FieldPerceiver: Domain agnostic transformer model to predict multiscale physical fields and nonlinear material properties through neural ologs
MJ Buehler, MATERIALS TODAY, 57, 9-25 (2022).
DOI: 10.1016/j.mattod.2022.05.020
Attention-based transformer neural networks have had significant impact in recent years. However, their applicability to model the behavior of physical systems has not yet been broadly explored. This is partly due to the high computational burden owing to the nonlinear scaling of very deep models, preventing application to a range of physical systems, in particular complex field data. Here we report the development of a general-purpose attention-based deep neural network model using a multi- headed self-attention approach, FieldPerceiver, that is capable of effectively predicting physical field data - such as stress, energy and displacement fields, as well as predicting overall material properties that characterize the statistics of stress distributions due to applied loading and crack defects, solely based on descriptive input that characterizes the material microstructure based on a set of interacting building blocks, all while capturing extreme short- and long-range relationships. Not using images as input, but rather realizing a neural olog description of materials where the categorization is learned by multi-headed attention, the model has no domain knowledge in its formulation, uses no convolutional layers, scales well to extremely large sizes and has no knowledge about specific properties of the material building blocks. Specifically, as applied to a fracture mechanics problem considered here, the model is capable of capturing size, orientation and geometry effects of crack problems for near- and far-field predictions, offering an alternative way to model materials failure based on language modeling without any convolutional layers commonly used in similar problems. We show that the FieldPerceiver can be used in a general framework, where the model can use insights learned during an initial, general training stage in order to fine-tune predictions for new scenarios, even when using only small additional datasets, revealing its broad generalization capacity. Once trained, the model can make predictions of thousands of scenarios within just a few minutes of compute time. It would take tens of hours, days or months to compute similar output using molecular dynamics simulation, for instance.
Return to Publications page