Open Access Article
Mingjian
Wen
*a,
Matthew K.
Horton
bc,
Jason M.
Munro
b,
Patrick
Huck
d and
Kristin A.
Persson
ef
aChemical and Biomolecular Engineering, University of Houston, Houston, 77204, TX, USA. E-mail: mjwen@uh.edu
bMaterials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, 94720, CA, USA
cMicrosoft Research, Redmond, 98052, WA, USA
dEnergy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, 94720, CA, USA
eMolecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, 94720, CA, USA
fDepartment of Materials Science and Engineering, University of California, Berkeley, Berkeley, 94720, CA, USA
First published on 5th February 2024
The elasticity tensor is a fundamental material property that describes the elastic response of a material to external force. The availability of full elasticity tensors for inorganic crystalline compounds, however, is limited due to experimental and computational challenges. Here, we report the materials tensor (MatTen) model for rapid and accurate prediction of the full fourth-rank elasticity tensors of crystals. Based on equivariant graph neural networks, MatTen satisfies two essential requirements for elasticity tensors: independence of the frame of reference and preservation of material symmetry. Consequently, it provides a unified treatment of elasticity tensors for all seven crystal systems across diverse chemical spaces, without the need to deal with each separately. MatTen was trained on a dataset of first-principles elasticity tensors garnered by the Materials Project over the past several years (we are releasing the data herein) and has broad applications in predicting the isotropic elastic properties of polycrystalline materials, examining the anisotropic behavior of single crystals, and discovering materials with exceptional mechanical properties. Using MatTen, we have found a hundred crystals with extremely large maximum directional Young's modulus and eleven polymorphs of elemental cubic metals with unconventional spatial orientation of Young's modulus.
In spite of the importance, elasticity tensor data from experimental measurement is limited to a small number of materials. For example, for inorganic crystalline compounds, experimental data is only on the order of a few hundred, considering entries both tabulated in handbooks and scattered in individual papers.10 The major difficulty lies in preparing large enough single crystals for accurate experimental measurement using current techniques such as resonant acoustic spectroscopy.11 In the past decade, efficient and reliable electronic structure calculation methods such as density functional theory (DFT)12 with automation frameworks13–15 have enabled high-throughput computational investigation of materials. Using this approach in the Materials Project,16 we produced an initial dataset of elasticity tensors for 1181 crystals in 2015,10 which has been expanded over time to 10
276, which we now release as a new dataset with this work. Nevertheless, this only accounts for 6.6% of the more than 154
000 crystals in the Materials Project, let alone the even greater number of crystals recorded in the Inorganic Crystal Structure Database (ICSD)17 and predicted by universal interatomic potentials.18
It is therefore no surprise that machine learning (ML) has gathered substantial interest as a means to develop efficient surrogate models for the prediction of elastic properties. In a nutshell, state-of-the-art ML models for elastic properties encode compositional information19–21 and/or structural information20–23 in a material as feature vectors and then map them to a target using some regression algorithms. This approach is adopted in many existing works for learning elastic properties (e.g., for alloys24–27 and polycrystals28,29) and related atomic properties like stress and energy fields.30 They are, however, limited to derived scalar elastic properties such as bulk modulus and shear modulus, and separate models are built for each derived property. Ideally, one would hope to predict the full elasticity tensor, from which all other elastic properties can be derived. Along this direction, there have been attempts to predict individual tensor components.31,32 These models are great first steps, but essentially they still predict separate scalars in a specific coordinate system and thus unavoidably violate the transformation rules of tensors.
Geometric machine learning33 such as equivariant graph neural networks (GNNs)34–38 and equivariant kernel methods39,40 can directly operate in the space of high-rank tensors and adhere to their transformation rules. The main use case is still for scalar molecular and materials properties, but a couple of works have explored the application in predicting tensorial properties such as the molecular dipole moment,40,41 magnetic moment,42 and dielectric response.39 Other applications that do not directly predict final tensorial targets have also successfully taken advantage of internal tensorial representations to learn scalar fields such as molecular electron density.43,44 Although these efforts focus on scalars or low-rank tensors, they demonstrate the viability of direct machine learning of the full fourth-rank elasticity tensor.
In this work, we develop the Materials Tensor (MatTen) model for a rapid and accurate estimate of the fourth-rank elasticity tensors of inorganic compounds. Our model, based on equivariant GNNs, takes a crystal structure as input and outputs its full elasticity tensor with all symmetry-related transformation rules automatically satisfied. Other elastic properties such as bulk modulus and shear modulus can then be derived from the full elasticity tensor. The model satisfies two essential symmetry requirements for elasticity tensors: independence of the frame of reference, meaning that the choice of a specific coordinate system does not affect the model output, and preservation of material symmetry, meaning that the symmetry in a crystal is captured and reflected in the output elasticity tensor. The capabilities of MatTen are demonstrated via the study of both isotropic and anisotropic elastic properties. Using MatTen, we screened the Materials Project database for the identification of materials with a large maximum directional Young's modulus. On average, the values of the newly found materials are more than three times larger than existing data, as verified by first-principles calculations. In addition, we have identified 11 unconventional polymorphs of elemental cubic metals whose maximum directional Young's moduli are in the 〈100〉 directions.
The intrinsic material symmetry of a crystal can further reduce the number of independent components.8,47 For example, copper is a cubic crystal with mirror planes and three-fold rotation axes (point group m
m), and such symmetry results in a number of only three independent components. Formally, the material symmetry imposes the following constraints on the elasticity tensor:45,49
| Cijkl = QipQjqQkrQlsCpqrs, | (1) |
It turns out that there exists only eight distinct classes (Fig. 1), proved via a purely algebraic approach by directly identifying the equivalence classes corresponding to eqn (1).49 Of the eight classes, one is for isotropic materials, and each of the other seven corresponds to a crystal system.45,50 In our opinion, there is still significant confusion on this topic. For example, the categorization by Wallace51 and populated by Nye,47 which incorrectly gives two unique classes for each of the tetragonal and trigonal cases (Fig. S1 in the ESI†), is still widely cited in recent works.52–54 We refer to Section 6.5 of ref. 45 for a historical note on the development of the categorization.
![]() | ||
| Fig. 1 Symmetry classes and independent components of the elasticity tensor. The value in the parentheses after the name indicates the number of independent components. All matrices are symmetric about the main diagonal, with the lower triangular part omitted in the depiction. For single crystals considered in this work, the isotropic case does not apply. See ref. 45 for a detailed treatment of the classification. | ||
The Voigt matrix provides a visual way to observe the material symmetry of a crystal reflected in the elasticity tensor (Fig. 1). The values of the matrix components, however, depend on the choice of the coordinate system and do not show any systematic pattern upon coordinate transformation,55,56 making it difficult to build predictive models for elasticity tensors. This can be overcome by the harmonic decomposition,57 where the space of all elasticity tensors is factored into the direct sum of irreducible representations of SO(3). Consequently, any elasticity tensor can be written in the form,
| C = h1(λ) + h2(η) + h3(A) + h4(B) + h5(H), | (2) |
In the GNN model (Fig. 2), the input crystal is represented as a graph
, with atoms as the nodes V and bonds as the edges E. The feature Fi ∈ V characterizes atom i, and the initial value of Fi is obtained by encoding the atomic number Zi using a one-hot scheme. A bond/edge between two atoms is created if the distance ‖
ij‖ is smaller than a cutoff value, where
ij denotes the distance vector between atoms i and j. Periodic boundary conditions are considered when constructing the bonds, using super cell vectors. The distance vector
ij is separated into two parts: the unit vector
ij from atom i to atom j and the scalar distance rij between them. The former is expanded on real spherical harmonics Yml, and the latter is expanded on the Bessel radial basis functions.58 In sum, these embedding modules extract structural information (coordinates of atoms, atomic numbers, and super cell vectors) from the crystal and provide them to the interaction blocks.
![]() | ||
Fig. 2 Schematic overview of the MatTen graph neural network model. (A) The model takes a crystal structure as input, performs message passing with equivariant graph neural network (GNN) layers, and outputs the full elasticity tensor satisfying all symmetry requirements. (B) Architecture of the model. The model consists of four modules: the position embedding module converts an input displacement vector ij between atoms i and j to radial (R) and spherical (Y) expansions; the atom embedding module encodes the atomic number Zi as irreducible atom features Fi using a one-hot encoding; interaction blocks iteratively refine the features using convolutions based on tensor product; and the output head selects the appropriate irreducible features corresponding to the elasticity tensor and assembles them as the output tensor C. Following ref. 56, the fourth-rank harmonic part of the elasticity tensor is depicted as a generic matrix. The ⊕ symbol denotes addition. | ||
The interaction blocks iteratively refine the atom features via convolution operations. The architecture of the interaction block follows the design of Tensor Field Network34 and NequIP.36 Unlike many existing GNNs for molecules and crystals22,59–61 that utilize scalar features, here, the atom feature Fi is a set of scalars, vectors, and high-rank tensors. Formally, it is a geometric object consisting of a direct sum of irreducible representations of the SO(3) rotation group.34,36 There are two major benefits of using such geometric features. First, they are incorporated as inductive bias which can improve model accuracy and reduce the amount of training data. Second, from them, one can easily construct other physical tensors such as the elasticity tensor in this work.
The convolution on these geometric objects is an equivariant function, meaning that if the input atom feature F to the convolution is transformed under a rotation in SO(3), the output is transformed accordingly. This is achieved via the tensor product convolution by updating the atom feature in the (k + 1)th interaction block from that in the kth interaction block,
![]() | (3) |
denotes the set of neighboring atoms for atom i, R indicates a multilayer perceptron (MLP) on the radial basis expansion of rij, and Y indicates the spherical harmonics expansion of
ij. The tensor product ⊗ between Y and Fjk is a bilinear map, which is a generalization of the outer product of two vectors. The product output is decomposed back onto the irreducible representations, and the entire operation is equivariant.34 The use of an MLP makes the convolution learnable. After a skip connection,62 the feature Fik+1 is passed through a nonlinear activation function and finally normalized using an equivariant normalization function.63
The output head maps the refined features from the interaction blocks to the target materials tensor of interest. First, the features of all atoms are aggregated to obtain a representation of the crystal. For intensive properties such as the elasticity tensor, meaning that the property value does not depend on the size of the system, we adopt the mean pooling by averaging the features such that the representation of the crystal is independent of the number of atoms. Next, an appropriate subset of the pooled irreducible representations that correspond to the target tensor of interest is selected and then assembled as the model output. For the elasticity tensor, the selected ones are two scalars, two second-rank traceless symmetric tensors, and a fourth-rank harmonic tensor. They are assembled to an elasticity tensor according to eqn (2).
Overall, MatTen is a function C = f(x) that maps a crystal structure x to its elasticity tensor C. The function f is equivariant to the SO(3) group transformation, that is, for any x and g ∈ SO(3), we have Dy(g)f(x) = f(Dx(g)x), where Dx(g) and Dy(g) are rotation matrices parameterized by g for the crystal structure and the elasticity tensor, respectively. This ensures that the model can produce an elasticity tensor C that respects the orientation of the input crystal structure. In other words, the choice of a specific coordinate system does not affect the model output; if the coordinate system is rotated, the output tensor rotates accordingly. This independence of the frame of reference characteristic is an indispensable property for models that predict tensors. In addition, any such model should also preserve the material symmetry of the crystal. By construction, MatTen guarantees the material symmetry reflected in the elasticity tensor. Concretely, if the predicted elasticity tensor is represented as a Voigt matrix, the symmetry and number of independent components in Fig. 1 are automatically maintained for all seven crystal systems (proof in the ESI†). For example, for a cubic crystal, the model guarantees that there are only three independent components c11 = c22 = c33, c12 = c13 = c23, and c44 = c55 = c66 and that all other components are zero.
For comparison, we trained two additional models. The first is a variant of MatTen, where the tensor output head of MatTen is replaced with a scalar output head, referred to as MatSca hereafter. The second is the AutoMatminer algorithm, an automated machine learning pipeline designed for predicting scalar materials properties.20 We evaluated their performance in predicting the elastic moduli, and the results are listed in Table 1. Both MatTen and MatSca have smaller MAEs than AutoMatminer across all three moduli, owning to the effectiveness of the underlying neural networks in learning materials properties from structures. The performance of MatTen and MatSca are comparable. However, it is worth highlighting that while an individual MatSca model was trained for each modulus, a single MatTen model successfully produced all the elastic moduli, demonstrating the versatility of the MatTen model.
| K (GPa) | G (GPa) | E (GPa) | ||||
|---|---|---|---|---|---|---|
| MAE | MAE/MAD | MAE | MAE/MAD | MAE | MAE/MAD | |
| MatTen | 7.37 (0.10) | 0.130 (0.002) | 8.38 (0.16) | 0.280 (0.005) | 20.59 (0.35) | 0.275 (0.005) |
| MatSca | 7.32 (0.09) | 0.129 (0.002) | 8.63 (0.07) | 0.288 (0.002) | 19.87 (0.43) | 0.265 (0.006) |
| AutoMatminer | 9.84 (0.34) | 0.174 (0.006) | 9.27 (0.32) | 0.309 (0.011) | 22.10 (0.77) | 0.295 (0.024) |
Upon closer examination of Fig. 3A–C, we have identified some inconsistencies in the predictions. All crystals in the DFT dataset have positive moduli, the predicted moduli by MatTen, however, occasionally yield negative values, indicating that the associated crystal is elastically unstable. This is an inherent challenge faced by machine learning regression models in general, albeit with physical inductive biases embedded in the model such as the symmetry requirements in MatTen. The number of crystals with negative predicted moduli remains minimal, accounting for only 3, 2, and 2 out of the 1021 test data for bulk, shear, and Young's moduli, respectively. The moduli alone, however, do not provide a comprehensive understanding. For a crystal to be elastically stable, the sufficient and necessary condition is that the Voigt matrix should be positive definite.8 We checked this and found that 25 crystals in the test set do not satisfy this condition. The majority of them are due to the incorrect prediction of the relative magnitudes of the diagonal and off-diagonal components of the Voigt matrix. A breakdown of the errors is provided in the ESI.† Nevertheless, this is not a concern in practical use; one can filter out the negative ones if desired.
To assess how MatTen performs for different elastic properties as well as for different crystal systems, we computed the scaled error, SE = MAE/MAD, in which MAE and MAD are the mean absolute error and mean absolute deviation, respectively (see Section 4). MAD quantifies the distance of reference values to their mean, and larger MAD means the reference values are more scattered. A model that makes accurate predictions for each data point will have an SE of 0, and a model that always predicts the mean of the dataset will have an SE of exactly 1. Comparing between properties, we see from Fig. 3D that the SE of bulk modulus is smaller than those of shear modulus and Young's modulus across all crystal systems. This suggests that bulk modulus is easier to predict, in agreement with existing observations.19–22 Next, we compare between crystal systems. The dataset used for model development has an uneven distribution in terms of the number of materials in each crystal system (Fig. S2 in the ESI†). For example, it contains 4217 cubic crystals, fewer than 800 trigonal and monoclinic crystals, and only 60 triclinic crystals. As a result, the SE and the error bar of the predicted moduli are larger for trigonal, monoclinic, and triclinic crystals in general (Fig. 3D). Despite the slightly higher errors, it is notable that the model can still perform well for the crystal systems with a low presence in the training data, particularly for triclinic crystals. This is primarily because MatTen internally treats all crystals the same, enabling crystal systems with fewer data to leverage the abundant data from other crystal systems and acquire enhanced representations. This type of transferability is not possible with models that are built separately for each crystal system.
The elastic moduli have values across different orders from near zero to hundreds (Fig. S4–S7 in the ESI†). To mitigate the challenge of learning values across a broad spectrum, some existing models19–22 adopt the approach of predicting the logarithm of the moduli. Unlike these models, MatTen directly predicts the full tensor without any data transformation, and all other elastic properties (including their logarithms) can be computed from it. The logarithms of the bulk, shear, and Young's moduli obtained from MatTen are comparable to those that learn in the logarithmic space (Table S1 in the ESI†). Further, the performance of MatTen can be further improved with additional training data. The MAE almost decreases linearly with the logarithm of the number of data used to train the model (Fig. S10 in the ESI†). Finally, it is also possible to predict the full elasticity tensor by separately modeling its non-zero independent components.31,32 MatTen performs much better than this approach thanks to its ability to deal with all crystal systems within a united framework (further discussion in the ESI†).
Young's modulus E discussed in Section 2.3 is an averaged property for isotropic polycrystals. But for single crystals, Young's modulus depends on the direction along which the strain/stress is applied and measured. Given the elasticity tensor Cijkl (equivalently, the compliance tensor Sijkl, see Section 4), the directional Young's modulus can be computed as47,54
| Ed(n) = (ninjnknlSijlk)−1, | (4) |
θ·cos
φ, sin
θ·sin
φ, cos
θ] (Fig. 4C), it can be represented in two dimensions (Fig. 4D, with a Robinson map projection67). Such plots make it easier to visually investigate the anisotropic characteristics of Young's modulus. For example, for the cubic rocksalt CaS crystal (Fig. 4B), the maximum directional Young's modulus Emaxd is along the 〈100〉 directions (for instance, θ = 90° and φ = 0°), while the minimum Emind is along the 〈111〉 directions (for instance, θ = 54.7° and φ = 45°). In fact, for cubic crystals such as CaS, the extreme values of Ed are guaranteed to occur in these two directions.47Eqn (4) can be simplified as Ed(n) = [S1111 − 2(S1111 − S1122 − 2S2323)(n12n22 + n22n32 + n32n12)]−1 for cubic crystals, expressed in terms of their three independent elasticity tensor components. It can be mathematically shown that, if| S1111 − S1122 − 2S2323 < 0, | (5) |
To quantitatively assess the ability of MatTen in predicting anisotropic elastic properties, we measured the error between the model predicted directional Young's modulus Epredd and the DFT reference Erefd. For CaS, MatTen prediction closely follows DFT reference, with a maximum under-prediction of 8.7 GPa along the 〈111〉 directions and a maximum over-prediction of 9.4 GPa along the 〈100〉 directions (Fig. S14 in the ESI†). In addition to this example crystal, we calculated the error for the entire test set, computed as D = Ds·Dv for each crystal. The value of the error is
, where ΔEd(θ,φ) = Epredd(θ,φ) − Erefd(θ,φ) and |·| denotes the absolute value. The sign of the error is Ds = +1 if
, and Ds = −1 otherwise. Put differently, the value Dv quantifies the average deviation from the DFT reference, while the sign Ds characterizes whether the overall prediction is larger than the DFT reference. The integration over θ and φ is performed using the Chebyshev quadratures, which uniformly distribute the integration points on the sphere and can avoid biasing specific points.68 The distribution of the prediction error D of Ed for the test set is plotted in Fig. 4E. It has a Gaussian-like shape, and it almost overlaps with that of isotropic Young's modulus obtained using the Hill average. Similar behaviors are observed for shear modulus and MAEs by crystal systems (Fig. S15 and S16 in the ESI†). These observations suggest that, on average, MatTen performs equally well for the anisotropic and averaged isotropic elastic properties. Accurate prediction of anisotropic properties offers a comprehensive understanding of the elastic characteristics exhibited by crystals, enabling the discovery of new materials through the utilization of these predictive capabilities.
We have also applied the MatTen model to screen for crystals with large Emaxd. We first filtered crystals from the Materials Project database based on their energy above the convex hull values, selecting those with a value of ≤50 meV per atom. This energy determines the thermodynamic stability of a crystal and has been shown to correlate with the synthesizability of crystals.70,71 We further narrowed down the selection to crystals with fewer than 50 atomic sites to reduce computational cost and remove crystals already present in the dataset used to develop the model. This resulted in 53
480 crystals for further analysis. Next, we employed MatTen to predict the full elasticity tensors for these crystals and compute their Emaxd. The top 100 crystals with the highest Emaxd were then supplied for DFT computation to obtain their elasticity tensors. Fig. 5 presents a histogram of Emaxd for the identified 100 crystals. Their Emaxd values all fall at the tail of the distribution of Emaxd from the training data. Quantitatively, the mean of Emaxd for the identified crystals is 606 GPa, while that for the training data is 174 GPa, corresponding to the expected value for a randomly selected material. This demonstrates the effectiveness of leveraging MatTen to screen for materials with extreme elastic properties. The newly identified crystals are provided in Data Availability.
![]() | ||
| Fig. 5 Screening of crystals with large maximum directional Young's modulus. The training data and new discoveries are separately normalized such that each sums to 100 percent. | ||
In addition, we have identified 11 unconventional polymorphs of elemental cubic metals regarding the direction of the extreme values of Young's modulus. Five of them have already been experimentally synthesized (four stable ground-state polymorphs and one metastable polymorph71), and the other six are metastable polymorphs that have not yet been experimentally observed. It is believed that Emaxd is along the 〈111〉 directions (meaning eqn (5) is not satisfied) for all cubic metals except Mo.47 We suspect that there exist other polymorphs of elemental cubic metals satisfying the criterion in eqn (5). To test this, we performed a screening of the Materials Project database. From the dataset used to develop MatTen, we have identified six such crystals. Mo is among them, and the other five are V (one polymorph), Cr (two polymorphs), and W (two polymorphs). For crystals not in the dataset, we first used MatTen to predict their elasticity tensors and then selected the 18 crystals that meet the verification criterion using DFT. DFT predicted 12 crystals satisfying the criterion; six are elastically unstable structures whose Voigt matrix has negative eigenvalues,8,9 and the other six successful ones are Mn, Na, K, Cs, Rh, and Tl (one polymorph for each). Among the 11 newly identified crystals, polymorphs of alkali metals Na, K, and Cs have DFT calculated S1111 − S1122 − 2S2323 values much more negative than that of Mo, but they are all hypothetical crystals that have not been experimentally synthesized yet. Five polymorphs are indeed experimentally observed, and they are all neighbors of Mo in the periodic table, namely V, Cr, Mn, and W. Among them, four are thermodynamically stable ground-state polymorphs, and one polymorph of Cr is metastable. Crystal structures, ground-state information, and elasticity tensors of the 11 unconventional polymorphs are provided in Data Availability and Table S3 in the ESI.†
It should be noted, however, that a crucial aspect regarding the practical use of the model relies on the robustness of the input structure. Given two structures of a crystal where the atomic coordinates are slightly different, we would expect the elastic properties to be similar. If, otherwise, the model is extremely sensitive to the input, then it is not ideal for practical application. Extra work such as highly accurate DFT structure relaxation is needed before applying the model for predictions. We tested MatTen by using structures directly queried from the Materials Project database and structures with tighter geometry optimization. The MAE of Emaxd between using the two types of structures is 6.55 GPa (Fig. S17 in the ESI†), more than three times smaller than the MAE (22.36 GPa) of Emaxd between model predictions and DFT references. This suggests that MatTen is robust enough to its input, and reasonably optimized structures (for example, from online databases) would not introduce extra error larger than the intrinsic error in the model.
The MatTen model is not limited to inorganic crystalline compounds and even elasticity tensors in general. The elastic behavior of other classes of materials such as two-dimensional layered materials and molecular crystals play a significant role in determining their functionality, and MatTen can be directly applied to model their elasticity tensors. Of course, a curated dataset of reference elasticity tensors is needed. Such data already exists, for example, in the Computational 2D Materials Database (C2DB).72 Moreover, besides elasticity tensor, MatTen can be applied to other tensorial properties of materials. These can be broadly categorized into two classes: material-level property and atom-level property. While the former means a single tensor for each crystal, the latter means a separate tensor for each atom in the crystal. Other material-level properties such as piezoelectric and dielectric tensors can be modeled by updating the output head as in Fig. 2 to use the corresponding irreducible representations of the tensor of interest (for example, a single second-rank symmetric matrix for the dielectric tensor). For atom-level properties such as the neutron magnet resonance (NMR) tensor, instead of using a mean pooling to aggregate atom features, one can directly map the features of an atom to a tensor for that atom. Using MatTen, we have conducted such an analysis for NMR tensors of silicon oxides and found that MatTen significantly outperforms both historic analytical models and other machine learning models by more than 50% for isotropic and anisotropic NMR chemical shift.73
One potential limitation of the proposed approach is the reliance on a relatively large dataset to develop the model. We have curated a dataset of 10
276 elasticity tensors which took millions of CPU hours to obtain. Such large datasets for other tensorial properties may not be readily available, but they begin to emerge. For example, the Materials Project has about 3000 piezoelectric and 7000 dielectric tensors.74,75 This amount of data might still be a good start to training faithful models, given that piezoelectric and dielectric are third- and second-rank tensors, respectively, which are much simpler than the fourth-rank elasticity tensor. In fact, for the second-rank NMR tensor, we only used a dataset of 421 crystals to obtain the best-performing model.73 Another possibility is to apply a transfer learning approach, where the model is first trained on a different property with large data (for instance, elasticity tensor) and then finetuned on the target property of interest (for instance, piezoelectric tensor). A limitation of the trained model can come from the data. The data consists of DFT calculations of perfect single crystals with relatively small super cells at a temperature of 0 K. Given that the efficacy of the model is intrinsically tied to the scope of the training data, it is imperative to exercise caution when applying the model to scenarios that extend beyond these parameters. For example, the model is not appropriate for crystals with defects, such as vacancies, dislocations, and grain boundaries. Additionally, it is not advisable to directly employ the model for estimating the mechanical properties at finite temperate, especially for those materials, like metallic alloys, which exhibit a pronounced temperature dependency.
For atom a, its atomic number Za is embedded as a vector with c components using a one-hot encoding to obtain the initial atom feature Fa. The unit vector
ij from atom i to atom j is expanded using a spherical harmonics basis consisting up to a degree of l = 4. (Explicitly, this corresponds to the “0e + 1o + 2e + 3o + 4e” irreducible representations in e3nn notation63). The distance rij between atoms i and j is expanded into a vector R using the radial basis functions,58
![]() | (6) |
With the atom features F, the spherical harmonics expansion of
ij, and the radial basis expansion R of rij obtained from the embedding layers, the interaction block performs tensor product–based convolution to refine the atom features. This is achieved viaeqn (3), more specifically,34
![]() | (7) |
within the cutoff rcut; l is an integer indicating the degree of the spherical harmonic function, and m = −l, …, l; the subscripts i, o, and f indicate input, output, and filter, respectively; c is the channel index (for example, for the embedding layer, it indicates the components of the one-hot encoding); C denotes the Clebsch–Gordan coefficients; and finally, R(lcf,li) are learnable multilayer perceptrons (MLPs), which take the RBF expansion as the input and contain most of the parameters of the model. Essentially, this combines the atom features of neighbors b to be the new atom features of the center atom a, in the same spirit of a message-passing graph neural network. A major characteristic of eqn (7) is that the use of the spherical harmonics and the Clebsch–Gordan coefficients together imply that convolutions are equivariant.34
The atom features F is also passed through a self-interaction,
![]() | (8) |
![]() | (9) |
| SiLU(s) = σ(s)s, | (10) |
| G(t) = σ(x)t, | (11) |
The readout head aggregates the features of individual atoms to obtain a representation of the material via a mean pooling,
![]() | (12) |
276 elasticity tensors is split into three subsets for training, validation, and testing with a split ratio of 8
:
1
:
1. A random split with stratification is adopted where each of the seven crystal systems is separately treated in the split. The model parameters are optimized using the training set, model hyperparameters are determined based on model performance on the validation set, and error analysis is performed using the test set unless otherwise stated. We train the model with the Adam optimizer to minimize a mean-squared-error loss function
with a mini-batch size B of 32. Note, Ci denotes the irreducible representation of the model predicted elasticity tensor with 21 components (see eqn (2)), but not the Cartesian tensor with 81 components. Similarly, Crefi denotes the corresponding reference DFT values. The learning rate is set to 0.01, and a reduce-on-plateau learning rate scheduler is used, which decreases the learning rate by a factor of 0.5 if the validation error does not decrease for 50 epochs. The training stops when the validation error does not decrease for 150 consecutive epochs, and a maximum of 1000 epochs are allowed for the optimization. We performed a grid search to obtain model hyperparameters such as the rcut and c. Search ranges and their optimal values are listed in Table S4 in the ESI.†
Ten-fold cross validation is also performed to test the effects of different data splits. Fig. S11 and S12 in the ESI† present the results, and MatTen is not sensitive to data splits. Detailed information on the training, validation, and test split, as well as the ten-fold split is given in the released dataset (see Data Availability).
| s = c−1, | (13) |
![]() | (14) |
| KV = [(c11 + c22 + c33) + 2(c12 + c23 + c31)]/9, | (15) |
| GV = [(c11 + c22 + c33) − (c12 + c23 + c31) + 3(c44 + c55 + c66)]/15. | (16) |
The Reuss average assumes that the stress is the same in each grain, equal to the macroscopically applied stress.79 The Reuss bulk modulus is
| KR = 1/[(s11 + s22 + s33) + 2(s12 + s23 + s31)], | (17) |
| GR = 15/[4(s11 + s22 + s33) − 4(s12 + s23 + s31) + 3(s44 + s55 + s66)]. | (18) |
The Voigt and Reuss averages are the two extreme cases. The Hill average takes their arithmetic mean and is considered the most accurate in a wide range of experimental conditions.64 The Hill bulk modulus is
| KH = (KV + KR)/2, | (19) |
| GH = (GV + GR)/2. | (20) |
Given the bulk modulus and the shear modulus (from any of the Voigt, Reuss, and Hill schemes), Young's modulus can be computed as80
| E = 9KG/(3K + G). | (21) |
In this work, we report the bulk modulus K = KH, shear modulus G = GH, and Young's modulus E = 9KHGH/(3KH + GH) from the Hill average scheme.
and
, in which yi is the reference value of data point i, ypredi is the model prediction for the data point, and ȳ is the average of all reference values. The numbers N1 and N2 do not necessarily need to be the same. This is the case in Fig. 3 and Table 1, where N1 is the number of crystals in a specific crystal system and N2 is the total size of the test set. The scaled error (SE) is then computed as SE = MAE/MAD.
Footnote |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3dd00233k |
| This journal is © The Royal Society of Chemistry 2024 |