Kihoon
Bang‡
a,
Jeongrae
Kim‡
ab,
Doosun
Hong‡
a,
Donghun
Kim
*a and
Sang Soo
Han
*a
aComputational Science Research Center, Korea Institute of Science and Technology, Seoul 02792, Republic of Korea. E-mail: sangsoo@kist.re.kr; donghun@kist.re.kr
bDepartment of Artificial Intelligence Software Convergence, Korea Polytechnics, Chuncheon Campus, Gangwon-do 24409, Republic of Korea
First published on 1st February 2024
To accelerate materials discovery, an inverse design scheme to find materials with desired properties has been recently introduced. Despite successful efforts, previous inverse design methods have focused on problems in which the desired properties are described by a single number (one-dimensional vector), such as the formation energy and bandgap. The limitation becomes apparent when dealing with material properties that require representation with multidimensional vectors, such as the electronic density of states (DOS) pattern. Here, we develop a deep learning method for inverse design from multidimensional DOS properties. In particular, we introduce a composition vector (CV) to describe the composition of predicted materials, which serves as an invertible representation for the DOS pattern. Our inverse design model exhibits exceptional prediction performance, with a composition accuracy of 99% and a DOS pattern accuracy of 85%, greatly surpassing the capabilities of existing CVs. Furthermore, we have successfully applied the inverse design model to find promising candidates for catalysis and hydrogen storage. Notably, our model suggests a hydrogen storage material, Mo3Co, that has not yet been reported. This readily reveals that our model can greatly expand the space of inverse design for materials discovery.
As a strategy for further accelerating materials discovery, an inverse design scheme has been introduced, in which a user defines the target properties of materials and an algorithm then suggests materials that meet the target properties. State-of-the-art computer simulation techniques (e.g., DFT calculations) only allow a forward prediction from material information to the properties. However, deep learning (DL) algorithms make inverse design of materials possible not only for organic materials7 but also for inorganic materials,8 in which DL methods such as generative adversarial networks (GANs)9 or variational autoencoders (VAEs)10 have usually been used.7,8,11–19 For example, Noh et al.14 found new metastable vanadium oxide (V–O) compounds using the VAE algorithm. Ren et al.8 also used a VAE to generate materials with the desired formation energies, bandgaps, or thermoelectric power factors. Xie et al.18 developed a chemical diffusion variational autoencoder (CDVAE) model and Wines et al.19 utilized the CDVAE to design superconductors with a high critical temperature. In contrast, Kim et al.11 generated various porous zeolites with the desired heat of adsorption of methane via a GAN. Despite these efforts, these inverse design methods are limitedly applicable to problems for which the target properties are described by a single scalar value, such as a formation energy of 1.0 eV per atom or a bandgap energy of 3.2 eV. In some cases, the material properties and performance can be well represented by a single number, but in many other cases, the properties need to be represented by multidimensional vectors (a series of numbers). Recently, various inverse design models dealing with multidimensional properties have been developed. Li et al.20 developed forward and inverse design models capable of handling multiple structural and property features of nanoparticles. Expanding on their efforts, the same research group applied their methodologies to predict multiple target electrochemical properties of MXene-type materials.21 And Dong et al.15 specifically addressed the light absorption spectrum, developing an inverse design framework for predicting a material's formula. These advancements demonstrate the importance and potential for the development of an inverse design model to target multidimensional properties.
A representative example of such multidimensional properties, which is the focus of this study, is the electronic density of states (DOS) pattern. The electronic DOS pattern can often be represented by the number of electronic states at each energy level (typically represented by approximately a few hundred values). The DOS properties determine not only the electrical properties of materials but also their chemical properties. It is well known that catalytic properties are also significantly affected by the DOS pattern.22,23 Although the d-band center value (single number) has been widely used as a simplified but effective descriptor in catalyst design,22–24 this value is not sufficient to fully represent and understand the whole electronic structure of catalyst materials. Recent DL studies25,26 show that the d-band center is not sufficient to fully describe adsorption energies. In this regard, the DOS pattern itself or its derivative, although much more complex than the d-band center, has served as an improved and complementary descriptor in catalyst design.27–31 Fung et al.29 developed an ML model designed to accurately predict adsorption energies from DOS patterns. Similarly, Hong et al.30 predicted adsorption energies from DOS patterns and interpreted the correlation between DOSs and chemisorption properties. Knøsgaard et al.31 successfully estimated quasiparticle band structures from DOS fingerprints using standard DFT calculations. Despite the ML model's success in utilizing DOS patterns as input, there have been no efforts thus far to directly predict material information (compositions of inorganic material) from DOS patterns, which is the key inverse design strategy in this work. Therefore, it is necessary and timely to develop a machine-learning-based inverse design strategy applicable to multidimensional properties such as DOS patterns, which should suggest material information (e.g., atomic structure or composition) from an input of desired DOS patterns.
For the development of such an inverse design technique, it is critical to develop a machine-readable representation to reflect the electronic DOS pattern information that is invertible, allowing conversion back to material information. Among the types of material information, the atomic structure and the chemical composition both affect the properties of materials. Consequently, various inverse design studies have employed representations which include both structure and chemical details.8,19 Nevertheless, the vastness of possible variations in chemical composition and atomic structure makes it challenging to navigate the landscape. Hence, there is still a need to limit the information used in inverse design strategies. For instance, Fung et al.32 examined the atomic structure of MoS2 composition, while Lyngby et al.33 explored the composition in the 2D-type structure. Likewise, a restriction would be needed in the atomic structure or chemical composition for manageability. Interestingly, our previous DL studies showed that in predicting material's properties, feature vectors derived from chemical compositions hold more weight than those from X-ray diffraction patterns representing atomic structures.34 The result provides a meaningful guideline for inverse design, i.e., it would be more efficient to specify the chemical compositions of materials for a given atomic structure. This guideline calls for the development of a representation for the chemical composition that is invertible to the electronic DOS pattern information. In fact, there have been several efforts to develop a representation for mapping the chemical compositions of materials, in which one-hot encoding methods have been used.35–37 For example, Zhou et al.35 generated one-hot encoded datasets consisting of elements in the material formulae and their chemical environments and embeddings of elements through singular value decomposition and a probability model. Tshitoyan et al.36 extracted elemental information through natural language processing of published papers to generate a one-hot encoded dataset and embeddings of elements through word2vec.38,39 However, these previously reported representations do not include electronic DOS information and have not been tested for inverse design.
In this work, we develop and report a convolutional neural network (CNN)-based DL model that is effective for inverse design of inorganic materials from multidimensional DOS patterns. This model can suggest the chemical composition for a given atomic structure and consequently several candidate materials with ranks. The composition vector (CV) created from the DOS patterns of each element is used as a representation vector for the inverse design, which greatly enhances the performance of the inverse design model, as evidenced by the composition prediction accuracy of 99% and the DOS pattern accuracy of 85%. To demonstrate the effectiveness of our model, we apply the model to two exemplary applications, namely, oxygen reduction reaction catalysis and hydrogen storage, where the inputs are DOS patterns for Pt3Ni and Pd, respectively, since these materials are regarded as prototypical materials in each field. The model successfully proposes novel binary alloys that have DOS patterns similar to those of the input materials in both applications. The workflow presented herein is not limited to DOS patterns but can be readily expanded to many other properties described by multidimensional vectors, such as spectrum data in materials science.
CVAmBn = mEVA ⊕ nEVB | (1) |
To validate whether the DOS-based EVs contain chemical information about elements and materials, t-distributed stochastic neighbor embedding (t-SNE)42 analysis is applied to the EVs (Fig. 2b). This algorithm is known to show good performance in visualizing high-dimensional vectors compared to other algorithms, such as principal component analysis (PCA).43 Interestingly, the t-SNE analysis reveals that elements in the same group of the periodic table are distributed at similar positions, indicating that each group is distinguished by the position. Additionally, the distances between groups reflect the chemical relations of the periodic table. For example, because the alkali metal group and alkaline metal group differ by only one valence electron, they are located close to each other in the t-SNE plot. Here, we highlight that the EV readily reflects the chemical information of elements, although we have not trained on the element information such as the atomic number or period number and only used the centroid DOS in constructing the EVs with the autoencoder. We also tested the performance of the DOS-based CV by training an artificial neural network (ANN) model, in which the CV for a given material was used as an input and classifying the bandgap energy of the input material into three categories: small gap (Eg < 0.2 eV), medium gap (0.2 eV < Eg ≤ 3.6 eV), and large gap (Eg > 3.6 eV). The results showed a high accuracy of 92% with DOS-based CVs (Table S1†). If we use the element embedding with normalized composition matrix (EENCM) method to create CVs, which is trained on the chemical formula of materials,34 then the accuracy approaches 93%, which is similar to that obtained with our DOS-based CVs, although our case is trained with a much lower amount of data (32659 DOS patterns) compared to the EENCM case (118
176 chemical formulas). These results show that our DOS-based CVs well represent the chemical composition information of materials.
The performance of the inverse design model is shown in Fig. 4a. We tested the performance with 100 randomly selected DOS patterns in the DOS DB. Two metrics for the performance are considered: the composition accuracy and DOS pattern accuracy. The composition accuracy is defined as the proportion of test samples for which the composition of an input DOS is included in the five candidate materials predicted by the inverse design model. In contrast, the DOS pattern accuracy is measured based on the comparison of the DOS patterns of the input material and the five candidate materials. First, if the DOS pattern of the candidate material exists in the DOS DB, then the DOS similarity between the input material (A) and the candidate (B) is calculated using a cosine similarity as follows:
![]() | (2) |
We also compare the DOS-based CVs with other types of CVs (formula-based CVs34 and one-hot encoding-based CVs) previously reported as an output of the inverse design in Fig. 4a. Although one-hot encoding can be used as a representation of an element or composition, it does not include chemical information. Thus, the accuracy of the inverse design model is lower than that when using the DOS-based CVs. If the inverse model is based on the formula-based CVs, then the composition accuracy is approximately 0.77, which is higher than that in the one-hot encoding case but is still much lower than that in our DOS-based CV case. Since our DOS-based CVs include the DOS information itself, the DOS pattern accuracy is also much higher than those obtained with formula-based or one-hot encoding-based CVs.
First, for catalysis applications, Pt3Ni has been regarded as one of the prototypical and best-performing catalyst materials for the oxygen reduction reaction (ORR) in a proton exchange membrane fuel cell (PEMFC).45,46 By using the inverse design model, we intend to design ORR catalysts with catalytic performance as high as that of Pt3Ni, and therefore, the DOS pattern of Pt3Ni was used as an input in our inverse design model. The model predicts the following five candidates: Pt3Ni (1st rank), Pt3Co (2nd rank), Pt3Rh (3rd rank), Pt3Fe (4th rank), and Pt3Mn (5th rank) (Fig. 5a). The fact that the 1st-rank candidate is Pt3Ni (identical to the input material) once again supports that the inverse design model has a high composition accuracy. In addition, it is noteworthy that the Pt3Co, Pt3Rh, Pt3Fe, and Pt3Mn candidates have all been previously reported as potential ORR catalysts, and all of them show higher activity than Pt.46–50 In particular, Pt3Co47 and Pt3Fe49 have very similar mass activity to Pt3Ni.46 We also compare the DOS patterns of Pt3Co and Pt3Ni (Fig. 5b) and find that the DOS similarity is as high as 0.94. Although these candidates are previously reported catalysts for the ORR, our inverse design model successfully finds candidate materials without training based on prior knowledge of the catalytic properties of the materials. These facts definitely reveal the effectiveness of our DL model for catalyst design. Moreover, we need to note that Pt3Co, Pt3Rh, and Pt3Mn are not included in the DOS DB used for the training of our inverse design model. This indicates that our model can readily identify candidate materials not only within the DB but also outside of the training dataset. This reveals that our inverse design can be more powerful for materials design than high-throughput screening. If we employ high-throughput screening of the DOS patterns in the DB, then we would never find candidates outside of the DB.
![]() | ||
Fig. 5 Inverse design for the prediction of ORR catalysts with an input of the Pt3Ni DOS. (a) ORR mass activity of candidates predicted via our inverse design model. The materials shown in dashed bars are out-of-the-training-dataset samples. All mass activity values are gathered from ref. 46–50. (b) DOS patterns of the input Pt3Ni (red line) and predicted Pt3Co (cyan area). The inset atomic structure shows a unit cell of Pt3Co. |
We further investigated the expandability of our model. For this, we applied the model to binary systems with a hexagonal crystal structure. The space groups of predicted materials were assigned to the most common space group (P63/mmc) in the Materials Project database. When provided with the Pt3Ni DOS as the target, our model recommends three materials as candidates with a DOS similarity of approximately 0.9: AuCo2, PtMn2, and PtFe2 (Fig. S2†). In particular, the DOS similarity of AuCo2 exceeds 0.9, indicating that our model is readily applicable to hexagonal structure systems. To investigate the thermodynamic stability of the candidate materials, we also calculated their formation energies using DFT calculations. PtMn2 and PtFe2 have negative formation energies. Although AuCo2 has a positive formation energy, the value is very close to zero, indicating that AuCo2 could be stable at temperatures well above 0 K.
Additionally, we expanded our model to ternary systems with a tetragonal crystal system, where the target DOS was maintained as that of Pt3Ni (Fig. S3†). The space groups of predicted materials were assigned to the most common space group (P4/mmm) in the Materials Project database. Similar to the binary hexagonal structure case, our model readily recommends a ternary tetragonal material (CoRh2Pd) with a high DOS similarity of 0.9, in which the formation energy of the candidate material is close to zero. As the 2nd tier candidate, our model recommends ZrRh2Ir and ScRh2Ir with a DOS similarity of 0.8, in which their formation energies are negative. Based on the two additional tests, it is confirmed that our inverse design model is not only limited to a cubic structure but also works for various crystal structures. Moreover, it is applicable to not only binary systems but also ternary systems.
As a second application example, we apply the inverse design model to find a novel hydrogen storage material as an alternative to the prototypical Pd.51,52 As a descriptor to evaluate the hydrogen storage properties of a material, the formation energy of interstitial hydrogen is well known to be very useful.53,54 To have high hydrogen uptake and release properties, the formation energy value should be small but negative. Because the formation energy of an interstitial defect is related to the DOS pattern,55,56 we tried to find a bimetallic hydrogen storage material whose DOS pattern is similar to that of pristine Pd by using the inverse design model.
As shown in Fig. 6, the inverse design model proposes five candidates by using the DOS pattern of Pd as an input: Pt3Ni, Pt3Co, Pt3Fe, Mo3Ni, and Mo3Co. To investigate the formation energies of interstitial hydrogen, we first determined the structure of candidate compositions. As Pd, the input material has a cubic structure, and all candidates are assumed to have cubic Bravais lattices. Among the crystal structures with the A3B composition in the cubic Bravais lattice, the L12 structure was selected as a prototype. DFT calculations were conducted for the candidate material with the L12 crystal structure in which octahedral and tetrahedral sites for the interstitial hydrogen were considered. Here, several Pt alloys are predicted as candidate materials, which results from the fact that Pt and Pd are in the same group and have similar electronic structures, as shown in Fig. 2. However, bulk Pt is known to have no hydrogen storage properties;57 thus, the Pt alloy candidates have positive formation energies for interstitial hydrogen, likely indicating low hydrogen storage properties. In contrast, additional DFT calculations reveal that among the candidates, Mo3Co has a negative formation energy of −0.18 eV per siteH, implying that Mo3Co can show hydrogen storage properties. The preferential site for the interstitial hydrogen in Mo3Co is an octahedral site, identical to the Pd case. In fact, pristine Mo and Co are not promising hydrogen storage materials under ambient conditions.58,59 However, homogeneous mixing of Mo and Co elements in the Mo3Co lattice creates a different electronic structure from those of pristine Mo and Co but similar to that of Pd, readily leading to high hydrogen storage properties. To the best of our knowledge, this work is the first report on the hydrogen storage properties of Mo3Co.
Lastly, since the training database also includes DOSs for non-metallic materials, our model has the potential to be applied to oxide materials. To evaluate its applicability to oxide systems, we tested our model using the target DOS of BaTiO3 a material known for its high dielectric constant and common usage in multi-layer ceramic capacitors. Recognizing the perovskite structure of BaTiO3, we hypothesized that the predicted material would also exhibit a cubic perovskite structure. As shown in Fig. S4,† our model predicts SrTiO3, BaMnO3, and BaZrO3 as candidate materials with high DOS similarity. While the DOS of SrTiO3 shows a high cosine similarity of 0.85, BaMnO3 and BaZrO3 show DOS similarities of approximately 0.8. However, all three candidates exhibit negative formation energies. Here, it is noteworthy that, as the DOS in the database were calculated using the typical PBE functional, inherent errors may exist in the DOSs for oxide materials. This result demonstrates that our model can extend to non-metallic systems.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3ta06491c |
‡ These authors contributed equally. |
This journal is © The Royal Society of Chemistry 2024 |