Discovery of zirconium dioxides for the design of better oxygen-ion conductors using efficient algorithms beyond data mining

It is important to find crystal structures with low formation (Ev) and migration-barrier (Em) energies for oxygen vacancies for the development of fast oxygen-ion conductors. To identify crystal structures with lower Ev and Em than those of ground-state ZrO2, we first reoptimize the crystal structures of various oxides reported in the database, and then directly construct them using an evolutionary algorithm. For efficient searching, we employ the linearized ridge regression model for Ev using descriptors obtained from density functional theory calculations of the unit cells and apply the predicted Ev as a fitness value in the evolutionary algorithm. We also find a correlation between the Ev and Em for the crystal structures of ZrO2. On the basis of this correlation, we confirm that the newly constructed crystal structures, as well as certain reoptimized structures from the database, that possess low Ev also have Em lower than that of ground-state ZrO2. Our successful strategy consisting of a combination of the evolutionary algorithm, first-principles calculations, and machine-learning techniques may be applicable to other oxide systems in finding crystal structures with low Ev and Em.


Introduction
A high oxygen ion conductivity (s O ) is an important property for applications 1,2 such as the electrolytes of solid-oxide fuel cells (SOFC), oxygen separation membranes, and gas sensors. Low vacancy formation (E v ) and migration-barrier (E m ) energies of oxygen vacancies V O are favorable for achieving a higher s O . 3 Currently, Y-doped ZrO 2 (YSZ) is widely used because of its advantages such as abundance, chemical stability, non-toxicity, and low cost. This material shows the s O of $10 À2 S cm À1 at the high temperature of 1000 K. 4 For useful industrial applications, it is necessary to have similar s O values at lower temperatures or a higher s O at a similar temperature. Indeed, several oxides such as Gd-doped CeO 2 (GDC), 5 pure or Er-doped d-phase Bi 2 O 3 , 6,7 and Sr-and Mg-doped LaGaO 3 (LSGM) 8,9 have been reported to show higher s O than YSZ at the same temperature. However, the need for the development of new oxygen-ion conductors remains.
The type of crystal structure strongly affects the s O as well as E v and E m . The monoclinic ground-state structure of ZrO 2 (space group P2 1 /c) is known to have a low s O . However, the s O can be signicantly increased through phase transformation into either the tetragonal (space group P4 2 /nmc) or cubic uorite structure (space group Fm 3m) upon doping with Y. 5 The d-phase Bi 2 O 3 in the defective uorite structure (its high-temperature phase) is known to have a high s O , whereas its a-phase counterpart is a low-temperature phase with a low s O . 6 Reported crystal structures with high s O are mainly uorites, perovskites, 5 and their derivatives, such as the defective uorite and melilite 10 structures. However, we expect that other undiscovered crystal structures may have high s O values. The evolutionary algorithm is useful for the construction and search of crystal structures. 11 This has been applied in combination with rst-principles calculations for determining crystal structures with maximized or minimized functional properties (tness values) such as enthalpy at high pressure, [12][13][14] hardness, 15 and dielectric constant. 16 This algorithm is especially powerful for easily obtained tness values, because it requires many individual crystal structures. From this perspective, the direct usage of the E v or E m as a tness value may be inefficient because these properties should be obtained from supercell calculations. 17 In this case, one useful method is machinelearning techniques.
A regression analysis, one of the machine-learning techniques, is used to predict a target variable on the basis of already accumulated data, which can be used for descriptors (also called "predictor variables" or "representations"). If the descriptors are composed of "density functional theory (DFT)unit-cell descriptors" that contain information regarding the unit-cells obtained only from DFT calculations, 18,19 the prediction model can be constructed with the advantage of computational efficiency. Some regression models have employed descriptors obtained from DFT calculations of the unit-cells to predict target variables such as the melting temperatures of binary metals, 20 GW-level band gaps for inorganic compounds, 21 and interatomic potentials. 22 In addition, Deml et al. 23 constructed a prediction equation for E v using ordinary least-square regression (OLSR) with four unit-cell descriptors and 45 oxides. However, it is not yet guaranteed whether this prediction model maintains accuracy in predicting E v for various types of crystal structures.
The purpose of the present study is to propose an efficient method to nd various crystal structures with lower E v and E m values than those of ground-state ZrO 2 . To this end, we reoptimized the reported crystal structures of various oxides in the Materials Project Database (MPD), 24 one of the inorganic materials databases based on rst-principles calculations, and reconsidered their E v and E m . We constructed a prediction model for E v based on linearized regression analyses using DFTunit-cell descriptors and subsequently constructed crystal structures based on the predicted E v using the evolutionary algorithm. We also investigated the relationship between E v and E m for the crystal structures of ZrO 2 . Finally, we found crystal structures having E v and E m values lower than those of groundstate ZrO 2 .

Overview
First, we reoptimized the crystal structures of several oxides from the MPD 24 for ZrO 2 . The computed E v and other properties of the unit-cells are used as the target variable and descriptors, respectively, for the regression analyses. Fig. 1 shows the workow for the construction of the crystal structures and prediction of the E v . New structures were constructed using the evolutionary algorithm. First-principles calculations were performed to optimize the constructed crystal structures and extract the DFT-unit-cell descriptors. The regression analyses were then applied to obtain the predicted E v . In total, 462 crystal structures were constructed through the repetition of cycles comprising the construction of crystal structures, calculation of DFT-unit-cell descriptors from the constructed structures, and application of regression analysis to predict E v .
We identied crystal structures with predicted E v in the target range and performed structural optimization again under tighter conditions. The accurate E v of the remaining crystal structures were computed from their supercells. Finally, we obtained 14 newly constructed crystal structures that have both the predicted and computed E v .
More details regarding each method are found in the next subsection.

Crystal structures for the training set in regression analysis
As mentioned in the previous subsection, we collected various crystal structures of oxides with cation-to-O ratios of 1 : 2 in the MPD 24 for use as training data in regression analysis and for considering their E v and E m . We set the number of atoms in the unit-cells to #30. We performed rst-principles calculations for structural optimization and computation of the E v . To identify crystal structures with reliable E v , we adopted several lters. Firstly, the crystal structures must satisfy the convergence of electronic and ionic relaxations. Secondly, they should be nonmetallic. Thirdly, they should maintain a displacement of the center of the V O site of 0.1Å at most aer structural relaxations; this is for avoiding the convergence of crystal structures into other crystal structures. On the basis of these lters, we collected 16 crystal structures of ZrO 2 with the computed E v , as summarized in Table 1. Henceforth, this set of 16 crystal structures is called "reoptimized-ZrO 2 " and each structure in the set is called by its space group type. Excluded crystal structures are summarized in Table S1 in ESI. † All the crystal structures in this study are displayed using the VESTA program. 25

Regression analysis
The linearized ridge regression (RR) 26 model uses a minimization function of the OLSR with a L2-norm penalty term that is given by where b is an n-dimensional vector of the regression coefficients of the descriptors, X is an (n Â p) descriptor matrix, y is an ndimensional vector of the target property for the training set, and l is a coefficient of the penalty term. Here, the penalty term is used for controlling the coefficients to avoid overtting. Table 2 shows the list of the DFT-unit-cell descriptors. To predict E v of the crystal structures of ZrO 2 , RR was employed with all 11 descriptors. The OLSR with three descriptors (x 01x 03 ), suggested by Deml et al., 23 was also used for comparison. Henceforth, the former and latter prediction models are referred to as RR-11-descriptors and OLSR-3-descriptors, respectively.
The training and test data were randomly divided in the ratio of 75% to 25%. To avoid overtting, three-fold cross-validation (CV) was also used. The prediction errors were dened as rootmean-square-errors (RMSE) that were averaged for 30 different random samplings of the training set.

Structure construction using an evolutionary algorithm
We used the USPEX 11,27,28 code to generate crystal structures based on an evolutionary algorithm. Here, the predicted E v was used as a tness value for minimizing the quantity |predicted E v À 5 eV|. This form was applied to focus on the search for crystal structures with E v values moderately lower than that (6.15 eV) of the ground-state P2 1 /c structure.
Each newly constructed unit-cell of ZrO 2 was set to have four Zr and eight O atoms. Sixty rst-generation crystal structures were produced using randomly selected space group types. The tness value was obtained aer structural relaxation based on the rst-principles calculations. From the second generation onwards, new crystal structures were produced by genetic operators, namely, heredity (30%), random symmetric algorithm (30%), and mutation (40%). Each generation consisted of 40 crystal structures. The evolutionary algorithm was terminated if the best-ranked crystal structure was not changed over eight generations or aer the number of generations reached 30.

First-principles calculations
All rst-principles calculations were performed using the projector augmented-wave (PAW) 29,30 method implemented in the Vienna Ab-initio Simulation Package (VASP) 31,32 within the framework of the Generalized Gradient Approximation (GGA) of Perdew-Burke-Ernzerhof (PBE) form, 33 including on-site Coulomb interaction 34 with an effective U À J of 3 eV (ref. 35) (GGA+U) for the d-orbitals of Zr. Here, we employed the GGA+U rather than the GGA to decrease the number of metallic crystal structures constructed by the evolutionary algorithm (See  Mean of |x 02 À x 02 (ground-state)| x 08 Mean of |x 03 À x 03 (ground-state)| x 09 Mean of |x 04 À x 04 (ground-state)| x 10 Mean of |x 05 À x 05 (ground-state)| x 11 Mean of |x 06 À x 06 (ground-state)| Fig. S1 in ESI †). The valence electrons occupied the 4s, 4p, 5s, and 4d orbitals for Zr, and the 2s and 2p orbitals for O. During the construction of crystal structures by the evolutionary algorithm, structural relaxations of the unit-cells (with the associated changes in lattice constants and atomic coordinates) were modeled until the interatomic force on each atom was reduced to within 0.01 eVÅ À1 . To avoid bad convergence, we constrained the number of ionic iteration cycles to the maximum of 120. The cutoff energy was set to 400 eV. The Brillouin zone was sampled by G-centered meshes with the density of 2p Â 0.12Å À1 for structural optimization such as fast screening. The calculations for the total energy and electronic density of states (DOS) of the optimized unit-cells were performed with a ner sampling of 2p Â 0.08Å À1 of the Brillouin zone. Subsequently, structural optimizations were performed for the survivors of the evolutionary algorithm again, but in tighter computational conditions: the cutoff energy was increased to 500 eV, and the Brillouin zone was sampled using G-centered 8 Â 8 Â 8 meshes.
The computed E v were obtained using the supercell method 17 as follows, where is the half-energy of an O 2 gas molecule (used as the chemical potential of O for the O-rich condition), q is the unit charge, E VBM is the energy of the valence band maximum (VBM), and E Fermi is the Fermi level, that is a variable changes between the VBM and the conduction band minimum (CBM). The chemical potential of O depends on the processing condition, but it provides a constant shi to the E v of all of the crystal structures of ZrO 2 and the general consequence related to the ranks of the E v are unchanged. Therefore, we only considered the O-rich condition in this study (See other O-rich conditions in Table  S2 in ESI †). We calculated both of the E v for a neutral oxygen vacancy ½E v ðV 0 O Þ and a doubly charged oxygen vacancy ½E v ðV 2þ O Þ. For a prediction model based on the evolutionary algorithm and regression analysis, E v ðV 0 O Þ was used as the computed E v . The calculations for the supercell with a V O were performed until the interatomic forces on each atom were reduced to within 0.02 eV A À1 with xed lattice constants. The sizes of the supercells were set such that the lengths of the three lattice constants were $10 A. G-centered 2 Â 2 Â 2 meshes were used to describe the kspace. The types of O sites were conrmed by SPGLIB implemented in PHONOPY. 36,37 For a crystal structure containing more than two types of O sites, the computed E v values were averaged.
The minimum energy path (MEP) for a V O migration was obtained by the climbing-images nudged-elastic-band (CI-NEB) method. 38,39 The E m of a V O can be dened as the energy difference between the transition state and the initial or nal state from the obtained MEP, as shown in Fig. 2. The CI-NEB calculations were performed with three intermediate images (states) until the forces decreased below 0.03 eVÅ À1 with a spring constant of 5 eVÅ À2 between images. Here, the climbing images with even numbers were used to increase the probability of nding the transition state. We employed a ðV 2þ O Þ for E m , assuming that V O were formed by the reduction of cation valency upon doping. The supercells with a ðV 2þ O Þ were described by decreasing two electrons of the background charge for compensation. The internal coordinates for the supercells with a ðV 2þ O Þ were relaxed in the same computational conditions as those for ðV 0 O Þ. crystal structures of the reoptimized-ZrO 2 in Table 1 were used to construct the prediction model. Fig. 3(a) shows the relationship between the predicted and computed E v obtained by using the OLSR-3-descriptors model. The distribution of data is largely scattered from the diagonal line that indicates equality between the predicted and computed E v . As arranged in Table 3, the CV-score and RMSE of the test set are 0.76 and 0.51 eV, respectively, which are much larger errors than the MAE of 0.20 eV for the 45 oxides. 23 The large difference between the CV-score and RMSE of the test set also indicates that the prediction model has a large dependence on the type of training set. Therefore, it implies that the prediction model using the OLSR-3-descriptors is not good enough to predict the E v of the various crystal structures of ZrO 2 .

Results and discussion
To improve the prediction accuracy, we constructed RR-11descriptors by adding more DFT-unit-cell descriptors on top of the three descriptors; these DFT descriptors are intuitively thought to be related to the forming and breaking of chemical bonds. We employed the RR model, which can control the coefficients of various descriptors through a penalty term and thereby minimize overtting. Fig. 3(b) shows the relationship between the predicted and computed E v obtained by using the RR-11-descriptors model. The scattering of points from the diagonal line is signicantly decreased. The CV-score and RMSE of the test sets are 0.17 and 0.16 eV, respectively, which are much smaller than those obtained from the OLSR-3-descriptors model. The similarity in the two values suggests the absence of any serious overtting.
Furthermore, we applied the RR-11-descriptor model with a training set consisting of all 16 crystal structures of reoptimized-ZrO 2 . In the results, the CV-score slightly decreased to 0.14 eV. This improvement can be attributed to the increased number of training data points. This value is also only 0.02 eV different from that of the RMSE of the test set of the prediction model with a 75% training set, which suggests the absence of any serious overtting. Finally, we identied the coefficients for a linear tting function applied as the tness value in the evolutionary algorithm by using the RR-11-descriptors model with all 16 crystal structures of the reoptimized-ZrO 2 .

Crystal structures with low E v
As Fig. 1 shows, we generated 462 crystal structures through combinations of the regression analysis for the E v and evolutionary algorithm, and screened out 71 crystal structures based on the criterion |predicted E v À 5| < 0.5 eV. Aer crystal relaxation once again under tighter conditions, we obtained 14 remaining crystal structures with the predicted and computed E v , excluding those that were metallic or could not satisfy the criteria for structural optimization. The detailed properties of   Table 4. Henceforth, this set of crystal structures is referred to as "generated-ZrO 2 ", and the name of each such structure is prexed with "Gen-" ("generated or constructed"); these structures are numbered in the order of the computed E v . These crystal structures have very low symmetry; most of them are P1. To identify similar crystal structures with higher symmetry, a looser tolerance for nding the space group was also applied. Fig. 4 shows the relationship between the computed and predicted E v of the crystal structures of reoptimized-ZrO 2 and generated-ZrO 2 , which are used as the training and test data, respectively. The E v of the ground-state P2 1 /c structure is emphasized with a star-shaped mark for easier viewing. Note that the distribution of the predicted E v for the 14 test-data crystal structures is in the range 5.11-6.53 eV, which is wider than the screening criteria. This is because some crystal structures are relaxed into more stable structures, with changes in the predicted E v .
The distribution of the test data is close to the diagonal line. The prediction error is only 0.11 eV, which is almost the same as the CV-score of the RR-11-descriptors model with the 16 training data (0.14 eV). This means that the E v of the 14 crystal structures of generated-ZrO 2 have been successfully predicted using only DFT-unit-cell descriptors. Note that the distribution of the 16 training data points is very close to the diagonal line in Fig. 3(b), because their predicted E v are obtained as the training data, and not as the test data. Fig. 5 shows Pearson's linear correlation coefficient (r p ) for each DFT-unit-cell descriptor with respect to the computed E v . The absolute value of r p is larger than 0.7, which indicates a strong correlation between the two properties for a given data point. 41   Because the descriptors also maintain their correlations with each other in the regression analysis, the important unit-cell descriptors for the prediction of E v can be changed according to the target materials and training data. Therefore, we suggest employing more types of unit-cell descriptors but with regression analysis using a penalty term; this is a useful way to construct the prediction model of the E v . We identify many crystal structures having E v lower than that of the ground-state P2 1 /c structure. As shown in Table 1 and Fig. 4, eight crystal structures, including the P4 2 /nmc and Fm 3m structures that are well-known high-temperature structures of ZrO 2 and that can be stabilized by extrinsic doping, 5 reveal lower E v than that of the ground-state P2 1 /c structure. In particular, the Pna2 1 structure, which is obtained from the structural data of SiO 2 , shows the lowest E v of 5.14 eV, which is almost 1 eV less than that of the ground-state P2 1 /c structure. This implies that exchanging the substituting elements in the crystal structures of other oxides can be an efficient way to reconsider a functional property. Additionally, we obtain 14 crystal structures based on the evolutionary algorithm that have lower E v than that of the ground-state P2 1 /c structure. This suggests that the discovery of a crystal structure with a special functional property can be accelerated by a combination of the machine-learning technique and the direct construction of crystal structures based on the evolutionary algorithm.
Meanwhile, we reveal why the neutral oxygen vacancy ðV 0 O Þ is mainly employed in the calculations of the E v for the prediction model. The V O in the crystal structures of ZrO 2 has been widely investigated. The P2 1 /c and P4 2 /nmc structures are known to form a deep level inside the E g based on the GGA 42-44 and HSE06 (ref. 45 and 46) calculations. Additionally, the E v of the ðV 0 O Þ in the dilute limit is physically well dened regardless of the doping level of the cation sites, and quickly converged with respect to the supercell size.
However, it is also worth investigating the relationship between    Table S3 in ESI †).
Before proceeding, it is worthwhile to discuss the comparison of the computed values (E g and E v shown in Fig. S1 in ESI †) between the GGA+U and GGA methods. It is noticeable that the E g obtained by the GGA+U are larger than those by the GGA, which implies that the GGA+U is useful to reduce the underestimation of the E g . For the computed E v , it is found that the E v obtained by the GGA are slightly smaller or almost the same as those obtained by the GGA+U. This suggests that the GGA+U does not lose its reliability compared with the GGA, despite including the empirical U parameter.

Crystal structures with low E m
We summarize the E m values of the Fm 3m and the P4 2 /nmc structures obtained herein and in former computational studies 42,47-50 in Table 5. Former computational studies have mainly used the GGA, rather than the GGA+U. To compare the results, we also employ the GGA with the PBE form. 33 The E m values obtained herein using the GGA are almost the same as those mentioned in previous computational reports. The E m values obtained using the GGA+U are larger than those obtained using the GGA for both the Fm 3m and P4 2 /nmc structures. We also compare the E m values of ðV 0 O Þ and ðV 2þ O Þ. Table 5  Hereaer, we will refer to the E m obtained by the GGA+U calculations.
In the case of the initial-state structure for the CI-NEB method, we employed the relaxed supercell possessing a ðV 2þ O Þ having the lowest energy among the different O sites. For the 14 crystal structures of reoptimized-ZrO 2 that have well-converged structures with a ðV 2þ O Þ, we considered various nal-state structures that have the V O at the nearest-neighboring O site with the cutoff distance of 3.5Å for the CI-NEB method (see Fig. S2 in ESI †). Among the E m values of different migration paths, the lowest value was extracted; we then checked whether the V O could migrate to the original site in the nearestneighboring supercell via the one type of migration path with the lowest E m , because the migration of the V O through the bulk system is practically meaningful. In 12 of the 14 crystal structures, V O could migrate to the original site in the nearestneighboring supercell via the migration path with the lowest E m . However, for the P4/n and R 3 structures, a V O needs different paths with different E m values to migrate to the  original site in the nearest-neighboring supercell. We dene the E m for these structures as the minimum barrier energy required for migration from the original site to that in the nearestneighboring supercell with several kinds of CI-NEB calculations. The energy diagrams for the migration of a V O are shown in Fig. S3 and Table S4 in ESI. † Fig. 7 and 8 show the relationships between the computed E v and E m values, and between the relative DE f (compared with that of the ground-state P2 1 /c structure) and E m values of various ZrO 2 crystal structures, respectively. Firstly, we compare the E m of the 14 crystal structures of the reoptimized-ZrO 2 . The E m values (0.39 and 0.34 eV, respectively) of the Fm 3m and P4 2 /nmc structures are lower than that (0.61 eV) of the ground-state P2 1 /c structure. Ten crystal structures, including the Fm 3m and the P4 2 /nmc structures, show lower E m than that of the ground-state P2 1 /c structure.
Among these ten, eight crystal structures have only 0.05 eV per atom higher DE f than the ground-state P2 1 /c structure (Table  1 and Fig. 8). This suggests that high s O can be realized for these structures with stabilizing methods such as extrinsic doping and the formation of epitaxial heterostructures. The P4 2 /nmc and Fm 3m structures are conrmed to have both relatively low E m and DE f . The Pbcn structure, which also shows relatively low DE f , is reportedly produced by biaxial tensile strain 53 on uoritestructured CeO 2 , which forms the (100) interface. This structure, obtained by 7% tensile strain, is also reported to show a higher O diffusivity than that of YSZ based on rst-principles molecular dynamics calculations. 54 The I4 1 /amd, Pbca, Pca2 1 , C2/c, and P2 1 /m structures also have the potential for stabilization, although they have somewhat larger E m than the P4 2 / nmc and Fm 3m structures do. These crystal structures have not been investigated as oxygen-ion conductors to the best of the authors' knowledge; however, some of them have been reported as metastable structures. The Pbca structure was experimentally synthesized at 600 C and 6 GPa. 55 This structure was also conrmed by rst-principles calculations to be stabilized by hydrostatic pressure over 4.1 GPa. 56 Mg-doped tetragonal ZrO 2 cooled to $30 K reportedly transforms to the orthorhombic structure with the space group of Pbc2 1 (Pca2 1 ). 57 The E m values for the 14 crystal structures of the reoptimized-ZrO 2 have similarly large r p of 0.64 and 0.62 with the computed E v ðV 0 O Þ and E v ðV 2þ O Þ, respectively. The E v is the energy required for breaking all the chemical bonds to separate an O atom from an oxide, and the E m is the energy required for breaking some of the chemical bonds locally in an oxide, as  shown in Fig. 2. Therefore, this correlation may be reasonable in terms of chemical bond cleavage. Correlations between E v and E m have also been reported for other oxides, like perovskite oxides 58 with various kinds of constituent elements and a multinary oxide of Ba 1Àx Sr x Co 1Ày Fe y O 3Àd . 59 On the other hand, the relative DE f and E m do not show any correlation, having an |r p | of only 0.10, which implies that the relative DE f is not a good single descriptor for the E m . To realize crystal structures with low values for both relative DE f and E m , special challenges in processing may be necessary.
The above information suggests that we are likely to identify a low-E m crystal structure when we initially investigate crystal structures with low E v . Therefore, for E m calculations, we choose the four crystal structures (Gen-01, Gen-02, Gen-07, and Gen-08) of the generated-ZrO 2 that possess the lowest E v values in ascending order. The crystal structures between Gen-03 and Gen-06 are not chosen because we expect these structures to have properties similar to Gen-02 (see Fig. S4 in ESI †). The crystal structures from Gen-09 to Gen-14 are also not considered for the E m because they may have relatively high E m values.
The E m values of these crystal structures from the generated-ZrO 2 are also shown in Fig. 7 and 8. The E m values of Gen-01, Gen-02, and Gen-08 are 0.42, 0.39, and 0.28 eV, respectively. Therefore, we succeed in constructing crystal structures with lower E m as well as lower E v than those of the ground-state P2 1 /c structure.
It is worth analyzing the Gen-08 structure further because it is very similar in crystal structure and energetic properties to P4 2 /nmc, despite its space group being P1. Fig. 9 shows the crystal structures of the P4 2 /nmc and Gen-08 structures. We conrm that Gen-08 and P4 2 /nmc do not become the same by structural optimizations with denser k-space sampling. However, the space group dened by using looser tolerance and structural properties, such as the radial distribution function and the bond lengths (see Fig. S4 in ESI †), imply that these crystal structures are quite similar; the Gen-08 structure is slightly distorted in terms of its bond lengths and lattice parameters (see Table S5 in ESI †). In the macroscopic view, these crystal structures may be difficult to distinguish from each other. For the construction of the Gen-08 structure, we only prepared the prediction model of E v by the regression analysis and evolutionary algorithm that requires the initial structurerelated input of only the number of constituting atoms of Zr and O in empty space. Therefore, the construction of Gen-08, which has a similar crystal structure to P4 2 /nmc, implies a successful result of a combination of evolutionary algorithm and regression analysis.
The Gen-07 structure, possessing an energy of 0.2 eV per atom higher than that of the ground-state P2 1 /c structure, is too unstable to contain a ðV 2þ O Þ. This suggests that the energetic stability and the low E v of the crystal structures should be considered in choosing candidates for low E m . The Gen-01-Gen- 07 crystal structures have energies more than 0.1 eV per atom higher than that of ground-state P2 1 /c structure, despite having E v lower than 5.7 eV. On the other hand, the crystal structures Gen-08-Gen-13 have less than 0.03 eV per atom higher energy than the ground-state P2 1 /c structure, although they have higher E v values. The latter group may have higher stabilization potential than the former.
Before concluding, we note our exclusion of dynamical stabilities based on the phonon calculations with harmonic approximations 36,37 from this study. The P4 2 /nmc and Fm 3m structures are known to be stabilized at temperatures exceeding $1173 C and $2370 C, respectively. However, as Fig. S5 in ESI † shows, the P4 2 /nmc structure includes no imaginary phonon frequencies, whereas the Fm 3m structure includes imaginary phonon frequencies despite its capacity for hightemperature stabilization, which is consistent with the phonon calculations by Wang et al. 60 Therefore, anharmonic phonon effects, 61,62 which are particularly signicant at high temperatures, must be considered for these materials.

Conclusion
We have constructed a prediction model for the E v values of various crystal structures of ZrO 2 using a linearized RR with 11 unit-cell descriptors. The predicted E v was used as the tness value for crystal structure constructions based on the evolutionary algorithm in order to mainly construct crystal structures having E v lower than that of the ground-state P2 1 /c structure. Our prediction model guarantees prediction errors of less than $0.15 eV for various crystal structures of ZrO 2 .
We also obtained the E m values of various crystal structures of ZrO 2 , and found that this property is correlated with E v . On the basis of this correlation, we calculated the E m values of several newly constructed crystal structures having low E v , and conrmed that these E m values are also lower than those of the ground-state P2 1 /c structure and lower than or similar to those of the P4 2 /nmc and Fm 3m structures. We have successfully discovered various crystal structures with low E v and E m values for ZrO 2 by the direct construction of crystal structures based on a combination of evolutionary algorithm and regression analysis and by reconsidering the crystal structures present in other oxides. Our DFT-unit-cell descriptors are constructed on the basis of the properties related to the chemical bonds between cations and O. Therefore, the good accuracy of the prediction model for E v and the correlation between the E v and E m may be reasonable because these properties are all related to the chemical bond cleavage. We expect that the method employed in this study can be applied to determine unknown low-E v and -E m crystal structures of other material systems, which have not been investigated closely. In addition, the strategy described herein can be also applied to search for crystal structures with other functional properties.

Conflicts of interest
There are no conicts to declare.