Accessing complex reconstructed material structures with hybrid global optimization accelerated via on-the-fly machine learning

The complex reconstructed structure of materials can be revealed by global optimization. This paper describes a hybrid evolutionary algorithm (HEA) that combines differential evolution and genetic algorithms with a multi-tribe framework. An on-the-fly machine learning calculator is adopted to expedite the identification of low-lying structures. With a superior performance to other well-established methods, we further demonstrate its efficacy by optimizing the complex oxidized surface of Pt/Pd/Cu with different facets under (4 × 4) periodicity. The obtained structures are consistent with experimental results and are energetically lower than the previously presented model.


Introduction
The rational design of highly efficient materials requires an atom-level understanding of their structure-performance relationship. [1][2][3][4][5] However, under working conditions, most materials undergo a structural reconstruction accompanied by an unpredictable performance. [6][7][8][9] For example, some bimetallic catalysts like Au-Ag alloys can exhibit dynamic geometrical and compositional reconstruction during the reaction, which generates active sites to boost performance. 6,[10][11][12][13][14] In contrast, under high voltage, metal catalysts can be partially oxidized, which results in destabilization and higher dissolution of the active species. 15,16 Surface reconstruction sensitively varies with the nature of unreconstructed surfaces and the compositions/ concentrations of adsorbent. 17,18 Due to the difficulty to model a reconstructed surface, it is a generally adopted, but improper, practice to oversimplify a complex surface when modeling. For example, to model a partially oxidized surface, some studies place adsorbed oxygen, 19 or only a layer of metal oxides, 20 upon a metal surface without considering reconstruction. 21 To achieve the rational design of highly efficient materials, it is essential to properly model their reconstructed structures to understand their real structure-performance relationship.
Reconstructed surfaces tend to adopt the most thermodynamically stable structures, 22 which is the lowest point on the potential energy surface (PES), the so-called global minimum (GM). 18 Finding the GM of a working material constitutes a global optimization (GO) problem. There has been signicant progress in developing GO algorithms for chemical structure optimization. However, most of them are mainly developed for isolated particles like crystals, clusters, or supported clusters, with simplied models deemed to be sufficient for most investigations. [22][23][24] The optimization of the surface system is more geometrically restricted than that of isolated particles, owing to the periodicity and the presence of strong covalent bonds between surface atoms and the underlying support, 25 and generally more atoms are required for reliable modeling.
Two major difficulties should be considered for globally optimizing surface structures: how to efficiently explore the highly complex PES, and how to reduce high computational costs caused by massive local relaxations used to describe the PES. 26 Indeed, some GO algorithms that were originally developed for isolated particles have been applied to surface structures. 27,28 However, their efficiency has not yet been examined systematically. Previous reports have expressed concern about the efficiency of the genetic algorithm (GA) that two good parent structures may produce poor candidates with high energy. 29,30 The insufficient efficiency also limits the model size for optimizing a surface system, as most studies are conducted generally with no more than (2 × 2) periodicity. [31][32][33][34][35][36][37] It is even unreliable since no known criterion guarantees that the "best structure" encountered by a GO algorithm is the GM that reects the reconstructed structures. 38 The absence of a criterion is primarily attributed to the inherent limitations in the spatiotemporal resolution of characterization techniques, resulting in a dearth of prior knowledge regarding the precise atomic-level structure of a reconstructed surface. 39 This situation poses a signicant challenge to the integration of machine learning (ML) aimed at mitigating the computational burden associated with GO. It is dangerous to rely solely on a pre-trained ML calculator that suffers extrapolation problems, as the GM can correspond to a very narrow basin of the PES and can hardly be involved in the training set. 40,41 Previous research on ML-involved structural searches, using techniques without extrapolation design, such as neural networks, has generally been randomly generated or an already built database, 42 or adjusting the atomic position is very constrained during the search. [43][44][45] Even for on-the-y ML frameworks with sampling design such as Bayesian optimization, 46 if they are applied solely without incorporating near-GM structural features into the training set, their effectiveness can be compromised. 47 This paper describes a new strategy for the global optimization of complex catalytic surfaces using a hybrid evolutionary algorithm (HEA) that combines differential evolution (DE) and genetic algorithms with a co-evolution framework. An on-the-y machine learning calculator based on Gaussian processes is adopted to complement local evaluations and expedite the identication of low-lying structures. We demonstrate the HEA method in obtaining the complex surface oxide structure of different facets of transition metals like Pt, Pd, and Cu using a (4 × 4) supercell. The globally optimized structures are lower than previously reported theoretical modeling and are consistent with experimental observation, providing important clues for the rational design of catalysts.

HEA methods
A owchart of the HEA program is shown in Fig. 1. A "tribe" framework is adopted in the HEA program to simulate the real evolutionary process in nature. Specically, several optimization processes (each is considered as a "tribe") are concurrently run with a periodic exchange of the most stable members among tribes. Firstly, an initial set of structures is generated at random within appropriate limits for a (small) cell shape and interatomic distances, which is then enlarged to the required periodicity and relaxed at the DFT level. Thus, the generated structure has high symmetry and is more likely to be stable. 48,49 An ML calculator based on a Gaussian process (GP) regressor is trained on-the-y using relaxed structures to expedite the identication of low-lying structures, separately for each tribe.
For the offspring population in each tribe, excessive offspring candidates will be generated by a hybrid evolutionary operator: one is to produce offspring candidates from twoparent structures by the GA operator introduced by Deaven and Ho. 29 Note that mutations (permutation, rattle, and mirror) are applied with pre-set probability for a newly generated structure. 51 Another is to perform the DE operator, and three strategies are introduced: 52 (1) where the new structure X de is generated by a linear combination between several randomly selected parent structures (X r 1 , X r 2 , X r 3 ,.) and a scaled difference (controlled by scaling factor F [0, 2]) between other donor structures. X best represents the most stable structures in the parent population. Eqn (1)-(3) can be denoted as "DE/rand/1", "DE/best/1" and "DE/rand-to-best/ 1" respectively.
The generated candidates will rst be evaluated by the GP regressor. In practice, the GP regressor needs to deal with an unrelaxed structure. Considering time consumption and memory problems for force prediction, the GP regressor is directly trained and performed using unrelaxed structures with their relaxed energy. 47,53,54 The GP regressor was implemented using the GPyTorch Python library. 55 A Gaussian process uses Bayesian inference and assumes that the prior distribution for the data can be given by a multivariate normal distribution, while its task is to infer the Gaussian posterior distribution p(E * jX,y,X * ) for the unseen datapoint X * (waiting for exploration) based on the observed training set (X, E q ). X is taken to be the feature vector rather than the Cartesian coordinates, where the many-body tensor representation (MBTR) descriptor is adopted by the DSCRIBE soware package. 50,56 MBTR descriptors provide whole-system representations of periodic systems with the locality of chemical interactions being exploited. To reduce complexity for representing large surface systems, only the top three layers (that are more prominent to reconstruct) are selected for training, while the bottom layer (representing the bulk structure) is excluded. For more realistic modeling, the target value y differs from E q by adding an independent identically distributed Gaussian noise, 3 $ N ð0; s n 2 Þ. The key predictive equations for the GP regressor are 57-59 where The predictive mean E * of the distribution is the estimation of the potential energy, while the variance cov(E * ) can be an uncertainty quantication. K($,$) denotes the covariance function (or kernel) matrix that is used to characterize the similarity between samples, which is the very heart of the GP regressor. In our study, the covariance function was chosen to be a sum of a Matern kernel and a linear kernel as where G is the gamma function, d(x i ,x j ) is the scaled distance between x i and x j , n is a smoothness parameter where smaller values are less smooth, K n is a modied Bessel function and a variance parameter. The Matern kernel is a generalization of the Gaussian kernel that has proved to have an advantage in high dimensional inputs, 60,61 and its synergy effects with the MBTR descriptor have been proven previously. 50 The optimal hyperparameters Q* are determined by the log marginal likelihood as The Thompson sampling (TS) method is used as the acquisition function that leverages the uncertainty in the posterior to guide exploration, a randomized strategy that samples a reward function (that is relative to potential energy) from the posterior and queries the structure x n+1 with the highest reward. 62,63 Fig. S1 † shows an illustrative example of a TS-guided on-the-y structure search in a non-convex search space.
x n+1 = arg max f w (a) where w ∼ p(yjX,Q) Equipped with this trained GP regressor, we choose a batch of multiple candidates, namely the batch Thompson sampling (B-TS) method. Different from the lower condence bound (UCB) function used in similar studies, 47,59,64 the B-TS method can naturally trade-off between the exploration and exploitation of the PES with no free parameters, thus avoiding the damage of efficiency caused by an inappropriate parameters setting of the UCB function, 41 the effectiveness of which in searching chemical space has been demonstrated and reported before. 63 Only these "most promising" structures are evaluated at the DFT level. The population is then updated under the 'survival of the ttest': a certain number of the most stable structures from the current (parent + offspring) population are kept, while others are eliminated. 22 Nonetheless, all DFT-evaluated structures are added to the training dataset, and the GP regressor is re-trained on-the-y.

Results and discussion
Performance of the HEA method Fig. 2(a) shows the optimizing performance of (4 × 4) 0.75 ML O-Pt(111) as a function of the number of local evaluations, among HEA (with different settings) and other well-established methods, such as GOFEE 59 and SSW, 65 for surface systems, where the HEA program (with and without the GP regressor) achieves the highest performance. With the on-the-y GP regressor, the HEA program can be further accelerated: the number of local evaluations decreased almost threefold (which saves around 3600 local evaluations) to reach the same energy level, and the GM found eventually is much lower. A dimensionally-reduced visualization of this accelerated performance is presented in Fig. 2(b), showing that the search with the GP regressor is closer to the GM region than that without the GP regressor aer a certain number of generations. The superior performance of the HEA program is also presented in Fig. S3(a) † for the optimization of the (2 × 2) surface. Aer the supercell, the obtained (2 × 2) structure is 0.28 eV per O-site less stable than the optimized (4 × 4) structure, shown by the green dot of Fig. 4(a), highlighting the necessity of directly optimizing in a bigger periodicity. As the structural search for a (4 × 4) surface requires even 10 times more local relaxation than that of a (2 × 2) surface, an accelerated module like the GP regressor is desired.
The high efficiency of the HEA program stems from three features: rstly, in Fig. 3a and b, the newly introduced DE operators are much more effective and stable at generating lower energy congurations compared with the GA operator. Mirror and permutation mutation, which is widely used in the optimization of isolated particles, performs poorly when dealing with the surface system. As a result, in Fig. 2(a), GA only cannot efficiently optimize the (4 × 4) surface, which is exceeded by the HEA program combining both GA and DE operators. Secondly, the (on-the-y) accuracy of the GP regressor produces reliable prediction of (unrelaxed) candidates. In Fig. 3(b), despite the accuracy loss, the MAE for both the models trained using relaxed and unrelaxed structures is below 10 meV per atom aer around 10 generations (where the number of data points for the training set is 750).
A relatively correct ranking of the offspring candidates is achieved, as shown in Fig. 3(d) through Kendall's s coefficient, demonstrating a reliable sampling during the search. Fig. 2(b) also reects that the MBTR descriptor contains the relevant structural information for approximating the energy, which is the prerequisite for an accurate GP regressor. Finally, introducing the "tribe" framework not only helps maintain a high structural diversity for each tribe, 48 but also enhances the efficiency of the GP regressor. Fig. 3(b and d) show that the accuracy is improved, and the sampling is more targeted. While the sparse GP regressor does not provide enough accuracy (Fig. S5 †), here a GP regressor with a full covariance matrix is used that requires o(N 3 ) computational time and o(N 2 ) memory space for Cholesky decomposition. Thus, it becomes expensive and its numerical stability is degraded for a large dataset. 66,67 Dividing the training dataset using the "tribe" framework naturally avoids these problems as shown in Fig. 3(c). All the above features contribute to the enhanced structural searchability of the HEA program.

Application in reconstructed oxide structures
The efficacy of the HEA program is further demonstrated in modeling the reconstruction of metal oxides, knowledge of which has been limited because of their complexity and the scarcity of surface-sensitive characterization techniques. 18,68 Global structural optimization is thus necessary to be applied. As a proof of concept, the complex surface oxides of different metals (Pt, Pd, and Cu) are studied here using the HEA program.
Pt can undergo irreversible restructuring under reaction conditions, due to the surface oxidation and subsequent Pt dissolution, which is thought to decrease catalytic efficiency and durability. 19,69-72 However, the atomic-level modeling of the Pt oxidation remains uncertain. 73 The optimized structures of 0.75 ML O-Pt(111), which are thought to exist at around 1.0-1.2 V RHE during electro-oxidation, 73,76 are shown in Fig. 4(d and e). The structures consist of two interconnected, protruding square planar PtO 4 units that are 1.7 Å in height. It is consistent with the scanning tunneling microscopy (STM) in Fig. 4(b and c) that the oxidized Pt(111) surface consists of a network of mono-atom-high (1.7 Å), wormshaped islands. 75,77 The surface oxidation state of PtO 4 units is between that of PtO (Pt 2+ ) and Pt 3 O 4 (Pt 2.7+ ), which exactly ts the in situ XANES showing that the oxidized Pt surface formed at >1.0 V presents square-planar PtO 4 units with Pt in a slightly higher oxidation state than in PtO. 78 With a similar oxidation state, the PtO 4 units also resemble the PtO and Pt 3 O 4 bulk oxide shown in Fig. S6. † Note there is no known criterion guaranteeing that the "best structure" encountered by a GO algorithm is the "true" global optimum. 38 Nevertheless, a lower energy allows our obtained structure to have a greater possibility to be the most abundant and representative phase under reaction conditions. 22 Pd and Cu are widely used catalysts but both suffer from severe reconstruction under the reaction conditions. 83-85 A typical characteristic for oxidized Pd(111) is the "Persiancarpet" pattern observed in STM, with which the simulated STM image based on the optimized structure is well consistent, as shown in Fig. 5b. Fig. 5c-f show that the optimized structure consists of several parallel chains with different features from the PdO 2 to the PdO 4 unit. Such multiple co-existing oxygen species have been observed by in situ XPS studies. 86 Our optimized structure is 0.35 eV per O-site lower than the structure searched by USPEX as shown in Fig. S4, † and 0.61 eV per O-site lower than CALYPSO as reported previously. 87 Compared with the CALYPSO model, we both presented the existence of subsurface O atoms that have been experimentally reported, 88 while the CALYPSO model failed to further contain parallel chains that form a "Persian-carpet" pattern. Fig. 5(h and i) show that the oxidized Cu(100) forms via the creation of the Cu-O chain that resembles the bulk Cu 2 O. 89 Experimentally, a Cu 2 O signal has been observed during the initial oxide growth of Cu(100) 90 along with the missing-row reconstruction (MRR) in Fig. 5(g), 82,91,92 which is consistent with the simulated STM image based on our optimized structure. Such MRR is reported widely for Cu oxidation, which is caused by the increasing surface stress and has been previously proven to be energetically favorable. 82,91 Subsurface oxygen is also contained in our structure, linking with the Cu-O chain through O-Cu-O units, which is believed to form above 0.5 ML O coverage experimentally, 92,93 and to contribute to the increased stability of the MRR structure. 94 Our optimized structure is 0.31 eV per atom more stable than the well-known (2O2 × O2)R45°reconstructed Cu(100) model 80,95 that also presents an MRR structure but fails to reect the experimentally observed formation of Cu 2 O. 96,97 Similarly, the oxidized Cu(110) also consists of the parallel, added-row Cu-O chains, whose simulated images are consistent with both experimental STM and TEM observation as shown in Fig. 5(j-n). 81,82,98 Conclusions In summary, we present a new strategy for global structural optimization using a HEA that combines DE and GA with a "tribe" framework. This algorithm combines the ability of the GA to explore the PES and the ability of the DE to exploit the PES. In practice, the HEA program performs better than wellestablished methods for optimizing surface systems. The high efficiency stems from the newly introduced DE operators that are effective in generating lower energy congurations, an efficient GP regressor that expedites the identication of low-lying structures, and a multi-tribe framework that maintains a high structural diversity. We demonstrate the efficacy of the HEA

Data availability
The data that support the ndings of this study are available within the article and its ESI, † or from the corresponding author on reasonable request.

Conflicts of interest
There are no conicts to declare.