Zijiang
Yang
,
Hanghang
Chen
and
Maodu
Chen
*
Key Laboratory of Materials Modification by Laser, Electron, and Ion Beams (Ministry of Education), School of Physics, Dalian University of Technology, Dalian 116024, P. R. China. E-mail: mdchen@dlut.edu.cn
First published on 12th April 2022
There has been increasing attention in using machine learning technologies, such as neural networks (NNs) and Gaussian process regression (GPR), to model multi-dimensional potential energy surfaces (PESs). A PES constructed using NNs features high accuracy and generalization capability, but a single NN cannot actively select training points as GPR does, resulting in expensive ab initio calculations as the molecular complexity increases. However, a PES constructed using GPR has a slow speed of evaluation and it is difficult to accurately describe a fast-changing potential. Herein, an efficient scheme for representing globally accurate reactive PESs with complex topography based on as few points as possible by incorporating active data selection of GPR into NN fitting is proposed. The validity of this strategy is tested using the BeH2+ system, and only 1270 points are automatically sampled. The generalization performance and speed of evaluation of the generated PES are much better than those of the GPR PES constructed using the same dataset. Moreover, an accurate NN PES is fitted by 12122 points as a benchmark for comparison to further test the global accuracy of the PES obtained using this scheme, and the corresponding results present extremely consistent topography characteristics and calculated Be+(2S) + H2 reaction probabilities.
In recent years, there has been increasing attention in representing high-dimensional PESs using machine learning algorithms.13–18 Among these methods, it is worth mentioning that the artificial neural network (NN)19 is a powerful and robust tool for fitting high-quality PESs of reactive systems in the gas phase and the interaction of molecules with surfaces.20–37 NNs can give an analytic form about the nuclear coordinates and energies by minimizing the cost function to obtain the optimal parameters of every neuron. However, the implementation of the NN approach is essentially based on the ab initio points selected in advance, and it becomes exponentially difficult to perform electronic structure calculations in enormous configurations as the complexity of the system increases. There have been widespread studies on how to saturate the energy points in a large configuration space based on the NN fitting. Raff et al. developed the use of trajectories and distance between ab initio points to choose new configurations.38 Behler proposed the sampled scheme of the configuration space by the large discrepancy between two different NN fits.39 Lin et al. put forward an uncertainty-driven strategy to automatically construct multidimensional NN PESs.40,41 They used the weighted negative squared difference surface between two independent NN structures as the uncertainty metric to search and add new data at the less reliable region of PES. These ways in generating an optimal dataset automatically amount to active learning, a machine learning algorithm with an efficient decision on the selection of training data.
Another popular machine learning method in modeling PESs is the kernel-based Gaussian process regression (GPR).42 Unlike NNs, GPR is a nonparametric model without a specific functional form, and it provides a statistical estimate of the energy value at an unknown configuration based on the pre-existing ab initio points. Benefiting from the Bayesian modeling, GPR can construct accurate PESs with fewer data points, and it has been successfully applied in multiple molecular systems.43–56 An important advantage of the GPR model is that it can explicitly provide the predictive uncertainty at an unknown configuration via variance, providing a straightforward active learning scheme to sample data points. Guan et al. utilized this unique feature to reproduce the H + H2O ↔ H2 + OH and H + CH4 ↔ H2 + CH3 reactive PESs by continually adding new energy points with the maximum of variance,57 and the two reliable PESs were obtained only by assembling 920 and 4000 points, respectively. Uteva et al. used three active learning strategies to determine the training sets of GPR in generating intermolecular PESs for CO2–Ne, CO2–H2, and Ar3 systems,58 and their studies further demonstrate the high efficiency of GPR in sampling data. Therefore, using the GPR method to represent PESs can save a mass of ab initio calculations, and it is more convenient and efficient for data acquisition than the NN model.
On the other hand, the speed of evaluation is also a very important factor for the assessment of constructed PESs. Reactive PESs obtained by the GPR approach are much slower to evaluate than the NN fitting because the product of two vectors with the size of the number of training points n needs to be evaluated numerically, scaling as O(n), which restrains the subsequent dynamics calculations to a great extent when the value of n is relatively large. In contrast, the numerical evaluation of NN PESs only depends on the number of layers and neurons of the network, thus it is very fast to access the energy values of the arbitrary configurations. Moreover, there exists the risk of ill-conditioned covariance matrices when the training set contains very close points or rapidly varying energy values, indicating that the GPR model is not suitable for constructing reactive PESs with obvious well or barrier structures owing to the requirement of dense points in these regions.
To sum up, a pure NN model can represent accurate PESs, but it cannot actively select training points as GPR does, thus the corresponding electronic structure calculations become much too expensive as the molecular complexity increases. PESs generated by the GPR method can significantly save the number of ab initio points compared with the NN fitting, but the speed of evaluation is slow and the regions with fast-changing energy cannot be accurately reproduced, thus the GPR model is usually used to construct PESs near the equilibrium geometry rather than global reactive PESs. Therefore, constructing globally accurate multi-dimensional PESs with a rapid speed of evaluation remains a challenge by using a single model. Combining the advantages of the above two machine learning algorithms could be a good idea to overcome this difficulty. In this work, we propose a new scheme for representing reactive PESs with complex topography by incorporating the active data selection of GPR into the NN fitting, which can generate accurate analytic PESs with a rapid speed of evaluation by sampling as few ab initio points as possible. First, a small initial dataset is selected to produce a rough GPR PES; and then the new ab initio data are added to the training set by searching the maximum in the variance space, and this procedure is automatically repeated until obtaining the convergent value of the highest predictive variance; finally, the NN method is used to construct the analytical PES based on the finally determined training points. We test the validity of this scheme in a simple triatomic Be+(2S) + H2 reactive system. The main reason is that accurate ab initio data can be obtained due to the relatively small number of electrons of this system, which is critical for testing a new approach for constructing PESs. Moreover, compared to H3, LiH2, or other simpler systems, the ground state BeH2+ PES features more complex topography characteristics, including wells, barriers, cusps formed by the avoided crossing, and rapidly changing energies in some regions. Therefore, the Be+(2S) + H2 system is a suitable candidate for examining the feasibility and universal applicability of this new strategy. The remainder of this paper is as follows: Section 2 gives the detailed methodology, including the ab initio calculations, searching points in GPR and NN fitting. In Section 3, the results and discussion are presented, which prove the effectiveness of this scheme in representing reactive PESs with few points. Section 4 concludes this article.
This new scheme that combines the actively selecting points of GPR and the NN fitting is a universal approach for the construction of multidimensional reactive PESs, and the basic procedure of this strategy is illustrated in Fig. 1. For the triatomic system, the initial configurations can be sampled in the Jacobi coordinates by using the Latin hypercube sampling (LHS) approach,63 which can avoid clustering and the samples cover the whole coordinate region in each degree of freedom. It is important to note that using the Jacobi coordinates to sample is not suitable for polyatomic reactive systems due to the significantly increasing complexity. In addition, a previous study58 suggests that the LHS has limitations for large systems. As the system's dimensions increase, the initial configurations can be sampled along each reaction pathway with the other coordinates fixed at the corresponding transition states, which are determined by relatively low-level ab initio calculations, ensuring that the dynamically relevant configuration space is covered. A preliminary GPR PES is constructed based on the initial energy points. Next, the new ab initio data are added to the training set by searching the configuration with the highest predictive variance by this GPR PES in a large coordinate space, and then a new GPR PES is constructed using the updated training dataset. This process is repeated automatically until obtaining the convergent maximum of predictive variance, and the training set is eventually determined. In the final step, the analytical PES is represented by the NN fitting based on this training set. To ensure the important permutation symmetry of the generated PES, the permutation invariant polynomial (PIP)9 is adopted as the input of GPR and NN models, namely, the PIP-GPR53 and PIP-NN21,24 methods are used in the process of constructing PES.
Next, the basic theory and important equations involved in this strategy are described. The Gaussian process is a kernel-based non-parametric supervised machine learning algorithm, which can be regarded as a limit of the Bayesian network with an infinite number of nodes. The detailed description of GPR can be found in the relevant literature,42 and here we only give a brief introduction about its application in active data selection for representing PESs. Supporting the initial dataset containing n configurations X = [x1,…, xn], where the inputs of xi are the PIPs constructed by the interatomic distance, and the outputs y = [y1,…, yn] of the GPR model are the corresponding normalized potential energies. The joint multivariate Gaussian distribution can be expressed as follows:
(1) |
(2) |
The goal of training a GPR is to obtain the parameters of kernel function by maximizing the following logarithm of the marginal likelihood,
(3) |
A new data point (x*, y*) to be predicted also follows the prior distribution of eqn (1),
(4) |
μ(x*) = [K*K + σn2I]−1y | (5) |
σ2(x*) = K** − K*T[K + σn2I]−1K* | (6) |
NNs consist of layers of interconnected mathematical function simulating biological neurons, and it can present a flexible form with arbitrary precision. There has been increasing interest in generating multi-dimensional PESs with NNs, and for more details please refer to the relevant reviews.14,15 For the studied system, the NN structure contains two hidden layers with 11 neurons. The analytical form in expressing the relationship between inputs and outputs is written as follows:
(7) |
Fig. 2 The highest predictive variance and the RMSE of the test database as a function of the number of training points in the GPR training for the BeH2+ system. |
Fig. 3 shows the difference between the predicted energies and the ab initio results in the test database, and the values of predictive energy are obtained on the PESs constructed using the PIP-GPR model and the proposed new approach with 1270 training data. The energy values of the abscissa axis are relative to the dissociation limit of triatomic Be+–H–H. The test RMSEs of the PES represented by the pure PIP-GPR model and this new method are 6.12 meV and 1.80 meV, respectively, implying the accuracy of the PES is remarkably improved by using an additional NN fitting after actively selecting points with the GPR model. In the new method, the final analytic PES is fitted by the PIP-NN scheme, which can yield extremely accurate PESs, such as the recently reported ultracold reactive system of KRb + KRb → K2 + Rb2,65 which presented an RMSE of only 1.86 cm−1. The Be+(2S) + H2 reactive PES features multiple wells, barriers, and cusps formed by the avoided crossing effect of the first excited state,66 which go against the fitting accuracy, and too small RMSE can also increase the risk of long-range potential for the tested system. Although the fitting RMSE does not reach the order of spectroscopic accuracy, the accuracy of the obtained PES is sufficient for dynamics studies on the endothermic reaction of Be+(2S) + H2. It can be seen from the distributions of the test errors that both the methods can give accurate predictions in the energy region below −4 eV. However, the PIP-GPR PES presents large predictive errors in the region of relatively high energy. On the contrary, the PES generated by the new strategy keeps a very small predictive error in the whole energy region. The absolute highest error values in the test dataset for the PIP-GPR method and the new model are 179.6 meV and 11.3 meV, respectively, suggesting that the generalization performance of the new approach is much better than that of the pure GPR model for the construction of PESs with complex topography.
Fig. 3 Predictive error distributions in the test database of the PESs constructed by the PIP-GPR model (1270 points) and the new method (1270 points) for the BeH2+ system. |
The speed of evaluation is also a very important aspect to assess PESs, and it directly affects the efficiency of the subsequent quantum scattering calculations. The evaluation of NN PESs depends on the number of neurons, and any two layers are linked by a simple function; thus, the speed of evaluating a NN PES is very fast. For the GPR model, as shown in eqn (5), it requires to calculate the covariance matrixes between the predicted configurations and all of the training data and the product of vector–vector with n dimensions, so the efficiency of predicting a new data point is very low when the number of training data is relatively large. We use a single core on a single compute node (Intel(R) Core(TM) i5-1035G1 CPU@1.00 GHz) to evaluate 100000 potential values on the PESs modeled using different schemes, and the evaluation times for the PIP-GPR PES and the PES constructed by the new method are 25.41 s and 7.37 s, respectively. The time cost is saved around three times than the pure GPR method, and this ratio will increase remarkably when the training set contains more data points.
Three-dimensional BeH2+ PESs constructed using the new scheme at four different Be+–H–H angles (45°, 90°, 135°, and 180°) are plotted in Fig. 4. The PES is smooth in the whole coordination space and no non-physical structures are present for every angle, demonstrating that there is no overfitting during PIP-NN training. The right valley and left valley on the PES correspond to the Be+(2S) + H2 channel and BeH+ + H channel, respectively. It can be seen that there exist a well and a barrier on the PES for each Be+–H–H angle, and the two structures become less obvious with the increase of the Be+–HH angle. The topography characteristics of BeH2+ PES imply the complicated changes between the nuclear configuration and energy.
Fig. 4 Three-dimensional BeH2+ PESs constructed by the new method at four different Be+–H–H angles (45°, 90°, 135°, and 180°). |
The molecular constants of diatomic species H2(X1Σg+) and BeH+(X1Σ+) are listed in Table 1. The values of bond length Re, dissociation energy De, vibrational frequencies ωe, and anharmonicity constants ωexe calculated on the PES generated by the new method coincide with the experimental values well,67 indicating that the new PES can accurately describe the distributions of the rovibrational states of the reactant and product when the dynamics calculations of the Be+(2S) + H2 → BeH+ + H reaction are implemented. Table 2 gives the geometries and energy values of stationary points for the ground state BeH2+, compared with the previous MRCI + Q results.68 The energy values are relative to the Be+ + H2 dissociation limit. It can be seen that the equilibrium structure and saddle points obtained on the PES constructed using this new scheme are in good agreement with the high-level ab initio calculations.
This work | Experimental data67 | ||
---|---|---|---|
H2(X1Σg+) | R e (bohr) | 1.400 | 1.401 |
D e (eV) | 4.743 | 4.747 | |
ω e (cm−1) | 4402.7 | 4401.2 | |
ω e x e (cm−1) | 104.90 | 121.33 | |
BeH+(X1Σ+) | R e (bohr) | 2.489 | 2.480 |
D e (eV) | 3.163 | 3.280 | |
ω e (cm−1) | 2220.5 | 2221.7 | |
ω e x e (cm−1) | 40.74 | 39.79 |
To further verify the reliability of this scheme in representing global reactive PESs, we use all the test data and additional 82 points on the minimum energy path to fit an accurate PIP-NN PES as the benchmark for comparison, and the fitting RMSE is only 0.91 meV. Fig. 5(a and b) show the collinear and global minimum energy paths of the Be+(2S) + H2 → BeH+ + H reaction, respectively. For the collinear collision, there are a shallow well and a low barrier, and the reactive paths calculated on the PESs generated by the three approaches are indistinguishable. Compared to the collinear path, the global minimum energy path includes a more obvious well and barrier, and there exists a small hump behind the barrier. It can be seen that the path obtained on the PES generated by the new method is completely consistent with the PIP-NN PES constructed with 12122 points, but the PIP-GPR PES produced by the same database cannot well reproduce the barrier and the complex shape of the product region. These results indicate that the GPR model only can produce sufficiently accurate results at the smooth parts of the PES, whereas the regions with fast-changing energy values cannot be described correctly for this system.
In addition to the assessments of the error, speed of evaluation, and topography characteristics, quantifying the quality of reactive PESs by quantum scattering is also very important. To prove that the new scheme can accurately character the dynamically relevant regions of PES, we perform the reactant coordinate based time-dependent wave packet69,70 calculations on the Be+(2S) + H2 → BeH+ + H reaction based on the PESs produced by different strategies. The time evolution of the wave function is based on the second-order split operator method.71 The main parameters used in the dynamical calculations are given in Table 3. In Fig. 6, the total reaction probabilities of the Be+(2S) + H2 → BeH+ + H reaction with the total angular momentum number J = 0 are displayed. The energy threshold value is 1.45 eV, which corresponds to the endothermicity of this reaction. The results obtained on PIP-NN PES with 12122 points save as the benchmark for comparison. It is clear that the reaction probabilities calculated on PIP-GPR PES have substantial errors, and the key resonance characteristics formed by the intermediate complex on the wells are not presented. This suggests that BeH2+ PES modeled by the GPR method is extremely unreliable for dynamics studies even though the number of training points has reached thousands. The main reason for the almost negligible probabilities calculated on PIP-GPR PES is that the collision of J = 0 partial wave is dominated by the global minimum energy path, as shown in Fig. 5(b), and this PES does not reproduce the key activated barrier, which corresponds to the transition state of this reaction, thus the calculated reaction probabilities are significantly weakened. To prove that the strategy can quickly converge to an error range necessary to run dynamics, we represent a PES with only 600 points based on this new scheme, and the corresponding reaction probabilities are also presented in Fig. 6. For the new PES with 600 points, the reaction probabilities agree with the results calculated on PIP-NN PES generated with 12122 points, indicating the high efficiency of the presented strategy in obtaining reliable PESs that can be used in dynamics studies. When the number of training data increases to 1270, the corresponding results are almost identical to those of the NN PES, suggesting that the PES obtained by this new scheme can accurately describe the dynamically relevant regions.
Our calculation results of the ground BeH2+ PES suggest that the standard GPR method could not correctly represent the reactive PESs with complex topography characteristics based on a small number of points. One important reason is that the GPR model inherently cannot give accurate prediction for the rapidly changing value, and the ill-conditioned covariance matrix may appear when the data distribution is relatively dense, resulting in numerical instability in those regions. But the GPR model is a very convenient and reliable tool for sampling data by giving the uncertainties of the predicted points. NNs have excellent generalization ability, but the performance of NNs in constructing PESs depends strongly on the distribution of training data. Incorporating the actively selecting data of GPR into the NN fitting can produce globally accurate reactive PESs with a rapid speed of evaluation based on as few ab initio points as possible.
The proposed scheme is an all-purpose method for representing global reactive PESs, characterized by high accuracy and rapid speed of evaluation, and its advantages are particularly prominent for systems with complex topography. For instance, the long-lived complex-forming reactive systems usually require a mass of ab initio points to model PESs and abundant energy grids to character the quantum dynamics. This method can greatly save the cost of electronic structure calculations and the generated PES can accelerate the quantum scattering calculations. The presented scheme seems to be very promising for constructing reactive PESs with more dimensionality. On the other hand, this scheme also has some limitations for particularly large systems. The inverse of the covariance matrix needs to be calculated in each GPR iteration, and the training complexity scales as O(n3), so the speed of actively selecting points dramatically decreases with the increase of dimension. This speed can be effectively improved by decreasing the number of training data at the cost of reducing the accuracy.
This journal is © the Owner Societies 2022 |