Tom A.
Young
*,
Tristan
Johnston-Wood
,
Hanwen
Zhang
and
Fernanda
Duarte
*
Chemistry Research Laboratory, 12 Mansfield Road, Oxford, OX1 3TA, UK. E-mail: tom.a.young@ucl.ac.uk; fernanda.duartegonzalez@chem.ox.ac.uk
First published on 10th August 2022
Recent advances in the development of reactive machine-learned potentials (MLPs) promise to transform reaction modelling. However, such methods have remained computationally expensive and limited to experts. Here, we employ different MLP methods (ACE, NequIP, GAP), combined with automated fitting and active learning, to study the reaction dynamics of representative Diels–Alder reactions. We demonstrate that the ACE and NequIP MLPs can consistently achieve chemical accuracy (±1 kcal mol−1) to the ground-truth surface with only a few hundred reference calculations. These strategies are shown to enable routine ab initio-quality classical and quantum dynamics, and obtain dynamical quantities such as product ratios and free energies from non-static methods. For ambimodal reactions, product distributions were found to be strongly dependent on the QM method and less so on the type of dynamics propagated.
Iterative training is essential to construct robust MLPs and can be summarised as follows: (i) developing a training set, (ii) performing the regression, (iii) repeating the process until the desired accuracy is obtained. Automated approaches to construct large training sets have been previously reported;16–18 however, their application remain limited to small systems. Moreover, the need to perform several thousands of energy and gradient evaluations on a reference potential energy surface (PES) makes it difficult to go beyond density functional theory (DFT) and employ accurate but computationally expensive wavefunction (WF)-based methods.19 Exceptions are rare and limited to systems with ≲10 atoms.17,20,21 Finally, achieving the desired accuracy when training reactive MLPs is challenging because the energy scale required for accurate dynamics is much larger than for non-reactive MLPs (where sampling around the minima is often sufficient). As such, more training data is required to reach the same accuracy. This is particularly relevant when sampling areas around the transition states (TSs), which often require WF-based methods, such as coupled-cluster (CC).22 Here, we show that recently developed MLP methods23,24 can be used to generate accurate potentials for modestly sized reactions (≈50 atoms) in an automated fashion and outline some of the enabled insights.
![]() | ||
Fig. 1 Diels–Alder reactions studied in this work, summarising the different aspects explored with the trained MLPs. |
Initial efforts to apply our original active learning (AL) strategy using GAP regression to model the reaction between ethene + butadiene, R1, proved promising, producing qualitatively reasonable reactive molecular dynamics (MD). However, the quality of these GAPs were not within 1 kcal mol−1 accuracy of the QM reference method, which is required for accurate rate estimation or dynamic studies (Fig. S1a, MAD = 5.3 kcal mol−1, ESI†). We hypothesised that this was due to the exothermicity of this reaction, which requires the MLP to be accurate in a 60 kcal mol−1 energy window. Indeed, using the same strategy and hyperparameters to model a less exothermic H-abstraction reaction (H3C˙ + C3H8 → CH4 + ˙CH(CH3)2) resulted in a more accurate potential (Fig. S1b, MAD = 2.1 kcal mol−1, ESI†). A modest improvement was obtained upon hyperparameter optimisation, at moderate computational cost (≈500 configurations required for R1). Specifically, reducing the regularisation, increasing the quality of the radial basis, and doubling the number of atomic environments considered in the training all improved the GAP (ESI,† Section S2). However, attempts to improve GAPs by training a component-wise potential separated over covalent bonds were unsuccessful (ESI,† Section S3.3).
We then turned our attention to the recently developed Atomic Cluster Expansion (ACE23) and Neural Equivariant Interatomic Potentials (NequIP24) MLP methods and evaluated their performance within the same training strategy for R1 (Fig. 2). While rather different in philosophy, both ACE and NequIP outperform GAPs and provide MLPs of similar accuracy (Fig. 2a and Fig. S24, ESI†). Here, accuracy is based on deviations between QM ground-truth energies (true) and predicted energies over independent DFT-MD trajectories propagated from the TS to the reactant and product states. Previously, we have shown that a prospective validation strategy in the configuration space accessible to that MLP is essential to characterising ‘good’ MLPs.17 These potentials are stable by construction as we use an AL strategy (Fig. 2, bottom left), where we define a stable potential where 10 × 1 ps trajectories can be propagated without encountering a configuration that deviates >2.3 kcal mol−1 (0.1 eV) from the true energy.
![]() | ||
Fig. 2 MLP methods trained on the [4+2] DA cycloaddition between ethene and butadiene in the gas phase. (a) Signed errors on total energies over three trajectories from reactants to products (ab initio AIMD ground truth, 300 K, PBE0/def2-SVP). (b) Relative performance between MLP methods on total time, data efficiency (number of training configurations, ntrain), and accuracy. Data normalised to unity (raw data in ESI,† Table S4). (c) Parity plot between MP2/def2-TZVP total energies and ACE predictions on MP2 AIMD trajectories from the TS. ACE trained on DFT-selected configurations with energies and forces re-evaluated at MP2. (d) Comparison of the predicted (red) and true PBE0/def2-SVP (grey) relaxed 2D potential energy surface surrounding the TS. Contour plot represents the ‘closeness’ of the training data to a point in the surface, with the darker regions being the closest. |
When considering the amount of data required to train an accurate MLP for R1, GAPs both with and without hyperparameter optimisation required ≈500 configurations. However, this is surpassed by both ACE and NequIP potentials, where ntrain≈
100 (Fig. 2b). Comparing training time across these methods is less straightforward due to their different architecture and implementation. In all cases they are found to be efficient and therefore suitable for rapidly developing bespoke MLPs. Among them, NequIP had the greatest training time (half a day using 10 CPU cores + 1 GPU), while GAP and ACE required just 5 ± 2 h of total training time on 10 CPU cores. It is important to note that this difference in training time reduces with the system size, as reference energy and force evaluations dominate the computational time for equally data efficient MLPs.
Tangent to our goal of developing accurate MLPs for DA reactions, we found that GAP regularisation could be harnessed to reduce the computational cost of developing CCSD(T)-quality potentials (ESI,† Section S4). For simple molecules, MP2 forces are accurate to within 1.1 kcal mol−1 Å−1 (0.05 eV Å−1) of their CCSD(T) counterparts (Fig. S16, ESI†), thus within the ‘expected error’ of the GAP. This removes the requirement for numerical CCSD(T) gradients. This strategy, combined with CUR30 selection, afford a 100-fold reduction in the number of required CCSD(T) energy calculations, thus greatly reducing the time required to train a CCSD(T)-quality potential (Table S3, ESI†). Furthermore, re-evaluating energies and forces from AL configurations with a different reference method (i.e. ‘uplifting’) reduces the computational cost associated with training WF-quality MLPs. Such strategies were also applied to other electronic structure methods. For example, uplifting PBE0/DZ configurations to MP2/TZ affords an ACE MLP within ±1 kcal mol−1 of MP2-MD-sampled configurations (Fig. 2c, DZ = double-ζ basis set, TZ = triple-ζ). These strategies enable us to attain a higher level of accuracy in the MLP at a reduced computational cost.
Comparison of the ACE-, GAP- and NequIP-predicted two-dimensional PES over the two forming C–C bonds in R1, where all other degrees of freedom are free to relax, reveals that all three potentials are accurate and smooth in the extrapolation regime (Fig. S26, ESI†). While all three potentials are suitable for training this reaction, ACE offers the best overall balance in terms of data efficiency, training time and accuracy of the final potential (Fig. 2b). Moreover, even when the closest configuration in the training data is 0.3 Å away in the forming bonds, the error is only a couple of kcal mol−1 when AL is initiated at the TS (r1 = r2 = 2.30 Å), suggesting that ACE is accurate in the extrapolation regime (Fig. 2d). Based on these results, in the following sections we focus our discussions on ACE due to its overall efficiency and accuracy.
![]() | ||
Fig. 3 Product distributions for the reaction between tropone and cycloheptatriene by ACE MD simulations initiated from a single TS (left). The ACE MLP was trained using a standard AL loop (diff, ET = 2.3 kcal mol−1 = 0.1 eV, 500 K dynamics) from two TSs, such that the training data included all three products. (a) Convergence of the product ratio (e.g. N→R2P1/Ntotal) with the number of trajectories propagated. Error is quoted as 2σ, obtained from bootstrapping with resampling (1000 iterations) on the same dataset. MD propagated from the TS in the NVE ensemble using initial Boltzmann-sampled velocities for 300 K. (b) Dependence of the product ratio on the type of propagated dynamics (Ntotal = 200), including classical and ring polymer MD (RPMD) from 1 ps simulations. (c) Product ratios from 200 classical NVE trajectories propagated from TS1 using ACE MLPs trained at B3LYP, PBE and M06-2X levels of theory (def2-SVP). The grey color refers to a no defined state being formed in 1 ps. Uplift corresponds to single point energy and force evaluations on ACE AL configurations (PBE0/def2-SVP) and then retraining the ACE potential. Uplift + AL uses PBE0/def2-SVP configurations reevaluated plus 5 iterations of AL at 500 K from the corresponding TS. See Methods sections S4 and S8 (ESI†) for more details. |
Initial training of an ACE potential from the TS (R2TS1) generated ≈450 configurations using standard AL with sampling in the reactant and product regions of two of the products (R2P1 and R2P2). However, even when propagating MLP-MD trajectories at 500 K, the Cope product (R2P3) was not observed in the training data. Only when the TS that leads directly to R2P3 was included as an AL starting configuration was the training data sufficiently complete. This second TS was 4.4 kcal mol−1 higher in energy than the TS that leads to R2P1 and R2P2.31 This result highlights the importance of considering relevant points in the PES that may not be obtained using automated sampling methods when training MLPs. For this reaction, ACE training required ≈2 days of AL.
Having successfully trained this ACE MLP for R2, we propagated dynamics to compute product distribution. Using 100 trajectories, each of them taking only minutes, we were able to obtain converged results (Fig. 3a). This efficiency also allowed us to propagate ring polymer MD (RPMD)32 using the developed ACE potential. As expected, given the nature of the bonds being formed during this reaction, the product ratios were similar to those obtained using classical MD.
We also evaluated the effect of temperature by equilibrating the system at the TS using constrained NVT dynamics (1 eV Å−2, ≈0.5 kcal mol−1 additional ZPE), prior to propagating NVE downhill dynamics. Both direct NVE downhill and equilibrated dynamics provide qualitatively similar trends. However, a more detailed analysis shows that the latter affords almost double the proportion of R2P1 and the formation of R2P3 as 1% of the total product distribution, which would otherwise not be observed (right most distribution in Fig. 3b). This demonstrates the importance of propagating equilibrated quantum dynamics to obtain converged distributions.33 Training an MLP uniquely allows for the 100-fold more expensive simulations to be performed, with the effect of ZPE and tunnelling accounted for.
Finally, we evaluated the effect of QM methodology on product distributions by performing MLP uplifts to different levels of theory. Uplift corresponds to performing single point energy and force evaluations on ACE AL configurations (PBE0/def2-SVP) at different levels of theory and then retraining the ACE potential to propagate classical NVE trajectories from the respective TSs. For the product distributions in Fig. 3c, MLPs can be obtained at the desired level of theory in 100s CPUh by applying uplifts, compared with >200000 CPUh without uplifts. It was found that some levels of theory require more AL iterations than others, suggesting that the overlap of the configuration space within the training set may not be optimal. For example, while the product distributions are unchanged for B3LYP on additional AL, M06-2X distributions continue changing upon further AL iterations, affording ≈10% more product R2P2 after 5 iterations. Product distributions vary considerably between QM methodology, with the major state varying from the reactant to R2P2, and the proportion of R2P1 varying from 1% to 10% (Fig. 3c). This evidences the large influence that the level of theory has on the propagated dynamics and consequently on the product distribution. The differences are significantly more important than those observed when comparing classical MD and RPMD results.
To further investigate the influence of the level of theory and the type of dynamics, the time gap, defined as the time difference between the formation of two C–C σ-bonds,34 is evaluated for the reaction of methyl-vinyl ketone (MVK) + cyclopentadiene (CPD), R3. ACE MLPs trained at different levels of theory led to slightly different time gaps. For example, both PBE0 and ωB97X-D functionals generated average time gaps of ≈20 fs, while for M06-2X it was ≈10 fs (Fig. 4a). The latter is consistent with the DFT-MD value of 11 ± 6 fs in the literature,35 indicating the reaction is dynamically concerted, as concluded by Houk and co-workers.34,36 Furthermore, as expected, quantum effects on this reaction are minimal, (Fig. 4b), since the differences between classical MD and RPMD are within 2 fs, while the differences due to the level of theory vary by over 5 fs. Traditionally, such estimates require several hundred quasi-classical trajectories, which are computationally demanding. However, by employing our strategy, we can compute the time gap from 1000 trajectories in fewer than 6 hours.
Using the ACE-MLPs trained for this reaction at the PBE0-D3BJ/def2-SVP level of theory we computed the free energy profile for R3 using umbrella sampling (MLP-US) over an averaged reaction coordinate, = (r1 + r2)/2 (Fig. 5a). This approach enabled us to fully sample the different conformations not only at the critical points but along the reaction pathway, totalling 1.5 ns of sampling, which would be impossible to reach with AIMD. Moreover, our MLP-US simulations also capture anharmonic effects arising from low vibrations modes, which are not directly accounted for using the static methods. The MLP-US also provides a piece-wise free energy, where the choice of reaction coordinate is flexible.
![]() | ||
Fig. 5 Reaction and activation free energies for the reaction between MVK and CPD, R3. (a) Comparison of the intrinsic reaction coordinate potential energy (ΔE), qRRHO free energy (ΔGqRROH) and MLP-US free energy at 300 K (ΔGMLP-US). MLP-US was performed using the ACE MLP trained at the PBE0-D3BJ/def2-SVP level of theory at 500 K from the TS, then 30 windows were simulated over the reaction coordinate (![]() |
Using this MLP, we also computed the temperature dependence for both activation and reaction energy in order to compare the MLP-US method to the static methods (Fig. 5b and c). For this DA reaction, MLP-US gives activation and reaction free energies comparable to the static methods, with a linear dependence on temperature. While no experimental data is available for the free energy of activation of this reaction in the gas phase, the comparison of the different static approaches to our MLP-US values suggest an estimate error of ≈3 kcal mol−1 for the free energy contributions. When comparing free energies of reaction (Fig. 5c), a larger deviation and larger error bars are observed at low temperature. This is likely due to the insufficient sampling; thus this point is excluded from the linear fit.
Using the same approach as for R3, we considered the DA reaction between cyclopentadiene and acetylene (R4, Table 1), for which comparison to experimental gas-phase kinetic data is possible.39 Performing MLP-US enabled us to obtain quantitative accuracy in both enthalphic and entropic contributions. These results show that MLPs, in combination with US, enable us to achieve accurate free energies with associated errors, and treat anharmonic effects without ad hoc corrections. While the MLP fitting and US protocol was completely automated and achieved within a day for this reaction, the choice of the QM reference method remains a critical choice that still requires human intervention. Comparison to a wider range of DA and other types of reactions, for which experimental data is available, will be the subject of future work.
ΔH‡/kcal mol−1 | ΔS‡/cal K−1 mol−1 | |
---|---|---|
US | 20(3) | −39(8) |
Expt.39 | 21.9(1) | −37.3(2) |
ACE MLPs were also obtained for other Diels–Alder reactions and consistently achieved chemical accuracy (±1 kcal mol−1). This enabled us to perform RPMD in order to introduce nuclear quantum effects, as well as to obtain free energies employing umbrella sampling. While product distributions were strongly dependent on the DFT reference method, they were less dependent on the type of dynamics propagated (classical vs. quantum). When computing free energies, MLP-US predicts barrier free energies within the static methods' ranges, which vary by ≈3 kcal mol−1. For reaction free energies, larger errors bars were obtained at low temperature due to insufficient sampling. Comparison to experimental values demonstrates that MLP-US provide quantitative accuracy.
The diverse range of properties studied in this work demonstrate the applicability of our strategy, delivering accurate and efficient MLPs within a day. However, advancements in reference methods are needed to obtain accurate potentials without a preceding benchmark study. We are confident this will enable the routine development of chemically accurate reactive MLPs.
A stable potential was defined by its ability to propagate 10 × 1 ps MD trajectories without finding a configuration |EMLP − E0| > ET. Training sets for initial GAPs including those that used GP variance as a selection criteria were constructed using the gap-train code (v. 1.0.0b).43 ACE23,44 potentials were trained using the ACE.jl code45 (v 0.8.4, Julia v. 1.6.3) and wrapped by pyjulip (v. 0.1, using PyCall). All MD simulations were propagated using the Langevin (0.02 friction coefficient) integrator in ASE v. 3.22.0 with initial velocities sampled from a Maxwell-Boltzmann distribution then propagated using a 0.5 fs timestep. NequIP24 potentials used the nequip46 v. 0.3.3 Python package (pytorch v. 1.9.1, pytorch_geometric v. 1.7.2, e2nn47 v.0.3.5) with a maximum 1000 epochs and a 90:
10 training to validation data split; all other hyperparameters were retained as their defaults (4.0 Å cut off radius, see nequip module in mlp-train, commit 0ba027c). All MLPs were trained on both energies and forces using the default loss functions for each MLP method with the default weights defined in the mlp-train package.40 For GAP, the relative weighting on energies and forces is controlled by σE and σF, which values are reported in Table S2 (ESI†). ACE uses a square norm of the energy difference predicted and true forces (eqn (18) in ref. 44) with wE = 20, wF = 1. Finally, NequIP uses a square norm of the force components (eqn (9) in ref. 24) with wE = 1, wF = 100. Gas phase (vacuum) simulations in periodic potentials used a cubic box (l = 10 nm). Combinations of harmonic restraining potentials were added and incorporated into the dynamics using ASE.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2cp02978b |
This journal is © the Owner Societies 2022 |