Reaction Dynamics of Diels-Alder Reactions from Machine Learned Potentials

Recent advances in the development of reactive machine-learned potentials (MLPs) promise to transform reaction modelling. However, such methods have remained computationally expensive and limited to experts. Here, we employ different MLP methods (ACE, NequIP, GAP), combined with automated fitting and active learning, to study the reaction dynamics of representative Diels-Alder reactions. We demonstrate that the ACE and NequIP MLPs can consistently achieve chemical accuracy (±1 kcal mol-1) to the ground-truth surface with only a few hundred reference calculations. These strategies are shown to enable routine ab initio-quality classical and quantum dynamics, and obtain dynamical quantities such as product ratios and free energies from non-static methods. For ambimodal reactions, product distributions were found to be strongly dependent on the QM method and less so on the type of dynamics propagated.


S1 GAPs for Reactions
We found that our initial GAP training strategy, [1] for developing an accurate reactive potential for the S N 2 reaction Cl − + CH 3 Cl was not easily translated to the higher-dimensional ethene+butadiene Diels-Alder case (MAD (R1) = 0.23 eV / 5.3 kcal mol −1 , Figure S1a). However, for a similarly complex H-abstraction reaction the obtained error is closer to the target 1 kcal mol −1 (MAD = 0.09 eV, 2.1 kcal mol −1 ) level of accuracy ( Figure S1b). Note that these results are consistent within repeats of the partially random active learning (AL) strategy and minor hyperparameter tweaks. ‡ ‡ a b MAD = 0.23 eV MAD = 0.09 eV Figure S1: Comparison of predicted and true energies over a GAP-MD propagated trajectories using initial velocities suitable for 300 K and a Langevin thermostat (300 K) with a 0.5 fs time step. 'True' energies calculated at PBE0-D3BJ/def2-SVP in ORCA. GAPs trained using active learning (AL) and the same hyperparameters in ref. [1] at 500 K. AL used a E T = 2 × 10 −5 eV atom −1 threshold on the maximum Gaussian process variance to select new configurations within a 1 ps window. Final datasets contained 217 and 223 configurations for ethene + butadiene (a) and methyl + propane (b), respectively.

S2
SI: Reaction Dynamics of Diels-Alder Reactions from Machine Learned Potentials

S2 Hyperparameter Optimisation
Starting from the hyperparameters shown in Table S1, which are based on our previous work [1], we optimised these hyperparameters as outlined in this section.

S2.1 QM Convergence
Using ORCA v. 4.2.1 the default SCF and DFT integration grids provide essentially converged forces ( Figure S2) thus the defaults are used throughout. The error introduced is negligible compared to the 'expected error' in the forces used in the GAP fitting (σ F = 0.1 eVÅ −1 by default), thus default ORCA parameters will be used herein.

S2.2 Expansion Order
Increasing the expansion order of the SOAP provides an improved approximation to the overlap between two atomic densities in the (dot product SOAP) kernel. Therefore, the energies and forces should be converged with respect to n max and l max , being aware of the increased computational cost of the SOAP computation. A slightly larger order of 8 in both l max and n max provides the S3 SI: Reaction Dynamics of Diels-Alder Reactions from Machine Learned Potentials a b Figure S2: Convergence of the force components of 10 randomly selected frames from an active learning instance from the TS of ethene+butadiene at the PBE/def2-SVP level of theory in ORCA. Convergence of the mean absolute deviation in forcers for the SCF energy change (E tol ) convergence (a) and DFT integration grid (b) is with respect to their maximum values in ORCA (10 −10 Ha and 7 respectively). The maximum MAD is 10× less than the default GAP σ F . optimum value, requiring a third of the training data to achieve a chemically accurate potential for propane ( Figure S3). Similar results are obtained for pentane over a smaller n max , l max window, albeit without the exponential decay observed in Figure S3. As found in other works (e.g. ref. [2]), accuracy converges faster with respect to l max than n max ( Figure S4b). Interestingly, decreasing the radial expansion order seems to reduce inhibit training slightly (n max = l max = 8 in Figure S4a cf.n max = 8, l max = 4 in Figure S4b).  Figure S3: Number of reference evaluations (n eval ) and number of training configurations (n train ) required to propagate 10 × 1 ps GAP-MD at 500 K without finding a prediction with an error > 1 kcal mol −1 from the true GFN2-XTB reference at a particular order of radial and angular SOAP expansion (n max , l max ) for propane. Other hyperparameters as Table S1. a b Figure S4: As Figure S3 for pentane with (a) n max = l max and (b) n max = 2l max .

S5
SI: Reaction Dynamics of Diels-Alder Reactions from Machine Learned Potentials

S2.3 Expected Errors
Using n max = l max = 8 and optimising the 'expected errors' in the energy and forces in training a gas-phase GAP for propane, increasing the 'smoothness' (larger σ E ) is not beneficial to the active learning rate ( Figure S5). Instead, fitting energies and forces more strongly (top right, Figure S5) enables only 75 training configurations required for an accurate propane potential. a Repeating the same along a slice in the (σ E , σ F ) space suggests that this result it not limited to just small systems ( Figure S6), no the choice of expansion order. Figure S5: Number of evaluations required to generate a chemically accurate GAP (as Figure S3) for propane as a function of σ E and σ F . Values are averages over three independent repeats. a b Figure S6: As Figure S5 for pentane along a slice of the 2D space with (a) n max = l max = 8, (b) n max = 12, l max = 6.
a Note that visualisation of a short 1 ps MD trajectory at 500 K suggests the potential is reasonable and is not circumventing the τacc error metric. Here, it is important to emphasize that the pentane potential trained using 500 K active learning at the GFN2-XTB level with the strongest fitting (σ E = 10 −4 eV atom −1 , σ F = 10 −2 eVÅ −1 , bottom right Figure S6) is highly accurate over 'long-time' dynamics ( Figure S7). MAD = 0.01 eV Figure S7: GAP-MD dynamics at 300 K (dt = 0.5 fs, 50Å cubic box equivalent to a vacuum) compared to GFN2-XTB ground truth. σ E = 10 −4 eV atom −1 , σ F = 10 −2 eVÅ −1 , 500 K active learning.

S7
SI: Reaction Dynamics of Diels-Alder Reactions from Machine Learned Potentials

S2.4 Sparse Points
Previous studies have shown that GAP accuracy can converge exponentially with the number of atomic environments ('sparse points'). [2]. Of course this will be system dependent, thus the scaling is evaluated for alkanes with 3, 4 and 5 carbons ( Figure S8). For these systems all GAPs are converged with respect to n sparse at 800 points. For pentane a substantial drop in the number of required evaluations is observed after 400 configurations, suggesting for 'large' systems (> 15 atoms) n sparse closer to 1000 is more suitable.  Figure S8: Convergence of the number of evaluations required to generate a potential as Figure S3 for linear alkanes as a function of the sparse points. Error bars are standard errors of the mean over three independent repeats.

S2.5 Summary
Based on these data, we suggest using the updated hyperparameters shown in Table S2 for small molecules in the gas phase. These updated hyperparameters are the ones used for GAP in the main text and herein.
n sparse 1000 We have shown previously that the AL sampling method (e.g. DFT functional) is important in obtaining accurate uplifted GAPs, 1 where the energy/forces of generated configurations are recalculated at a higher level. For the [4+2] Diels-Alder reaction between ethene and butadiene, sampling using a cheaper method than PBE does not seem to be suitable over the intrinsic reaction coordinate (IRC, Figure S9a). Although the thermodynamics at PBE are closer to the reference MP2 values, kinetics at PBE0 are slightly closer ( Figure S9b). Sampling is therefore performed with the more transferable PBE0 functional. Instead of using the difference between reference and predicted energies as the criteria for adding new configurations in the AL loop the GP variance may also be used. [1] For this system, using a threshold around 2 × 10 −5 eV atom −1 samples in a similar region of the energy space as using a 0.09 eV (2 kcal mol −1 ) difference threshold ( Figure S10), but at a much lower computational cost. b Therefore, a 'gp var' sampling strategy will be used for maximum training efficiency. Figure S10: Normalised probability density function over the energies sampled using AL from the ethene+butadiene [4+2] TS. gp var = x corresponds to running GAP-MD at 500 K until the predicted variance is above x eV for the whole system. diff = y eV corresponds to running GAP-MD at 500 K until |E 0 = E GAP | > y. S10 SI: Reaction Dynamics of Diels-Alder Reactions from Machine Learned Potentials

S3.2 Conformational Rearrangement
Upon cyclisation, the formed cyclohexene molecule undergoes conformational rearrangement to the more stable (and > 99% populated) chair conformer. While often assumed to be fast, in cyclohexane solution the interconversion at room temperature only occurs at a rate of 55 s −1 . [3] An MM forcefield (GAFF) affords rapid interconversion between the conformers, while the GAP (trained at 1200 K on cyclohexene) interconverts more slowly ( Figure S11). The MM however does afford the qualitatively correct distribution ( Figure S12), with the half-chair favoured over the half-boat. Developing potentials which allow for conformational changes without reparametrisation is necessary if accurate long-time simulations are to be possible.
ϕ Figure S11: Sample trajectories of cyclohexene at 900 K (0.5 fs timestep) using GAFF-parametrised (RESP charges) and GAP-AL generated potentials (latter uplifted from ∼ 500 GFN2-XTB AL configurations to PBE0 CUR selected to 50%). Shaded areas highlight the approximate regions of the half-chair, boat and chair conformations of cyclohexene. Plotted using a 5 fs block average. S11 SI: Reaction Dynamics of Diels-Alder Reactions from Machine Learned Potentials Figure S12: Histogram of absolute dihedral angles over 100 trajectories of cyclohexene, as Figure S11.

S3.3 Fragmentation
With a view to reducing the number of evaluations required to generate a chemically accurate potential for ethene+butadiene cyclisation we explored an additive approach to obtaining a potential. Training the intermolecular interaction between components is an effective fragmentation of a condensed phase molecular [1] or solid state system. [4] Extending this approach to covalent bonds however did not lead to a gain in accuracy for a specific number of configurations (i.e. τ acc ∼ 100 fs cf. τ acc ∼ 200 fs using the same 'gp var' selection strategy).
Fragmentation over a covalent bond in this manner produces two limitations (1) the intra GAP used to train the fragments may never have encountered the configuration adopted in the full system and (2) the energy scale over which the inter GAP must be accurate is much larger than without an I+I decomposition. Specifically for this system, even if the ethene and butadiene potentials are high quality (τ acc > 1 ps, Figure S14) they will not have well sampled the pyramidal sp 3 geometrics present in the cyclohexene product. This increases the amount of data required for the inter GAP to learn. Furthermore, the intra component distortion is only possible in the product making the energy scale larger that otherwise required ( Figure S10). The one advantage is that the reactant state (reactants separated by > 3Å) is well approximated by the intra+inter decomposition, as observed for e.g. bulk water.

S3.4 Reaction Training
Training a GAP for the whole ethene+butadiene reaction with the optimised hyperparameters (Table S2) from the TS using a 'gp var' selection strategy generated a MLP capable of a τ acc ∼ 500 fs at 1 kcal mol −1 . A representative trajectory back and forwards from the TS illustrates the achieved accuracy ( Figure S15) available in just 30 CPUh.  Figure S15: Comparison of true (PBE0/def2-SVP) and GAP predicted energies over one trajectory propagated from the TS to reactants (t < 0) and products (t > 0). GAP trained using 'gp var' (E t = 2 × 10 −5 eVÅ −1 , 300 K) and contained 181 configurations. S14 SI: Reaction Dynamics of Diels-Alder Reactions from Machine Learned Potentials

S4 Mixing Energy and Force Methods
Prior CCSD(T)-quality GAPs [1] have used computationally demanding numerical gradients. c It would therefore be beneficial to use a cheaper QM method to evaluate the gradients in combination with CCSD(T) energies.
The following examples are chosed to be of modest computational cost, and are assumed to be generalisable.  Generating frames using a DFTB(3ob) MD simulation and evaluating the force components at several levels of theory suggests that using hybrid DFT or MP2 forces in combination with CCSD(T) energies should be sufficient to train a CCSD(T)-quality GAP. The average error is less than or similar to an 'expected error' in the forces that optimises the active learning rate (σ F =< 0.1 eV A −1 , Figure S16).

S4.1 Methane
Training a GAP on CCSD(T) energies and MP2 forces for a gas-phase methane molecule is sufficient to generate a GAP within 1 kcal mol −1 of the true CC surface. Using active learning at XTB to sample the configuration space, MP2 forces and CC energies, highly accurate methane dynamics ( Figure S17 Table  S1.

S4.2 Methanol
Generating a GAP using active learning (AL) in parallel can lead to an overcomplete set of configurations. This is because in the initial stages the independent MD can sample similar regions of configuration space, which in turn do not provide much information for the fit. Selecting the most diverse set of configurations is therefore advantageous when the energy and forces are going to be reevaluated. Training a GAP for methanol using XTB AL then evaluating MP2/def2-TZVP energies and gradients using different number of CUR 5 selected configurations on the SOAP kernel matrix leads a reasonably stable τ acc with the number selected (Table S3).  Table S3: GAP accuracy on CUR selected configurations generated as Figure S17 for methanol values are averages over three τ acc evaluations and errors in standard error of the mean.

S4.3 Acetic acid
Employing both mixed energy and forces and CUR selection of the configurations allows dynamics broadly within (MAD = 0.7 kcal mol −1 ) chemical accuracy to the ground truth CC using only 167 configurations. For comparison this is ∼ 100 times fewer CC calculations than would have been required using numerical gradients on the whole set of configurations. d  Figure S18: As Figure S17 for acetic acid using a 50% CUR selection of configurations. Hyperparameters as Table S1.

S5 System Size Scaling
Extending our initial work on a selection of modestly-sized systems and configurational complexity, 1 the following section outlines how the number of evaluations required to train a chemically accurate GAP (over > 1 ps simulation time) scales with system size (hyperparameters shown in Table S1, standard AL methodology described in ref. [1]).

S5.1 Alkanes
First we consider a semi-optimal case for training larger systems, that of gas phase linear alkanes. This is because as more CH 2 units are added, the 'bonded' component of the energy is already known to the potential, such that only the 'non-bonded' components must be learnt.
The number of evaluations required to reach a potential with τ acc > 1 ps increases roughly linearly up to pentane ( Figure S19) and reaches a maximum at hexane (> 3000 evaluations). e While 3000 evaluations is fast at the GFN2-XTB level, sampling using a more accurate DFT method would exceed the goal of building potentials within a day. Furthermore, alkanes with more than 20 atoms (e.g. 26 in octane) require more than 1000 training configurations (selected using AL). It is therefore necessary to perform some hyperparameter optimisation.  Figure S19: Number of reference evaluations (n eval ) and number of training configurations (n train ) generated in 100 active learning loops for a linear alkane chain (500 K, 1 kcal mol −1 AL threshold, GFN2-XTB reference method). τ acc is an average from three simulations, E l = 1 kcal mol −1 , E t = 10E l , T = 300 K, max(τ acc ) = 1 ps. e As the system becomes larger, the less GAP-MD is run and fewer evaluations required to find a new configuration with a large enough error to be selected, hence the peak in n eval for hexane. S18 SI: Reaction Dynamics of Diels-Alder Reactions from Machine Learned Potentials

S5.2 Small Organic Molecules
A selection of solvents and small organic molecules ( Figure S20, inc. > 2 elements) forms a diverse set over which system scaling can be more realistically determined.
Unlike the alkanes, there is no obvious trend in the number of evaluations required to train a chemically accurate potential. Even correlating against a general complexity metric [6] reveals little to no correlation aside from a general increase.

S5.3 Atomic Energy Errors
In addition to the complexity increasing with system size, the accuracy per atom increases when the goal is to generate a potential that is accurate to 1 kcal mol −1 in total energy. To evaluate if the total (relative) energy is an important quantity for our target properties (free energies or reaction dynamics) we use the set of S N 2 reactions: Cl − + {MeCl, EtCl, n PrCl}.
Generating highly accurate (error ≪ 1 kcal mol −1 ) GAPs for the reaction is possible ( Figure S22). Using these potentials and increasing the regularisation ('expected error') on energies per atom leads to larger errors on the potential energy barriers ( Figure S23). Note that these GAPs are trained purely on energies, to isolate the effect of adding larger atomic errors. This scenario simulates training different potentials to the same per atom accuracy, which -as expected -leads to larger

S6 Method Comparison
Training different MLP methods on the ethene+butadiene reaction using active learning with a selection criteria of 0.1 eV f affords highly accurate potentials in all cases ( Figure S24, MAD ∼ 0.04 eV, 1 kcal mol −1 ). The data requirement of the GAP (406 configurations) was significantly higher than both ACE (114) and NequIP (126) potentials and required significant hyper-parameter tuning (S2). The total training time on 10 CPU cores was 7, 4 and 14 hours for GAP, ACE and NequIP potentials respectively, the latter also utilised an Nvidia RTX 2080 GPU. Evaluating the potentials on the intrinsic reaction coordinate again each perform comparably (Figure S25), and all provide smooth 2D surfaces ( Figure S26). The latter is in spite of the extrapolation within high-energy regions.

S7 Active Learning Selection Strategies
Using the predicted GP variance on a new configuration can be a highly effective selection strategy for sampling new configurations (see e.g. Figure S14). However, when training other kinds of MLP there may be no analogue to accelerate the 'diff' selection strategy. g Using a threshold on the maximum distance ('max dist') to any of the training set can afford a 10× speed-up in training for non-GAP MLPs where the reference evaluations dominate the execution time. Specifically, using a selection criteria defined by, where p i is the normalised SOAP vector for the i-th configuration in the training data and ζ is an positive power to sharpen differences. This is exactly the form of the kernel used in our GAPs and can provide a quantification of the similarity between one molecular configuration and another.
With an appropriately chosen k T , potentials can be trained efficiently ( Figure S28) despite not being correlated with the absolute energy difference ( Figure S29).
Using this strategy it is essential to backtrack until max(k * ) is not too small, to prevent high-energy structures (or SCF convergence failures) making their way into the training data. We found an upper threshold of (k T ) 2 to be sufficient without much tuning and ζ = 8 to be optimal.
In contrast to R1, direct use of this strategy to train a ACE potential for the DA/cope reactions between tropone and cycloheptatriene did afford a reasonable potential, but during AL training there was no sampling of one of the DA product (R2 P1 ).
g Using |Etrue − EMLP| and evaluating potentially 8 DFT evaluations with no selected configuration, for a 1 ps max time approaching the end of AL cycle.

S8 Tropone + cycloheptatriene
Accuracy of the ACE potential trained for the reaction between tropone + cycloheptatriene is outlined in Figure S30, with τ acc > 1 ps from both the ambimodal and Cope TSs (see ref. [7] for a definition of τ acc ).

S9 Methyl vinyl ketone + cyclopentadiene
Accuracy of the ACE potential for reaction Methyl vinyl ketone and cyclopentadiene is shown in Figure S31 with training from [4+2] DA TS. The time gap is an important property to distinguish between dynamically concerted and dynamically stepwise mechanisms in Diels-Alder reactions. [8]. When employing ACE potentials to calculate the time gaps, each trajectory initiated from TS was run until either the cycloadduct was formed (the two forming C-C bond lengths smaller than 1.6 A), or the reactants were separated (the two forming C-C bond lengths larger than 3.0Å), with a maximum simulation time of 500 fs.

S10 Acetylene + cyclopentadiene
With a view to obtain an accurate ACE potential capable of direct comparison to experimental enthalpy and entropies of activation we selected the reaction between cyclopentadiene and acetylene, for which high-quality experimental data is available. Re-analysis of the data in ref. [9] with an Eyring analysis afforded the experimental activation parameters ( Figure S33).
∆S ‡ = -37.25±0.2 cal K -1 mol -1 Figure S33: Eyring analysis of the rate data in ref. [9]. Rate coefficients in units of atm −1 s −1 thus ln(k/T ) includes an implicit factor of (atm K s). Linear fit performed with scipy.stats.linregress (v. 1.7.1) with the associated errors in derived quantities quoted as the standard errors.
Training an ACE potential using the default hyperparameters at the PBE0-D3BJ/def2-SVP level from the [4+2] DA TS generated a highly accurate potential ( Figure S34). To enhance the sampling over the reaction coordinate additional AL training was performed using harmonic biases (k = 10 eVÅ −1 ) in minimums of the reaction coordinate (average of forming bond lengths) histogram ( Figure S35). We found this to be crucial in enabling US at temperatures higher than ∼ 350 K. Uplifting this potential to CCSD(T)-quality required a benchmark of QM methods with analytic gradients, as to avoid ∼ 10 4 CC calculations. Double hybrid (DH) density functionals potentially offer accuracy above MP2, hence were selected as the target method. Interestingly, for this reaction there is a considerable spread (∼ 5 kcal mol −1 , 0.3 eV) in the barrier and reaction potential energies compared to the DLPNO-CCSD(T)[TightPNO]/def2-TZVP reference ( Figure S36). Only is the spin-component scaled DH DSD-PBEP86 achieves modestly accurate but underestimates the barrier by some 3 kcal mol −1 . Uplifting the PBE0-D3BJ potential to DSD-PBEP86 was performed with and without additional AL and the US free energy barriers plotted against temperature to extract the activation enthalpy and entropy ( Figure S37). Umbrella sampling in each case is well converged with respect to sampling ( Figure S38).