Maximilian X. Tiefenbacherab,
Brigitta Bachmair
abc,
Cheng Giuseppe Chen
cd,
Julia Westermayr
ef,
Philipp Marquetand
ac,
Johannes C. B. Dietschreit
*c and
Leticia González
*ac
aResearch Platform on Accelerating Photoreaction Discovery (ViRAPID), University of Vienna, Währinger Straße 17, 1090 Vienna, Austria. E-mail: leticia.gonzalez@univie.ac.at
bVienna Doctoral School in Chemistry, University of Vienna, Währinger Straße 42, 1090 Vienna, Austria
cInstitute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Straße 17, 1090 Vienna, Austria. E-mail: johannes.dietschreit@univie.ac.at
dDepartment of Chemistry, Sapienza University of Rome, Piazzale Aldo Moro, 5, Rome, 00185, Italy
eWilhelm-Ostwald Institute, University of Leipzig, Linnéstraße 2, 04103 Leipzig, Germany
fCenter for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Humboldtstraße 25, 04105 Leipzig, Germany
First published on 24th April 2025
Excited-state nonadiabatic simulations with quantum mechanics/molecular mechanics (QM/MM) are essential to understand photoinduced processes in explicit environments. However, the high computational cost of the underlying quantum chemical calculations limits its application in combination with trajectory surface hopping methods. Here, we use FieldSchNet, a machine-learned interatomic potential capable of incorporating electric field effects into the electronic states, to replace traditional QM/MM electrostatic embedding with its ML/MM counterpart for nonadiabatic excited state trajectories. The developed method is applied to furan in water, including five coupled singlet states. Our results demonstrate that with sufficiently curated training data, the ML/MM model reproduces the electronic kinetics and structural rearrangements of QM/MM surface hopping reference simulations. Furthermore, we identify performance metrics that provide robust and interpretable validation of model accuracy.
A popular method to carry out excited-state dynamics simulations of molecular systems is trajectory surface hopping (TSH).9 In this so-called mixed quantum-classical approach, electrons – responsible for electronic transitions and excited-state properties – are treated quantum mechanically, while the heavier nuclei are described using classical mechanics. At the expense of neglecting nuclear quantum effects, TSH effectively captures the quantum nature of electrons, even if it requires propagating numerous independent classical nuclear trajectories to accurately simulate the behavior of a nuclear wave packet as it splits during nonadiabatic events.7,10,11 Despite being attractive, the need for many trajectories makes TSH simulations computationally expensive, especially when the underlying on-the-fly calculations of the coupled potential energy surfaces (PESs) are performed using accurate quantum mechanical methods.
The situation becomes more challenging when simulating photochemical processes in the condensed phase.12 For example, chromophores in nature are rarely isolated; rather, they are typically embedded within complex, heterogeneous environments that include a variety of molecular interactions, including solvent effects, hydrogen bonding, and intricate structural features. These environmental factors can significantly influence the photochemical behavior of the system, altering deactivation pathways. Implicit solvation models,13 which treat the solvent as a continuous medium, are often insufficient to capture detailed interactions. A more accurate treatment is achieved by modeling the environment explicitly. One efficient way to do that is by using hybrid quantum mechanics/molecular mechanics (QM/MM) methods, where the region of interest, such as the excited chromophore, is treated with QM, while the surrounding environment, including solvent or complex heterogeneous scaffolds, is modeled with MM.14–16 This approach offers a balance between accuracy and computational cost by restricting expensive QM calculations to the critical region of the system that actually requires a quantum mechanical treatment. The advantages of QM/MM strategies are indisputable, and in combination with TSH, have enabled impressive simulations of time-resolved photochemical processes in large complex systems.17–22 However, because the computational expense of a QM/MM calculation is largely determined by the level of theory employed in the QM region, QM/MM dynamical simulations suffer from comparable (or greater) costs than simulations of the isolated chromophore. The situation becomes more prohibitive the more trajectories are needed to achieve statistically meaningful results.17
Over the years, several strategies have been developed to reduce the cost of TSH simulations. One notable example is the use of vibronic coupling models,23 which replace the expensive on-the-fly calculations of the PESs by pre-parameterized potentials that are approximated by scaled harmonic oscillators in the simplest case.24,25 Recently, TSH simulations using linear vibronic coupling PESs have been extended to include a classical MM environment,26 enabling efficient time-resolved analysis of three-dimensional solvent–solute interactions.27 While this approach reduces the computational cost significantly, it is only applicable to rather rigid molecules, where anharmonic effects, such as large amplitude motions or bond rearrangements, do not play a role in the relaxation dynamics.
An alternative and highly flexible approach to reduce the cost of the underlying electronic structure problem is the use of machine learned (ML) potentials.28–31 Trained on high-quality quantum mechanical data, ML potentials have demonstrated their ability to replicate the accuracy of ab initio calculations at a fraction of the computational cost.32 ML potentials have already shown considerable success in modeling dynamics in the electronic ground-state.33–38 In contrast, their application to excited-state dynamics, which is significantly more expensive than ground state simulations, is currently only feasible for small molecular systems like organic chromophores and is limited by the availability of accurate reference data.39–45 This is in contrast to ground-state ML potentials, of which many are transferable between molecular systems and can thus simulate large biomolecules or materials using training data of smaller building blocks. As a consequence, the simulation of large systems in their excited state also requires methods like mixed ML/MM (machine learning/molecular mechanics).46 However, the integration of ML potentials with MM for both ground- and excited-state dynamics remains in its infancy.
One of the key challenges in developing an ML/MM approach is to accurately describe the interaction between the ML potential and the surrounding MM environment. In QM/MM simulations, the interactions between the quantum region and the classical region are clearly defined by the Hamiltonian, which ensures that the treatment of the two regions is physically correct.14 For ML/MM, the interaction needs to be carefully modeled to ensure that the combined system behaves correctly; however, there appears to be little consensus on the best way to do so.47–58
A recent study by Mazzeo et al.59 demonstrates the use of Gaussian process regression to describe the excited-state dynamics of a solvated molecule by learning ML/MM energies and forces with kernel models in a two-step process. First, they fit the vacuum PESs and then subsume the differences between pure QM and QM/MM under polarization, which is described by a second model. Their approach is restricted to purely adiabatic dynamics in the excited state, neglecting any coupling between states. In contrast, to the best of our knowledge, ML/MM implementations for nonadiabatic excited-state dynamics using an electrostatic embedding do not exist. In this work, we propose the first ML/MM nonadiabatic excited-state dynamics using electrostatic embedding and a general number of electronic states. We use the FieldSchNet architecture of Gastegger et al.,51 which allows the inclusion of the electric field via an additional ML input. The electric field is generated by point charges of the MM environment and alters the different excited states. As an application, we investigate the excited-state dynamics of furan in water (Fig. 1a), including three coupled electronic singlet states. Furan is a small heterocyclic organic molecule that serves as a building block in biologically relevant systems, such as DNA and proteins, and has long been the focus of theoretical and experimental studies.60–68 Few also considered the explicit interaction of furan with water forming hydrogen bonds,69,70 highlighting the need of QM/MM studies able to capture these interactions and their impact on the excited state relaxation dynamics explicitly.
The remainder of the paper is organized as follows. In Section 2, we present the theory behind a QM/MM setup with electrostatic embedding and describe all the necessary terms when using an ML model. Next, in Section 3 we outline the data collection method, architecture, and training process of the ML interatomic potentials and summarize the numerical experiments performed. The results of various training settings, the quality of the obtained PESs, and the ML-driven dynamics are compared to on-the-fly TSH using the quantum chemical reference method used to train our ML potentials in Section 4.
Htot = HMM + HQM + HQM–MM, | (1) |
![]() | (2) |
![]() | (3) |
Eqn (3) is part of HQM–MM and is included in the calculation of the quantum subsystem, thus coupling the QM and MM regions.14,53
The electric field εi at position RQMi of QM atom i is the sum over all atoms in the MM region (NMM) and is defined as
![]() | (4) |
![]() | (5) |
![]() | (6) |
The partial derivatives in eqn (5) and (6) of the ML energy with respect to the ML atoms and the field are computed via back-propagation. The derivatives of the field with respect to the nuclear positions (MM or ML) are obtained analytically, based on eqn (4).
![]() | (7) |
![]() | (8) |
Since the contribution of the field to the nuclear gradient is expected to be small, this term was neglected in the original FieldSchNet paper.51 We call eqn (8) the “augmented loss” and use it for all ML trainings in this work, to be consistent with the forces used for dynamics simulations, as we found the field-dependent term to be crucial for stable excited-state molecular dynamics simulations.
![]() | (9) |
The probability of hopping between the electronic states is determined by the fewest switches algorithm,9 which minimizes the number of state transitions. The transition probability between two states n and m is given by
![]() | (10) |
For energy minimization, heating, equilibration, and production runs, we used the MD engine sander.82 Fig. 2 shows a schematic timeline for the MM–MD simulations and how snapshots for the subsequent excited-state QM/MM simulations were collected.
The MM–MD simulations employ a time step of 2 fs, a Langevin thermostat with a friction constant of 2 ps−1, and constrained hydrogen-heavy atom bond distances by means of the SHAKE85 algorithm. After the initial setup, the energy of the system was minimized for 2000 steps using steepest descent. After minimization, the system was heated for 20 ps to 300 K through continuous heat transfer from the thermostat (the bath temperature is always at 300 K). Subsequently, the system was equilibrated for 600 ps in the isobaric–isothermal ensemble with the Berendsen barostat.86 Using the same settings as for the equilibration, we performed a production run with a total of 2.4 ns. Snapshots were recorded every 4 ps, resulting in a total of 600 equidistant frames, which were used to build two sets of initial conditions (sets I and II in Fig. 2) for subsequent excited-state QM/MM simulations – explained next.
The computational details specified here are common to both sets, unless stated otherwise. The interactions of QM and MM regions were described via electrostatic embedding. RATTLE88 was used to constrain the bond vibrations of the water molecules of the MM region, whereas the hydrogen atoms of furan were left unconstrained. Furan was described using time-dependent density functional theory (TD-DFT) at the BP86/def2-SVP89–92 level of theory, as implemented in Orca 5.0.93 This level of theory was chosen based on a benchmark performed by Hieringer et al.,94 in which a total of 12 different methods were compared, including other DFT functionals, and wave function methods such as CASPT2 or CC3. BP86 showed the best agreement with the experimental excitation energies from Kamada et al.95
The absorption spectrum of furan in water was calculated using the first 100 frames from the production run of the MM–MD simulations (recall Fig. 2). The spectrum comprises the lowest ten singlet excited states, covering an energy range up to 9.5 eV, see Fig. 3 and Table S1 of the ESI.†
In the simulations set I, we use all 600 snapshots generated from the MM simulation as possible initial conditions (position and velocities) for the subsequent QM/MM-TSH excited-state dynamics, which included the lowest eleven singlet states. Furan is then excited within the energy range of 7.3–7.5 eV, primary targeting the S3 state. Based on the associated excitation energies and oscillator strengths of the sampled 600 geometries, the initially excited electronic states were selected stochastically,96 resulting in 46 initial conditions. Out of those, 39 started in S3 (the state with a large dipole moment, see Table S1†) and 7 started in S4.
Following previous studies,63,68 which indicate that the relaxation of furan to the ground state occurs within approximately 200 fs, the trajectories of set I were propagated for 330 fs. A nuclear time step of 0.5 fs and an electronic time step of 0.025 fs were employed. Nonadiabatic couplings between singlet states were obtained from the local diabatization algorithm by Granucci et al.,76 computing the overlap of the wave function between two consecutive time steps.77,78 To conserve energy, the velocities of the QM particles were uniformly rescaled after each hop to account for the potential energy difference between the new and old states (no explicit nonadiabatic coupling vectors were used for a projection of the velocity vectors). Since TD-DFT cannot describe conical intersections between S1 and S0 due to the degeneracy of the reference state (ground state) and the first excited state,97 trajectories were forced to hop to the ground state whenever the energy difference between the states S1 and S0 states was smaller than 0.1 eV. Once in the ground state, no back transitions were allowed until the end of the propagation time. The data (set I) were then used for model training, validation, and testing.
Since the local diabatization scheme cannot be used with ML/MM simulations because of the unavailability of the wavefunction, a second set of QM/MM-TSH simulations was performed. In this, so-called set II, the nonadiabatic couplings between the singlet states were determined using the curvature-driven TSH scheme recently developed by Zhao et al.,98 which relies on the second time derivative of the energies only and thus is accessible in conjunction with ML potentials. In this way, the resulting ML/MM-TSH dynamics are directly comparable with the reference QM/MM curvature-driven TSH ones. The initial conditions for set II of QM/MM simulations were taken from the last 2 ns of the MM trajectory, corresponding to 500 possible initial conditions (see Fig. 2). In order to investigate the transferability of the ML potential, we excite between 6 and 7 eV to have a different excitation window and thus obtain a different set of trajectories with initial conditions and energies different from those used in the training set (set I). The chosen window resulted in 66 excitations to the bright S2 state. The 26 excitations to the S1 were not propagated, in order to have all trajectories starting from the same state. This makes the dynamics and kinetic fits easier to interpret. All other settings were identical to those employed in set I.
![]() | ||
Fig. 4 Schematic representation of the communication between the driver of the nonadiabatic dynamics, the SHARC engine,99 and the ML model, FieldSchNet, needed to perform excited state ML/MM dynamics. The SHARC interface extracts the value of the electric field at the position of every ML atom and passes both the field value and the atom positions to the ML model. The FieldSchNet model, predicts the field-dependent energies and gradients and passes them back to the SHARC interface. The system is then propagated in time by the SHARC driver. Here, none of the steps are based on file I/O further accelerating the dynamics. |
Data set I comprises all points collected from the 46 QM/MM-TSH trajectories, each 330 fs long with a 0.5 fs time step (i.e. 46 × 330 × 2 points). However, because a few trajectories ended prematurely, the total amounts to 28935 data points. This training set was then divided into training, validation and testing using 37, 5 and 4 trajectories, respectively. The validation and test sets were used solely for analyzing the training error on energies and forces, but not for comparing dynamics results, as this cannot be done under identical conditions. We call this approach “split by trajectory”, see Fig. 5a. Furthermore, we performed an alternative set of ML trainings, in which we randomly assigned frames from the 46 trajectories to the three different tasks (train, validation, and test) in a ratio of 80
:
10
:
10. We call this way of mixing the data “random split”, see Fig. 5c. The advantage of “split by trajectory” is that the test error is more likely to reflect the true performance, as it ensures that the model has not seen any frames of the trajectories belonging to the test set. The disadvantage is that the number of trajectories from which the model can learn is smaller than in the “random split” scheme. To further investigate the impact of data partitioning in the dynamics, we also examined the effect of subsampling in time for each partitioning scheme, as consecutive frames that are 0.5 fs apart are likely to be very similar and thus add little new information to the data set. We therefore tested the effect of using only every second (50% of data usage), third (33%), etc., data point. This is schematically indicated in panels b and d of Fig. 5 for the example of 33% of data usage.
The ML trainings were performed using the Adam optimizer.101 We set the starting learning rate to a value of 10−5, and the parameters for the exponential averages of past gradients (momentum term) and squared gradients (raw second moment) to 0.9 and 0.999, respectively. Higher initial learning rates were consistently found to cause numerical instabilities during training. A learning rate scheduler was used to adjust the learning rate. If the model's performance on the validation set did not improve for 20 consecutive epochs, the learning rate was decreased by 20% to avoid overstepping. The maximum number of epochs was set to 5000; alternatively, training was stopped early if the learning rate fell below 10−6. The batch size was 10. We used the augmented mean squared error from eqn (8) as the loss function with wE = 1 and wF = 10.
![]() | ||
Fig. 6 Change in model test performance on (a) the test set taken from set I and (b) trajectories from set II, as a function of the fraction of data used during model training. Results of “random split” models are shown as circles, “split by trajectory” as stars. Mean absolute error (MAE) of the energy and root mean square error (RMSE) in the forces are shown in blue and grey, respectively. All values correspond to averages over all five electronic states, indicated by the bars over MAE and RMSE. The three symbols for each combination of split type and fraction of data correspond to the three models trained with different random initializations. To guide the eye, lines connect the averages of the models with identical hyperparameters. To show the trends more clearly, two outliers in subfigure (b) have been excluded from this plot, a version that includes these points can be found in the ESI (Fig. S24).† |
The learning curve (Fig. 6a) from the “random split” (circles) forms a nearly perfect line on a log–log graph, suggesting a power-law relationship between the sample count and the error size. This partitioning scheme often places training and testing frames in close temporal proximity. When using the entire data set, the training set probably includes the surrounding time frames of the test frames. This introduces a strong dependency between training and testing errors, raising doubts about whether a small testing error truly reflects the model's ability to generalize (i.e. the power to correctly extrapolate).
In contrast, the “split by trajectory” training approach shows a different pattern. The errors initially decrease before stabilizing at around 33% data usage, which corresponds to using every third time step. This behavior arises because the training and test sets are less correlated in this scheme. The “split by trajectory” method ensures that test errors directly evaluate the model's generalization capabilities. Beyond a certain point, adding more closely spaced data does not further improve the test performance. This is due to minimal configuration changes within a time step of 0.5 fs, offering little additional information for extrapolation.
The discrepancy between the “split by trajectory” and random split test errors when using closely spaced trajectory frames is noteworthy. The scatter (parity) plots shown in Fig. S2 and S4† for the models using 100% of the training data do not exhibit any trends that would point to a general difference between those models, except that the “split by trajectory” models perform worse, which one would expect based on their relative average test performance from Fig. 6a. However, one should note that the models trained using the “split by trajectory” exhibit a much more homogeneous test performance in comparison to the “random split” models, where some appear to be much better or worse than the other two. If one were to choose settings based on Fig. 6a, then “ramdom split” with 100% and “split by trajectory” with 33% of the data are expected to perform best.
As can be seen, the electronic populations derived from ML/MM show significant differences depending on which model was used to generate the random split 100% and split by trajectory 33% trajectories, even if those models differ only by their random initialization. The random split 100% models #1 and #3 produce dynamics with very similar population curves as the reference QM/MM dynamics, whereas the dynamics of model #2 show much slower internal conversions. For the split by trajectory 33% models, the visual agreement of the predicted populations increases from model #1 to #3.
The agreement of the ML/MM electronic populations with the corresponding QM/MM reference cannot easily be explained with the test statistics presented in the previous section. Inspection of the parity plots for the energy gap between neighboring PESs (Fig. S2–S4 of the ESI†) reveals that the split by trajectory models appear to be worse at predicting this energy difference. This is problematic, as the energy gap is used to force hops into the ground state from S1. To probe the model performance in regions with small energy gaps further we employ an energy-gap-weighted error measure,
![]() | (11) |
To facilitate a more effective comparison across the various models, in addition to visually evaluating the population curves, we fitted a kinetic model to the different steps of the relaxation process. This model includes only two time constants: τ2→1 for the internal conversion from S2 to S1 and τ1→0 for the transition from S1 to S0. The QM/MM reference simulations predict the first internal conversion to be about five times faster (τQM/MM2→1 = 16.8 fs) than the second one (τQM/MM1→0 = 64.9 fs). These time constants are comparable to those predicted in the gas phase (9.2 fs and 60 fs, for the lifetimes of the S2 and S1 states, respectively), by Fuji et al.63 using similar TSH simulations and TD-DFT, which in turn are also consistent with time-resolved photoelectron spectra, recorded by the same authors. The similarity of the time constants indicates that the effect of the solvent is not very pronounced, slightly decelerating the decay of the bright S2 state.
The different numerical values of the kinetic fits and (weighted) errors, obtained by the ML/MM simulations are collected in Table 1. We observe that none of the ML/MM models produces dynamics that relax as fast as observed in the QM/MM simulations. Furthermore, the relative error for τ2→1 is almost always larger than for τ1→0, because of their difference in magnitude. Intriguingly, the kinetics of the ML/MM simulations can differ greatly even for those models that have the same training settings and only differ by weight initialization. In general, models trained using the random split appear to perform better, especially when the training frames are taken at smaller intervals. The reason for this is likely the small set of trajectories available for training, such that splitting by trajectory imposed a stronger limit on the phase space available for training than the random splitting. More trajectories in the training set are expected to remove this trend. Indeed, random split with 100% of the training data delivers the best results.
Split type | % of data | Model # | τ2→1 [fs] | τ1→0 [fs] | [kcal mol−1] | [kcal mol−1] | [kcal mol−1 Å−1] | [kcal mol−1 Å−1] |
---|---|---|---|---|---|---|---|---|
Random | 100 | 1 | 29 ± 4 | 73 ± 4 | 1.4 | 1.4 | 5.9 | 56.3 |
2 | 138 ± 28 | 92 ± 12 | 2.7 | 3.2 | 9.1 | 76.2 | ||
3 | 28 ± 5 | 73 ± 6 | 1.3 | 1.3 | 6.0 | 54.9 | ||
33 | 1 | 63 ± 16 | 90 ± 8 | 3.3 | 3.8 | 10.8 | 95.3 | |
2 | 36 ± 8 | 97 ± 8 | 2.6 | 2.7 | 9.8 | 69.5 | ||
3 | 66 ± 11 | 84 ± 6 | 5.0 | 5.8 | 19.0 | 126.0 | ||
1 | 1 | 240 ± 41 | 164 ± 29 | 28.5 | 22.9 | 26.0 | 168.0 | |
2 | 134 ± 18 | 130 ± 16 | 9.5 | 10.0 | 23.6 | 126.7 | ||
3 | 97 ± 10 | 116 ± 14 | 11.1 | 11.3 | 23.2 | 124.9 | ||
By traj. | 100 | 1 | 153 ± 26 | 95 ± 10 | 6.3 | 6.3 | 13.7 | 81.7 |
2 | 37 ± 6 | 251 ± 33 | 6.0 | 6.0 | 14.1 | 83.9 | ||
3 | 292 ± 61 | 109 ± 17 | 6.0 | 5.9 | 13.9 | 82.6 | ||
33 | 1 | 67 ± 17 | 67 ± 6 | 5.6 | 5.6 | 13.6 | 80.4 | |
2 | 86 ± 11 | 134 ± 18 | 5.9 | 6.0 | 14.1 | 82.7 | ||
3 | 173 ± 32 | 76 ± 9 | 5.7 | 5.8 | 14.0 | 82.0 | ||
1 | 1 | 231 ± 23 | 70 ± 11 | 10.0 | 10.7 | 23.2 | 122.8 | |
2 | 393 ± 70 | 100 ± 22 | 10.0 | 11.2 | 24.8 | 132.2 | ||
3 | 292 ± 51 | 109 ± 18 | 12.1 | 13.4 | 25.4 | 132.6 |
While it is generally a problem to have a confidence measure for ML-based MD simulations where the behavior of the system is not known, it is gratifying to see that the weighted error, as specified in eqn (11), serves as a metric to predict which of two models with identical hyperparameters will perform better. Comparing the normal and weighted RMSE, it can be seen that the energy errors change by a maximum of 1 kcal mol−1, except for the models trained with only 1% of the data. We can therefore assume that energy values close to the intersection seams are learned with a similar accuracy as points further away from these important regions. However, the same does not appear to be true for the gradients of the PESs. The increase from non-weighted to energy-gap-weighted RMSE varies, but is approximately a factor of 5 to 10. Weighted energy and force RMSEs taken together suggest that the distance between the different electronic PESs is roughly correct; however, the topology close to the avoided crossings is not. We therefore conclude that the general feature of the avoided crossing is represented correctly, but that the individual ML-PESs are significantly less smooth in these regions, leading to a strong increase in the force errors, albeit not in the energies. Surprisingly, a model with a weighted force RMSE of 50 kcal mol−1 Å−1 predicts the relaxation dynamics qualitatively correctly (random split, 100%, models no. 1 and 3). Since the weighted error decreases with larger training set sizes, increasing the training set size even further should lead to smaller errors. Focusing on the correct reconstruction of the slope near the intersection seams seems to be especially important. This interpretation is further supported by test calculations where we used the time-derivative of the gradients instead of the energies, which lead to significantly decreased agreement between the ground truth and the ML models. Hence, the computed ML transition rates have the correct order of magnitude, because the couplings that were used in our simulations only depend on the energies and not on the gradients.
To further understand the performance of the different models, we also performed an error analysis regarding energies, gaps, and forces for the data in set II. Parity plots for set II are shown in Section S2.2 of the ESI† and the average errors in Fig. 6b. One can see that on this data – unseen by all models – “random split” and ”split by trajectory” models perform similarly. This underlines that test statistics based on the random splitting (see Fig. 6a) provide a strong underestimation of the true error. Furthermore, the parity plots of Section S2.2 in the ESI† show that all models (random and trajectory split) appear to strongly overestimate ES0 for configurations with a ground state energy above 75 kcal mol−1 (likely geometries with strongly elongated or broken bonds), which leads to a significant underestimation of the energy gap to the first excited state in these regions. It may appear confusing that the models tend to underestimate this gap when their dynamics is always slower than the QM/MM reference. However, this only happens for configurations with a high ground-state energy, which are those occurring after relaxation to the ground state, and are then prevented from hopping back up (see discussion in Section S3.2 and Fig. S18 of the ESI†).
In general, the errors of the models based on the set II are so large that it is surprising that they can produce reasonable population decays. These errors are based on geometries from the entire 300 fs, where furan displayed ring opening and subsequent bond rearrangements (discussed in detail in Section S4 of the ESI†), which might be difficult to describe with (TD-)DFT. The initial part of the dynamics is much better reproduced by the ML/MM than the latter parts, since it corresponds to regions in configuration space that are reasonably described by (TD-)DFT. To test this hypothesis, we performed a second round of error analysis on set II, where we only included frames from the first 75 fs. The much better performance of the models on these configurations becomes apparent when looking at the parity plots in Section S2.3 of the ESI.† The errors (MAE and RMSE) of forces, energies, and energy gaps are significantly lower than those on the entire set II. For split by trajectory they are basically identical to the test statistics obtained from set I. This means that the models fit the PESs well in the part of the configuration space that is explored right after irradiation, which is the part needed for the relaxation back to the electronic ground state. The subsequent distortions due to the excess energy of 6–7 eV, are not well described, but these deficiencies are less relevant to the change in electronic populations. This is why most models were able to reproduce the relaxation dynamics of set II even though they cannot extrapolate correctly to the configurations visited in later stages of the QM/MM simulations. We want to acknowledge, however, that FieldSchNet51 is based on the invariant SchNet architecture.72 Hence, we expect an equivariant ML model capable of handling external charges102 to be more robust, just as such models are in the usual field-independent setting.103
![]() | ||
Fig. 8 Aligned QM/MM S2–S1 (a) and S1–S0 (b) hopping geometries obtained from trajectories in set II. Comparison of S1–S0 hopping geometries obtained from QM/MM (c) and ML/MM (d) (random split, 100% data, model #1) by projecting them onto the maximum of either the C1–O and C4–O bond distance (see Fig. 1b) as well as the PC2 from a PCA performed solely on the QM/MM frames (Section S5). |
To analyze the similarities of the hopping structures found in the QM/MM and ML/MM simulations, we performed a principal component analysis (PCA) on the set of QM/MM geometries for the transitions between S1 and S0 using a Coulomb matrix representation104 of furan (Section S4 of the ESI†). We found that the first principal component (PC1), which recovers about 89% of the overall variance, focuses on the distance between the oxygen atom and carbon atoms 1 and 4, whereas PC2, which reflects 9% of the variance, is a linear combination of several interatomic distances. In Fig. 8c we replaced PC1 with the maximum length of the two bonds C1–O and C4–O for easier visualization, as PC1 creates an inverted V due to the symmetry of the breaking bonds (see Fig. S19 in Section S4 of the ESI†). One can see that the geometries form one continuous hopping seam.
When applying the same linear transformation to the hopping geometries of the ML/MM simulation (here shown for random split, 100%, model #1) and projecting them onto PC2 and the maximum of the C–O bond distances, they form the same diagonal line, indicating that hops occur in the same region of configuration space. This is not only true for the single model analyzed here, but for most models (Fig. S20 in Section S4.2.1 of the ESI†). Deviations from the rather even distribution seen in Fig. 8c correlate with significantly different relaxation dynamics, e.g. some models hop only at certain points along the seam creating visual clusters in the 2D-projection. As the S2–S1 hopping geometries are all very similar, the projection onto the first two principal components forms a single cluster (Fig. S21 in Section S4.2.2 of the ESI†). Trajectories with slow internal conversion from S2 to S1 show outliers in the 2D-projection (Section S4.2.2 and Fig. S22 of the ESI†).
We compared the training errors derived from using two distinct data splitting strategies – “random sampling” and “split by trajectory”, depending on whether the data points were sampled randomly from any trajectory, or entire trajectories are used for training, validation and testing. Even though its performance is limited by the smaller number of available trajectories, we recommend to use “split by trajectory”, as it offers cleaner data separation and more realistic error statistics. In general, we observe a wide range in the performance of the ML/MM models when trying to reproduce the nonadiabatic QM/MM dynamics of a set of held out trajectories, even for ML models with the same hyper-parameters. Strong sensitivities of excited state kinetics generated by ML models were already noted in earlier studies.41 However, well-chosen test statistics served as reliable indicators of model performance. Energy gap-weighted errors help to highlight discrepancies near the intersection seams. Projecting hopping geometries onto two dimensions provides valuable insights to understand the difference between ground truth dynamics and those derived from the ML model.
Based on our findings, we recommend against using costly nonadiabatic QM/MM dynamics simulations to generate training data, where a single time step of the nonadiabatic QM/MM dynamics takes already 2 minutes using (TD)-DFT for the presented test system. A study that goes beyond a proof of concept would likely use a more expensive level of theory that handles bond dissociations correctly. Most computational effort is spent on geometries that are far away from critical regions of the PESs (the intersection seams). Furthermore, the produced geometries are highly correlated as they are closely spaced in time. Future applications should therefore aim to collect training data through active learning schemes,105,106 which has the added benefit that such data are not correlated, which means that no computational effort is wasted obtaining frames that might later be discarded by subsampling in time. We expect models trained on such data to generate better forces near the intersection seams and to reproduce the dynamics of the QM method more reliably.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5dd00044k |
This journal is © The Royal Society of Chemistry 2025 |