Patrizia
Mazzeo
*,
Edoardo
Cignoni
,
Amanda
Arcidiacono
,
Lorenzo
Cupellini
* and
Benedetta
Mennucci
Dipartimento di Chimica e Chimica Industriale, Università di Pisa, Via G. Moruzzi 13, 56124 Pisa, Italy. E-mail: patrizia.mazzeo@phd.unipi.it; lorenzo.cupellini@unipi.it
First published on 11th October 2024
The application of quantum mechanics (QM)/molecular mechanics (MM) models for studying dynamics in complex systems is nowadays well established. However, their significant limitation is the high computational cost, which restricts their use for larger systems and long-timescale processes. We propose a machine-learning (ML) based approach to study the dynamics of solvated molecules on the ground- and excited-state potential energy surfaces. Our ML model is trained on QM/MM calculations and is designed to predict energies and forces within an electrostatic embedding framework. We built a socket-based interface of our machinery with AMBER to run ML/MM molecular dynamics simulations. As an application, we investigated the excited-state intramolecular proton transfer of 3-hydroxyflavone in two different solvents: methanol and methylcyclohexane. Our ML/MM simulations accurately distinguished between the two solvents, effectively reproducing the solvent effects on proton transfer dynamics.
A critical issue of QM/MM methodologies is related to their computational cost especially when applied to dynamics. At present, QM/MM molecular dynamics (MD) simulations allow processes to be studied on the order of 10–100 picoseconds,3,7 while purely classical models are needed to study longer timescales. Several efforts are being made in order to speed up QM/MM MD simulations, for example by using enhanced sampling,9–11 and by making smart guesses of the solution of the QM problem by using the trajectory information.12–16
A possible strategy to significantly enhance the applicability of QM/MM methods to ground- and excited-state dynamics is to introduce surrogate machine learning (ML) models. In recent years, in fact, ML methods have proven to be a viable solution to obtain energies and properties with QM accuracy at a much reduced computational cost.17,18 From the beginning, ML methods have been coupled with MD engines, allowing the simulation of systems over timescales inaccessible to pure QM simulations.19–24 In addition, several efforts have been devoted to properly couple the ML description with a classical description of the environment. Some authors have used ML to obtain more accurate implicit solvent models.25,26 When ML is applied to the modelling of an MM environment, some works focus on some sort of mechanical embedding27–29 scheme, where ML potentials substitute the MM force field for part of the simulation. More recently, several models tackled the challenges of a more rigorous electrostatic embedding level.30–37 These models attempt at reproducing the effects of electrostatic embedding QM/MM approaches, where the properties of the QM part are modified by the MM charges. Within deep learning potentials, some authors have directly included the electrostatic fields from the MM environment into existing neural network architecture.30–32 Others focused on predicting atomic charges and polarizabilities for the ML part, which are then allowed to interact with the MM environment.33,35 A similar strategy is used by Grassano et al.36 to correct a purely mechanical embedding ML/MM with the polarization of the ML part. Another strategy is to divide the system into more than two regions and treat their interaction at different levels of theory.38,39
ML potentials are usually trained to describe the ground state of molecules,40–42 even though works have appeared where ML is used to tackle the additional challenges of excited states.43–48 However, to the best of our knowledge, electrostatic ML/MM models to describe excited-state dynamics in solvated systems have not been developed.
In this work, we develop ML models to describe the dynamics of solvated molecules in their ground and excited states, taking into account the environment within an electrostatic embedding scheme. In the context of ML simulations in condensed-phase systems, we aimed to reproduce the environment effects by including electrostatic interactions with MM charges and the polarization of the QM region due to the surrounding environment.
Our work represents the first application of the ML/MM method with electrostatic embedding in excited-state dynamic simulations. We exploit the expressiveness of kernel methods such as Gaussian process regression to construct a model that can describe a molecule in different solvents using relatively small training sets. The model is implemented in JAX49 in an open-source package called GPX,50 and interfaced to the Sander program of the AmberTools51 suite to perform ML/MM molecular dynamics via the Python script .52 The ML/MM MD is orders of magnitude faster than the corresponding QM/MM implementation, allowing simulation of the dynamics over larger timescales with QM accuracy.
This machinery is applied to 3-hydroxyflavone (3HF), a model system to study excited-state intramolecular proton transfer (ESIPT). The ESIPT reaction in 3HF allows for the interconversion between the normal (N) and the tautomeric (T) forms (Fig. 1), both exhibiting different fluorescent properties. Experiments have shown that steady state fluorescence and time-resolved properties have a strong dependence on the solvent.53–56 Furthermore, experimental and computational studies indicated that the ESIPT reaction is characterized by a fast component (tens of femtoseconds) and a slow component (around 10 picoseconds).10,57–60 The slow component is present exclusively in hydrogen-bonding solvents57,59 and is due to the breaking of hydrogen bonds between the 3HF and the solvent.59 Clearly, describing such a drastic solvent dependence requires including the electrostatics of the environment in a QM/MM (or ML/MM) embedding. We show that our ML/MM MD simulations reproduce the fast and slow components of the ESIPT process when 3HF is solvated with methylcyclohexane (MCH) and methanol (MeOH). This is achieved with as few as 1000 training points and few hours of training time, making our model an attractive strategy for ML/MM ground- and excited-state MD simulations.
Our model is built on some χ = χ(R) descriptor of the R geometry of the system, encoding both the QM and the MM part, and some parameters Θ. A QM/MM property (force or energy) is then given as the function
![]() | (1) |
The QM/MM energy can be written as a sum of contributions
EQM/MM = EvacQM + EintQM-MM + EpolQM = EvacQM + Eshift |
Similarly to ref. 34, 37 and 61, we implemented a hierarchical model for both forces and energies. In this model, the QM-MM interaction is predicted as the environment shift to be added to the vacuum prediction.
![]() | (2) |
Both predictions are made with Gaussian process regression (GPR). The primary consideration is to maintain the integral/derivative relationship between energies and forces to ensure energy conservation throughout the simulation. Within GPR, this can be done by solving the following linear system:22,62
![]() | (3) |
In this compact notation, ∇ represents the derivative with respect to the coordinates of the system. The subscripts in ∇1 and ∇2 indicate that the derivative is performed on the first and the second argument of the kernel, respectively. In eqn (3), the upper left block is the kernel matrix
Kij = K(χ(Ri),χ(Rj)) | (4) |
![]() | (5) |
The lower left block indicates the element-wise derivative of the first argument of the kernel matrix with respect to the coordinates of the system:
![]() | (6) |
The lower right block is the Hessian of the kernel, which can be written as follows:
![]() | (7) |
![]() | (8) |
![]() | (9) |
![]() | (10) |
The kernel function we used for taking into account the internal geometry changes belongs to the class of Materns, ν = 2.5:
![]() | (11) |
dij = ‖χID,i − χID,j‖. | (12) |
For the vacuum model (see Fig. 2(a)), following the approach of the GDML models,40,41 we found that it was sufficient to solve only part of the general system in eqn (3). In particular, we trained this model only on forces, solving the following problem:
(∇QM1∇QM2Kvac + σvac2I)αvac = −Fvac | (13) |
Kvac(χi,χj) = Kinternal(χID,i,χID,j) | (14) |
We employed this approach for both GS and ES vacuum models, where Fvac in eqn (13) represents forces for the QM part calculated in a vacuum for the ground state and the first excited state, respectively. From the model described above, the prediction of forces on a χ* test point is achieved through a linear combination of Hessian kernel evaluations with the training points:
![]() | (15) |
The prediction of energies is then obtained as the integral of the previous expression
![]() | (16) |
Notice that since this prediction requires an integral, this energy is defined up to a constant C, which can be defined as:41,64
![]() | (17) |
![]() | (18) |
Kdirect(χPot,i,χPot,j) = χPot,iT·χPot,j. | (19) |
Contrary to the vacuum model, the environment shift model needed to train on both energies and forces (eqn (3), see also Fig. 2(b)) to obtain low errors. We attribute this different behaviour to the amount of information contained in the χID and χPot descriptors. Indeed, while χID is a complete descriptor of the molecular geometry, the potential χPot contains a trace over the MM atoms and is evaluated on the QM atoms only, so it is intrinsically approximate. For this model, we imposed an equal regularization for energies and forces (σe = σf). In this case, the targets are energies and forces for both the ground and the first excited state, computed as the difference between the QM/MM calculation and the vacuum:
![]() | (20) |
Also in this case we trained two models, for GS and ES respectively. We put both the direct environment effect and the influence of internal geometry in the covariance matrix:
Kshift(χi,χj) = Kdirect(χPot,i,χPot,j)·Kinternal(χID,i,χID,j). | (21) |
This is possible due to the versatility of GPR, which allows the use of any symmetric and positive semi-definite function as a kernel, e.g. the Hadamard product of two symmetric and positive semi-definite functions.
Note that we train only on shift forces acting on QM atoms, but the QM-MM interaction contributes to the forces on both QM and MM atoms. This means that during the prediction phase, the derivative with respect to the first argument of the kernel will always pertain to QM positions, as it corresponds to the training points. On the other hand, the derivative with respect to the test point geometry will vary: it will be with respect to the QM coordinates for the QM/MM force contribution to QM atoms (QMshift) and with respect to the MM coordinates for the QM/MM force contribution to MM atoms (
MMshift):
![]() | (22) |
![]() | (23) |
![]() | (24) |
The energy (Êshift) is predicted following eqn (9). More detailed expressions of the derivatives in eqn (22) and (23) are provided in the ESI.† Our environment shift model essentially learns the relationship between QM/MM energy and force shifts and the electrostatic potential used as a descriptor. Since changing the solvent alters the MM charges and, consequently, the potential, our shift model should, in principle, be able to extrapolate to any solvent, despite being trained on a specific one.
The separation of our QM/MM prediction into a vacuum component and a shift component gives us greater control over each model. It also makes our environment shift model compatible with any existing model for the vacuum. Additionally, the environment shift model depends only on the coordinates of the system (both QM and MM) and the MM charges. These quantities are readily available in any MD engine capable of performing QM/MM simulations.
![]() | ||
Fig. 3 Scheme representing the Amber-GPX interface via the ![]() |
The communication between Python and Sander is based on a TCP socket, in which a Python server (52) is executed at the start of the MD and kept running during the whole simulation. Avoiding the Python initialization at each step has the advantage of saving the time needed to import all the necessary packages. In addition, since our models are implemented in JAX49via the GPX50 packages and are thus JIT-compiled, having a single instance of Python running for the whole simulation means that the JAX functions are compiled only at the first step, with significant gains in performance during the MD. The Python server communicates with a Sander client through a Socket implementation in Fortran (
66) which allows exchanging data between Python and Sander without writing and reading on disk. Avoiding input/output (IO) operations on disk allows for a much faster exchange of information, at the price of recompiling Sander with the Fortran-based socket. We also provide the option to leave Sander untouched, and perform the communication between Python and Sander using disk IO operations, in a similar way to the implementation in the
35 code. Throughout this paper, we present ML/MM MD results obtained with the faster, in-memory data exchange mode.
![]() | (25) |
![]() | (26) |
In eqn (25) and (26), N is the number of test points and NQM is the number of QM atoms.
Since the aim was to simulate a reactive process, the training was constructed including all significant geometries that the molecule can assume during time evolution. The most difficult part to be described is the transition state (TS), where the moving proton is in the middle between the two oxygen atoms. Because of that, the training set was divided in three parts, corresponding to the two minima (N and T forms) and the region surrounding the TS. For both vacuum and environment models, the actual training set comprises 500 geometries among the two 3HF forms and 500 geometries near the TS, selected using the farthest point sampling (FPS)67 approach. This approach allows us to obtain configurations as distant as possible from each other.
The prediction of excited-state energies and forces presents the additional challenge of dealing with the mixing between different states. We followed the same procedure described above in terms of building the 1000-point training set, and we trained two models: one including only the excitations with pure HOMO–LUMO transitions and the other selecting geometries among the whole dataset, allowing for mixing with other states. While both models performed similarly on the test sets, we observed differences in their performance during simulation tests. In particular the mixed-state model was more prone to simulation crushes. Therefore, we present the results obtained with the pure HOMO–LUMO transition model.
The target for the vacuum models is simply 3HF forces in a vacuum (see Fig. 2(a)). For the environment shift model, the targets are the differences between QM/MM energies and forces and the respective vacuum quantity (see Fig. 2(b)), i.e., the environment shift model requires two sets of calculations for each geometry. Following the same approach of the US simulation, all the calculations were performed at the ωB97XD level (TD-ωB97XD for the excited state). The basis set was customized in order to have an accurate description of the proton transfer, so we used 6-31G(d,p) for the moving proton and the two oxygen atoms involved in the bond breaking and formation and 6-31G for all the others. All calculations were run on Gaussian16.68
Concerning the performance of our GPX package, we trained our models on a single computer node AMD EPYC 7282 running@2.8 GHz using 32 threads. By exploiting an analytic solver, GPX took 6.5 min for the training on forces and 8.1 min for the training on energies and forces with 1000 training points (average time between GS and ES models).
Since we imposed equal regularization for energies and forces in eqn (3), we have two hyperparameters to optimize for both vacuum and environment models: λ of the Matern kernel and σ of regularization. Hyperparameter optimization for each model was conducted using grid search (λ = [5, 10, 15, 20], σ = [10−5, 10−4, 10−3]) and four-fold cross-validation (CV). The total training time, including CV for hyperparameter optimization, was 4.29 h. For all the models, CV produced very small values of the σ hyperparameter (10−5), making them more prone to overfitting. Indeed, during initial MD simulation tests, we encountered instabilities in our environment shift models, leading to the breaking of the system. We utilized these failed trajectories as additional test cases to further validate our models, but we did not include them in the training set. This process revealed the need to increase the regularization strength to accurately describe geometries outside the training set. Raising the σ parameter from 10−5 to 10−3 caused slightly larger errors on the test set, but significantly improved the robustness of our environment shift models. Therefore, for both GS and ES shift models we set σ = 10−3 and used again cross-validation to find the optimal value of λ. These last models are the ones we show in the Results section and that we used for production simulations. We included the learning curves for all models in the ESI (Fig. S2†).
In Table 1 we report the errors on the MeOH test set for all the models. GS and ES models have similar errors, indicating that, in the absence of mixing with other states, learning the excited state is not more difficult than learning the ground state. For the vacuum model, the error in energies is generally below 0.1 kcal mol−1 and the error in forces is below 0.5 kcal mol−1 Å−1. Notably, these very low errors in energy are achieved despite the model being trained solely on forces. Moving to the shift model, we can observe a decrease in the error in forces with respect to the vacuum, likely due to the lower absolute value of shift forces. Conversely, the error in energies increases despite the lower absolute value of environment shift energy. We attribute this behaviour to the fact that, as previously mentioned, the environment shift model contains only a trace of the MM atoms, while the energy depends equally on QM and MM atoms. However, the combined prediction gives an error well below the chemical accuracy (environment model, Table 1). We also reported a plot of the errors in energy and forces along the reaction coordinate in Fig. S15 in the ESI.†
@Vacuum | @Shift | @Environment | |||||
---|---|---|---|---|---|---|---|
RMSE | MAE | RMSE | MAE | RMSE | MAE | ||
GS | Energy | 0.067 | 0.051 | 0.267 | 0.214 | 0.273 | 0.217 |
Force | 0.433 | 0.280 | 0.143 | 0.100 | 0.458 | 0.307 | |
ES | Energy | 0.066 | 0.050 | 0.265 | 0.215 | 0.272 | 0.218 |
Force | 0.473 | 0.291 | 0.184 | 0.125 | 0.497 | 0.321 |
The performance of our models is reported in Table 2 and Fig. 6. Plots of the force for all QM atoms can be found in Section S2 of the ESI.† When examining the @shift column of Table 2, we can conclude that the model is capable of describing the environment effect in MCH. The environment shift model errors in MCH are significantly lower than the ones in MeOH because MCH provides a minimal environmental contribution to energies and forces, resulting in reduced error.
@Vacuum | @Shift | @Environment | |||||
---|---|---|---|---|---|---|---|
RMSE | MAE | RMSE | MAE | RMSE | MAE | ||
GS | Energy | 0.221 | 0.157 | 0.091 | 0.074 | 0.240 | 0.171 |
Force | 0.890 | 0.606 | 0.062 | 0.043 | 0.891 | 0.607 | |
ES | Energy | 0.509 | 0.316 | 0.121 | 0.100 | 0.537 | 0.337 |
Force | 1.612 | 1.033 | 0.070 | 0.048 | 1.612 | 1.034 |
A larger error is present for the vacuum model. This dissimilarity can be attributed to the different way in which MeOH and MCH test sets are generated, since for MeOH we used geometries extracted from the same ab initio trajectory of the training set, whereas the MCH test set consists of completely new geometries extracted from an ML/MM simulation (see Section 4.4). The larger error on the full prediction in the environment reflects the larger error on the vacuum model.
Fig. 7 shows the approximate timings for each significant part.
![]() | ||
Fig. 7 Graphical representation of the average time required for each operation at each step of the simulation. |
The bar-plot highlights that 70% of the time is employed by Amber, which evaluates the MM-only classical energies and forces, and computes the van der Waals interactions with the 3HF. Concerning the ML part, we can say that the remaining 30% is almost equally divided between the computation of the descriptor and Jacobian for the MM part, χPot and JPot, and the predictions of the vacuum and shift contributions. The calculation of the ID descriptor χID and Jacobian JID, as well as the data exchange between Sander and the Python server, takes negligible time (more details are provided in Table S1 in the ESI†).
In terms of computational efficiency on similar hardware, reference QM/MM simulations achieve a performance of approximately 1.0 ps per day for ground-state simulations. In contrast, using our ML/MM strategy, we achieve a significantly higher performance of 200 ps per day. For excited-state simulations, the advantage is even more pronounced. Here, our ML/MM approach maintains the same high performance as ground-state simulations. In contrast, traditional TD-DFT simulations typically achieve a performance of only 0.1 ps per day.
This significant improvement in performance allows us to conduct simulations more rapidly and efficiently, which is particularly beneficial for studying complex processes such as excited-state dynamics.
Finally, to show the stability of our ML/MM potential for longer dynamics, we plot in Fig. S17 of the ESI† the results of a 1 ns long ML/MM simulation of the ES in MeOH in the NVE ensemble. Within this timescale we did not observe drifts in the total energy or sudden spikes in the force, which demonstrates the stability of our simulations.
In Fig. 9 we plotted the reaction coordinates (OTH and ONH) for the proton transfer over the simulation time for a representative MeOH trajectory. The moment in which the two lines cross identifies the proton transfer. Grey lines in Fig. 9 show all the other MeOH trajectories, which display PT at different times.
The cumulative results of our simulations in both solvents are reported in Fig. 10. This plot illustrates the fraction of trajectories that undergo PT over the simulation time. We can confirm that our model effectively captures the differences between the two solvents. The plot shows that the ultra-fast component is observed in both solvents, but with lower probability in MeOH. On the other hand, the slow component is present only in MeOH, due to the presence of hydrogen bonds, which must be broken before the proton transfer can happen. This is the reason why all trajectories exhibit the ESIPT within approximately 3 ps for MCH. Conversely, for MeOH, not all trajectories reached the T form even after 10 ps of simulation. This is in agreement with the experimental observation that fluorescence emanates only from the T form in non-hydrogen-bonding solvents and from both N and T forms in hydrogen-bonding ones.53 Within the statistical uncertainty, this plot is comparable to the one obtained with ab initio simulations,10 even though a direct comparison is not possible due to the different approach for what concerns the environment description (electrostatic embedding vs. polarizable embedding) and the statistical accuracy (fewer trajectories were run in ref. 10).
In Table 3, we present the results of a multi-exponential fitting of the curves shown in Fig. 10. The fitted curves are depicted in Fig. S18 of the ESI.† We achieved a good fit using a combination of two exponential functions for MCH and three for MeOH. For both solvents, we found an ultrafast component of the order of 150 fs and a slower one of 1.0 ps. The presence of an ultrafast component is in agreement with transient absorption experiments,57,59 although in experiments this component seems even faster than in our simulations. A 1.6 ps component was also measured in ref. 59, which we connect with the 1.0 ps component found in simulations.
τ 1 ps (a1) | τ 2 ps (a2) | τ 3 ps (a3) | |
---|---|---|---|
MCH | 0.15 (0.50) | 1.0 (0.50) | |
MeOH | 0.18 (0.04) | 1.0 (0.23) | 16.0 (0.73) |
The slow component (∼10 ps) was observed only for MeOH, which is consistent with experimental measurements,57,59 as the authors could attribute the ∼10 ps component measured in MCH to hydrogen-bonding impurities in the solution.
Overall, our simulations well reproduce the qualitative difference between MeOH and MCH solvents in the PT timescale and extent. The ultrafast components indicate that PT is almost barrierless in MCH, as well as in part of the MeOH configurations.10 Our timescales are slightly slower than experiments for this component, which we can tentatively attribute to missing nuclear quantum effects in the proton transfer coordinate.
For the training of our models we rely on GPR in the gradient domain (learning only forces or both energies and forces). This approach is beneficial because it provides 3 × NQM forces from a single sample, allowing us to train with a relatively small dataset. With only 2 × Nsample calculations (one in a vacuum and one in the environment) one can run numerous trajectories over extended timescales. This approach can thus significantly enhance ground-state simulations, but it has an even greater effect on excited-state simulations, which are considerably more computationally demanding. Training a model for excited states is very delicate, as strong mixing can significantly impact learning, sometimes necessitating the use of diabatization procedures.37,48,75 However, for excited states with low mixing, learning based on geometries that exhibit a pure HOMO–LUMO transition makes training for the excited state as straightforward as for the ground state.
We also built a machinery which is able to connect the GPX software with Amber to perform ML/MM simulations. The core of this interface is a Python script (), which receives coordinates and charges from Sander and returns forces and energies. This server is not specific to GPX, allowing any ML model to be implemented and used for simulations. This socket-based data exchange is crucial as it eliminates the overhead associated with reading and writing files, significantly reducing the cost of the simulation.
Our strategy was applied to the study of excited-state intramolecular proton transfer (ESIPT) of 3-hydroxyflavone in two solvents: methanol as a polar protic solvent and methylcyclohexane as an apolar and aprotic one. The use of ML can overcome the main limitations of ab initio approaches. By running 100 trajectories per solvent, we significantly increased the statistical accuracy of our descriptions. Furthermore, we were able to investigate both the ultra-fast and the slow components of the ESIPT reaction using standard simulations, whereas the latter required enhanced sampling techniques for ab initio simulations.10 Our simulations could reproduce with good accuracy the experimentally determined ESIPT time constants in both solvents. Importantly, our environment models proved able to extrapolate to a different solvent than the one used in the training. On the other hand, these models are specific for the molecule in the QM part. In fact, both the inverse distance descriptor and the electrostatic potential on QM atoms encode the QM molecule “globally”. However, we note that training our models requires only a relatively small number (1000) of training points, and the training time for a molecule like 3-hydroxyflavone is of few hours for both GS and ES. This makes us confident that a similar protocol could be applied to diverse and more complex systems.
Footnote |
† Electronic supplementary information (ESI) available: Derivatives of kernels; learning curves; full force plots; errors along the reaction coordinate; comparison between ML/MM and QM/MM trajectories; trajectory stability; timings; fitting of the PT curves. See DOI: https://doi.org/10.1039/d4dd00295d |
This journal is © The Royal Society of Chemistry 2024 |