Carolin
Müller
a,
Štěpán
Sršeň
bc,
Brigitta
Bachmair
de,
Rachel
Crespo-Otero
f,
Jingbai
Li
g,
Sascha
Mausenberger
be,
Max
Pinheiro
Jr
h,
Graham
Worth
f,
Steven A.
Lopez
*i and
Julia
Westermayr
*jk
aComputer-Chemistry-Center, Friedrich-Alexander-Universität Erlangen-Nürnberg, Nägelsbachstraße 25, 91052 Erlangen, Germany
bInstitute of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Währinger Straße 17, 1090 Wien, Austria
cDepartment of Physical Chemistry, University of Chemistry and Technology, Technická 5, 162 28 Prague, Czech Republic
dResearch Platform on Accelerating Photoreaction Discovery (ViRAPID), University of Vienna, Währinger Straße 17, 1090 Vienna, Austria
eVienna Doctoral School in Chemistry (DoSChem), University of Vienna, Währinger Straße 42, 1090 Vienna, Austria
fDepartment of Chemistry, University College London, 20 Gordon Street, London, WC1H 0AJ, UK
gHoffmann Institute of Advanced Materials, Shenzhen Polytechnic University, 7098 Liuxian Boulevard, Shenzhen, Guangdong 518055, P. R. China
hAix Marseille University, CNRS, ICR, Marseille, France
iDepartment of Chemistry & Chemical Biology, Northeastern University, 805 Columbus Avenue, Boston, MA 02120, USA. E-mail: s.lopez@northeastern.edu
jWilhelm-Ostwald Institute, Leipzig Universityy, Linnéstraße 2, 04103 Leipzig, Germany. E-mail: julia.westermayr@uni-leipzig.de
kCenter for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI), Dresden/Leipzig, Germany
First published on 4th September 2025
Exploring molecular excited states holds immense significance across organic chemistry, chemical biology, and materials science. Understanding the photophysical properties of molecular chromophores is crucial for designing nature-inspired functional molecules, with applications ranging from photosynthesis to pharmaceuticals. Non-adiabatic molecular dynamics simulations are powerful tools to investigate the photochemistry of molecules and materials, but demand extensive computing resources, especially for complex molecules and environments. To address these challenges, the integration of machine learning has emerged. Machine learning algorithms can be used to analyse vast datasets and accelerate discoveries by identifying relationships between geometrical features and ground as well as excited-state properties. However, challenges persist, including the acquisition of accurate excited-state data and managing the complexity of the data. This article provides an overview of recent and best practices in machine learning for non-adiabatic molecular dynamics, focusing on pre-processing, surface fitting, and post-processing of data.
Understanding and predicting the photophysical and photochemical properties of molecular systems requires detailed knowledge of their potential energy surfaces (PESs). However, in polyatomic molecules, PESs are inherently high-dimensional, defined by numerous internal coordinates, making their full characterization computationally intractable. A powerful strategy to address this challenge is non-adiabatic molecular dynamics (NAMD) simulations, which enable the exploration of PESs by directly identifying the critical geometries visited upon photoexcitation (see also Section 2). Through this approach, NAMD facilitates the characterization of structure–property relationships that govern the excited-state processes and reactions that occur after light absorption. The trajectory data obtained from NAMD simulations serve as a basis for identifying different nonradiative decay channels, assessing their efficiency, and determining characteristic time scales – key insights that inform the rational design of novel chromophores, materials, and photonic devices. However, despite their ability to resolve real-time excited-state molecular vibrations and reaction pathways toward various photoproducts,5 NAMD simulations are computationally demanding. Due to their statistical nature, achieving meaningful insights requires the propagation of numerous trajectories, each typically evolved with time steps on the order of 0.5 fs. Consequently, a single 1 ps simulation necessitates approximately 2000 quantum chemical calculations. These high computational costs constrain the scope of NAMD applications, particularly for large molecular systems, complex environments,6 and simulations extending beyond the sub-nanosecond regime.
This challenge has spurred the integration of machine learning (ML) techniques into the study of photodynamics, offering a promising route to overcome existing limitations of NAMD simulations and to access previously unexplored regions of chemical space.7–10 ML has become a powerful tool in electronic structure theory, widely used to predict a broad range of properties including Hamiltonians,11 electronic energies, forces, dipole moments, and even experimental observables such as spectroscopic features.10,12–14 Among its many applications, ML potentials for ground-state dynamics are arguably the most successful and broadly adopted.15 More recently, efforts have expanded to include ML potentials capable of representing multiple electronic states, thereby enabling excited-state simulations.
In this setting, ML potentials serve as efficient surrogates for potential energy surfaces (PESs), either for a single state (ground state) or for multiple electronic states. Leveraging large datasets of quantum mechanical calculations or experimental data,10,12–14 ML models can learn complex structure–property relationships and accurately predict key quantities for NAMD simulations, such as energies, forces, non-adiabatic couplings (NACs), and spin–orbit couplings (SOCs). Their significantly lower computational cost compared to ab initio methods enables simulations of excited-state processes at extended timescales.16–18 However, ML-driven photodynamics remains a nascent field and faces several important challenges. These include (i) the limited availability of high-quality excited-state reference data, (ii) the non-uniqueness of certain properties due to wavefunction phase arbitrariness,19,20 and (iii) discontinuities in PESs near regions of strong coupling.8,21 Together with the high computational cost of generating reliable training data, these issues currently hinder broader adoption.
To address these challenges, this article presents a consolidated overview of current best practices for the development and application of ML potentials in excited-state dynamics. Drawing on insights from the CECAM workshop “Machine-learned potentials in molecular simulation: best practices and tutorials”, we focus on ML models tailored to excited-state simulations within the framework of mixed quantum-classical (MQC) NAMD20,22–27 – the most widely used setting for such approaches. Alternative strategies, such as the direct learning of wavepacket propagation, are not covered. Instead, the emphasis is placed on the practical aspects of supervised ML workflows, including data generation and pre-processing, model training and refinement, and the analysis of NAMD trajectories through ML-based post-processing.
The structure of this article reflects a typical ML-driven workflow for excited-state simulations (cf.Fig. 1). We begin with a concise introduction to NAMD, with a focus on surface hopping techniques as the underlying framework for most current ML applications in excited-state dynamics (Section 2). This is followed by a discussion of the data foundation required to train reliable ML potentials, including the selection or computation of quantum chemical reference data (Sections 3.1 and 3.2) and the pre-processing of quantum chemical data (Section 3.3). We proceed by outlining the construction of machine learning models for excited-state dynamics, beginning with the selection of suitable molecular structure representations (Section 4.1) and regression architectures (Section 4.2). Subsequently, we highlight representative implementations of excited-state ML potentials (Section 4.3) and provide a comparative analysis of single- versus multi-state architectures (Section 4.4). Further, we address the incorporation of phase correction during model training (Section 4.5) and examine the critical challenges of ML for quantum dynamics (Section 4.6) and transferability across chemical compound space (Section 4.7). Finally, we address the post-processing stage, highlighting best practices for explorative trajectory analysis (Section 5.1) using methods such as dimensionality reduction (Section 5.2) and clustering (Section 5.3).
NAMD methods have evolved into various approaches, from fully quantum to semiclassical methods. Quantum methods, such as Multiconfiguration time-dependent Hartree (MCTDH)34 or variational multiconfigurational Gaussian (vMCG)35 propagation methods, directly consider the nuclear wavefunctions and offer insights into nuclear quantum effects. A schematic example of wavefunction propagation is shown in Fig. 2a, where a wavepacket is split along the trajectory. However, because of their computational costs, these methods usually require selecting a few degrees of freedom for the propagation, the assumption of model potential energy surfaces (PESs) or the use of linear vibronic coupling models. Trajectory-based methods, such as ab initio multiple spawning (AIMS),36–38 Ehrenfest dynamics,39,40 and trajectory surface hopping (TSH),40–42 assume that the nuclear wave function can be approximated by a swarm of classical trajectories, exemplified in Fig. 2b. Among these approaches, the trajectory surface hopping method is one of the most widely used techniques for investigating photoinduced processes and reactions of medium-sized molecules on the picosecond timescale.36
![]() | ||
Fig. 2 Quantum (a) and classical (b) propagation of the nuclei in excited-state molecular dynamics simulations. Reproduced from ref. 8 under CC-BY 4.0. |
TSH simulations are usually performed on the fly, considering all vibrational degrees of freedom. However, approximated PESs obtained from, for instance, the linear vibronic coupling model, can also be used.43 Notably, model PESs make the simulations much more computationally efficient but introduce approximations. Accurate solutions for linear vibronic coupling systems can only be obtained for rigid systems. ML-based PESs offer higher accuracy and high computational efficiency by learning quantum chemical reference data.8,44 We will now focus on TSH trajectory data that can be fed into ML models, facilitating NAMD simulations.
Several NAMD packages are available to generate the surface hopping trajectories, to name a few: SHARC,44,45 Newton-X,46,47 PyRAI2MD,16 MLatom25 and JADE-NAMD.48 Despite the various implementations of the surface hopping methods in these packages, they share the same aspects to produce the trajectory data, including the surface hopping algorithms, excited-state calculations, and generation of initial conditions used as starting structures for NAMD.
The electronic wave function in surface hopping is a linear combination of multiple electronic states with the same or different spin multiplicities at specific nuclear positions. The temporal evolution of nuclear positions, referred to as trajectory propagation, updates the nuclear coordinates and velocities according to classical dynamics; usually by integrating Newton's equation of motion. The current state for propagating the nuclear trajectory is determined stochastically via the coefficients of the electronic states. The square of these coefficients represents the electronic state populations, and their time derivatives indicate the tendency of non-adiabatic electronic transitions between states.36
The fewest switches surface hopping (FSSH) method assumes the least number of switches between two electronic states, meaning that electronic population transfer occurs primarily from one electronic state to another.40–42 The transition probabilities in FSSH dynamics depend on the non-adiabatic couplings (NACs) between electronic states with the same spin multiplicity (i.e., internal conversion),49 which describe the steepest changes in the wave function as nuclear motions occur. When intersystem crossing is considered, spin–orbit couplings (SOCs) are also required to account for hops between electronic states with different spin multiplicities (i.e., intersystem crossing).50,51 The NACs and SOCs connect the classical nuclear dynamics and electronic wave function, governing the temporal evolution of electronic populations over all considered states.
Traditional FSSH shows overcoherence between electronic states because the population transferred to an upper or lower state follows the gradients of the current state. Various approaches have been developed to address overcoherence, including augmented FSSH (A-FSSH), decoherence-induced surface hopping (DISH), methods based on the decay of mixing (DM), and overlap decoherence correction (ODC), among others.52 All these methods could be used to generate training data for ML. However, due to its simplicity and effectiveness, the most commonly used approach is based on the simplified decay of mixing, which dampens the coefficients of the inactive states at each time step. Overcoherence is usually accounted for in NAMD programs and approximations such as the ones proposed by Persico and Granucci are applied.44,46,49,53
![]() | ||
Fig. 3 Checklist for data pre-processing in the context of ML-accelerated NAMD simulations: Selection of electronic structure method and selection or generation of training data. |
Single-reference methods compute excited-state electronic structures using the ground state as a reference. Common approaches include time-dependent density functional theory (TDDFT)60 and its spin-flip variant (SF-TDDFT),61–63 coupled-cluster methods such as approximate second-order coupled-cluster (CC2),58,64 and nth-order algebraic diagrammatic construction methods like ADC(2), ADC(3), ADC(2)-x, SOS-ADC(2)-x.65 Among these, (spin-flip) TDDFT and ADC(2) are the most widely used single-reference methods for TSH simulations.29,59 However, these methods must be used with caution, particularly in TSH simulations, as they often fail to accurately describe conical intersections between the ground and excited states.66,67 However, they are computationally efficient and accurate for dynamics within excited-state potentials. Especially when multiple states (more than 3 or 4) are considered, they can be beneficial compared to state-average-based multi-reference methods. This issue arises frequently in photochemical reaction pathways, where reactions may involve doubly excited states and proceed through S1/S0 crossing regions. In such cases, the common practice for propagating TSH trajectories is to continue the simulation until a crossing region is encountered, typically defined by an energy gap of less than 0.1 eV between states.44 At this point, one of the following actions is usually taken: (i) stopping the simulation, or (ii) assuming an instantaneous transition to the ground state.68 However, the assumption of an instant transition should be used with caution, as it may lead to a significant overestimation of decay rates.68 To address these challenges, the spin-flip technique (e.g., SF-TDDFT)61,63 was introduced to better describe the potential energy surfaces around conical intersections. While this method improves the accuracy of simulations in these regions, it is still susceptible to spin contamination, which requires careful monitoring throughout the simulation. Alternatively, methods such as mixed-reference spin-flip TDDFT (MRSF-TDDFT)69 and spin-restricted ensemble-referenced Kohn–Sham (REKS)70 can also handle conical intersections but at the cost of the simplicity associated with single-reference methods. Among these, MRSF-TDDFT has shown particular promise by addressing the spin contamination issues of SF-TDDFT, and it has been successfully applied in non-adiabatic dynamics simulations using the fewest-switches surface hopping (FSSH) algorithm.71
In contrast, multi-reference methods are essential for accurately describing regions of degeneracy and conical intersections, as they explicitly account for strong electron correlation effects. Several popular methods rely on the complete active space self-consistent field (CASSCF) approach and build upon the CASSCF reference. For example, complete active space second-order perturbation theory (CASPT2)72 adds second-order perturbative corrections (i.e., dynamical correlation) to the CASSCF reference. Other multireference methods, such as n-electron valence state perturbation theory (NEVPT2)73 and multireference configuration interaction (MRCI),74–76 also extend the capabilities of CASSCF to better describe excited states. Additionally, variants of CASPT2, such as extended multistate (XMS),77 dynamically weighted (XDW),78 and rotated multistate (RMS),79 provide improved energy corrections, particularly near state-crossing regions. Despite their advantages, these methods cannot be used as black-box methods, and running these calculations is limited due to the need for selecting active spaces capable of describing all steps in the dynamics. Intruder states can enter the active space, affecting energy conservation. These issues make excited-state dynamics with MC calculations extremely challenging. Recent works show that the adaptive sampling configuration interaction (ASCI) method can expand the active space size beyond 50 electrons and 50 orbitals.80,81 The costs of MC calculations are only affordable for propagating TSH trajectories of small molecules.82,83 The multiconfiguration pair-density functional theory (MCPDFT) combines CASSCF for static correlations and DFT for dynamical correlations.84,85 It shows comparable accuracy to the CASPT2 method at the cost of CASSCF calculations, which offers another choice for obtaining TSH trajectory data.86 In some cases, neither SC nor MC methods can propagate the TSH trajectories correctly and a combination of QM methods has to be employed.87,88 An alternative strategy to learn the energies, forces and NACs, is the one adopted by Booth et al.,89 which builds upon the concept of eigenvector continuation. This approach allows for the interpolation of many-body wavefunctions and the analytical evaluation of forces and NACs. The method was applied to the surface hopping dynamics of a set of linear hydrogen chains within an active learning protocol. Simulations for the longest chain, H28, required only 22 DMRG training calculations. The method can also be applied using other correlated electronic structure techniques.
Even when comprehensive electronic structure data are available, such as the electronic properties of a set of key geometries (cf. Section 3.2.1) or preliminary data from a few short NAMD trajectories (cf. Section 3.2.2), an ML model trained on this data may not be capable of accurately predicting the electronic properties of unseen molecular conformations.24 This includes conformations that are only visited during extended, long-time-scale NAMD simulations. However, the ability of the model to generalize to such unseen conformations is essential for performing ML-accelerated NAMD simulations over longer time scales.
To overcome this limitation, it is often necessary to construct databases that not only incorporate typical data points sampled via static and dynamic methods grounded in chemical intuition, but also to supplement these with additional data points (conformational structures) that effectively map the potential energy surface. Ideally, the inclusion of new data should focus on points that significantly differ from the existing ones, thereby enhancing the predictive capability of models.
In this section, we provide a brief overview of traditional approaches for sampling data points in computational photochemistry, both through static and dynamic methods (Sections 3.2.1 and 3.2.2), introduce the key principle of active learning for diversifying datasets (Section 3.2.3) and conclude with hybrid approaches (Section 3.2.4), emphasizing established best practices.
The identification of stationary points on PESs is guided by chemical intuition, utilizing well-established optimization methods like the global optimizer algorithm (GOAT),90 conformer–rotamer ensemble sampling tool (CREST)91,92 to localize global and local minimum geometries. To locate transition states in both ground and excited states, the pysisyphus software package93 can be employed. It supports traditional methods, including intrinsic reaction coordinates and chain-of-states approaches, such as the Nudged Elastic Band (NEB) and String Method (SM), to locate transition states. Minima in the conical intersection seam between two electronic states can be identified using standard geometry optimization algorithms. For these cases, single-reference methods, such as spin-flip time-dependent density functional theory (SF-TDDFT), or multireference methods are recommended for better accuracy (see also Section 3.1 and ref. 94–98). Once stationary points are located, additional static data points can be sampled through geometry interpolation techniques that connect these stationary points. This approach has been successfully applied in the construction of excited-state databases like WS22 (ref. 99) and XAlkeneDB24 (see cross-marks in Fig. 4a) or in ref. 22.
![]() | ||
Fig. 4 Schematic illustration of a 2D-PES of 2-butene (shown for the ground state, S0, with excited-state minima indicated), defined along the rotation around and stretching of the central alkene bond (data is taken from ref. 24). The figure highlights key steps in constructing training databases for excited-state ML potentials. Static sampling approaches (a) include Wigner sampling around local minima (circle symbols), optimization of conical intersections (S01, square symbol), and interpolation between minima (cross symbols). Surface hopping trajectories (b) expand the sampled configurational space (particularly for the E-isomer), while active learning (c) identifies additional critical geometries in previously unexplored regions. |
Another important step is the generation of initial conditions, which serve as the foundation for performing subsequent NAMD simulations. Beyond defining positions and momenta, these conditions should also capture quantum effects in semi-classical simulations. Most NAMD studies assume vertical excitation by ultrashort pulses without explicitly modelling laser-molecule interactions.100 A widely used approach is the nuclear ensemble approach (NEA), where a distribution of ground-state geometries serves as the basis for generating an excited-state ensemble.101,102 If a molecule has multiple thermally accessible local minima in the ground state, such as conformers, the ensemble is typically constructed by sampling geometries from each conformer according to their Boltzmann-weighted distribution.87 The distribution of ground-state geometries itself is generally obtained from a local minimum or a slightly displaced structure using Wigner sampling or molecular dynamics (MD) simulations. While Wigner sampling accounts for vibrational wavefunctions in position and momentum space, it struggles with anharmonic low-frequency modes (<500 cm−1). Moreover, Wigner sampling is limited to the configurations immediately related to equilibrium geometries. In contrast, classical MD sampling generates a Boltzmann-like ensemble at room temperature but often underestimates the total energy relative to the zero-point vibrational energy (ZPVE).103 This limitation can be mitigated by propagating dynamics at higher temperatures or applying a quantum thermostat to broaden the classical distribution.100,104,105 Alternatively, nuclear quantum effects can be incorporated through path integral methods.106–108
Since trajectory based NAMD simulations are stochastic, a larger number of trajectories is necessary to accurately capture the dynamics and inherent variability. The required number of trajectories can vary depending on factors such as system complexity, excited-state reactivity, and the specific experimental data being compared. In previous studies, typically between 50–100 trajectories (performed at the electronic structure level of theory) were used to generate training data.17,27,109–111 This number of trajectories was found to provide adequate conformational information for learning the ground- and excited-state PESs.17,27,109–111 However, propagating such a set of ab initio NAMD trajectories can be prohibitively expensive for training. Thus, it is usually more efficient to run additional trajectories within the ML model itself, rather than generating a large volume of trajectory data directly using electronic structure theory.17,24,27
However, underrepresentation of regions with small inter-state energy gaps in trajectory data presents a challenge.17,110 State crossing events typically occur at these small energy gaps, which can lead to insufficient sampling of critical regions. Therefore, we recommend including a limited number of trajectories or trajectory snapshots in the training data. However, the key to generating accurate and reliable datasets lies in the choice of the electronic structure method, coupled with additional sampling of geometries near small energy gaps. This strategy ensures that critical regions, such as conical intersections (see Section 3.2.1), are adequately captured. Recently, an active learning strategy driven by inter-state energy gaps has further emphasized the importance of targeted sampling in these degeneracy regions112 (see the subsequent Section 3.2.3).
Notably, a recent study compared the test errors of random and trajectory-based data splitting (see Fig. 5avs.5b).27 While random split led to improved test performance, it underestimated the true error. This and other studies have highlighted the benefits of including trajectory snapshots rather than full trajectories.22,87,109–112
![]() | ||
Fig. 5 Strategies for partitioning data from NAMD simulations into training, validation, and test sets, illustrated with 5 example trajectories of varying lengths. Random split (a): trajectory frames are pooled and randomly allocated to one of the three subsets. Split by trajectory (b): all frames from a single trajectory are assigned to the same subset. The figures is adapted from ref. 27. |
To overcome the limitations of manually curated training sets, active learning (AL) is frequently employed to iteratively refine and optimize the data – selecting only the most informative points to improve model generality and efficiency.17,114,115 In essence, AL involves identifying and sampling regions of configurational space where the model is uncertain, computing their ab initio properties, and retraining the model accordingly.116
Determining which configurations require recalculation depends on assessing predictive uncertainty. While uncertainty can, in principle, be estimated based on structural descriptors – such as determining whether a geometry's Coulomb matrix falls outside the 95% confidence interval of those in the training set.117 Most approaches in the excited-state community rely on property-based uncertainty quantification. While some models, like Gaussian processes, offer built-in uncertainty estimates, ensemble-based approaches, such as multiple independently trained neural networks, are commonly used for models that lack this feature, particularly neural networks.7,21 Among these, the widely adopted strategy of query-by-committee leverages at least two separately trained neural networks to evaluate disagreement in predicted molecular properties (e.g., energies, forces, dipole moments, or NACs) during NAMD simulations.17,112,114 High discrepancies between models signal that a geometry likely resides in an undersampled or poorly learned region of the potential energy surface.
To quantify such disagreement, uncertainty quantification (UQ) metrics are important to assess the reliability and interpretability of models.118 Early work by Behler et al.119 established key principles for assessing predictive uncertainty in atomistic neural networks (NNs). Building on this, Musil et al.120 demonstrated the use of ensembles combined with resampling techniques to improve uncertainty estimates in machine-learned potentials. Comprehensive reviews on UQ for NNs can be found in ref. 121 and 122, covering a wide range of methodologies from Bayesian inference to ensemble methods. For Gaussian process regression (GPR), which naturally provides predictive variance, Deringer et al.123 offer an extensive review. More recent advances include novel UQ frameworks for NNs by Kellner et al.,124 which integrate uncertainty estimates with improved computational efficiency. Additionally, Ceriotti and co-workers have introduced computationally inexpensive UQ methods125,126 that are especially suited for high-throughput molecular simulations.
In the excited-state context, UQ measures like absolute errors112 or root mean squared errors (RMSEs)17,114 are applied. Once the model has undergone an initial training phase, it is used to sample new geometries. If the UQ value at a given configuration exceeds a predefined threshold, the simulation is paused, the configuration is recalculated using the reference electronic structure method, and the resulting data is added to the training set. Retraining follows, and the process is repeated until the model consistently performs across all relevant regions of the PES. A schematic of this iterative AL workflow is provided in Fig. 6.
While this iterative process has been successful in improving ML potentials for excited-state simulations,17,87,114 the effectiveness of AL depends on many interconnected parameters. Researchers must make critical choices regarding the initial data, the ML model (including descriptors and regression algorithms; see Section 4), simulation conditions, UQ method, and thresholds – all of which can significantly influence the final model quality.
The thresholds used to trigger reference calculations in AL loops are highly system- and model-dependent and often refined through empirical testing, whereas Hu et al.117 provide a method for an automatic statistically justified choice of thresholds. Energy-based thresholds typically range from 0.03 to 0.04 hartree, aligning with typical gap values where non-adiabatic transitions occur.127 Different studies have adapted this general strategy to the specifics of their systems. For example, in the PyRAI2MD (see Section 4.3) study on [3]-ladderdienes undergoing [2 + 2]-photocycloadditions, uncertainty was quantified via the standard deviation of model predictions, with thresholds set to 0.043 hartree and 0.25 hartree/bohr for energies and forces, respectively. In the SchNarc workflow (also Section 4.3) for the small cation CH2NH2+, a more diverse set of thresholds was applied: 0.03 hartree for energies (reduced iteratively by a factor of 0.95), 0.5 debye for dipole moments (kept constant), and 0.25 for NACs, which was temporarily raised to 3.0 to enhance sampling near conical intersections.17
A more recent study targeting azobenzene and pyrene derivatives further refined this approach by integrating query-by-committee with gap-driven molecular dynamics (gapMD), a strategy designed to focus sampling around conical intersections.112 Trajectories were steered toward small-gap regions (≤0.03 hartree), after which dynamics proceeded on either the upper or lower surface to ensure meaningful sampling and avoid dissociative pathways. The associated AL loop employed a multi-state neural network (MS-ANI, see Section 4.3) trained on both energies and gradients. Retraining was triggered by two conditions: negative predicted energy gaps, which indicate incorrect state ordering, and elevated uncertainty values, defined as the absolute difference between predictions from a main model (trained on energies and gradients) and an auxiliary model (trained on energies alone).
Recent work has demonstrated that a combined sampling strategy yields the most reliable and compact training data.22,24,99,112 Specifically, integrating Wigner sampling (to cover thermal fluctuations of reactants and products), geometrical interpolation (between optimized structures of reactants, minimum energy crossing points, and products), and short-time quantum chemical trajectories (to access relevant non-equilibrium conformations) represents current best practice. This hybrid approach efficiently captures the necessary configurational diversity for accurate ML-accelerated NAMD simulations.
We should note that the energy data are a series of scalar values. In contrast, the gradient data are vectorial data stored in 2D arrays of N nuclei and three Cartesian coordinates or flattened 1D arrays of 3N values. While the energy is invariant under permutation, translation and rotation, the gradient data are rotationally covariant (i.e., depending on the molecular orientation). While initial efforts addressed this problem by learning gradients as the first derivative of the energy with respect to nuclear coordinates, equivariant models have since become the state-of-the-art.128–130 These models enable the direct prediction of vectorial properties such as forces. However, it remains important to ensure that such direct predictions are consistent with the corresponding energy derivatives.131
Fitting couplings can be done by choosing an initial structure as a reference and correcting the phase for all other structures.17 This procedure requires computing overlap matrices between the initial structure and any other structure in the training set. Substantial overlap (>0.5 for each electronic state) can identify whether phase change has occurred by assessing the sign of the overlap. Negative overlap values suggest that a phase change has occurred; in that case the property of the new structure must be multiplied by −1 for the given state. The same procedure must be carried out for the other contributing state (i.e., any inter-state property has to be multiplied by −1 or +1 twice) because couplings are related to two electronic states. When the differences between two molecular structures are significant, the overlap could be too small to determine the phase factor. In this case, geometry interpolation between the two structures must be done to determine the correct phase.17
The learning of NACs poses significant challenges, primarily due to two intrinsic properties: (i) NACs exhibit a strong dependence on the inverse of the energy gap between coupled electronic states, which introduces singularities near conical intersections or avoided crossings and leads to highly non-smooth behavior; (ii) the arbitrary phase of electronic wavefunctions results in discontinuous NACs as a function of nuclear geometry. In the following, we outline best practices to mitigate these challenges, focusing on techniques for smoothing NACs and correcting for random phase behavior.
• Smoothed NACs. NACs between the two electronic states of same spin multiplicity i and j are formally given by
![]() | (1) |
![]() | ||
Fig. 8 Conical intersection between two states and corresponding non-adiabatic couplings (reddish, dashed lines) including smooth couplings multiplied by the energy gap (blueish, dashed lines). Reproduced from ref. 134 under CC-BY 4.0. |
To address this, it is advantageous to train ML models on the smooth, gap-independent numerator of eqn (1), rather than on the full NACs. This approach yields continuous learning targets across the PES17 (see blue dashed line in Fig. 8). Full NACs can then be reconstructed on the fly by dividing the learned quantities by the ML-predicted energy gaps, resulting in smoothed couplings that are more amenable to ML.135
• Phase-corrected NACs. The wavefunction phase is randomly initialized at different nuclear positions. However, preprocessing the NAC data for ML requires that the NACs across all trajectories have the same phase. Phase correction of the couplings is typically performed by computing their overlap between two structures at consecutive time steps.17,19 These complexities necessitate additional steps for rigorous data preprocessing, enabling the models to handle double-valued properties effectively.135,136 A recently developed methodology by Richardson136 can also deal with double-valued properties by means of phase-correction performed within the loss function of a model. This is described in more detail in Section 4.5.
The so-called Baeck–An approximation137,138 is one approximation from which various formulas have been derived for the evaluation of NACs.139,140 For instance, Westermayr et al. used the Hessian of the squared energy gap potential and the gradient difference vectors, defining the branching space, to approximate NACs using ML.87,135,141 Baeck-An couplings, however, are only accurate in regions of the configuration space where two PESs are nearly degenerate. Thus, it is suggested to set the couplings to 0, when two PESs have a ΔE > 0.5 eV.19,51 Alternatively, for a two-state problem the Landau–Zener scheme142,143 can be used to compute hopping probabilities. Landau–Zener requires the evaluation of the energy gaps and their time derivatives, obtained by finite differences along the trajectories, and originally applies to internal conversion, but was extended to account for intersystem crossing. The results from NAMD with Landau–Zener agree well for small to medium organic systems.144 A similar approach relying only on the PES shape is the Zhu–Nakamura theory of surface hopping,145 which uses energies and gradients of two crossing states for the hopping probability calculations. The forces are diabatized in a generalized 1D model based on three-point interpolation.146 This method agrees well with the fewest switches surface hopping and applies to intersystem crossing.146–149
SOCs are known to substantially increase in molecules with heavy atoms (e.g., S, Br). They introduce important off-diagonal elements to the electronic Hamiltonian, significantly altering the topology of PESs near the intersystem crossing points. Therefore, we must diagonalize the electronic Hamiltonian with SOCs to obtain PESs in a spin-adiabatic representation. It should be noted, that the SOCs lift the degenerate triplet states into different spin states (z = 0, ±1) in spin-adiabatic representation, which expands the total number of states, including all spin components of a multiplet state.
Learning SOCs in the spin-diabatic representation only requires computing the norm of SOCs without additional preprocessing. The spin-diabatic SOCs can be fitted as scalar values, similar to electronic energies (see Section 3.3.1). However, learning SOCs in the spin-adiabatic representation requires explicit calculations of the interstate couplings between the spin-adiabatic states. Such interstate couplings have the same nature as the NACs, where phase corrections are needed, and can be trained similarly to NACs (see Section 4.5).
Conical intersections are incredibly challenging for surface fitting because of the discontinuous potentials near the seam. And, of course, NACs are singular at conical intersections. It is thus advantageous for potential fitting to switch to a different basis to enable smooth evolution along geometrical coordinates with a geometry-dependent transformation matrix T(R):
![]() | (2) |
U(R) = TT(R)V(R)T(R) | (3) |
The diabatic basis is convenient for ML because of its smoothness. The recommendation is to switch to a diabatic basis whenever possible for a given system. Some quantum chemistry codes, such as Molpro,172 can provide quasi-diabatic energies. The Quantics package can also deliver diabatic energies and couplings using propagation diabatization.173 Even for surface hopping, which has been shown to work better in the adiabatic basis,40,44 the ML training can be more efficiently performed in the diabatic basis and diagonalized to obtain the adiabatic basis. Interestingly, a new approach for smooth fitting of coupled surfaces avoiding diabatization has been proposed recently.174–177 Instead of the energies in either adiabatic or diabatic basis, coordinate-dependent coefficients of the characteristic polynomial are fitted. Even though it is so convenient, obtaining a diabatic representation is usually difficult so this task is often left to cases where quantum dynamics is needed. The fitting of adiabatic energies and properties for trajectory surface hopping has proved to be feasible, barring non-smooth cusps. The dynamics are reliable, because hopping events occur near conical intersections but not exactly on the seam. This simplifies the problem, since the exact seam is not reproduced as accurately as the rest of the potentials. Moreover, diabatic states can be fitted implicitly while minimizing loss function based on diagonalized adiabatic states, thanks to the versatility of NNs.169 Very recently, a Deep Sets178 autoencoder has been incorporated into a MACE179 architecture in order to learn excited states in a permutationally invariant manner within X-MACE.26
f:Z,R → Ei,Fi,Cij | (4) |
Their successful development hinges on three pillars: access to high-quality reference data (Section 3), an appropriate molecular structure representation (Section 4.1), and a suitable regression model (Section 4.2). Together, these elements determine the model's accuracy, generalizability, and physical reliability. A checklist for surface fitting is provided in Fig. 10.
![]() | ||
Fig. 10 Checklist for surface fitting in the context of ML-accelerated NAMD simulations: selection, setup and evaluation of ML potentials for excited states. |
To be effective, excited-state ML potentials must satisfy several essential criteria. They must preserve key physical symmetries: energy predictions should be invariant to translation and permutation of identical atoms, and vectorial properties (e.g., forces, NACs, and transition dipoles) must transform equivariantly under rotation (Section 4.1). Differentiability is required for computing forces as energy gradients, constraining model architectures (see Section 4.5). Inference should be fast and accurate, with sufficient model flexibility to capture excited-state complexity (see Sections 4.5 and 3.3.3). Additionally, models should scale to larger systems – often achieved through atom-wise descriptors (see also Section 4.1) – and generalize to unseen chemical environments (see Section 4.7). While these criteria are the same as for fitting ground-state properties, excited states introduce new problems that need to be considered and are discussed in the following sections. We begin by introducing the two major design components of ML potentials: molecular structure representations (Section 4.1) and regression models (Section 4.2). Next, we discuss different approaches for treating the manifold of excited states, comparing single-versus multi-state approaches (Section 4.4). This is followed by a discussion of training and validation strategies (Section 4.5). We subsequently review existing excited-state ML potentials in Section 4.3, distinguishing between models with fixed versus learned representations and models using local versus global descriptors. Finally, we comment on the transferability of excited-state ML potentials (Section 4.7).
While reference data (Section 3) forms the foundation for training ML models, the choice of molecular descriptor is equally crucial for performance. In the context of ML potentials, descriptors are typically categorized as global or local representations, which are discussed in Section 4.1. Equally important is the regression model, which maps descriptors to target properties. In Section 4.2, we introduce the main regression methods: kernel methods and neural networks.
To this end, molecular structure representations used in ML models must satisfy several key requirements. These include: (i) translational, rotational, and permutational invariance, ensuring that scalar properties such as energy remain unchanged under coordinate transformations or atom indexing; (ii) rotational equivariance for directional properties like forces, dipole moments, and non-adiabatic couplings (NACs), which must transform consistently with molecular orientation; (iii) smoothness and differentiability with respect to molecular geometry; (iv) a one-to-one (biunique) mapping to molecular structure; (v) transferability across chemical compound space; and (vi) computational efficiency.183 While symmetries can in principle be learned from data, explicitly encoding them into the descriptors significantly improves learning efficiency and model generalization. For instance, although atom sorting can be used to enforce permutational invariance, it often introduces discontinuities,184 whereas inherently invariant descriptors avoid this issue.185,186
Molecular representations can be broadly categorized as global, where the entire structure is encoded into a single descriptor, or local, where individual atoms are described based on their local environments. While early ML approaches often used global representations, recent developments – especially ground-state ML potentials – have focussed on local descriptors due to their scalability and ability to model systems of arbitrary size and composition, enhancing transferability and flexibility.
In the following, we briefly outline representative examples of both global and local descriptors employed in excited-state ML potentials. A summary of the key models and descriptors used in the excited-state community is provided in Fig. 11, while Section 4.3 offers an overview of representative software packages, focusing on their application domains, such as multi-state learning or the prediction of NACs, and highlighting how they have been used and evaluated in practice.
![]() | ||
Fig. 11 Overview of excited-state machine learning potentials, categorized into Kernel Methods and Neural Networks, based on both global and local descriptors (fixed and learned). Models handled by MLAtom are highlighted in italic font25 and are interfaced with Newton-X,46,47 such as MS-ANI.112 Models interfacing with the SHARC software suite include SchNarc,135 FieldSchNet,27 SPaiNN,24 X-MACE,26 and KRR-FCHL.190 Implementations interfacing to other MD drivers include PyRAI2MD22 and DANN.169 The scheme is adapted from ref. 191. |
• Global descriptors encode the entire molecular geometry into a single vector or matrix, enabling ML models to predict total energies directly as a function of all atomic coordinates E = f(Z,R), without partitioning into atomic contributions (local descriptors).
The most common and straightforward representations include the Coulomb matrix (CM),114,134 which captures nuclear charges and interatomic distances; bag of bonds (BoB),187 is based on the sorting of CM elements; the inverse distance descriptor, which simplifies CM by using only pairwise inverse distances; and the relative-to-equilibrium (RE) descriptor, which normalizes these distances by their equilibrium values.185 The latter descriptors are central to models like (s)GDML188 and KREG,185,189 which are commonly used ground-state ML potentials.
Global descriptors are typically complete and compatible with standard regression techniques, allowing them to capture all interactions regardless of atomic separation. However, they lack built-in permutational invariance and struggle with transferability to systems of varying size or composition, limiting their scalability. These limitations motivate the shift toward local descriptor approaches in larger or more diverse chemical systems.
• Local descriptors represent the chemical environment of each individual atom within a molecule and encode atom-wise information in vectors or matrices. These descriptors enable ML models to predict molecular properties by aggregating atomic contributions. For example, the total energy can be decomposed as a sum of atomic energies E = ∑iEi(Zi, Ri).
The representation of the molecular structure can be handled as fixed or learned during training with NNs (see example architectures in Fig. 11). Some of the popular state-of-the-art local and pre-defined (fixed) representations are e.g. the smooth overlap of atomic positions (SOAP),192 atom-centered symmetry functions (ACSF)193 and the Faber–Christensen–Huang–Lilienfeld (FCHL) representations.194 Examples, of learned representations involve SchNet,181 PaiNN130 or MACE.179
Additional hyperparameters must be optimized through a thorough hyperparameter search for fixed representations and require modest computational resources. The number of requisite hyperparameters is lower for learned representation, but training is more expensive. The training process refines the representation and requires additional computational resources for the model to learn this data pattern. Thus, the choice between fixed and learned representations depends on the task's requirements, available resources, and requisite model complexity.8
NNs currently present the state-of-the-art for NAMD as relatively large training samples are often required. In addition, NNs allow for simultaneous treatment of all involved electronic states, which is usually both more efficient and accurate. Techniques such as diabatization (see Section 3.3.3) can increase the accuracy of kernel methods as well. While rarely applied in quantum chemistry, NNs can be combined with KMs to get the best of both worlds.196,197
• Kernel methods (KMs). KMs use the Kernel trick, allowing linear regression algorithms to fit non-linear functions by implicitly transforming the input data into a higher-dimensional space.198 KRR is straightforward and popular in quantum chemistry.199,200 KMs represent a category of elementary ML methods that are relatively simple to implement; there are readily available frameworks and libraries. For instance, the scikit-learn Python library can be used to train employing an arbitrary kernel method in just a few lines of code.201
Within the KRR method, the properties of interest are modelled as a linear combination of the kernel functions k (i.e., a similarity functions) between the geometry x of interest and the training points xi, written as
![]() | (5) |
![]() | (6) |
Note, the performance of KMs depends on the molecular representation, that is, how we input the molecular structure into the ML model (see Section 4.1). Therefore, it might be advantageous to use one of the packages developed especially for chemical applications with several representations readily available, such as MLatom,199,203,204 GAP,205,206 or QML code.207 A special kernel-based method called symmetric gradient-domain machine learning (sGDML208) was designed especially for molecular dynamics.208–210 Kernel methods together with molecular representations are still being developed and we have seen accurate applications to ground-state MD.211
• Neural Networks (NNs). The fundamental unit of an artificial NN is a single neuron, which applies weights and bias to the inputs, and outputs the result after possibly applying a non-linear, so-called activation function. The neural units are connected in layers, linked through trainable parameters. A NN typically consists of an input layer, one or more hidden layers, and an output layer. The interconnected node architecture is thus analogous to the brain; each connection has adjustable parameters that allow the network to learn from prior information.116,212
Various NNs are widely used to learn ground-state electronic properties of molecular systems, particularly energies and gradients. Many advanced methods leverage the local nature of chemical interactions by representing structures on a per-atom basis. These atom-wise descriptors predict extensive properties, like total energy, as a sum of atomic contributions. This approach offers key benefits: linear scaling with system size, preservation of size-extensivity, and permutational invariance of energy. Examples include Behler–Parrinello NNs (BPNNs),213 ANI,214 and ACE215 for fixed, rotationally invariant descriptors, and PaiNN,130 Nequip,216 MACE,179 and So3krates217 for learned, rotationally equivariant descriptors (cf. right side in Fig. 11).
In those examples, users do not necessarily have to develop the ML codebase on their own to train and predict properties. SchNetPack218 is a prominent example, which includes several models like SchNet,181 PaiNN,130 SchNOrb,219 or FieldSchNet,220 the latter allowing the treatment of electric fields and thus environmental effects. It offers various modules to predict ground state properties (e.g., energy, forces, dipole moments). The default settings of these models suffice for most tasks, but additional parameter adjustments enable more complex tasks.20,24,135
• Kernel methods.25,134,221 In the area of Kernel methods, there are no chemistry-tailored codes that were primarily designed for excited-state dynamics. That means that they do not natively support the fitting of the coupling properties or multiple PESs at once. However, they can be easily utilized for training on individual states, especially when using curvature-driven NAMD such as Landau–Zener NAMD.
In one of the first approaches the inverse distance and FCHL molecular structure representation were employed in an NN and KRR approach.134 All models were shown to learn the relation between either energies and forces or NACs and the molecular structure, when the properties are treated separately from each other. However, it was found that training a single ML model to predict energies, forces, and NACs simultaneously reduces accuracy compared to learning each property separately. Including all properties in a shared loss function can even hinder the learning of individual properties due to conflicting learning signals.
Encoding of the energy levels (state numbers) in the representation was shown to improve results and make multiple outputs for KRR possible, while multi-outputs are achieved more straightforwardly for NNs. In both cases the modification of the representation to yield multi-state outputs was accompanied by an enlargement of the ML model, a larger kernel matrix in the case of KRR, and a larger input layer in the case of NNs.134
The MLatom package25,203,204 provides a suite of kernel- and NN based models, which have recently been extended to support the prediction of excited-state properties.112,221,222 These extensions involve constructing separate machine learning models for each electronic state of interest,112,221 enabling the prediction of state-specific quantities such as energies, gradients (and oscillator strengths). Supported kernel-based approaches include KREG, KRR with Coulomb matrix descriptors, sGDML, and GAP-SOAP. Very recently, Dral et al.221 utilized the relative-to-equilibrium (RE) representation185 within the MLatom framework to develop KRR-based KREG models capable of accurately predicting non-adiabatic coupling (NAC) vectors.
• PyRAI2MD.9 Another package that includes NNs for excited state properties is Python Rapid Artificial Intelligence Ab Initio Molecular Dynamics (PyRAI2MD).9 PyRAI2MD has enabled mechanistic understanding of complex photochemical organic transformations such as electrocyclizations, E/Z-isomerizations, and cycloadditions.16,180 It implements a fully-connected feed-forward NNs using inverse distance descriptors with multiple output nodes for predicting several excited-state energy and gradients at once. To avoid rapid growth of the inverse distance descriptors, it allows user to define the input distances between atoms pairs in their local atomic environments. It can also learn symmetric reaction pathways by defining the permutations of symmetric atoms, such as hexafluorobenzene.223
Besides the excited-state potential, PyRAI2MD provides a virtual potential model like SchNarc to predict the NAC vectors and a scalar output NNs for predicting the norm of SOCs for NAMD in the spin-diabatic representation. In addition, PyRAI2MD support both Zhu–Nakamura224 surface hopping and FSSH with curvature-driven time-dependent couplings, which run NAMD without using NACs. Recent updates of PyRAI2MD incorporated the NNs into the ONIOM approach with semi-empirical methods, like GFN2-xTB, which successfully revealed blocked non-radiative mechanisms in molecular aggregates225 and singlet fission mechanism in molecular crystals.226
• SchNarc20,135 & SpaiNN.24 There are two software packages, SchNarc20,135 and SpaiNN,24 that integrate SchNetPack 1.0 and 2.0, respectively, with SHARC 3.0 (ref. 44) and SHARC 4.0,45 enabling ML-accelerated surface hopping simulations. SchNarc, interfaced with SchNetPack 1.0, relies on the rotationally invariant SchNet181 descriptor and, therefore, cannot directly predict vectorial properties. To overcome this, SchNarc applies a trick: it first predicts a “virtual” property and then derives the vectorial property by taking its derivative with respect to nuclear coordinates. This method allows SchNarc to predict properties for multiple singlet or triplet states, transition dipole moments, and spin–orbit couplings (SOCs). SPaiNN, a follow-up to SchNarc, integrates with SchNetPack 2.0 and takes advantage of the rotationally equivariant PaiNN130 representation, enabling direct prediction of vectorial properties, including NACs. This direct approach improves prediction quality, as SPaiNN also combines scalar predictions, multiplying them with nuclear coordinates and adding the vectorial prediction to enhance the accuracy of the final NAC vectors.24
• DANN.169 Also building on the PaiNN130 architecture, Axelrod et al.169 have implemented the diabatic artificial neural network (DANN) model, to predict the quantum yields for the photoinduced E/Z-isomerization of azobenzene derivatives based on NAMD trajectories. This model implicitly incorporates diabatization into the NN architecture. This diabatization is used to ease the fitting of adiabatic states across chemical space. In particular, it addresses the issue of gap overestimation near conical intersections of unseen species.17,22,110,113,135
Since diabatic energies are smooth even at conical intersections (unlike adiabatic energies, which exhibit non-differentiable cusps), they are easier to approximate using ML models (see also Section 3.3.2). DANN enforces this smoothness through a specialized loss function related to the NAC vector. This loss penalizes the magnitude of the NAC vector after rotation from the adiabatic to the diabatic basis and includes contributions from the NAC forces and a phase-correction term for each geometry. In addition to standard energy and force loss terms, DANN includes a gap error loss and an application-specific term that discriminates between E- and Z-isomers of azobenzene. The latter is implemented using the root-mean-square deviation (RMSD) between the input geometry and its aligned equilibrium structure.
• FieldSchNarc27 and Field-MACE.227 FieldSchNarc,27 built upon FieldSchNet220 for ground-state simulations, and Field-MACE227 both enable the inclusion of environmental effects in ML-driven photodynamics simulations using a hybrid quantum mechanics/molecular mechanics (QM/MM) approach. In QM/MM simulations, the system is divided into two distinct regions: a QM region and an MM region. The QM region is treated using high-level quantum chemical methods or, in this context, ML potentials, while the MM region is described by classical force fields. This setup facilitates computationally efficient simulations of large systems or solvated molecules, preserving high accuracy where it matters most.
Specifically, FieldSchNarc incorporates environmental interactions by accepting additional inputs representing the electrostatic field derived from MM atom point charges. This allows the model to effectively respond to dynamic changes in the environment, accurately capturing electronic properties of the QM region influenced by adaptive MM surroundings. However, accurate ML potentials from FieldSchNarc require training data that include the same QM region embedded in diverse MM environments. In contrast, Field-MACE models environmental interactions using a multipole expansion integrated within the MACE179 and X-MACE26 architectures, tailored respectively for ground- and excited-state simulations. A notable advantage of Field-MACE is its ability to initialize parameters from foundational MACE models,228,229 significantly enhancing data efficiency. Nevertheless, a common drawback of employing either model type is the sparsity of QM/MM datasets, necessitating additional effort in data curation and preparation.27
• MS-ANI. The multi-state ANI (MS-ANI) architecture112 was recently introduced. MS-ANI employs separate NNs for each atom type (Zi), where atom-wise energy contributions are summed to yield the total molecular energy for a given state Ei. The adiabatic electronic state is included in the input for each element-wise NN. Notably, the number of electronic states is decoupled from the network architecture, allowing MS-ANI to handle an arbitrary number of states. This flexibility is achieved by incorporating the state index (i) into the model's input features alongside geometric descriptors. The state information is passed through all hidden and output layers of the network, enabling the model to distinguish between different states during training. This design not only allows electronic state information to propagate through the entire network, but also facilitates training on incomplete datasets—such as those containing molecules with varying numbers of labelled electronic states. The MS-ANI architecture was recently employed in the OMNI-P2x,222 a universal excited-state neural network potential that enabled real-time photodynamical simulations and rational design of the visible-light-absorbing azobenzene systems.222
• Excited-state MACE versions.26,177 Recently, MACE has been used to predict multiple potential energy surfaces and improves on the prediction of regions around conical intersections.177 The architecture is designed to reconstruct intersecting energy surfaces from value-sorted data using smooth invariants. It involves approximating elementary symmetric polynomials or related invariants, which remain smooth despite surface intersections causing cusps. The original surfaces are recovered by solving a polynomial root-finding problem through companion matrices, specifically Frobenius, Schmeisser (symmetric tridiagonal), or Chebyshev colleague matrices. Each method has distinct numerical properties, advantages, and sensitivity to perturbations, influencing their stability and accuracy. This method has been shown to improve accuracy in regions where electronic states are close to each other and has been tested in the prediction of valence and conduction bands of graphene and electronically excited states of organic molecules.
In addition, X-MACE26 combines the MACE framework with a DeepSets autoencoder, enabling smooth representation of inherently non-smooth excited-state surfaces. The model encodes adiabatic energies into permutationally invariant functions, reconstructing them through a Hermitian companion matrix decoder to ensure physically meaningful, real-valued eigenvalues. X-MACE significantly improves prediction accuracy for excited-state energies, forces, and non-adiabatic couplings compared to existing models, and notably allows transfer learning from foundational ground-state models to excited states, enhancing data efficiency and generalization to previously unseen molecular systems.
Early efforts predominantly employed single-state ML models, wherein independent models were trained for each electronic state.112,134 While conceptually straightforward, this approach inherently disregards correlations among electronic states, thereby limiting its efficacy in describing interstate couplings and correlated dynamics.
To address the limitations of single-state models, multi-output architectures have been developed.22,24,134,169 In this approach, a single neural network is trained to predict multiple potential energy surfaces (PESs) simultaneously. The model employs shared hidden layers and a final output layer with one neuron per electronic state.24 This shared representation allows the network to capture inter-state correlations implicitly, often leading to improved accuracy in surface hopping simulations compared to separate, state-specific models.134,135
However, despite these advantages, such models can still struggle in scenarios involving unseen species, particularly near conical intersections, where subtle differences in energy gaps are crucial. As a result, these models may overestimate energy gaps and fail to consistently deliver better predictions across all systems.17,22,110,113,135,169
More recently, multi-state models have been proposed, which incorporate the electronic state index (e.g., 0 for the ground state, 1 for the first excited state, etc.) as an additional input feature to the network.112,222 In this architecture, hidden and output neurons process state-specific information, enabling a unified treatment of all electronic states. This approach offers increased flexibility relative to multi-output models, as it allows for the seamless handling of an arbitrary number of states. Furthermore, it alleviates limitations associated with loss function scaling in multi-output architectures, which may affect the quality of predicted coupling properties.24
Another effective strategy for modeling excited-state potentials involves an internal ML-based diabatization scheme, where diagonalization is directly integrated within the model architecture. In this approach, the model predicts an internal ML-diabatic Hamiltonian matrix, which does not necessarily need to be physically meaningful and is subsequently diagonalized to obtain the adiabatic energies. The loss function explicitly considers these adiabatic energies. Several methods utilizing this strategy have been developed, consistently showing superior performance compared to models that directly predict adiabatic energies.87,169,230,231
Pre-processing of photodynamics data is typically cumbersome; accounting for the phase factor during training is advised. To avoid explicit phase alignment during preprocessing, phase-free (or phaseless) loss functions have been developed and are commonly used in training NNs potentials for excited-state properties.22,24,135,169
The key idea behind phase-free training is to incorporate phase correction directly into the loss function . One approach evaluates the loss for all possible sign combinations of the N coupled electronic states, i.e., 2N−1 permutations, and selects the combination with the lowest error relative to the target.135 However, this method scales exponentially as
, making it computationally expensive for systems with many states.
To address this, more efficient alternatives treat each coupling vector independently by evaluating only two sign options per coupling, multiplying it by +1 and −1, and selecting the one with the lower loss.22,24,169 This reduces the computational complexity to linear scaling, , while maintaining predictive accuracy.24
By selecting the sign combination that yields the lowest fitting error, this method effectively performs implicit phase correction, streamlining the training process without sacrificing performance.
Full quantum dynamics methods solve the time-dependent Schrödinger equation by propagating the complete wavepacket on a grid. The Multi-configuration Time-Dependent Hartree (MCTDH) method is an example of this approach.34 Unlike trajectory-based methods, however, global surfaces are needed. A diabatic representation is also required (Section 3.3.3). While the surfaces only need to be accurately described in the region of space occupied by the wavepacket, their delocalized nature means that it is sensitive to “holes” in the potential, i.e., regions where the potential drops to large negative energies due to poor fitting. This is a likely problem, if sufficient data is not used when learning surfaces, particularly at the boundaries of the space to be covered.
One can provide the data samples by using the surface hopping data. However, the classical nature of these trajectories may mean that they do not sample the full space needed to describe the wavepacket. Gaussian wavepacket methods, such as Multiple Spawning38 or vMCG,234 may provide the answer by collecting data along trajectories that can be related directly to the wavepacket motion. This is in particular for the case with vMCG where the trajectories are not classical and spread faster to cover the needed phase space. vMCG has been shown to provide the points for Shepard Interpolated surfaces during direct dynamics simulations173,235,236 and use more advanced sampling techniques in the GROW methodology.237 More recently, it has been used with Gaussian Process Regression to learn the potentials for MCTDH calculations.238 However, the latter simulations also showed the challenges as many points were required for accurate surfaces.
ML techniques can take a complicated multi-dimensional potential function and put it in the form required for efficient quantum dynamics simulations. An ongoing challenge is the high computational cost of multi-dimensional integrals; for efficiency the potential must be in a “sum-of-products” form. The sum-of-products formalism is written as a sum of terms comprising low-dimensional functions multiplied together. NNs are suitable for this,239,240 but have only been applied to small systems.
More and more research is directed towards post-processing (Section 5). Unlike surface hopping, which gives easily visualizable results with chemical structures moving along trajectories, wavepackets have all the information in a complex multi-dimensional function. Extracting trajectories of molecules evolving in time across excited- and ground-state PESs would help to create unprecedented knowledge of molecular excited states. Sampling and clustering techniques (see Sections 5.3 and 5.4) to find the significant regions are useful, but visualizing correlations is needed to demonstrate how wavepacket components evolve coherently.
Unlike classical force fields, widely used for ground-state dynamics, ML models for excited states often require retraining for each specific system. Global representations, though efficient for predicting properties relevant to NAMD, tend to generalize poorly across diverse chemical structures compared to local descriptors. Even state-of-the-art models frequently struggle to deliver accurate predictions when trained on a single system.21
While many ML architectures are in principle transferable, excited-state coupling properties, such as NACs, are highly sensitive and system-specific, further limiting generalization. Nonetheless, recent work has demonstrated promising progress: models trained on multiple molecules and conformations have been applied to UV/vis spectra prediction241 and excited-state dynamics.169
Notable are the OMNI-P2x model by Dral et al. (2025)222 and X-MACE by Westermayr and co-workers,26 which are the first universal ML potentials capable of predicting excited-state energies, forces, and other properties across a diverse set of organic molecules. The models are tested on out-of-the-box ML-driven NAMD simulations and allow for screening of derivatives of organic chromophores.
The section will be divided into four parts: prior considerations, dimensionality reduction, and clustering analyses of static data, which involves analyzing all the data simultaneously, considering each point individually, and dynamics analysis, which focuses on analyzing data as a time series. We will not delve into the analysis of individual trajectories, as comparisons to experimental data necessitate the examination of statistical averages rather than individual events.
An example Jupyter notebook for the analysis of trajectory data from excited-state calculations can be found at the following link: https://colab.research.google.com/drive/14W3zqhvSjHxUkSM_JrSbQiU_G6cSBmQK.
![]() | ||
Fig. 12 Checklist for post-processing in ML-accelerated NAMD simulations: data quality checks and analysis of NAMD results. |
The choice of features included, and the representation are crucial for the outcome of any ML method, as already emphasized in Section 4.1. For postprocessing analyses, this depends on – and is limited by – the NAMD data aspects, the research question, and the extent of knowledge about the chemical problem. Many possible representations exist for geometrical information. However, to maintain interpretability, intuitive representations and features are chosen, such as the pairwise distances,244 any combination of interatomic distances, angles, and dihedrals,245 based on the pairwise dissimilarity matrix between two configurations,246 or as normal mode coordinates representations.247,248
The data should be normalized or scaled to avoid biases from different descriptors. The possibilities depend on the nature of the data and include a mere shift of the mean to zero, min–max normalization (subtraction of the minimum followed by dividing by the maximum–minimum difference), z-score scaling (subtraction of mean followed by division by standard deviation), and the division by the respective quantities of a reference data point (e.g., global minimum structure).
Principal Component Analysis (PCA) is one of the most widely used linear dimensionality reduction methods. It transforms the original high-dimensional data into a lower-dimensional space by identifying the directions of maximum variance in the data. The obtained set of uncorrelated variables is known as the principal components. The input to PCA is standardized descriptors that are pre-defined geometric and or property-based quantities that represent the molecular system under investigation. The covariance matrix of the input features is computed to capture the correlation between the variables. This matrix is decomposed into its eigenvectors and eigenvalues, where the eigenvectors with the largest eigenvalues correspond to the principal components containing most of the data's variance. These principal components determine the new coordinate system for the transformed data. The first few main components will ideally include most of the variance in the data, enabling dimensionality reduction to visualize the data in a few dimensions. This is most effective on jointly Gaussian-distributed data because no correlation between components implies independence.249
PCA enables the interpretation of the transformed features by investigating how much each pre-defined descriptor contributes to each principal component, revealing the relative importance of each feature. Kernel PCA is an extension of PCA that allows for nonlinear dimensionality reduction. It uses a kernel trick technique to implicitly map the data into a higher-dimensional feature space, where linear PCA is performed. By utilizing a nonlinear mapping, kernel PCA can capture more complex and nonlinear patterns in the data.250
Multidimensional scaling (MDS) is another common category of nonlinear dimensionality reduction methods, especially useful for visualisation.251 MDS encompasses several algorithms reconstructing a low-dimensional spatial representation while preserving given pairwise distances or dissimilarities among data. Unlike other methods, MDS can handle any dissimilarity measure, making it applicable to a broad range of data types in chemistry, such as bond distances, angles, or torsional angles. The primary goal of MDS is to create a map with coordinates in a lower-dimensional space (usually 2 or 3 dimensions) that reflects the pairwise dissimilarities among the data points. These coordinates place each data point in the new space in such a way that the Euclidean distance between them closely matches the original dissimilarity measure.
The algorithms most frequently applied are classical MDS, metric MDS, and non-metric MDS. Classical MDS is particularly useful when the original dissimilarities are based on metric measurements, such as bond lengths or angles. Metric MDS is designed to handle metric dissimilarities explicitly, ensuring that the distances in the reconstructed map satisfy the properties of a metric space. It is a valuable approach when the dissimilarity measure is derived from real distances with a consistent scale. Non-metric MDS can accommodate non-metric dissimilarities, which may not satisfy the triangle inequality. Non-metric MDS is often employed when direct metric measurements are unavailable and only ordinal relationships between molecules are known.252
Another nonlinear dimensionality reduction technique is isomap, which focuses on preserving the intrinsic geometry and manifold structure of the data. It leverages the concept of geodesic distances, measuring the shortest path along the manifold between data points. The critical steps in isomap include the construction of neighborhood graphs, computing the geodesic distances between data points connected via the neighborhood graph, and embedding the data into a lower-dimensional space. The latter transformation uses MDS to preserve the geodesic distances as much as possible. Isomap is particularly useful for datasets with nontrivial geometric or topological properties that are difficult to analyze with linear methods.253,254
Other popular nonlinear dimensionality reduction techniques are t-SNE (t-Distributed Stochastic Neighbor Embedding) and UMAP (Uniform Manifold Approximation and Projection). t-SNE captures local structure and similarities between data points by constructing a probability distribution in the high-dimensional space, thus representing pairwise similarities between data points and a corresponding distribution in the low-dimensional space. It is especially effective at preserving local structures, enabling cluster visualization for identifying patterns within complex molecular datasets.255
UMAP is another nonlinear technique that focuses on preserving the global structure and connectivity of the data. It constructs a high-dimensional topological representation of the data and then optimizes a low-dimensional representation that preserves the topological relationships. UMAP is based on manifold learning and is particularly useful for capturing local and global structures in high-dimensional data. It can handle large datasets more efficiently than t-SNE while producing similar embedding results.249
K-means clustering partitions the data into a predefined number of clusters (K) by minimizing the within-cluster sum of squares. K-means clustering requires a distance metric and is suitable for continuous variables. It assumes clusters of similar sizes and spherical shapes.259
Hierarchical clustering, as the name implies, creates a hierarchical structure of clusters by merging or splitting clusters based on their similarity or dissimilarity. It does not require prior specification of the number of clusters and can handle both continuous and categorical variables.201
Density-Based Spatial Clustering of Applications with Noise (DBSCAN) identifies clusters based on regions of high density separated by regions of low density. It is suitable for data with irregularly shaped clusters and effectively handles noise and outliers.260
Gaussian Mixture Models (GMM) are useful when the number of clusters is difficult to determine. While it starts with an initial guess of the number of clusters, it provides suggestions for a suitable number. GMM assumes that the data points are generated from a mixture of Gaussian distributions. It can estimate the distribution parameters and assign probabilities to data points belonging to each cluster.261
We recommend exploring the scikit-learn library for more clustering techniques, which provides a comprehensive overview of different clustering methods and their use cases.201 Note that experimenting with various methods of clustering or parameter settings can be highly valuable because there is usually no general solution. Still, many processes might lead to similar results or can provide complementary information. Selecting a clustering method may require iterative refinement based on the data's specific characteristics and the desired outcome.
The distinction can be made between extrinsic and intrinsic measures. Extrinsic measures rely on externally provided labels or known class assignments to evaluate clustering performance. Thus, they are often not practicable when clustering dynamics data of which no labels are accessible before analysis. Examples of extrinsic measures are the rand index (measuring the similarity between two clusters by comparing pairs of samples), mutual information (quantifying mutual dependence between clusters), V-measure (assessing homogeneity and completeness of clusters), or Folkes-Mallows score (determining the similarity between two clusters by comparing geometric means).
More practicable for our use case are intrinsic measures that assess clustering quality based solely on the characteristics of the clustering itself and the underlying data without requiring any external labels. They thus aim to capture how well the data points within each cluster are grouped and how distinct the clusters are from one another. Intrinsic measures include the Silhouette coefficient, Calinski–Harabasz index, and Davies–Bouldin index.
The Silhouette coefficient or Silhouette score evaluates the performance based on the distance of data points within clusters a (average intra-cluster distance) and the distance of different clusters b (average inter-cluster distance),262 which reads as
![]() | (7) |
The Calinski–Harabasz index uses the ratio of between-cluster variance to within-cluster variance. The higher the index, the better the clusters are defined. The Davies–Bouldin score measures the average similarity of a cluster with respect to its most similar cluster. Lower values indicate better clustering performance.201,263
Exploring molecular interactions and connectivity patterns can be achieved with network visualization techniques such as Cytoscape.264 Molecular data are represented as a network or graph, where molecules are nodes, and their relationships are edges. Clusters can be visualized as subgraphs or communities within the larger network.
Besides these visualization techniques, additionally inspecting the clusters visually is often inevitable. Therefore, cluster centroids can be computed and serve as representatives of each cluster, reducing the complexity and time required for the visual inspection.87,201
Time plots are probably the most intuitive approach because they demonstrate how molecular properties evolve throughout the simulation. Plotting molecular properties against time allows for identifying trends, fluctuations, and potential patterns or events in the system. Autocorrelation analysis can reveal the presence of correlations or dependencies in molecular data over different time lags. It helps identify any time-dependent patterns or autocorrelated behaviour within the system. Fourier Transform applied to molecular dynamics data can identify dominant frequencies or periodicities in molecular motions. It helps uncover characteristic vibrational modes, collective motions, or oscillatory behaviour within the system.
Tslearn is a Python library designed for time series analysis.265 It provides many tools and algorithms for analysing, modelling, and visualizing time series data. Some of the key features and techniques offered by tslearn include time series clustering and time series classification, which enable clustering and classification of time series into predefined classes or categories, respectively, time series forecasting, or time series distance metrics to allow for quantification of similarities and dissimilarities between time series sequences.
A major challenge that remains to be solved is transferability. Notably, the number and character of electronic states within a certain energy range can vary dramatically even between seemingly similar molecules. ML offers the possibility to carry out many hundreds of NAMD trajectories that are needed to achieve statistically significant results. Therefore, active learning techniques are often indispensable when carrying out ML-driven NAMD. In addition, the high amount of data produced by ML-driven NAMD will likely require other advanced ML-based methods for postprocessing and analysis in the future.
This journal is © The Royal Society of Chemistry 2025 |