Machine learning enables long time scale molecular photodynamics simulations

Machine learning enables excited-state molecular dynamics simulations including nonadiabatic couplings on nanosecond time scales.


Abstract:
Photo-induced processes are fundamental in nature, but accurate simulations are seriously limited by the cost of the underlying quantum chemical calculations, hampering their application for long time scales. Here we introduce a method based on machine learning to overcome this bottleneck and enable accurate photodynamics on nanosecond time scales, which are otherwise out of reach with contemporary approaches. Instead of expensive quantum chemistry during molecular dynamics simulations, we use deep neural networks to learn the relationship between a molecular geometry and its high-dimensional electronic properties. As an example, the time evolution of the methylenimmonium cation for one nanosecond is used to demonstrate that machine learning algorithms can outperform standard excited-state molecular dynamics approaches in their computational efficiency while delivering the same accuracy.

Introduction
Machine learning (ML) is revolutionizing the most diverse domains, like image recognition [1], playing board games [2], or society integration of refugees [3]. Also in chemistry, an increasing range of applications is being tackled with ML, for example, the design and discovery of new molecules and materials [4,5,6]. In the present study, we show how ML enables efficient photodynamics simulations. Photodynamics is the study of photo-induced processes that occur after a molecule is exposed to light. Photosynthesis or DNA photodamage leading to skin cancer are only two examples of phenomena that involve molecules interacting with light [7,8,9,10,11]. The simulation of such processes has been key to learn structure-dynamicsfunction relationships that can be used to guide the design of photonic materials, such as photosensitive drugs [12], photocatalysts [4] and photovoltaics [13,14]. Computer simulations of photodynamics typically rely on molecular dynamics simulations of coupled nuclei and electrons. These simulations require the computation of high-dimensional potential energy surfaces (PESs), i.e., the electronic energy levels of the molecule for all possible molecular configurations, using quantum chemistry. The calculation of these PESs is usually the most expensive part of the dynamics simulations [15] and therefore, different approximations are necessary and ubiquitous. For the lowest electronic quantum level, the electronic ground state, the time-consuming quantum chemical calculations are often replaced with force fields [16] but no force fields are available to describe electronically excited states. Another drawback of most conventional force fields is their inability to describe the breaking and formation of chemical bonds. Recently, increasing effort has been devoted to ML potentials [17], where an accurate representation of the ground state PES including bond breaking and formation is promised [16,18,19,20,21,22]. However, the problem of obtaining accurate full dimensional PESs for excited states and simulating long time photodynamics has not been solved yet. A few studies included kernel-ridge regression to simulate nonadiabatic molecular dynamics, but omitted critical regions of the excited state PES [23,24], so that intermittent quantum chemical calculations were still needed. One such critical region is where two PESs get into close proximity. For two electronic states of the same (or different) spin-multiplicity, those regions are called conical intersections (or simply state crossings). Importantly, in these regions, non-radiative transitions between different electronic states occur involving ultrafast rearrangements of both nuclei and electrons. Such transitions are induced by different kinds of couplings, including nonadiabatic couplings or spin-orbit couplings. These couplings are fundamental to model photodynamics [25] but pose a real challenge for ML [23,24]. Here we overcome all these different bottlenecks and achieve the simulation of photodynamics for long time scales using deep neural networks. We expand on the idea of using ML to obtain potentials for electronic excited states, as well as arbitrary couplings within a framework that combines ML with trajectory surface hopping molecular dynamics (Fig. 1). Our ML approach is fully capable of describing all necessary properties for executing nonadiabatic excited-state molecular dynamics on the order of nanoseconds. These properties include electronic energies, gradients, spin-orbit couplings, nonadiabatic couplings, and dipole moments of molecules. Additionally, the underlying potentials and couplings can be used to optimize critical points of the configurational space, such as potential minima or crossing points, which are critical for interpreting photochemical mechanisms.

Surface hopping molecular dynamics with deep neural networks
The photodynamics simulations have been carried out with an extended version of the program suite SHARC (Surface Hopping including ARbitrary Couplings) [15,26]. Within surface hopping, the nuclei are propagated according to the classical equations of motion and the electrons are treated quantum mechanically via interfaces to external electronic structure program packages.
The electronic structure calculations are carried out on-the-fly at the nuclear geometries visited by the classical trajectories. The probability of a molecular system occupying a specific electronic state and population transfer between the different electronic states -in the form of stochastic, instantaneous hops from one electronic state to another -are dependent on the couplings between them.
For surface hopping simulations with neural networks, the idea of retrieving electronic properties from an external source stays the same, but instead of a quantum chemical calculation, neural networks predict energies, gradients, couplings and dipole moments. The relationships between nuclear coordinates and the corresponding electronic properties are learned from a training set, where each data point is one set of nuclear coordinates and the associated set of quantities as computed with a reference method. In order to make the procedure usable, the processes for generating neural networks potentials and their use in photodynamics simulations have been automated in SHARC. Additionally, the ML dynamics is executed with the newly Fig. 1: Schematic workflow of surface hopping molecular dynamics with deep neural networks. The scheme starts from a set of initial quantum chemical calculations, which are pre-processed by a phase-correction algorithm and constitute an initial training set. Using this set, two deep neural networks (NN1 and NN2) are trained and replace the quantum chemical calculations of energies (E) and gradients (G), nonadiabatic couplings (NACs), spin-orbit couplings (SOCs) and dipole moments (µ). The dynamics calculation starts with an input geometry, for which the two neural networks provide all electronic quantities. If the outcome of both neural networks is sufficiently similar, the configurational space around this input geometry is adequately represented by the training set and the electronic quantities are used for a propagation time step. If not, the nuclear configuration is recomputed with quantum chemistry, phase corrected and included in the training set -a process referred to as adaptive sampling. The neural networks are then re-trained and a new dynamics cycle is started. developed program pySHARC, a python wrapper for the SHARC program, which avoids writing data to the hard disk and thus reducing substantially the runtime of the program.

Building up the training set
The combination of quantum chemistry with ML requires a cost-effective generation of a training set that, while it samples the conformational space of a molecular system comprehensively, is small enough to keep demanding quantum chemical reference calculations feasible [22]. With this in mind, we employ an initial training set based on normal mode scans and then switch to an adaptive sampling scheme [20] that automatically identifies untrustworthy regions not covered by the initial training set. The adaptive sampling procedure employs excited-state dynamics simulations using two or more neural networks that are independently trained from the same training set. At every time step, the (root mean squared) error between the predictions of the different neural networks is compared to a predefined threshold. Whenever this threshold is exceeded, i.e., the different neural networks make very different predictions, the corresponding geometry is assumed to lie in a conformational region with too few training points. It is then necessary to expand the training set by computing the quantum chemistry data for this geometry. Along a dynamics run, the threshold for the error between predictions made by the neural networks is adapted until the conformational space is sampled sufficiently to make accurate predictions without any additional reference calculations.
In order to achieve translational and rotational invariance in the relations established between the predicted properties and the nuclear coordinates, we use as input the matrix of inverse distances. Energies and gradients are directly used for training purposes in a single neural network. Forces are predicted as analytical derivatives of the neural networks [27], ensuring energy conservation [20,23]. Similarly, permanent dipole moments are directly used in the training. However, couplings (as well as transition dipole moments) need to be preprocessed as they are computed from the wave functions of two different electronic states and therefore depend on the relative phases of these two wave functions. Phase inconsistencies need to be eliminated in order to avoid ill-behaved photodynamics [28], as it is described in the following.

Facing phases of wave functions
The electronic wave functions computed with quantum chemistry programs are usually obtained as the eigenfunctions of the electronic Hamiltonian. However, this requirement does not uniquely define an electronic wave function because multiplying it by a phase factor, a complex number with an absolute value of 1 (in the case of real wave functions the possibilities are ±1), still returns a valid eigenfunction. Thus, in practice two wave functions computed for two very similar geometries might randomly differ by their phase factor. This problem can be visualized using molecular orbitals, see  i.e., diagonal elements in matrix notation, the wave function for an electronic state enters twice and any phase is squared, thus canceling out; however, in off-diagonal elements, such as couplings, the wave functions of the two different electronic states enter, and the different phases do not necessarily cancel out. Consequently, the raw curves of such off-diagonal properties are discontinuous (middle panel of Fig. 2), prohibiting correct learning behavior in the neural network. It is thus necessary to globally track the phases of all wave functions from one reference geometry to every other data point in the training set and apply a phase-correction algorithm to obtain smooth curves (bottom panel of Fig. 2). It is important that the same phase convention is applied for all data points, and not only along a single trajectory as usually done within surface hopping molecular dynamics [26] in order to ensure a correct neural network training. The phases are tracked by computing wave function overlaps between adjacent molecular geometries [29]. If the geometries are close enough, the overlaps will be sufficiently large and contain values close to +1 or -1, allowing a detection of phase switches. However, in cases in which molecular geometries are too far apart, the overlaps will generally be close to zero, offering no information about a phase change. In this case, we resort to interpolation between those two molecular geometries and iterative computation of wave function overlaps. For details on the phase correction algorithm, see the supplementary information.

Arriving at nanosecond simulations of photodynamics
The performance of our deep learning molecular dynamics approach is first demonstrated with a one-dimensional The first panel shows molecular geometries that are given as an input to a quantum chemistry program. The results for properties corresponding to offdiagonal matrix elements of the Hamiltonian are shown in the middle panel. Random signs are obtained due to random assignments of the phases of a wave function appear. As can be seen in the bottom panel, these random switches can be removed with phase correction and smooth relations between a molecular geometry and any property can be found.
analytical model consisting of five harmonic oscillators, one to represent the ground state 1 X, one for an excited singlet state 1 A, and three degenerate ones to mimic a triplet state 3 B (dashed lines in Fig. 3A). The states are coupled by both nonadiabatic couplings and spin-orbit couplings. Sampling of configurations along the degree of freedom leads to a training set of 100 data points and the potential energy curves can be fitted nearly exactly by neural networks (solid lines in Fig. 3A). Excited-state dynamics are started from 100 different initial conditions generated in the 1 X state, each excited to the 1 A state. The time evolution of the different states using the original analytical potentials and the trained neural networks (Fig. 3B) show that the predictions are comparable and each state is populated similarly after 100 fs. The mean absolute error (MAE) in the predictions of the energies is 0.00031 eV (0.01 kcal/mol) and of the gradients is 0.0024 eV/A (0.057 kcal/mol/Å). This first proof-ofconcept thus shows the ability of our ML method to describe excited-state dynamics including nonadiabatic and spin-orbit couplings. In the following, as a realistic system we consider the full dimensional photodynamics of the methylenimmonium cation, CH2NH2 + -the simplest member of the protonated Schiff bases. Methylenimmonium is reported to undergo ultrafast switches between different electronic states after excitation with light [30]. A larger member of this family is retinal, which is fundamental for vision [31] but we choose the methylenimmonium cation because it is small enough to perform accurate reference photodynamics simulations for short time scales for comparison. The reference computations were carried out with the highly accurate multi-reference configuration interaction method including single and double excitations and a double-zeta basis set (abbreviated as MR-CISD/aug-cc-pVDZ and in the following labelled as QC1). Two neural networks are trained on data obtained with the QC1 method using the adaptive sampling scheme described above, resulting in about 4000 data points (MAE energies among all states: 0.032 eV ≙ 0.73 kcal/mol; MAE forces among all states: 0.51 eV/Å ≙ 11.9 kcal/mol/Å, see also Table S1). We then simulated the dynamics of the methylenimmonium cation after excitation from the electronic ground state (S0) to the second excited electronic state (S2) during 100 fs, independently with both QC1 and the trained neural network. As can be seen from very low computational cost of ML, we were able to perform a very large number of trajectories with neural networks potentials (3846) in comparison with standard quantum chemistry (90). This is clearly visible in the smooth population curves for the neural networks simulations due to this enlarged statistics (a comparison of the curves with identical number of trajectories for neural networks and QC1 can be found in Fig. S1). In order to estimate the magnitude of the error obtained with neural networks, we carried out a second ab-initio molecular dynamics study with an additional, very similar, quantum chemistry method where only the double-zeta basis set is changed from aug-cc-pVDZ to 6-31++G**; the new level of theory (MR-CISD/6-31++G**) is termed QC2 (see further computational details in supplementary information and Fig. S2). As Fig. 4B shows, the differences between both levels of theory are of the same order of magnitude as those encountered between neural networks and quantum chemistry, indicating that the agreement between the methods is very good. Time constants derived from these data also agree very well (see Table S2).
Finally, we show that our deep neural networks are able to overcome the problem of limited simulation time and predict long  After excitation to the S2 state, ultrafast internal conversion to the S1 state takes place, followed by recovery of the S0 state within 300 fs. excited-state dynamics. In Fig. 5, we show the population dynamics of the methylenimmonium cation on a logarithmic scale up to 1 nanosecond (ns), i.e., 10 4 times longer than they were simulated with our quantum chemical reference method. A movie of one trajectory over 10 picoseconds (ps) is part of the supplemental material (Movie S1). The propagation of CH2NH2 + for 10 ps can be executed in less than 6 hours, which is 300 times faster than the calculation with the quantum chemical reference method. The propagation of 1 ns took 59 days with deep neural networks, whereas an estimated ~19 years of computation would have been required with the quantum chemical reference.

Neural network conical intersections
Since the neural networks can provide energies, gradients, and couplings, they can also be used to optimize important point of the PES, like state minima or conical intersections. Conical intersections are the target of many quantum chemical studies as they are commonly deemed as the most probable geometries for radiationless transitions between electronic states. Here we optimize two minimum energy conical intersections in CH2NH2 + , one between the S2 state and the S1 state and another one between the S1 state and the S0 state. The optimizations were independently performed with the trained neural network, as well as with the QC1 and QC2 methods for comparison. The optimized geometries of the  Fig. 4 for nomenclature) as well as optimized S1/S0 (A) and S2/S1 (B) minimum energy conical intersections (CI). The actual geometry is depicted on top (geometrical parameters are given in Fig.  S5). A zoom of the regions near the optimized points is shown in Fig. S6 together with a definition of the dihedral and pyramidalization angles. minimum energy conical intersections projected along two important coordinates are shown in the scatter plots of Fig. 6 together with the hopping geometries. The optimized molecular geometries (shown in Fig. S5) agree well (Cartesian coordinates in Tables S3-S6). Furthermore, each method results in a comparable distribution of hopping geometries around the optimized points, which in practice is even more important [32] for describing the population transfer in the simulations correctly.

Outlook
Deep neural networks accelerate nonadiabatic excited-state molecular dynamics simulations by orders of magnitude, thus overcoming the constraints of limited time scales and limited statistics. This approach opens new avenues for studying the photodynamics of complex systems on long time scales relevant for chemistry, biology, medicine, and material design. Offering access to the precision of high-level quantum chemistry methods at only a fraction of the original computational cost, we expect this setup to become a powerful tool in several research fields.

Materials and Methods
Surface Hopping Molecular Dynamics Nonadiabatic molecular dynamics simulations were carried out with the program SHARC (Surface Hopping including ARbitrary Couplings) [15,26].
To construct a one-dimensional analytical model, we used the analytical interface of SHARC. The potential energy curves are defined in Table S7. Nonadiabatic couplings are responsible for transitions between the 1 X and 1 A state while spin orbit couplings coupled states of different spin multiplicity, i.e. 1 X and 3 B as well as 1 A and 3 B. These couplings as well as dipole moments between different states were described by constant values given in Table S8 and S9, respectively.
A total number of 2000 initial starting points were sampled according to a Wigner distribution [15], from which 100 were selected, excited in a range of 0-2 eV and propagated for 100 fs with a time step of 0.05 fs for the nuclear motion and 0.001 fs for the analytical solution of electronic amplitudes.
The quantum chemical reference calculations of the methylenimmonium cation, CH2NH2 + , are done with the program COLUMBUS [33] which uses the implementation of nonadiabatic coupling vectors via Dalton [34] integrals. CH2NH2 + possesses 16 electrons. Molecular orbitals were optimized with the state-averaged complete active space self-consistent field (SA-CASSCF) method, averaged over 3 states. The two lowest molecular orbitals representing the 1s orbitals of the C and N atom (Fig. S2(A)) were frozen at the MRCI level of theory. The next three lowest energetic orbitals were set to be inactive ( Fig. S2(B)) and 4 orbitals were assigned to the reference space with 6 active electrons (Fig. S2(C)). Single and double excitations from the (6,4) reference space to 73 virtual orbitals yield a total of about 660000 configuration state functions. A frequency calculation was carried out with COLUMBUS [33]. We performed scans with 100 points along each normal mode (Table S10) as well as with 72 points along the torsion around the central double bond in order to generate the initial training set. From these scans, we removed data points from computations that did not show proper convergence.
As a prerequisite for the dynamics, 1000 initial conditions were sampled from the Wigner distribution of the quantum harmonic oscillator defined by the above-mentioned frequency calculation. From these 1000 possible starting points, 200 were excited -according to the oscillator strengths in the excitation window of 9.44 ± 0.15 eV -to the brightest state, which is the second excited singlet state. This excitation shows mostly ππ* character. Each trajectory was propagated for 100 fs with a time step of 0.5 fs for nuclear motion and 0.02 fs for integration of the timedependent Schrödinger equation. Trajectories showing problems with energy conservation due to improper convergence of the quantum chemistry were excluded, resulting in 90 trajectories for QC1 and 88 trajectories for QC2. However, the trend of the populations when all trajectories were taken into account is the same. Therefore, no bias was introduced by sorting out trajectories due to bad energy convergence within quantum chemical calculations, which appeared mainly around conical intersections. Such problems do not occur in the neural networks, which demonstrates another advantage of the ML approach.
Optimization of minimum energy conical intersections were executed with the SHARC tools that utilize an external optimizer of ORCA [35], where the computed energies and gradients [36,37] from COLUMBUS are fed in.

Neural Networks (NNs)
Multi-layer feed forward neural networks (NNs) have been implemented in python using the numpy [38] and theano [39] packages. Network architecture To find (hyper)parameters of the optimal-network-architecture NNs, we have automated a random grid search [1] and adapted the learning rate, one of the most critical parameters during training [40].
The optimization of several parameters of NNs was done for the initial training set consisting of 100 data points for the analytical model and 992 data points for CH2NH2 + . After the adaptive sampling, the training set increased to twice its initial size. Each nuclear configuration was given to the NNs as its inverted distance matrix. For optimal performance, we tested several aspects of the NN architecture, like the type of non-linear basis function (hyperbolic tangent and shifted softplus function, ln(0.5 + 0.5)) used, the number of nodes, the number of hidden layers, as well as other parameters, such as the learning rate, lr, the L2 regularization rate, the number of epochs, the batch size, and predefined factors that regulate the influence of forces in the training of energies and lr. Emphasis was put on the optimization of the L2 regularization rate with respect to a given batch size and lr. For each quantity that has to be predicted, a separate NN was used, except for the gradients. The latter enter directly into the NNs dedicated to the potential energy prediction. The NNs for the analytical model consisted of one or two hidden layers, whereas deep NNs with 6 hidden layers were applied for training and predictions of properties of CH2NH2 + . Optimal parameters selected for propagation of two NNs in the last steps of dynamics simulations are given in Table S11 and S12 for the analytical model and CH2NH2 + , respectively.

Training
In general, training aims at fitting the weight parameters of a NN in order to minimize a cost function, which usually represents the mean-squared error (MSE) between predictions by NNs and reference data. For this task, we used Adam (adaptive moment estimation) [41], a stochastic gradient descent optimization algorithm. In combination with Adam, an exponential learning rate decay was applied [1]. Therefore, a decay-factor, flr, with values between 0 and 1, as well as a step size were defined. The step size gives the number of epochs that have to be passed after lr is adjusted according to the following equation: = • . (S1) Quantities, which can be related to a nuclear configuration and predicted by implemented NNs are energies as well as corresponding gradients, spin-orbit couplings, nonadiabatic couplings, and dipole moments. For predictions of energies, it is favorable to include gradients in the minimization process [20]. The cost function, thus depends on two terms, with running over all molecules (total number ), and α running over all Cartesian coordinates of atoms (total number 3N). The first part of Equation S2 is the MSE between energies predicted by the NN, NN , and reference-data energies, , obtained from quantum chemical calculations. The second part of this equation represents the MSE of molecular forces, once derived from NNs, NN , and once calculated with quantum chemistry, . The constant ƞ regulates the influence of forces on the overall cost function and is set to 1 here. All different electronic states were treated within one NN, thus the errors entering the cost function are for all electronic states. The reference data set was always split into a training and a validation set in a random fashion using a ratio of 9:1. For training and prediction, inputs and outputs were scaled with respect to the mean and standard deviation (std) of the training set according to equation S3.
An early stopping mechanism was used to control overfitting. Additionally, the convergence of the loss functions of the training and validation set were checked manually, ensuring, e.g., that their order of magnitude is similar. The training set of the analytical model consisted of 100 equidistant data points containing values of x between 0.4 and 1.8 (with x being the one degree of freedom of the harmonic oscillators, see Table S7). Two initial training sets were generated for CH2NH2 + : one via sampling along each degree of freedom as well as a torsional mode and another one by executing dynamics simulations with SHARC but starting each trajectory from the same geometry, i.e., the equilibrium geometry, with sampled velocities according to a Wigner distribution. The excitation window for these simulations was set to 7-11 eV. The training set sampled along normal modes and the torsional mode yielded a better initial fit of the different computed properties, thus the focus was set only on this set for CH2NH2 + . The initial data set was insufficient for excited-state dynamics simulations and was expanded via an adaptive selection scheme as described in reference [20]. For this reason, two slightly different sets of NNs (NN1 and NN2, see Table S12 for the differing parameters) were chosen to start dynamics simulations by replacing quantum chemical calculations with predictions made by NNs. The mean, � , of fitted properties, � , of each NN, J, is given to SHARC to propagate the nuclei. Further, the root mean-squared error (RMSE) between those predictions, , was calculated on-the-fly: Inputs according to a region of the PES that has not been visited yet could be detected onthe-fly by setting a threshold for the RMSE of each quantity predicted by NNs. The threshold is set to 18.8 kcal/mol (0.03 H) for the energies in the beginning and adapted to smaller values during sampling similar to reference [20] by multiplication with a factor of 0.95 up to 10 times. Whenever this threshold is exceeded, predictions by NNs are deemed untrustworthy and need to be recomputed with quantum chemistry as well as added to the training set. For the dipole moments, the threshold was set to 0.5 a.u. (atomic units). The latter was kept constant, since dipole moments do not directly enter into the dynamics in our current approach but are implemented for future research purposes.
For the MSE of the nonadiabatic couplings, the threshold is initially set to 0.25 a.u. (atomic units), reduced adaptively, and raised intermediately to 3.0 a.u. after a training set size of 3600 points is reached. This intermediate raise makes it possible to find important geometries close to conical intersections. These are necessary to describe the dynamics accurately. Consequently, the population dynamics converges towards the correct behavior, as can be seen in Fig. S3 for the last iterations of the sampling procedure.
Data pre-processing: Global phase correction We carried out a phase correction procedure to obtain a training set suitable for our ML algorithm. To this aim, a phase vector, p, had to be derived for each molecular geometry containing values of +1 and -1. This vector was obtained by the computation of wave function overlaps between two different geometries. The overlap matrix, S, between wave functions of following nuclear configurations, and , was computed according to reference [29] using the WFoverlap code: = ⟨ | ⟩.
(S6) The overlap matrix has the dimension of N states x N states , with N states being the number of electronic states included in a calculation. When the molecular geometries are similar enough to each other, overlaps are sufficiently large to gain information whether a phase switch has occurred or not. Wave function overlaps for an electronic state that are close to +1 indicate that no phase change has taken place, whereas overlaps close to -1 point out a change in the phase of a wave function. Far from conical intersections, p usually corresponds to the diagonal matrix elements of S and also has the dimension of N states . Whenever two PESs are in close proximity, off-diagonal elements of S can be larger than diagonal elements. In such cases, values of p depend on absolute values of row elements of S. A threshold of ±0.5 was set between overlaps of two following wave functions to ensure that the correct value of +1 or -1 was taken into account. Each entry, pi, of p was then calculated according to Equation 7 with i and j running over all N states .
= �max�| |� ( )� ∀ � � ≥ 0.5; , = 1,2, … , However, many situations arise, in which computed overlaps between two geometries yield values close to 0. With these, it is not possible to decide reliably whether a phase change is present or not. Therefore, an interpolation of n steps between those two molecular geometries needed to be carried out, with n usually being in the range of 5 to 10. Interpolation was performed linearly with respect to a molecule's Z matrix. Transformation from xyz format to a Z matrix format and vice versa was executed with OpenBabel. The size of interpolation steps was large enough when each computation yielded overlaps close to +1 or -1. The phase vector, pn, applicable to the last point, n, of the required interpolation was found by multiplication of all previous phase vectors, p0 to pn-1: p0 always corresponds to the phase vector between the reference geometry, upon which the complete training set is phase corrected, and the first interpolation step. The reference geometry for global phase correction was chosen to be the equilibrium geometry. If no row element of S had an absolute value above or equal to ±0.5, the interpolation was stopped and more interpolated nuclear configurations between the equilibrium geometry and the one that needs to be included in the training set were generated. If there were still not enough interpolated configurations to account for sufficiently large overlaps between two subsequent wave functions, the simulations were stopped. One reason for too small overlaps are intruder states, which are excluded at the reference geometry, but are included at another geometry due to an energy drop. Thus, a previously included state is excluded such that the overall molecular electronic wave function changes considerably. The phase correction of each matrix M -in our case, the Hamiltonian matrix, which contains energies and couplings, and the dipole matrices for each direction of the coordinate system -was carried out according to Equation 9. Matrices containing all relevant nonadiabatic coupling vectors to form the nonadiabatic coupling tensor with the dimension N states xN states xN atoms x3 (with N atoms being the number of atoms) were corrected according to Equation 10. Each two dimensional nonadiabatic coupling vector, , accounts for nonadiabatic couplings between two states i and j. It was corrected by multiplication of every element with the entry of p for each state i, , and j, .

Supplementary Text
Comparison of dynamics with QC1 and NN (Fig. S1) Fig. S1 depicts a direct comparison of the molecular dynamics computed with QC1 (MR-CISD/aug-cc-pVDZ) and NNs. In both cases, the populations of 90 trajectories starting from the second excited singlet state are shown. Two hundred initial conditions were excited with 9.44 ± 0.15 eV and 90 trajectories reached 100 fs. Therefore, from NN-simulations only the first 90 trajectories were used for comparison. As can be seen, the populations of each state are in good agreement when comparing both methods -as they are in Fig. 4. However, the curves are not smooth anymore due to statistical reasons that arise from considering only 90 trajectories instead of 3846. Geometry comparison (Fig. S4) In addition to the population dynamics, we also checked whether visited molecular geometries are comparable along the trajectories. We first calculated the mean of each nuclear configuration over time from the NNs leading to populations given in Fig. S1 and computed the root-mean squared displacement (RMSD) to the mean molecular geometry at the respective time step of the ensemble predicted with MR-CISD/aug-cc-pVDZ, i.e. QC1 (Fig. S4 continuous line). An analogous comparison is also carried out for QC1 and QC2 (MR-CISD/6-31++G**, Fig. S4, dotted line). For completeness, we also computed the RMSD between NN and QC2 (Fig. S4, dashed line). All RMSDs are of comparable size, further validating our NN approach. Minimum energy conical intersections (MECIs) (Fig. S5 and S6) With each method, QC1, QC2, and NN, we optimized the geometries of the two MECIs, S1/S0 and S2/S1. Geometries present at the MECIs are shown in Fig. S5. They were optimized starting from the different hopping geometries obtained with the afore-mentioned methods, as indicated in the scatter plot of Fig. 6 in the main text and the zoom in Fig. S6. The MECI between the S1 and S0 state shows a rotation of the molecule around the H3-N-C-H4 dihedral angle (see Fig.  S6B for the definition) of approximately 90 degrees. Due to symmetry of the molecule, the MECI is obtained at ~ 90 degrees as well as at ~ -90 degrees. The optimized geometries agree well between all methods (QC1, QC2, NN). All methods yield hopping geometries that are distributed in a rather large region around these points.
The second MECI, S2/S1, is defined by a slight bi-pyramidalization and a bond elongation between the carbon and the nitrogen atom of around 1.44 Å (see Fig. S5) compared to a bond length of around 1.29 Å at the equilibrium geometry (Fig. S6B). Again, molecular geometries obtained with different methods are very similar. The scatter plot in the right panel of Fig. S6A shows the good agreement among different methods.     MECI with each different level of theory. For each method, we could find two different conical intersections, one between the first excited singlet state, S1, and the ground state, S0, as well as one between the second excited singlet state, S2, and S1. QC1 (MR-CISD/aug-cc-pVDZ) is used as the reference method to train neural networks (NN) and compared to MECIs obtained with QC2 (MR-CISD/6-31++G**). For the two MECI, the bond length between the nitrogen (blue) and the carbon (grey) atom is shown, as well as the bond length between the carbon and one hydrogen atom. Values are given in angstrom. A dihedral angle between four atoms marked with the dashed line is given as well as an angle between the carbon and a hydrogen atom (S1/S0 MECI) and between two hydrogen atoms (S2/S1 MECI). Fig. 5. (A) Scatter plots showing the distribution of hopping geometries of each method as well as optimized nuclear configurations close to the MECI present between the S1 state and the ground state, S0, (left) as well as between the S2 state and the S1 state (right). (B) Shown is the equilibrium geometry of methylenimmonium in order to specify the atoms that describe the used dihedral and pyramidalization angle in Fig. 5 and Fig. S5A. The pyramidalization angle that is used to describe the MECI between S1 and S0 is defined by the angle between the bond between the nitrogen (N) and the carbon (C) atom (red line) and the plane that is spanned by the C atom and the two hydrogen atoms, H2 and H4 (red triangle). The dihedral angle that is chosen to distinguish geometries describing both MECI is defined by hydrogen atoms, H3 and H4, as well as by the C and N atom (dashed black line). It is the angle between the planes spanned by atoms H3-N-C and N-C-H4. Time constants for transitions between electronic states derived by fitting a kinetic model to populations described in Fig. 4. The table shows calculated time constants with QC1 (MR-CISD/aug-cc-pVDZ), QC2 (MR-CISD/6-31++G**) and NNs. k1 refers to the time constant from S2 to S1 and k2 to the time constant from S1 to S0.  Table S11.

Conical intersection analysis in addition to
Selected parameters to construct NNs trained on data of the analytical model to carry out dynamics simulations given in Fig. 3 of the main text. NN1 describes the first NN that was defined, whereas NN2 describes the second one. The influence of the forces for training of energies was set to 1.