Machine learning molecular dynamics for the simulation of infrared spectra

Artificial neural networks are combined with molecular dynamics to simulate molecular infrared spectra including anharmonicities and temperature effects.

described above. The resulting ML model was then used to predict the energies, forces and dipole moments along the 60 000 geometries sampled by the BP86 dynamics simulation of methanol. Figure S1 shows the distributions of the errors between the properties predicted by the ML model and the BP86 reference. Compared to the ML model based on the adaptive sampling scheme, much larger deviations from the BP86 values are found for the model constructed in the above manner. While the former one exhibits MAEs for energies, forces and dipole moments of 0.048 kcal mol −1 , 0.533 kcal mol −1 Å −1 and 0.016 D respectively, the MAEs of the latter are 10.580 kcal mol −1 , 108.253 kcal mol −1 Å −1 and 0.074 D. This decrease in accuracy is mainly due to two reasons: First, the random selection of the reference data points, which can neglect rarely occurring configurations that are nevertheless important for the description of the system. Second, by using an approximate method in order to sample efficiently, the geometric properties (e.g. equilibrium bond lengths) of configurations explored in this manner can differ significantly from those that would be obtained with the reference method (which is e.g. not used directly, due to the high computational cost). As a consequence, the resulting ML potentials are only valid for e.g. bond lengths sampled via the cheap method but not necessarily relevant for the expensive electronic structure reference. Both problems are avoided in the adaptive sampling scheme, since the HDNNPs used to sample different configurations closely resemble the reference level of theory, while the deviations between their predictions offer an uncertainty measure for the selection of new points.
The typical behavior of the adaptive sampling scheme when selecting reference configurations is shown in Figure S2 using the n-alkane as an example. While new reference data points are added more frequently during the early stages of the HDNNP ensemble construction, these events become increasingly rare as the ensemble approaches convergence. An interesting effect can be observed in the vicinity of 2000 timesteps, where a particularly undersampled region of the PES is encountered (dissociation of a hydrogen atom) and the sampling frequency increases again for a short time. Table S1 shows the performance of ML models for methanol employing ensembles of different sizes. The respective MAEs for energies, forces and dipole moments (once again computed along the BP86 trajectory) are averaged for all possible ensembles of size N constructed from the best 5 HDNNPs and dipole models. In the case of the energies and forces, improvements in accuracy can be gained by increasing the number of HDNNPs and dipole NNs. This gain is largest for the step from N = 1 to N = 2 but decreases quickly when more models are added. The dipole moment model does not seem to profit significantly from using more than one predictor.

HDNNP ensembles
As can be seen for N = 1, the basic building components of the above models already exhibit excellent accuracy and consequently the improvements introduced by larger ensembles are relatively small. However, in the case of more approximate models (e.g. during the early stages of the adaptive sampling scheme), ensembles offer a much greater advantage. This effect is demonstrated by performing the above analysis for ML models based on the randomly selected reference data points (see Table S2). Here, the MAEs of the predicted energies and forces are reduced to less than half of their initial value when changing the number of base predictors from N = 1 to N = 5, which is close to the expected reduction of 1 √ 5 ≈ 0.44. However, only minor improvements are once again observed in the case of the dipole moments.
Based on the above observations, we chose two HDNNPs to model the potential energies of all systems, as this offers the best trade off between an improved accuracy and the associated increase in simulation time. Since the dipole moment models do not seem to profit from using more predictors and are only used once a converged reference data set has been obtained, no ensembles are employed in this case.

Infrared Spectra
IR spectra were obtained with molecular dynamics simulations employing the same timestep and initialization as for the sampling procedure (see Section 2). After an initial equilibration period (3ps for methanol, 5ps otherwise), simulations were run at 300 K in case of methanol and n-alkanes (for 30 and 50 ps) and 350 K for the tripeptide (for 50ps). Temperature was kept constant using a massive Nóse-Hoover-chain thermostat 18-20 during equilibration and a standard Nóse-Hoover thermostat 18,20 during production runs. In both cases a chain length of 3 and a relaxation time of 100 fs were used. Forces and energies were obtained using a HDNNP ensemble of size two. In addition to HDNNP accelerated dynamics, AIMD simulations were carried out for methanol using the BP86 level of theory described above. Dipole-dipole autocorrelation functions were computed according to the Wiener-Khinchin theorem 21 for an autocorrelation depth of 2048 fs. A combination of a Hann window function 22 and zero-padding were applied to the autocorrelation functions before Fourier transformation in order to improve the quality of the final IR spectra.

High Dimensional Neural Network Potentials (HDNNPs)
Training and construction of the HDNNPs was carried out with the RUNNER program. HDNNPs were trained for 100 epochs with the element decoupled Kalman filter, using a forgetting schedule with λ 0 = 0.95 and λ k = 0.995. 23 The hidden layers of the elemental NNs employed the softplus activation function: while the final layer applied a linear transformation. Weights were initialized using the procedure described in Reference 24 . Molecular forces were included in the optimization procedure with a weighting factor of η = 10. To accelerate training, all ACSF were scaled to a range between -1 and 1 and the energies of the free atoms were subtracted from the target potential energies. In addition, an adaptive filter threshold of 0.9 times the current RMSE was employed to energy and force updates. The final models were determined using cross validation combined with an early stopping schedule, with 10 percent of the data in the validation set. 25 The neural network architectures employed in the final HDNNPs are given in Table S3. These networks were selected automatically during the adaptive sampling runs from a pool of ten different architectures, based on their performance in reproducing reference energies and forces.

Table S3
Architectures of the elemental neural networks used in the final composite ML models. Given are the number of nodes in every hidden layer. The input dimensions are the lengths of the ACSF input vectors (see Table S6), output dimension is one in all cases. For a detailed discussion of the atom-centered symmetry functions used to describe the chemical environments in the HDNNPs and the dipole model, see Section 7.

Dipole Moment Model
The NN dipole models were implemented in python 26 using the numpy 27 and theano 28 packages. The chemical environments of the individual atoms were described with the same ACSFs as for the HDNNPs (see Section 7). Weights were once again initialized using the above procedure. Unlike in the HDNNPs, hyperbolic tangent nonlinearities were used in the hidden layers.
For all systems, training was carried out for 10 000 epochs using the ADAM optimizer 29 with parameter settings of ε = 10 −8 , β 1 = 0.9 and β 2 = 0.999. Individual learning rates and L 2 regularization strengths are given in Table S4.
Once again, the final models were determined using cross validation combined with an early stopping schedule (10 percent of data as validation data). The final NN architectures can be found in Table S3.

Static Vibrational Frequencies and IR intensities
In order to gain further insights into the accuracy of the developed ML models, the static IR spectra predicted by these models were compared to the corresponding electronic structure spectra.
To this end, finite difference computations were used to obtain the Hessians and dipole moment derivatives necessary to calculate the normal mode frequencies and IR intensities. Prior to these computations, all molecular structures were optimized using the respective electronic structure methods and ML models. The deviations between the static ML and electronic structure spectra with respect to vibrational frequencies and absorption intensities can be found in Table S5. In the case of the C 69 H 140 , no static electronic structure spectra could be obtained due to the large number and prohibitive computational cost of the individual finite difference calculations.

Table S5
Comparison of the static vibrational frequencies and IR intensities obtained with finite difference electronic structure computations and the respective ML models. In addition to the MAEs, the deviation of each property is also given as a percentage of the overall range of the observed values. Column three shows the number of displacements required to compute the static electronic structure spectra if molecular forces are utilized, while column four contains the number of configurations used to construct the corresponding ML models. The ratio between these two quantities is given in column five. In addition to a comparison of the static spectra, Table S5 also gives the ratio between the reference samples used to construct the different ML models and the number of electronic structure finite difference computations. As can be seen, with a growing number of atoms contained

Atom-Centered Symmetry Functions
The local chemical environment of the different atoms in methanol and the n-alkanes is characterized via radial symmetry functions of the type and angular symmetry functions R i j is the distance between atoms i and j (analogous also for atoms k), R s is the offset of the Gaussian function. η, ζ and λ are parameters which determine the overall shape of the symmetry functions. f c is a cutoff function introduced to limit the description of the local environment to the chemically relevant regions and is defined as with R c as the cutoff radius. For a more detailed discussion of the different symmetry functions, see Reference 30 . In the present work, cutoff radii of 6.35 Å, 4.00 Å and 5.00 Å were used for methanol, the n-alkanes and the protonated tripeptide respectively.
In case of the protonated alanine tripeptide, a slightly modified version of Equation 3 was used to describe the angular atomic environment: The overall composition of the symmetry functions for the different molecular systems can be found in Table S6. The individual parameters of the radial and angular symmetry functions used to describe the methanol molecule are given in Tables S7 to S12, those for the n-alkanes can be found in Tables S13 to S16 and the parameters for the tripeptide functions are given in Tables S17 to S24. HDNNPs and dipole models use exactly the same symmetry functions.