DOI:
10.1039/C5SC04786B
(Edge Article)
Chem. Sci., 2016,
7, 51395147
Machine learning exciton dynamics†
Received
10th December 2015
, Accepted 1st April 2016
First published on 1st April 2016
Obtaining the exciton dynamics of large photosynthetic complexes by using mixed quantum mechanics/molecular mechanics (QM/MM) is computationally demanding. We propose a machine learning technique, multilayer perceptrons, as a tool to reduce the time required to compute excited state energies. With this approach we predict timedependent density functional theory (TDDFT) excited state energies of bacteriochlorophylls in the Fenna–Matthews–Olson (FMO) complex. Additionally we compute spectral densities and exciton populations from the predictions. Different methods to determine multilayer perceptron training sets are introduced, leading to several initial data selections. In addition, we compute spectral densities and exciton populations. Once multilayer perceptrons are trained, predicting excited state energies was found to be significantly faster than the corresponding QM/MM calculations. We showed that multilayer perceptrons can successfully reproduce the energies of QM/MM calculations to a high degree of accuracy with prediction errors contained within 0.01 eV (0.5%). Spectral densities and exciton dynamics are also in agreement with the TDDFT results. The acceleration and accurate prediction of dynamics strongly encourage the combination of machine learning techniques with ab initio methods.
1 Introduction
Studying the exciton dynamics of large photosynthetic complexes, such as the Fenna–Matthews–Olson (FMO) and lightharvesting II (LHII) complexes, has been a topic of much theoretical and experimental interest in recent years.^{1–14} The theoretical community has focused on employing and developing reduced models to understand and describe the dynamics of these complexes.^{2–7,13–30} These models rely on the knowledge of a system Hamiltonian for the interacting chromophores as well as spectral densities to describe the coupling of chromophores to their environments (protein, solvent).^{31} Computing a system Hamiltonian or spectral density is an arduous computational task due to the large number of degrees of freedom in these complexes. The most detailed approaches used to obtain these quantities have been mixed quantum mechanics/molecular mechanics (QM/MM) or semiclassical simulations.^{32,33} In particular, one QM/MM approach which has become popular in recent years consists in propagating the nuclei in the electronic ground state of the photosynthetic complex.^{9,10,34,35} This approximation ignores the change in electronic structure due to excitation of the chromophores. Subsequently, for a subset of time frames, excited state energies for the chromophores are computed using a quantum method such as timedependent density functional theory (TDDFT).^{10} The energy trajectories are then employed to extract systembath correlation functions and finally spectral densities to use in exciton dynamics.
The downside of this approach is the large computational cost. Long molecular dynamics (MD) equilibration times of several tens of nanoseconds are required.^{36,37} The typical computational scaling of MD codes with the system size N is (NlogN).^{38} In contrast, TDDFT calculations scale as (N^{2}).^{39} Very often calculations need to be repeated for identical chromophores in similar environments to account for the effect of small variations. For instance in the case of a singlepoint mutation typically one would need to rerun the entire set of simulations.
In this work we propose an alternative route: using multilayer perceptrons, a special class of neural networks, to predict the excited state along a MD trajectory. Such approaches typically scale as (N) and were found to perform significantly faster than TDDFT approaches. As a test system we consider the Fenna–Matthews–Olson (FMO) complex of P. aestuarii. Neural networks have previously been used to predict potentialenergy surfaces.^{40} We use multilayer perceptrons as fully connected neural networks to predict the values of the first singlet excited state for the chromophores. We train the neural networks on the excited state energies obtained from QM/MM calculations. Several sampling methods are used to select the training data for the neural networks. In particular we tested a sampling method based on correlations of nuclear positions to improve on the spectral density predictions. Once trained, the neural networks are employed to make excited state energy predictions. Then one can build a Hamiltonian from the predictions and compute the exciton dynamics.
With optimal neural network training and 12 trained neural networks per BChl we predicted excited state energies with errors contained to 0.01 eV (0.5%) from the neural network ensemble average. Further, with neural networks trained on data based on correlation sampling we correctly predict the shape of the spectral density and observe an error which is squared with respect to the excited state prediction error. This demonstrates the power of machine learning in chemistry, as has also been found in recent work where machine learning was employed to extract other chemical properties.^{41–44}
2 Methods and computational details
2.1 Ground state QM/MM simulations
A semiclassical description of the FMO complex was obtained by combining ground state MD simulations with TDDFT calculations of the first singlet excited state, known as the Q_{y} state,^{45} at given molecular conformations along the timedependent trajectories. The MD runs were carried out using the NAMD software package^{46} with the AMBER99 force field (ff99SB).^{47} The BChla parameters employed are reported in ref. 48. The Xray crystal structure of the FMO trimer in P. aestuarii (PDB 3EOJ,^{49} see Fig. 1) was chosen as initial configuration. The trimer was solvated using a TIP3P water box.^{50} The minimum distance between the crystal structure and edges of the solvent box was taken to be 15 Å.^{37,51} The charge was neutralized by adding sodium ions. The total number of atoms in the system was 141624. The simulation was equilibrated for 4 ns and the production run was 40 ps long with a 2 fs timestep. Electrostatic interactions were calculated with the particlemesh Ewald method. Shake constraints were used for all bonds containing hydrogen. Simulations were carried out at 300 K and 1 atm using a Langevin thermostat and a Langevin piston as implemented in NAMD.

 Fig. 1 Crystal structure of the FMO complex in P. aestuarii (PDB 3EOJ).^{49} (A) 3EOJ trimer crystal structure. (B) BChls' geometric arrangement in monomer A (residues 360 to 367, corresponding to sites 1 to 8). Hydrogens are not shown in this representation.  
The timedependent Q_{y} excited ground state gaps of the BChla molecules were obtained using TDDFT with the PBE0 functional,^{52} within the Tamm–Dancoff Approximation (TDA),^{53} using QChem.^{54} We employed the 321G basis set due to the high computational cost of these simulations. The Q_{y} excited state was taken to be the one with the largest oscillator strength and the orientation of the transition dipole was verified using the same methodology as in ref. 37 and 55. Excited state energy distributions are shown in the ESI (see Section S.1.4†) and trajectories can be downloaded from ref. 56. Excited state energies were computed at every 4 fs of the production run. The number of TDDFT calculations performed initially is based on the time scale and time step required to reproduce the dynamics accurately. Some values were excluded based on the oscillator strength/angle criterion/failed convergence. The excluded values were at most 2.15% of the full trajectory.
2.2 Machine learning: neural networks training
2.2.1 Input data representation.
Multilayer perceptrons (neural networks) were used to predict quantum mechanical excited state energies of BChls in the FMO complex from the MD classical coordinate trajectory. Neural networks were trained in a supervised training scheme using the back propagation algorithm.^{57} Excited state energies from the TDDFT calculations described previously were provided to the neural network as targets. BChl conformations were represented by Coulomb matrices,^{43} which were successfully used to predict excited state energies of various molecules with one neural network.^{58} Both, input and target feature distributions, were rescaled to a zero mean and a unitary standard deviation prior to neural network training.^{59}
By using Coulomb matrices as input features, neural networks can be trained on a representation of BChls which is translation and rotation invariant. Coulomb matrices are particularly suitable to describe BChls in the FMO complex as these molecules do not undergo large conformational changes within time scales of several tens of picoseconds.^{36,37,60} Coulomb matrices were adapted to account for external charges within and around the represented BChl. The electrostatic influence of particles in the environment N was described by adding additional Coulomb potential terms to the corresponding Coulomb matrix entries (see eqn (1)):

 (1) 
Partial charges Z_{i} of atoms in the system were taken from the system topology (Amber 99SB force field,^{47} and ref. 48). Studies have shown that the tails of the BChls have little influence on the Q_{y} excited state energies.^{37,61} Thus, instead of representing the entire BChl in a Coulomb matrix, the phytyl tail was neglected and only the 90 atoms closest to the magnesium in the BChls were represented in Coulomb matrices to reduce their dimensionality. We included all external partial charges present in the system to generate Coulomb matrices.
2.2.2 Neural network architecture, choice of BChl molecule and overfitting.
We chose to use multilayer perceptrons (neural networks) with logistic activation functions and two hidden layers. This setup has been shown to perform particularly well for supervised regression problems.^{62} Optimal neural network hyperparameters were identified from a grid search. Both the learning rate and the number of neurons in the first and second hidden layer were varied to find the lowest deviations between predictions and target data.
Instead of performing the grid search on each BChl in the FMO complex individually, only the most representative BChl was used to determine optimal neural network hyperparameters to reduce the computational cost. We identified this BChl in terms of shared Coulomb matrix space. Coulomb matrices of all eight BChls were clustered and compared. We found that site 3 shares the most Coulomb matrix space with all other sites (see ESI Section S.1.1†).
From the grid search, we found that a learning rate of 10^{−4} with 204 neurons in the first hidden layer and 192 neurons in the second hidden layer results in the smallest average absolute deviation of predicted and target excited state energies.
Target feature overfitting was avoided by using early stopping.^{63} For all training sessions a total of 4000 trajectory frames was assigned to the training set as a balance between information and computational cost. Neural networks were trained on Intel(R) Xeon(R) CPUs (X5650 @ 2.67 GHz). Training one neural network on four cores took about (23.9 ± 5.0) h. Training times for other investigated training set sizes ranging from 500 to 5000 frames are reported in the ESI (see Section S.1.2†).
2.2.3 Reducing training set redundancies through clustering: taxicab, Frobenius and “correlation” clustering.
We employed different methods to select Coulomb matrices and corresponding excited state energies for neural network training. In a first approach training set Coulomb matrices were drawn randomly from the entire trajectory.^{63} This led to excited state energy predictions with an average accuracy of 13 meV.
However, as BChl conformations in the data set are not uniformly distributed a training set consisting of randomly drawn Coulomb matrices likely contains redundant information. MD simulations were carried out in the NPT ensemble with constant temperature and pressure. Thus, the BChl conformations are sampled from a Boltzmann distribution.^{64} To avoid selecting many similar conformations and thus similar Coulomb matrices we performed a cluster analysis on all Coulomb matrices of the entire trajectory to determine the most distinct Coulomb matrices. We implemented a Coulomb matrix cluster algorithm following the principles of the gromos method.^{65}
Distances between Coulomb matrices were measured using pnorms. Two different metrics were applied: p = 1 (taxicab norm) and p = 2 (Frobenius norm). Both clustering approaches resulted in more accurate predictions of excited state energies with accuracies of 9 meV but the prediction of exciton dynamics remained quite inaccurate (see Section 3.2).
To improve the prediction of exciton dynamics, we developed a clustering method based on coordinate correlations in the classical MD trajectory. We will refer to this approach as the “correlation” clustering method. The Q_{y} state in BChls is mostly distributed along one of the diagonals which connects opposite nitrogen atoms.^{37} Training set frames were thus selected based on high correlations in the nitrogen rootmean square deviation (RMSD). In particular, for the nth BChl we sampled from

C^{RMSD}_{n}(t)^{2} = 〈D^{nitrogen}_{n}(t)D^{nitrogen}_{n}(0)〉^{2}.  (2) 
until 4000 frames with the largest RMSD correlation were selected. Here, refers to the rootmean square difference in position of the four nitrogen atoms in the nth BChl at a given time t with respect to their position in the energy minimized crystal structure at time t = 0. This sampling led to a more accurate prediction of the spectral density (see Section 3.2).
2.3 Exciton dynamics and spectral densities
To further compare the predicted excited state energy trajectories with the TDDFT trajectories we computed the exciton dynamics in the FMO complex using two different methods. The first is a stochastic integration of the Schrödinger equation as used in previous work.^{10} The second method is the Markov Redfield master equation.^{66} Both of the methods are Markovian but the first relies only on the excited state energy trajectories while the latter also depends on the spectral density. Here we focus on the sensitivity of these methods to the changes related to using neural networks rather than to more subtle questions on dynamics such as those addressed by the comparison of Markovian with non Markovian methods (e.g. the hierarchy equation of motion approach).^{2}
Finally the spectral density j(ω), as used in the Redfield equations of motion (see eqn (3)), is obtained by normalizing the Fourier transform of the twotime correlation function as we discussed in previous work.^{31}

 (3) 
The superscript “harm” refers to the harmonic prefactor which is needed to connect the QM/MM results to the open quantum system approach. Following the notation of previous work,^{31}C^{cl}(t) denotes the classical correlation function.
3 Results and discussion
3.1 Excited state energy prediction using neural networks
3.1.1 Acceleration of excited state energy computations with neural networks.
Intel(R) Xeon(R) CPUs (X5650 @ 2.67 GHz) were used to train neural networks and predict excited state energies. A total of 12 neural networks was trained for each of the eight BChls in monomer A of the FMO complex. Predictions of each of the 12 neural networks per BChl were averaged in a neural network ensemble averaging approach to obtain a more accurate prediction for the excited state energy trajectory. The spread of predictions of individual neural networks is given in the ESI (see Section S.1.3†).
Training sets were generated with all four training set selection methods (see Section 2.2.3). Predicting Q_{y} excited state energies for the entire trajectory (10^{4} frames) for one BChl with one neural network took on average (3.9 ± 0.8) s on one core. In contrast, the quantum chemistry calculations using the TDDFT (PBE0/321G) model chemistry required approximately 60000 h for the entire trajectory on one core. Target feature generation is only necessary if trained neural networks are not available and independent of the number of frames to be predicted, thus reducing the reported costs if longer trajectories are predicted. Required calculation times to compute excited state energy trajectories for each of the eight BChls in the FMO complex are reported in Table 1. While two cores of an Intel(R) Xeon(R) CPU (X5650 @ 2.67 GHz) were used for TDDFT calculations and neural networks were trained on four cores the recorded calculation times were multiplied by the number of used cores for comparison reasons.
Table 1 Time required to compute excited state energies (10000 frames) for all bacteriochlorophylls (BChls) for TDDFT (PBE0/321G) and neural network (NN) predictions from correlation clustered Coulomb matrices. Reported times include neural network training (t_{train}) on 4000 frames with input (t^{input}_{Coul}) and target feature (t^{target}_{E}) generation, excited state calculations/predictions (t_{Calc}) and the total time (t_{tot}). If trained neural networks are available, only Coulomb matrices need to be calculated for neural network predictions, reducing the required time to 48 h. Reported times correspond to training a total of 12 neural networks independently to obtain ensemble averaged excited state energies. All reported times refer to calculations on a single core of an Intel(R) Xeon(R) CPU (X5650 @ 2.67 GHz)
Method 
Training [h] 
Calculation [h] 
Total [h] 
t
^{input}_{Coul}

t
^{target}_{E}

t
_{train}

t
_{Calc}

t
_{tot}

PBE0/321G 
— 
— 
— 
480000 
480000 
NN_{Corr} 
48 
192000 
9178 
<0.1 
201226 
If trained neural networks are available, excited state energies of given BChl conformations can be predicted directly from Coulomb matrices representing the BChl conformations. Thus, to obtain the excited state energies of an entire trajectory consisting of 10000 frames, Coulomb matrices need to be calculated first. With a calculation time of 2.19 s per Coulomb matrix on one Intel(R) Xeon(R) CPU (X5650 @ 2.67 GHz) core, about 6 h are needed to compute all 10000 Coulomb matrices. With excited state energy predictions requiring less than a minute, excited state energy trajectories can be obtained from neural networks about four orders of magnitude faster compared to TDDFT calculations.
3.1.2 Accuracy of neural network predictions.
As can be seen in Fig. 2 neural network predictions agree well with the TDDFT data. Predictions from the average over 12 networks deviate from TDDFT values by ∼0.3 meV regardless of the site and the input selection method. Ensemble averaging was found to improve the prediction accuracy for all sites (see Table S.2†). When considering excited state energies predicted from a single network, we observed a deviation of σ^{single}_{s,random} < 14 meV for all sites s when the neural network was trained on randomly drawn Coulomb matrices. This prediction error can be decreased to σ^{single}_{s,taxicab} < 9 meV by selecting the training set based on the Coulomb matrix space clustering taxicab method (see Section 2.2.3). The uncertainty of neural network predictions therefore corresponds to roughly 14% of the typical FWHM (64 meV) of the sites. Frobenius clustering showed similar deviations. In contrast, predictions from neural networks trained on correlation clustered Coulomb matrices show a slightly higher deviation of σ^{single}_{s,correlation} < 15 meV on average. The obtained prediction errors are about one order of magnitude smaller than those in ref. 58. This difference is due to the fact that we predict excited state energies of the same molecule in different timedependent geometries while in ref. 58 the neural network was trained to predict the excited state energies of many different molecules.

 Fig. 2 Mean and standard deviation of Q_{y} excited state energy distributions for all eight sites obtained from TDDFT calculations (PBE0/321G) and compared to neural network predictions. Presented distributions were obtained from single neural network predictions. Neural networks were trained on Coulomb matrices selected from the classical MD trajectory by the indicated selection method. Error bars indicate the width of the excited state energy distribution.  
3.1.3 Crosspredictions: predicting excited state energies for other bacteriochlorophylls.
Since all BChla molecules in the FMO complex consist of the same atoms and show similar geometrical conformations, we also used neural networks trained on one BChl to predict excited state energies of other BChls in the same monomer. This enabled us to understand how well the trained network can adapt to changes in the environment from changes in the Coulomb matrices. We observed that for any clustering method, the prediction error is about two times larger when performing this type of crossprediction, see Fig. 3. Nonetheless, as we see in panel A, the largest observed average absolute deviation is still below 1.14% (corresponding to 25 meV).

 Fig. 3 Relative absolute deviations of predicted excited state energies from TDDFT excited state energies. Neural networks trained on one particular site (indicated by “Network”) were used to predict excited state energies of another site (indicated by “Target”). Panel (A) shows the relative absolute deviation ^{rel}_{BChl} of predicted excited state energies from TDDFT excited state energies for each BChl, , in percent. Panel (B) shows the deviation σ^{mean}_{BChl} of the mean of the predicted excited state energies from the mean of the TDDFT calculated excited state energies, σ^{BChl}_{mean} = 〈ε^{NN}_{BChl}〉 − 〈ε^{TDDFT}_{BChl}〉/〈ε^{TDDFT}_{BChl}〉 in perthousand.  
3.1.4 Predicting excited state energies of bacteriochlorophylls in other monomers.
All neural networks were trained on BChla molecules in monomer A. These neural networks were then used to predict excited state energies of BChls in the other two FMO monomers. Each neural network predicted the excited state energies of the BChl corresponding to the one on which it was trained (i.e. a neural network trained on site 1 in monomer A predicted site 1 in monomer B and C).
Due to the fact that the FMO complex is a homotrimer, similar BChl conformations should be sampled during the MD simulation in each monomer. Thus, excited state energy averages are expected to be identical for corresponding BChls of different monomers, provided that the same phase space regions were covered in the simulation time. The results are presented in Fig. 4. In that figure, we show the timeaveraged excited state energy predicted from 12 independently trained neural networks for each monomer as well as the timeaveraged Q_{y} energies obtained from TDDFT for monomer A. The bars represent the spread of the distribution and not an error. The predicted distributions are narrower than the TDDFT distributions. This is probably due to the fact that frames corresponding to energies at the tails of the distribution are less sampled. Regarding the error on the other hand, the largest deviation encountered between mean values of excited state energy distributions is 3 meV.

 Fig. 4 Mean and standard deviation of excited state energy distributions for all eight sites obtained using TDDFT PBE0/321G (solid line with circle), for monomer A, and using neural network prediction with training on 40% of all Coulomb matrices of monomer A for all three monomers (dashed lines and triangle, square and star symbol for monomers A, B, C). Presented distributions were obtained from the ensemble averaged predictions of 12 neural networks. Error bars indicate the standard deviation of the obtained excited state energy distributions.  
We have found that trained neural networks can accurately predict TDDFT excited state energies. This allows for a large reduction of computational time. It is possible to train on a single BChl and predict excited state energies of other BChls in the same monomer or in different monomers.
3.2 Spectral densities and exciton dynamics with neural networks
We then used the calculated excited state energy trajectories of all BChls to obtain information about interactions of the BChls with their environment by computing spectral densities. In addition, we built a Hamiltonian to extract the exciton dynamics in the system.
The spectral density J^{harm}(ω) (see eqn (3)) was computed for all eight sites in the FMO complex from the TDDFT excited state energies and neural network predicted excited state energies.^{31} Spectral densities for each site were averaged over all BChls to obtain an averaged spectral density J_{ave}(ω). To minimize spurious effects in the Fourier transform, we multiplied the correlation function by a Gaussian of σ^{2} = 0.09t_{max}^{2} with t_{max} = 1600 fs.^{31} The Gaussian is normalized to have unitary area in frequency domain so that in frequency domain this corresponds to a convolution with a Gaussian with a FWHM of 26 cm^{−1}.
In Fig. 5 we show the comparison to our neural network prediction with training and prediction on the same site and the various Coulomb matrix selection methods. We found that predicted spectral densities all have a shape which resembles the overall shape of the TDDFT spectral density. However, the height and accurate position of the peaks in the spectral density is most accurately predicted using correlation clustering.

 Fig. 5 Spectral density averages . The spectral densities were computed from excited state energy trajectories obtained from TDDFT calculations (PBE0/321G) and compared to spectral densities from neural network predicted excited state energy trajectories. Neural networks were trained on the bacteriochlorophyll they predicted with the indicated Coulomb matrix selection method. The correlation method (see Section 2.2.3) gives the best prediction.  
Spectral densities for each BChl, calculated from TDDFT excited state energies and excited state energies predicted from neural networks trained on all input selection methods are plotted in the ESI (see Fig. S.11 in Section S.1.5†).
Average spectral densities were used to calculate the reorganization energy . Comparisons of reorganization energies are reported in Table 2. We observe that the smallest deviation between neural network predicted results and TDDFT results occurs for neural networks trained on correlation clustered Coulomb matrices.
Table 2 We report the percentage deviation of neural network predicted reorganization energies λ_{NN,β} from TDDFT (PBE0/321G) calculated reorganization energies λ_{TDDFT} as σ_{λ} = λ_{TDDFT} − λ_{NN,β}/λ_{TDDFT}, where β indicates the method used for Coulomb matrix selections. Results are given in percent (%)
Method (β) 
Random 
Frobenius 
Taxicab 
Correlation 
σ
_{
λ
} [%] 
70.0 
54.9 
53.4 
46.6 
We also computed the population dynamics in the FMO complex monomer with a stochastic integration method.^{10} We averaged 4000 stochastic trajectories to obtain converged population dynamics. The initial excited site was chosen to be site 1 and the dynamics was propagated for all eight coupled sites at 300 K. The couplings of the Hamiltonian were taken from ref. 55 and 67. Results are shown in Fig. 6. We see that neural network predictions from neural networks trained on randomly drawn and correlation clustered Coulomb matrices predict the exciton dynamics in agreement with TDDFT calculations.

 Fig. 6 Population dynamics of the FMO complex calculated at 300 K with initial state in site 1. Here only two site populations are shown but dynamics was carried out for the 8 BChls. Excited state energy trajectories were obtained from TDDFT calculations (PBE0/321G) as well as neural networks trained on randomly drawn and correlation clustered Coulomb matrices.  
We also employed the Markovian Redfield method to compute the exciton dynamics in the FMO complex (see Section 2.3).^{66} In this case there is an explicit dependence of the exciton dynamics on the spectral density. The energies in the Hamiltonian were taken to be the averages from the TDDFT or neural network predicted excited state energy trajectories. The same couplings as for the stochastic integration method were used.
To investigate the importance of excited state energies and spectral densities on the Redfield exciton dynamics we calculated the exciton dynamics in two different ways. First, we computed the exciton dynamics with neural network predicted excited state energies and the average spectral density obtained from TDDFT calculations and then we used the neural network predicted energies as well as the neural network predicted spectral densities. Results are presented in Fig. 7 panel A, for predicted excited state energies and TDDFT spectral densities and panel B, for excited state energies and spectral densities predicted by neural networks. We initialized the dynamics with the excitation in BChl 1 and propagated the dynamics for all eight coupled sites at 300 K.

 Fig. 7 Time evolution of the exciton population for BChl 1 (red) and BChl 2 (blue) in the FMO complex calculated from excited state energy trajectories and average spectral densities using the Redfield method. The initial state is site 1 excited. Panel (A) shows the exciton dynamics for neural network predicted excited state energies using the same TDDFT calculated average spectral density in all cases. Panel (B) shows the exciton dynamics with both, excited state energy trajectories and harmonic average spectral densities predicted by neural networks trained with the indicated selection method.  
In panel A we see that given a constant spectral density the error on energies is small and does not strongly influence the exciton dynamics. The main role is played by the spectral density as can be seen in panel B. Here we see much larger differences depending on the neural network sampling method and we notice that the prediction of neural networks trained on correlation clustered Coulomb matrices agrees better with the TDDFT exciton dynamics. Deviations of neural network predicted exciton dynamics from TDDFT calculated exciton dynamics are listed in detail in the ESI (see Section S.1.5†). Further methods should be investigated to improve the prediction of the spectral density, the error on the energy should be reduced further by an order of magnitude to predict spectral densities with the optimal reorganization energy.
4 Conclusion
The computational study presented in this article showed that multilayer perceptrons (neural networks) can be used to successfully predict excited state energies of BChls in the FMO complex. Using different methods to select a training data set for neural network training from features in the classical MD trajectory alone we were able to generate neural networks which can predict TDDFT excited state energies with high accuracy (0.01 eV).
Furthermore, the neural networks can predict properties such as QM/MM derived spectral densities or exciton population dynamics. The prediction of excited state energies using the neural networks is about seven orders of magnitude faster than TDDFT. If we include training feature generations we still observe a speedup of about four orders of magnitude. Even if neural networks need to be trained first, excited state energies are obtained in less than half the time needed for TDDFT. Based on the observations we made on the FMO complex we recommend the following procedure to apply machine learning to predict excited state properties of other systems:
(1) Represent all molecules of interest and their chemical environment with Coulomb matrices.
(2) Obtain optimal neural network architectures from hyperparameter grid searches. In particular, it is important identify the most representative molecule in terms of the space which contains the features used for neural network predictions.
(3) Determine an optimal training set. Training sets for neural networks with optimal network architecture can be generated by selecting Coulomb matrices based on properties which are related to the desired quantum mechanical properties. We observed that selecting Coulomb matrices which represent Coulomb matrix space clusters improves excited state energy predictions. However, these do not necessarily work well for dynamics. Investigate on the role of higher sampling of the tails of the energy distribution when selecting the training set.
(4) Make predictions. To predict spectral densities and exciton dynamics with high accuracy, Coulomb matrices should be selected for the training set if they reveal high excited state energy correlations. We found that nitrogen RMSD correlations in the BChls are a good indicator for excited state energy correlations. Thus, we selected Coulomb matrices based on high nitrogen RMSD correlations. Of course this might be complicated for some molecules with very delocalized excited states.
In conclusion, this approach provides a gigantic speedup to ground state QM/MM. From a neural network trained on a single BChl molecule, we can predict the excited states of 23 other molecules in the system at very low additional computational costs. This will be particularly useful to speed up or simply to enable the simulation of larger, more complex and challenging lightharvesting systems. Further, it will be helpful to study, for instance, the role of small changes in the environment on the exciton dynamics. Future questions we would like to address include the possibility of extending the prediction to other temperatures using for example multitarget machine learning.
Acknowledgements
S. V. and A. A.G. acknowledge support from the Center for Excitonics and Energy Frontier Research Center funded by the U.S. Department of Energy under award DESC0001088. Computations were run on the Harvard University's Odyssey cluster, supported by the Research Computing Group of the FAS Division of Science.
References
 J. Adolphs and T. Renger, Biophys. J., 2006, 91, 2778–2797 CrossRef CAS PubMed .
 A. Ishizaki and G. R. Fleming, Proc. Natl. Acad. Sci. U. S. A., 2009, 106, 17255–17260 CrossRef PubMed .
 C. Kreisbeck, T. Kramer, M. Rodriguez and B. Hein, J. Chem. Theory Comput., 2011, 7, 2166–2174 CrossRef CAS PubMed .
 M. Sarovar, G. R. Fleming and K. B. Wahley, Nat. Phys., 2010, 6, 462–467 CrossRef CAS .
 J. Moix, J. Wu, P. Huo, D. Coker and J. Cao, J. Phys. Chem. Lett., 2011, 2, 3045–3052 CrossRef CAS .
 M. Mohseni, P. Rebentrost, S. Lloyd and A. AspuruGuzik, J. Chem. Phys., 2008, 129, 174106 CrossRef PubMed .
 M. B. Plenio and S. F. Huelga, New J. Phys., 2008, 10, 113019 CrossRef .
 B. P. Krueger, G. D. Scholes and G. R. Fleming, J. Phys. Chem. B, 1998, 102, 5378–5386 CrossRef CAS .
 C. Olbrich and U. Kleinekathöfer, J. Phys. Chem. B, 2010, 114, 12427–12437 CrossRef CAS PubMed .
 S. Shim, P. Rebentrost, S. Valleau and A. AspuruGuzik, Biophys. J., 2012, 102, 649–660 CrossRef CAS PubMed .
 G. D. Scholes, C. Curutchet, B. Mennucci, R. Cammi and J. Tomasi, J. Phys. Chem. B, 2007, 111, 6978–6982 CrossRef CAS PubMed .
 P. Huo and D. F. Coker, J. Chem. Phys., 2010, 133, 184108 CrossRef CAS PubMed .
 J. Cao and R. J. Silbey, J. Phys. Chem. A, 2009, 113, 13825–13838 CrossRef CAS PubMed .
 J. Wu, F. Liu, Y. Shen, J. Cao and R. J. Silbey, New J. Phys., 2010, 12, 105012 CrossRef .
 S. Jang, M. D. Newton and R. J. Silbey, J. Phys. Chem. B, 2007, 111, 6807–6814 CrossRef CAS PubMed .
 P. Rebentrost, M. Mohseni, I. Kassal, S. Lloyd and A. AspuruGuzik, New J. Phys., 2009, 11, 033003 CrossRef .
 A. Ishizaki and G. R. Fleming, J. Phys. Chem. B, 2011, 115, 6227–6233 CrossRef CAS PubMed .
 D. Abramavicius and S. Mukamel, J. Chem. Phys., 2010, 133, 064510 CrossRef PubMed .
 N. Skochdopole and D. A. Mazziotti, J. Phys. Chem. Lett., 2011, 2, 2989–2993 CrossRef CAS .
 G. Ritschel, J. Roden, W. T. Strunz, A. AspuruGuzik and A. Eisfeld, J. Phys. Chem. Lett., 2011, 2, 2912–2917 CrossRef CAS .
 P. Rebentrost and A. AspuruGuzik, J. Chem. Phys., 2011, 134, 101103 CrossRef PubMed .
 N. Singh and P. Brumer, Faraday Discuss., 2011, 153, 41–50 RSC .

N. Singh and P. Brumer, arXiv:1106.5911v1, 2011.
 L. A. Pachón and P. Brumer, Phys. Chem. Chem. Phys., 2012, 14, 10094–10108 RSC .
 S. M. Vlaming and R. J. Silbey, J. Chem. Phys., 2012, 136, 055102 CrossRef PubMed .
 F. Caruso, S. K. Saikin, E. Solano, S. Huelga, A. AspuruGuzik and M. B. Plenio, Phys. Rev. B: Condens. Matter Mater. Phys., 2012, 85, 125424 CrossRef .

M. Mohseni, A. Shabani, S. Lloyd and H. Rabitz, arXiv:1104.4812v1, 2011.
 J. Zhu, S. Kais, P. Rebentrost and A. AspuruGuzik, J. Phys. Chem. B, 2011, 115, 1531–1537 CrossRef CAS PubMed .
 I. de Vega, J. Phys. B: At., Mol. Opt. Phys., 2011, 44, 245501 CrossRef .
 J. Roden, A. Eisfeld, W. Wolff and W. T. Strunz, Phys. Rev. Lett., 2009, 103, 058301 CrossRef PubMed .
 S. Valleau, A. Eisfeld and A. AspuruGuzik, J. Chem. Phys., 2012, 137, 224103 CrossRef PubMed .
 I. P. Mercer, I. R. Gould and D. R. Klug, J. Phys. Chem. B, 1999, 103, 7720–7727 CrossRef CAS .
 G. Tao and W. H. Miller, J. Phys. Chem. Lett., 2010, 1, 891–894 CrossRef CAS .
 C. Olbrich, T. L. C. Jansen, J. Liebers, M. Aghtar, J. Strümpfer, K. Schulten, J. Knoester and U. Kleinekathöfer, J. Phys. Chem. B, 2011, 115, 8609–8621 CrossRef CAS PubMed .
 A. Damjanović, I. Kosztin, U. Kleinekathöfer and K. Schulten, Phys. Rev. E: Stat., Nonlinear, Soft Matter Phys., 2002, 65, 031919 CrossRef PubMed .
 C. Olbrich, J. Strümpfer, K. Schulten and U. Kleinekathöfer, J. Phys. Chem. Lett., 2011, 2, 1771–1776 CrossRef CAS PubMed .
 S. Jurinovich, C. Curutchet and B. Mennucci, ChemPhysChem, 2014, 15, 3194–3204 CrossRef CAS PubMed .
 Y. Shan, J. L. Klepeis, M. P. Eastwood, R. O. Dror and D. E. Shaw, J. Phys. Chem., 2005, 122, 054101 CrossRef PubMed .

D. Rappoport and J. Hutter, ExcitedState Properties and Dynamics, in Fundamentals of TimeDependent Density Functional Theory, Lecture Notes in Physics 837, Springer Berlin/Heidelberg, 2012, pp. 317–336 Search PubMed .
 J. Behler and M. Parrinello, Phys. Rev. Lett., 2007, 98, 146401 CrossRef PubMed .
 R. Ramakrishnan, M. Hartmann, E. Tapavicza and O. A. von Lilienfeld, J. Chem. Phys., 2015, 143, 084111 CrossRef PubMed .
 K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O. A. von Lilienfeld, K.R. Müller and A. Tkatchenko, J. Phys. Chem. Lett., 2015, 6, 2326–2331 CrossRef CAS PubMed .
 M. Rupp, A. Tkatchenko, K.R. Müller and O. A. von Lilienfeld, Phys. Rev. Lett., 2012, 108, 058301 CrossRef PubMed .
 E. O. PyzerKnapp, K. Li and A. AspuruGuzik, Adv. Funct. Mater., 2015, 25, 6495–6502 CrossRef CAS .

M. Olivucci, Density Functional Methods for Excited States: Equilibrium Structure and Electronic Spectra, in Computational Photochemistry, Elsevier, 2005, pp. 93–128 Search PubMed .
 J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kale and K. Schulten, J. Comput. Chem., 2005, 26, 1781–1802 CrossRef CAS PubMed .
 W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz, D. M. Ferguson, D. C. Spellmeyer, T. Fox, J. W. Caldwell and P. A. Kollman, J. Am. Chem. Soc., 1995, 117, 5179–5197 CrossRef CAS .
 M. Ceccarelli, P. Procacci and M. Marchi, J. Comput. Chem., 2003, 24, 129–142 CrossRef CAS PubMed .
 D. Tronrud, J. Wen, L. Gay and R. Blankenship, Photosynth. Res., 2009, 100, 79–87 CrossRef CAS PubMed .
 W. L. Jorgensen and J. D. Madura, J. Am. Chem. Soc., 1983, 1407–1413 CrossRef CAS .
 C. Olbrich, J. Strümpfer, K. Schulten and U. Kleinekathöfer, J. Phys. Chem. B, 2011, 115, 758–764 CrossRef CAS PubMed .
 J. P. Perdew, M. Ernzerhof and K. Burke, J. Chem. Phys., 1996, 105, 9982–9985 CrossRef CAS .
 J. C. Taylor, Phys. Rev., 1954, 95, 1313–1317 CrossRef .
 Y. Shao, Z. Gan, E. Epifanovsky, A. T. B. Gilbert, M. Wormit, J. Kussmann, A. W. Lange, A. Behn, J. Deng, X. Feng, D. Ghosh, M. Goldey, P. R. Horn, L. D. Jacobson, I. Kaliman, R. Z. Khaliullin, T. Kús, A. Landau, J. Liu, E. I. Proynov, Y. M. Rhee, R. M. Richard, M. A. Rohrdanz, R. P. Steele, E. J. Sundstrom, H. L. Woodcock III, P. M. Zimmerman, D. Zuev, B. Albrecht, E. Alguire, B. Austin, G. J. O. Beran, Y. A. Bernard, E. Berquist, K. Brandhorst, K. B. Bravaya, S. T. Brown, D. Casanova, C.M. Chang, Y. Chen, S. H. Chien, K. D. Closser, D. L. Crittenden, M. Diedenhofen, R. A. DiStasio Jr, H. Dop, A. D. Dutoi, R. G. Edgar, S. Fatehi, L. FustiMolnar, A. Ghysels, A. GolubevaZadorozhnaya, J. Gomes, M. W. D. HansonHeine, P. H. P. Harbach, A. W. Hauser, E. G. Hohenstein, Z. C. Holden, T.C. Jagau, H. Ji, B. Kaduk, K. Khistyaev, J. Kim, J. Kim, R. A. King, P. Klunzinger, D. Kosenkov, T. Kowalczyk, C. M. Krauter, K. U. Lao, A. Laurent, K. V. Lawler, S. V. Levchenko, C. Y. Lin, F. Liu, E. Livshits, R. C. Lochan, A. Luenser, P. Manohar, S. F. Manzer, S.P. Mao, N. Mardirossian, A. V. Marenich, S. A. Maurer, N. J. Mayhall, C. M. Oana, R. OlivaresAmaya, D. P. O'Neill, J. A. Parkhill, T. M. Perrine, R. Peverati, P. A. Pieniazek, A. Prociuk, D. R. Rehn, E. Rosta, N. J. Russ, N. Sergueev, S. M. Sharada, S. Sharmaa, D. W. Small, A. Sodt, T. Stein, D. Stück, Y.C. Su, A. J. W. Thom, T. Tsuchimochi, L. Vogt, O. Vydrov, T. Wang, M. A. Watson, J. Wenzel, A. White, C. F. Williams, V. Vanovschi, S. Yeganeh, S. R. Yost, Z.Q. You, I. Y. Zhang, X. Zhang, Y. Zhou, B. R. Brooks, G. K. L. Chan, D. M. Chipman, C. J. Cramer, W. A. Goddard III, M. S. Gordon, W. J. Hehre, A. Klamt, H. F. Schaefer III, M. W. Schmidt, C. D. Sherrill, D. G. Truhlar, A. Warshel, X. Xua, A. AspuruGuzik, R. Baer, A. T. Bell, N.
A. Besley, J.D. Chai, A. Dreuw, B. D. Dunietz, T. R. Furlani, S. R. Gwaltney, C.P. Hsu, Y. Jung, J. Kong, D. S. Lambrecht, W. Liang, C. Ochsenfeld, V. A. Rassolov, L. V. Slipchenko, J. E. Subotnik, T. Van Voorhis, J. M. Herbert, A. I. Krylov, P. M. W. Gill and M. HeadGordon, Mol. Phys., 2015, 113, 184–215 CrossRef CAS .
 S. Chandrasekaran, M. Aghtar, S. Valleau, A. AspuruGuzik and U. Kleinekathöfer, J. Phys. Chem. B, 2015, 119, 9995–10004 CrossRef CAS PubMed .

F. Häse, S. Valleau, E. PyzerKnapp and A. AspuruGuzik, Machine learning for exciton dynamics: Qy trajectories, figshare, 2015 Search PubMed.

D. E. Rumelhart, G. E. Hinton and R. J. Williams, Learning Internal Representations by Error Propagation, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, Cambridge, MA, USA, 1986, pp. 318–362 Search PubMed .
 G. Montavon, M. Rupp, V. Gobre, A. VazquezMayagoitia, K. Hansen, A. Tkatchenko, K.R. Müller and A. von Lilienfeld, New J. Phys., 2013, 15, 095003 CrossRef .
 J. Heaton, J. Mach. Learn. Res., 2015, 16, 1243–1247 Search PubMed .
 O. A. von Lilienfeld, R. Ramakrishnan, M. Rupp and A. Knoll, Int. J. Quantum Chem., 2015, 115, 1084–1093 CrossRef CAS .
 N. H. List, C. Curutchet, S. Knecht, B. Mennucci and J. Kongsted, J. Chem. Theory Comput., 2013, 9, 4928–4938 CrossRef CAS PubMed .
 M. Riedmiller, Comput. Stand. Interfac., 1994, 16, 265–278 CrossRef .
 L. Prechelt, Neural Network, 1998, 11, 761–767 CrossRef .

D. Frenkel and B. Smit, Free Energies and Phase Equilibria, in Understanding Molecular Simulation, Academic Press, 2002, pp. 165–199 Search PubMed .
 X. Daura, K. Gademann, B. Jaun, D. Seebach, W. F. van Gunsteren and A. E. Mark, Angew. Chem., Int. Ed., 1999, 111, 249–253 CrossRef .

H.P. Breuer and F. Petruccione, The Theory of Open Quantum Systems, Oxford University Press, New York, 2002 Search PubMed .
 M. Aghtar, J. Strümpfer, C. Olbrich, K. Schulten and U. Kleinekathöfer, J. Phys. Chem. B, 2013, 117, 7157–7163 CrossRef CAS PubMed .
Footnote 
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c5sc04786b 

This journal is © The Royal Society of Chemistry 2016 