Machine Learning for Quantum Dynamics: Deep Learning of Excitation Energy Transfer Properties

Understanding the relationship between the structure of light-harvesting systems and their excitation energy transfer properties is of fundamental importance in many applications including the development of next generation photovoltaics. Natural light harvesting in photosynthesis shows remarkable excitation energy transfer properties, which suggests that pigment-protein complexes could serve as blueprints for the design of nature inspired devices. Mechanistic insights into energy transport dynamics can be gained by leveraging numerically involved propagation schemes such as the hierarchical equations of motion (HEOM). Solving these equations, however, is computationally costly due to the adverse scaling with the number of pigments. Therefore virtual high-throughput screening, which has become a powerful tool in material discovery, is less readily applicable for the search of novel excitonic devices. We propose the use of artificial neural networks to bypass the computational limitations of established techniques for exploring the structure-dynamics relation in excitonic systems. Once trained, our neural networks reduce computational costs by several orders of magnitudes. Our predicted transfer times and transfer efficiencies exhibit similar or even higher accuracies than frequently used approximate methods such as secular Redfield theory

The energy transport in light-harvesting complexes is determined by coupled pigments which are embedded in a protein scaffold. 1,2 The large number of degrees of freedom in the system renders ab initio calculations on the atomistic level impossible. Therefore, the exciton transfer dynamics is typically modeled with an effective Frenkel exciton Hamiltonian. 3,4 The exciton Hamiltonian for a system of N sites for the single exciton manifold reads where ε i denotes the first excited state energy of the i-th pigment molecule and V ij denotes the Coulomb coupling between excited states at the i-th and j-th molecule. We assume that the exciton system couples linearly to the vibrational environment of each pigment, which is assumed to be given by a set of harmonic oscillators. The phonon mode dependent interaction strength is captured by the spectral density Here, d i,k defines the coupling of the k-th phonon mode (b † i,k ) of the i-th pigment with frequency ω i,k .
In the first step of photosynthesis, energy is absorbed in the antenna pigments and subsequently transferred to the reaction center in which photochemical reactions are triggered. Within a simple picture, this process can be modeled by energy transfer from an initially excited pigment (donor) to a target state (acceptor). We model energy trapping in the acceptor state |acceptor phenomenologically by introducing anti-Hermitian parts in the Hamiltonian where Γ trap defines the trapping rate. In a similar way, we model radiative or non-radiative decay to the electronic ground state as exciton losses The rate Γ −1 loss defines the exciton lifetime. In this study we are interested in two different exciton propagation characteristics: the average transfer time 2 and the overall efficiency which corresponds to the accumulated trapped population during the transfer process. For numerical evaluations, we replace the upper integration limit by t max which is chosen such that the total population within the pigments has dropped below 0.0001.
The exciton dynamics is expressed in terms of the reduced density matrix ρ(t), which can be obtained from standard open quantum system approaches. Here we employ the hierarchical equations of motion (HEOM) approach which accounts for the reorganization process, [5][6][7][8] in which the vibrational coordinates rearrange to their new equilibrium position upon electronic transition from the ground to the excited potential energy surface. The major drawback of the HEOM approach is the adverse computational scaling, which arises from the need to propagate a complete hierarchy of auxiliary matrices. Therefore, we employ a high-performance implementation of HEOM integrated into the QMaster software package. [9][10][11] As demonstrated in previous publications, QMaster enables HEOM simulations for large systems comprising of up to hundred pigments, 12 as well as to perform accurate calculations for highly structured spectral densities. 10,11,13,14 A computationally much cheaper formalism, the Redfield approach, can be derived with the assumption of weak couplings between the system and the bath in combination with a Markov approximation. 4,15 The secular approximation simplifies the equation even further and allows to write the dynamics in the form of a Lindblad master equation. This drastically reduces the computational demand of this approach compared to exciton propagation under the HEOM, which explains the popularity of the secular Redfield equations. However, secular Redfield has been shown to underestimate the transfer times in certain light-harvesting complexes. 16

B. Exciton transfer properties of biological Hamiltonians
We report the exciton transfer times calculated for the four light-harvesting complexes (LHCs) FMO, RC, CP43+RC and CP47+RC. [17][18][19][20]  In this study we generated four datasets to train MLPs for excitation energy transport property predictions. For each dataset we randomly sampled excited state energies and inter-site couplings of Frenkel exciton Hamiltonians from uniform distributions within particular parameter value ranges inspired by existent light-harvesting complexes (see main text for details).
The four generated datasets differ in both, the parameter value ranges and the number of excitonic sites and thus the number of parameters in the Frenkel exciton Hamiltonians. MLPs are trained on 12000 Frenkel exciton Hamiltonians sampled from each of the four spanned parameter spaces. Since MLP models have only limited extrapolation capabilities, the training set for MLP models needs to sufficiently cover the parameter space of interest such that test set points are close to training set points. As MLP models are fully connected, even a deviation in only one of the input parameters can cause significantly different predictions.
We estimate the relative coverage of the parameter spaces of the four investigated datasets by computing the distance between all Hamiltonians in each dataset based on the maximums norm as defined in Eq. 7 where both A and B are symmetric matrices with A, B ∈ R n×n . Fig. 1  We observe much larger distances between Hamiltonians in the FMO and CP47 datasets compared to the RC and CP43 datasets. Based on this observation, we expect lower prediction accuracies of trained MLP models for the FMO and the CP47 dataset which is consistent with the obtained results in Tab. 2 in the main text.

D. Computational cost of exciton dynamics calculations
Established methods for computing the population dynamics of excitonic systems such as the hierarchical equations of motion (HEOM) approach suffer from adverse computational scaling. Because of this drawback less sophisticated techniques with lower computational demand such as the secular Redfield method are more popular. Both methods need to run a full population dynamics calculation for obtaining exciton transfer properties such as the average transfer time or transfer efficiency.
We report the runtimes of HEOM and secular Redfield calculations observed during the generation of the four datasets used in this study. Each dataset consists of 12000 randomly generated exciton Hamiltonians for which we computed average transfer times and transfer efficiencies with both methods (see main text for details). HEOM calculations were carried out in the high-performance QMaster package, [9][10][11] which uses the architecture of GPUs for propagating a complete hierarchy of auxiliary matrices in parallel. Secular Redfield calculations were run on single core CPUs. Computation time for the individual Hamiltonians show some variations in computation time since depending on the excitation energy transfer times, we need to run the exciton dynamics for a larger or smaller number of time-steps. In Tab. II we show the average computation time for a single exciton Hamiltonian for each dataset (average over all 12,000 Hamiltonians), as well as the total computational cost to generate the complete datasets.
As a compromise between accuracy and computational costs we truncate the hierarchy at N max = 5 for the datasets with fewer pigments (RC and FMO) and at N max = 4 for the datasets with more pigments (CP43 and CP47). We tested that the accuracy is retained at these truncation levels by simulating a random selection of 10 exciton Hamiltonians drawn from each dataset at respective lower and higher levels of truncation. We observe an average 1.6 % deviation in the calculated transfer times between the higher and lower truncation levels for the FMO dataset. The RC and CP43 datasets show smaller deviations with 1.1 % for RC and 0.1 % for CP43. For the CP47 we observe a deviation of about 2.4 %. Despite the small deviations induced by the earlier truncation of the hierarchy, we use the lower truncation level in our dataset generation to keep the computational costs at a reasonable level (see Tab. II).
While transfer properties in smaller systems, as they are modeled in the RC and FMO datasets, can be calculated within a few minutes, calculations on larger systems modeled in the CP43 and CP47 datasets show significantly higher computational demands with typical runtimes in the order of 6 to 18 hours. The significant computation times illustrate that large-scale simulations on artificial exciton systems can be computationally quite exhaustive.
Multi-layer perceptrons (MLPs), however, encode a set of matrix operations, which allows for a significantly faster calculation of the properties of interest. By construction, the time for computing the output of an MLP scales linearly with the number of neurons in the architecture. We find that our trained MLP models could predict exciton transfer times and transfer efficiencies of one complete dataset introduced in this study in less than 10 seconds on a single CPU. This estimate includes the time spent for loading all 12000 exciton Hamiltonian matrices into memory and running the matrix operations encoded in the MLP architecture as well as writing the results of the prediction to file.  MLP models can be considered a valid alternative to secular Redfield calculations when running exciton dynamics calculations on a large number of excitonic systems due to the comparable accuracy at an orders of magnitude lower computational cost.

E. Dataset preparations for multi-layer perceptron predictions
MLP models constructed in this study were designed to predict exciton transfer times and transfer efficiencies from the Frenkel exciton Hamiltonian of an excitonic system. The numerical values of excited state energies and inter-site couplings in the exciton Hamiltonian, as well as, the transfer times and transfer efficiencies depend on the chosen unit system. Further, the ranges of these properties can be very different.
Prediction targets need to be rescaled onto an interval, which lies within the codomain of the output layer activation function. The target properties in this study are exciton transfer times and efficiencies. Both of these properties are always positive, but while efficiencies also have an upper bound, transfer times could a priori be arbitrarily large. We, therefore, chose the softplus function for the activation function of the MLP output layer as it exhibits similar properties. To use most of the non-linear regime of the softplus function, we decided to map our training targets t (the collective set of transfer times and efficiencies) onto the interval (0, 4] based on the maximum transfer time and efficiency in the training set T train (see Eq. 9). Exciton transfer times and transfer efficiencies were rescaled separately. Note, that both exciton transfer times and efficiencies are always positive which justifies the implicit lower bound of zero in the equation.t

F. Bayesian optimization for hyperparameter selection
Multi-layer perceptrons (MLP) consist of a set of neurons organized in layers. Each neuron accepts an input, which is rescaled by a set of weights and biases intrinsic to the neuron, to calculate its output. The outputs of neurons in one layer are propagated through the MLP as the input for the subsequent layer. While weights and biases of each neuron are collectively referred to as the parameters of the MLP, an MLP model contains additional free parameters such as the number of layers and the number of neurons per layer. The latter are the hyperparameters of the model. MLP parameters are typically optimized on the training set with gradient based optimization techniques such as stochastic gradient descent. Hyperparameters are instead selected by optimizing the parameters of an MLP on the training set and evaluating the prediction accuracy on the validation set.
In this study, we decided to employ a Bayesian optimization approach to find hyperparameters for accurate MLP architectures. We chose a total of six hyperparameters to be optimized and set fixed ranges for each of them for the Bayesian optimization. Hyperparameters and ranges are reported in Tab. III. In particular, we also included the number of training points as a hyperparameter to investigate by how much the prediction accuracy of MLPs can increase when expanding the training set. The particular ranges for individual hyperparameters were chosen based on a few test training runs and chosen large enough that the Bayesian Optimizer is able to explore diverse MLP architectures. Although we found that MLPs perform more accurately when using more points in the training set, we restricted the number of training 6 points to 6000 as a compromise between accuracy and computational demand. Especially for PCA sampled training sets we observed only a small advantage of larger training sets measured by the validation set error. Several activation functions were probed in the Bayesian optimization.

Convergence of Bayesian optimization runs
Bayesian optimization selects a particular set of hyperparameters from all the possible values for each of the hyperparameters (see Tab. III). The MLP corresponding to this set of hyperparameters is then constructed and trained on the training set. Prediction errors on the validation set are used to evaluate the prediction accuracy of each constructed MLP after it was trained.
The Bayesian optimization procedure was run for a total time period of seven days (walltime) on four GPUs (NVIDIA Tesla K80) for each dataset (FMO, RC, CP43, CP47) and each training set selection method (random, PCA). We extract the validation set error for all MLPs generated and trained in this process. During optimization, we keep track of validation error of the current optimal MLP architecture. Fig. 2 to illustrate the progress of the hyperparameter optimization. After only a few iterations the prediction error for the validation set already has significantly dropped. In each of the Bayesian optimization procedures we did not see any decrease in the validation set errors for at least the last 200 proposed MLP architectures. Further, we observe that as opposed to a single best set of hyperparameters the Bayesian optimization instead reveals a number of MLP architectures with different hyperparameter values but similarly small validation set errors. Therefore, we conclude that the Bayesian optimization converged for all datasets and identified reasonably accurate MLP architectures.

Bayesian optimization results
We recorded the minimum validation set errors of all MLPs constructed and trained during the Bayesian optimization procedure to study the effect of particular hyperparameter choices on the prediction accuracy (see Fig. 3).
Optimal sets of hyperparameters resulting in the smallest validation set error for all four datasets with MLPs trained on PCA select (randomly drawn) training sets are reported in Tab. IV (Tab. V). While MLPs trained on the RC and the FMO datasets, which consists of fewer excitonic sites, tend to prefer shallow but broad architectures we achieved the smallest relative validation set errors with more hidden layers and fewer neurons per hidden layer for the CP43 and the CP47 dataset with more excitonic sites.

G. Comments on order ambiguities in Frenkel exciton Hamiltonians
As there is no well-defined ordering of excitonic sites in the considered system, Frenkel exciton Hamiltonians are only uniquely defined up to mutual permutations of rows and columns. This order ambiguity was found to be Training points  6000  6000  5480  6000  Layers  3  5  3  3  Neurons per layer 1729  1258  1899  749  Learning rate  challenging for MLPs trained to predict atomization energies from Coulomb matrices and different techniques have been developed to overcome this problem. 21 . In this study, Frenkel exciton Hamiltonians were constructed by uniformly sampling excited state energies and inter-site couplings. For all sites in a system we used the same sampling distributions. In the case of Coulomb matrices, however, the distributions of values for diagonal and off-diagonal elements can be different for different atom types, which is why the ordering ambiguity emerges.
To illustrate that the ordering ambiguity does not arise for the Frenkel exciton Hamiltonian, we used our trained MLPs to predict exciton transfer times and efficiencies for all exciton Hamiltonians in each dataset for which we permuted some of the rows and columns. Note, that we chose site 1 as the initially excited site for the population dynamics simulations while site 3 acted as an acceptor. Permutations involving sites 1 or 3 are therefore expected to significantly change the prediction results. The obtained average relative absolute prediction errors are reported in Tab. VI.
We generally observe prediction errors for permuted Hamiltonians comparable to the prediction errors on the unpermuted test sets if the permutations are performed on sites other than 1 and 3. If, however, sites 1 or 3 were involved in the permutation the prediction errors drastically increased by at least an order of magnutide.

H. Comparison of exciton transfer times obtained from exciton dynamics calculations and multi-layer perceptron predictions
We trained MLP models to predict exciton transfer times and efficiencies from Frenkel exciton Hamiltonians to provide an alternative to computationally costly exciton dynamics calculations. In this section, we comment on the prediction accuracy of MLP models by comparing their predictions with results obtained from two popular exciton dynamics approaches, the secular Redfield method and the hierarchical equations of motion (HEOM) formalism (on which the MLPs were trained). While secular Redfield is known to underestimate exciton transfer times it is computationally cheaper than HEOM.
We provide a context for MLP prediction accuracies by comparing MLP predicted transfer times to transfer times obtained from secular Redfield and HEOM calculations on the level of individual Hamiltonians. Fig. 4 shows scatter plots of transfer times obtained from all three approaches. We plot transfer times predicted by MLPs and secular Redfield versus transfer times calculated with HEOM, which we consider as the ground truth for the purpose of this study. The green line in Fig. 4 indicates perfect agreement between HEOM predictions and predictions by either By comparing the exciton transfer time predictions of the secular Redfield approach to the exciton transfer times of HEOM we observe that secular Redfield almost always underestimates transfer times consistently in all four datasets. MLPs instead over-and underestimate exciton transfer times, which is due to the symmetric loss function employed during MLP training and hyperparameter optimization.
The absolute deviation between Redfield and HEOM transfer times generally increases with the value of the exciton transfer time, as observed in previous studies. 16 Also for the MLP predictions, we observe a larger deviation from the HEOM results for longer transfer times. However, in contrast to secular Redfield, the error is still distributed rather symmetrically around our ground truth. This observation is explained by the fact that MLPs were trained to minimize the relative deviation between predicted and HEOM transfer times. That is, for longer transfer times the predictions can afford larger deviations from HEOM which results in an opening funnel structure, especially seen in the scatter plot for the CP47 dataset. In addition, all of the presented datasets include fewer data points at larger transfer times, but MLPs are trained to minimize loss functions which take the unweighted average over all transfer times.