Florian
Häse
,
Christoph
Kreisbeck
* and
Alán
Aspuru-Guzik
*
Department of Chemistry and Chemical Biology, Harvard University, Cambridge, 02138, USA. E-mail: christophkreisbeck@gmail.com; aspuru@chemistry.harvard.edu; Tel: +1-617-384-8188
First published on 23rd October 2017
Understanding the relationship between the structure of light-harvesting systems and their excitation energy transfer properties is of fundamental importance in many applications including the development of next generation photovoltaics. Natural light harvesting in photosynthesis shows remarkable excitation energy transfer properties, which suggests that pigment–protein complexes could serve as blueprints for the design of nature inspired devices. Mechanistic insights into energy transport dynamics can be gained by leveraging numerically involved propagation schemes such as the hierarchical equations of motion (HEOM). Solving these equations, however, is computationally costly due to the adverse scaling with the number of pigments. Therefore virtual high-throughput screening, which has become a powerful tool in material discovery, is less readily applicable for the search of novel excitonic devices. We propose the use of artificial neural networks to bypass the computational limitations of established techniques for exploring the structure-dynamics relation in excitonic systems. Once trained, our neural networks reduce computational costs by several orders of magnitudes. Our predicted transfer times and transfer efficiencies exhibit similar or even higher accuracies than frequently used approximate methods such as secular Redfield theory.
Mechanistic studies reveal valuable insight into the microscopic details of EET. Prominent examples are given by studies probing the impact of electronic coherence or non-trivial interactions between excitons and specific vibrational modes on transfer characteristics.7–13 However such investigations are tedious since they require sophisticated experimental setups,11–16 as well as computationally involved accurate simulations of open-quantum system dynamics.7–9,17–20 Further, there are only a few fundamentally different natural light-harvesting complexes from which alone we cannot extract the relation between the structure of an excitonic system and its dynamics in full detail.
In order to relate the dynamics to the underlying structure, it is desirable to investigate a large number of artificially designed excitonic systems. This has been recently addressed in several theoretical works.21–24 For example, analyzing perturbations on pigment geometries in the Fenna–Matthews–Olson (FMO) complex revealed that higher transport efficiencies tend to be realized by more compact structures.25 The drawback of these statistical approaches is that they need to run exciton dynamics calculations for ten thousands of randomly generated physically-plausible multi-chromophoric structures. Due to the sheer number of performed dynamics simulations, such an analysis becomes quickly computationally exhaustive, even though less sophisticated methods such as Lindblad equations are used.25
Here, we follow a novel path and leverage concepts from deep learning to bypass the computational demand of established techniques for exploring EET properties (see Fig. 1). Specifically, we train multi-layer perceptrons (MLPs), a class of fully connected feed-forward artificial neural networks to predict average exciton transfer times and overall transfer efficiencies. The input features to the MLPs are hereby given by the parameters of the corresponding Frenkel exciton Hamiltonians.26,27 For large scale screening of parameter space, only a fraction of all systems needs to be actually calculated to train the MLPs. Once trained, our neural networks evaluate transfer times just within a few milliseconds and thus bypass the computational demand of established techniques for exploring EET properties, while maintaining sufficiently high prediction accuracy.
We demonstrate the potential of the MLPs by considering various artificial datasets which were generated by uniform sampling of pigment excitation energies and inter-pigment couplings in the vicinity of the energies and couplings of a set of relevant biological complexes: the FMO complex,28 as well as the light-harvesting complexes CP43, CP47 and the reaction center (RC) of photosystem II.29–31 We aim to predict average transfer times from an initially excited donor to a certain acceptor pigment. Fig. (1) shows the situation for the FMO complex, which serves as an energy wire bridging the chlorosome and the reaction center in the photosynthetic apparatus of green sulfur bacteria and has become a standard system for comparing energy transfer properties.32 Initial excitation is assumed to be located at the donor pigment 8 since this pigment is in the proximity of the light-harvesting chlorosome antenna. Then, the excitation energy needs to be transferred to the target pigment 3 which couples to the reaction center where photochemical reactions are triggered. In the context of EET, the latter process is typically modeled as irreversible energy trapping.33–36
The MLP models are trained based on transfer properties obtained with the hierarchically coupled equation of motion technique (HEOM),37–39 which is a non-perturbative open quantum system approach taking into account non-Markovian effects. HEOM has become one of the standard tools in the field (a ready-to-run online package is available on http://nanohub.org)40 and serves in this manuscript as ground truth to quantify the error for the predictions made by the neural networks. The accuracy of the predictions critically depends on the choice of hyperparameters such as the number of neurons, number of hidden layers or the learning rate, which collectively define the specific architecture of the neural network. However, the best set of these parameters is a priori unknown. Therefore, we determine the architectures for our MLP models from a Bayesian optimization on selected hyperparameters. This procedure is well-established in the machine learning community and was shown to outperform architectures built by domain experts.41
We assess the quality of our MLP predictions by comparing the relative error of our predicted transfer times to the relative error made by secular Redfield calculations. The latter is simple to implement and commonly used to avoid the numerical complexity of more accurate HEOM simulations. Our findings demonstrate that MLPs provide a computationally significantly cheaper alternative to secular Redfield computations at comparable or, in most of our examples, even higher accuracy. Results for the FMO complex are summarized in Fig. 1.
The study of excitation energy transport typically involves two steps: first, an effective Hamiltonian describing the system parameters needs to be constructed and second, transfer properties need to be computed from this effective Hamiltonian using open quantum system approaches. While some of the authors have successfully applied machine learning techniques to accelerate the construction of effective Hamiltonians by predicting excited state energies of excitonic sites from Coulomb matrices,48 to our knowledge there has been no attempt to adapt machine learning models to predict transport properties of open-quantum systems.
In the subsequent sections, we develop a machine learning framework based on multi-layer perceptrons (MLPs) which predict excitation energy transfer properties of excitonic systems based on an effective Hamiltonian rather than obtaining them from computationally expensive quantum dynamics calculations. In future applications, this approach could facilitate large-scale screening such as the search for best-performing devices or studies on structure–function relationships in natural light-harvesting. MLPs have been shown to generally perform well in supervised regression problems in chemistry.48,49 Further, we choose MLPs since there is no informative relation between neighboring elements in the Frenkel exciton Hamiltonian which could be exploited by convolutional or recurrent neural networks, as excitonic sites can be numbered in arbitrary order.
Overall, our procedure can be summarized as follows. Based on the Frenkel exciton Hamiltonian we leverage standard open quantum system approaches to generate a database comprising of average transfer times and efficiencies for EET from a donor to a target pigment for a random set of Frenkel exciton Hamiltonians. The complete dataset is split into a training set, on which we train each MLP model, as well as a validation and a test set. For training data selection we will compare two strategies: (i) random selection of data points and (ii) selection of training data based on a principal component analysis (PCA) which allows us to extract those data points covering the most information sampled in the dataset. As we show in Section 2, the latter strategy is of particular relevance if the space of transfer properties is not evenly sampled and many representatives in the training set exhibit redundant information. We run a Bayesian optimization procedure to identify the best architecture for our MLP models. The performance of each architecture is quantified by the average relative absolute error made when predicting transfer properties for the validation set. Finally, we run predictions on the test set to assess the ability of the optimized architecture to generalize to realizations that were neither employed for training nor for validation during the Bayesian optimization. The source code for exciton transfer property predictions along with all trained MLP models as well as the datasets generated in this study are made available on GitHub.50
For our first dataset, we sample Hamiltonians around the FMO complex (Fig. 1), which serves frequently as the prototype light-harvesting complex. We construct three additional datasets that are motivated by the photosystem II of higher plants. For one set, we consider the eight pigments of the reaction center (RC) core, in which the primary step of charge separation is initiated through the electronically excited pigment ChlD1.30,51 For the other two sets, the reaction center core is extended by including either light-harvesting complex CP47 or CP43 of photosystem II into the exciton system. For simplicity, we refer to the dataset inspired by the CP43 + RC (CP47 + RC) complex as the CP43 (CP47) dataset from hereon. For each dataset, we generated 12000 exciton Hamiltonians by uniformly sampling excited state energies and inter-site couplings from a fixed range of values, as is summarized in Table 1.
Label | #sites | ε low [cm−1] | ε high [cm−1] | V range [cm−1] |
---|---|---|---|---|
RC | 8 | 14800 | 15000 | −50 to 50 |
FMO | 8 | 12000 | 12800 | −100 to 100 |
CP43 | 21 | 14800 | 15100 | −60 to 60 |
CP47 | 24 | 14500 | 15300 | −100 to 100 |
In the following, we are interested in transfer characteristics such as average transfer times from an initially excited pigment (donor) to a target pigment (acceptor). This model provides a simple description of the first step of photosynthesis, where energy is absorbed in the antenna pigments and subsequently transferred to the reaction center in which photochemical reactions are triggered. The energy transport in light-harvesting complexes is determined by coupled pigments which are embedded in a protein scaffold,52,53 and is typically modeled with an effective Frenkel exciton Hamiltonian. We include energy trapping in the acceptor pigment phenomenologically by introducing anti-Hermitian parts in the Hamiltonian. The exciton dynamics is expressed in terms of the reduced density matrix, which can be obtained from standard open quantum system approaches.
We compute exciton transfer times for all Hamiltonians in our datasets with the hierarchical equations of motion (HEOM)37–39 method, implemented in the QMaster software package, version 0.2.33,54,55 HEOM is a numerically exact method which accurately accounts for the reorganization process,56–59 in which the vibrational coordinates rearrange to their new equilibrium positions upon electronic transition from the ground to the excited potential energy surface. For all Hamiltonians we assumed identical Drude–Lorentz spectral densities , describing the exciton–phonon interaction. We do not use the parameters of the spectral density as input features for our neural networks. Extending our approach to predict transfer properties for various spectral densities goes beyond the present scope and is the aim of future work. More details on the Frenkel exciton Hamiltonian and the exciton dynamics methods, as well as the definition of the transfer time and transfer efficiencies, are given in the ESI Section A.†
Distributions of transfer times for all exciton Hamiltonians of each dataset are depicted in Fig. 2. The transfer times for the Hamiltonians of the biological complexes are highlighted in every distribution. Excited states and inter-site couplings for the exciton Hamiltonians of the biological complexes are taken from literature,29–31,60 and are uploaded to the GitHub repository.50 All population dynamics simulations are initialized as a fully populated site 1, serving as a donor, while site 3 acts as acceptor that couples to an energy sink with trapping rate Γtrap (see ESI Section A†). Note that the labeling of the donor and acceptor state is without loss of generality as rows and columns of the Hamiltonian can be permuted in a suitable way, which effectively corresponds to a relabeling of the pigments. Since excited state energies and inter-site couplings are drawn from the same distributions for all sites in one dataset we did not explicitly account for the ordering ambiguity which arises, for instance, in the case of Coulomb matrices for which matrix entries depend on the particular types of atoms to which they correspond.43
We find large variations in the ranges of transfer times between the four datasets. The RC and CP43 datasets, both with relatively narrow ranges of excited state energies and site couplings, yield relatively small transfer times. In contrast, we observe a wider spread in transfer times for the FMO dataset and the CP47 dataset which is consistent with the broader range of excited state energies and site couplings that were sampled.
The transfer times of the actual biological complexes lie close to the mode of the distributions for all four datasets. This suggests that natural systems may not be specifically selected for extraordinary transfer properties, as they exhibit transport characteristics that are just likely to occur, even for a random choice of the exciton Hamiltonian. We note that providing a conclusive answer goes beyond the scope of the present manuscript, but could be the subject of a future more detailed structure–function analysis. A recent evolutionary study for the FMO complex61 goes along a similar direction and suggests that the FMO complex has evolved towards stability to mutations rather than a selection of specific transfer characteristics.
Therefore, we follow a different path and carry out a more sophisticated selection process. The idea is to add those Hamiltonians to our training set which give the most information. We perform a principal component analysis (PCA) on the 8000 Hamiltonians containing dataset (after separating 2000 Hamiltonians each for validation and testing). We project each Hamiltonian onto a reduced space spanned by the most relevant principal components. The Hamiltonians for the training set are then selected such that they are maximally separated in the reduced space. This procedure guarantees that our training set constitutes the most diverse entities.
The 12000 Hamiltonians of each dataset were split into three sets: a training set of up to 6000 Hamiltonians for training MLP model instances with particular hyperparameters, a validation set of 2000 Hamiltonians used to evaluate the MLP architecture during optimization of the hyperparameters and a test set of 2000 Hamiltonians to probe out-of-sample prediction accuracies. All constructed MLP models were trained with stochastic gradient descent with 200 data points per batch and the ADAM optimizer,62 until the average relative absolute error (see eqn (1)) on the validation set increased over three full consecutive training epochs. Neuron saturation was avoided with L2 regularization on all weights of all neurons but the output neurons.
An essential component in developing accurate machine learning models consists in choosing proper values for the model hyperparameters. For this MLP framework, we consider a total of six hyperparameters. The initial learning rate μ for the ADAM optimizer and the regularization parameter λ. We also included the number of MLP layers and the number of neurons per layer, as well as the activation functions for neurons in each layer, for which we allowed five different options to choose from. The only exception is the last layer, for which we always use the softplus activation function to constrain our MLP models to the prediction of always positive transfer times and efficiencies. Lastly, we treat the number of training points as a hyperparameter in order to study the effect of the variations in the number of training samples on the prediction accuracy. The set of hyperparameters to be optimized and their allowed ranges are summarized in the ESI in Table III.†
We employ a Bayesian optimization algorithm,63 in order to scan the space of hyperparameters for the most accurate model. The model accuracy was defined as the average relative absolute error (see eqn (1)) in exciton transfer times predicted by the MLP and corresponding HEOM simulations for the validation set. All generated MLP models were constructed and trained with the same random seed. Bayesian optimization is a common tool in machine learning and balances exploration of parameter space and exploitation of previous information. The idea of this ansatz is to reduce the number of costly function evaluations under the assumption that the unknown function was sampled from a Gaussian process. In contrast to gradient or Hessian based optimization techniques, Bayesian optimization uses information of all previously evaluated points and can thus find a good approximation to the minimum of non-convex functions in relatively few iterations. We carried out the Bayesian optimization of MLP hyperparameters in the spearmint software package.41 MLP models were generated and trained using the Tensorflow package, version 1.0.64
(1) |
Dataset | Model | Δτtrain [%] | Δτvalid [%] | Δτtest [%] |
---|---|---|---|---|
FMO | Network (PCA) | 4.53 | 4.38 | 7.41 |
Network | 10.53 | 10.75 | 11.56 | |
Redfield | 9.70 | 9.96 | 9.60 | |
RC | Network (PCA) | 2.71 | 2.73 | 3.35 |
Network | 3.61 | 3.58 | 3.76 | |
Redfield | 8.62 | 8.67 | 8.60 | |
CP43 | Network (PCA) | 4.42 | 4.47 | 4.72 |
Network | 4.66 | 4.71 | 4.86 | |
Redfield | 4.71 | 4.66 | 4.73 | |
CP47 | Network (PCA) | 12.36 | 12.32 | 12.59 |
Network | 13.36 | 13.34 | 13.59 | |
Redfield | 10.48 | 10.47 | 10.51 |
The predictions are carried out with the Bayesian optimized MLP architectures, which show slight variations in their best-performing hyperparameters depending on the dataset at hand. However, for all datasets, the neural networks tend to prefer shallow but broad architectures comprising of only a few layers with each layer containing a larger number of neurons. More details on the procedure and results for the hyperparameter optimization can be found in the ESI Section F.†
Overall we find a high accuracy of our predictions and small average relative errors on the test sets which are in the range between 3.35% for RC (PCA selected training set) and 13.59% for the largest considered exciton system CP47 attached to RC (random selected training set). The CP47 dataset exhibits the most diverse transfer properties (see Fig. 2), which explains the larger average relative absolute errors in the predictions when compared to the other datasets. Prediction accuracies for exciton Hamiltonians with permuted rows and columns are reported in the ESI Section G.† We find prediction accuracies similar to those achieved on the test sets for Hamiltonians with permutations not involving the source or target sites. The observed prediction errors are also consistent with the distance distributions of Frenkel exciton Hamiltonians for each of the four datasets (see ESI Section C†), which indicates that MLP models generally benefit from a finer sampling of the input parameter space.
The accuracy of the predictions can be enhanced by a more sophisticated PCA selection of the training set without the need of generating additional computationally expensive data points. The level of improvement of the PCA selection over a random selection of the training set differs for the four complexes. In general, we find that MLPs can be trained almost equally accurate with either selection method. The highest benefit of the PCA selected training set is obtained for the FMO and CP47 dataset, which are not only the most diverse ones out of our four datasets but are biased towards Hamiltonians showing fast transfer. As intuitively expected, selecting training points based on PCA is most advantageous for datasets with an extremely unevenly sampled feature space.
For example in the case of the biological exciton Hamiltonian of the FMO complex, HEOM reveals a transfer time of 7.95 ps. The trained MLP model predicts a transfer time of 7.52 ps which is slightly more accurate than secular Redfield calculations that result in 7.48 ps. Exciton transfer times obtained for all four biological complexes with all three approaches are reported in the ESI Table I.† However, while the MLP prediction takes about 5 ms, secular Redfield calculations took about 14.5 min on a single CPU (computation times are listed in the ESI in Table II†). We conclude that our trained MLP predictions are competitive to secular Redfield calculations in terms of their accuracy, but (once trained) come at a significantly reduced computational cost. Computational costs for all three approaches are summarized in the ESI Section D.†
Besides analyzing the accuracy in terms of averaging over all realizations in the datasets, we compare the relative errors in transfer time for secular Redfield and the MLP predictions in more detail on the level of individual Hamiltonians. Fig. 4 depicts scatter plots where the horizontal axes measure the accuracy of secular Redfield calculations and the vertical axes reflect the accuracy of MLP predictions for MLPs trained on the PCA selected datasets. We do not distinguish between training, validation, and test set and show the complete dataset. Almost all the Hamiltonians show a ΔtRedfield = (tHEOM − tRedfield)/tHEOM > 0, which demonstrates that secular Redfield systematically underestimates transfer time scales. On the other hand, the predictions under-as well as overestimate transfer time-scales yielding a more symmetrical distribution along the horizontal axis. For the RC (FMO) dataset, more than 95% (80%) of the Hamiltonians fall into regions marked as green, for which the neural networks provide higher accuracy than secular Redfield. For all other datasets, secular Redfield and the MLP predictions are equally likely to give better results, with about 59% (57%) of the Hamiltonians for CP43 (CP47) falling within the green shaded region. This is in agreement with our average relative absolute errors listed in Table 2. We did not observe any cases for which the MLPs show relative errors that significantly exceeded any of the secular Redfield ones.
The presented approach is of particular interest for large-scale analyses of the structure–transport relationship in excitonic systems. An area of great interest in excitonics is the study of the dynamics of charge dissociation at the interface present in bulk heterojunction photovoltaics.65,66 We believe a tool like this will help in the rapid screening of material properties in the mesoscale and therefore help the search for high-performance OPV systems.67
Once trained, evaluations of MLP models come at almost no additional cost. Our four generated MLP architectures (each optimized for one of the four datasets) predict transfer times for an aggregated set of 48000 exciton Hamiltonians just within a few seconds, while the corresponding quantum dynamics simulations take several GPU (CPU) years for the HEOM (secular Redfield) calculations. Our trained MLP models extend well to out-of-sample predictions for exciton Hamiltonians that are close to the sampled parameter regime. However, to employ MLPs on parameter regimes beyond those probed in the existing database requires running computationally expensive exciton dynamics for a few thousand Hamiltonians in order to extend our training set. To avoid this bottleneck a potential strategy could be to leverage already existing data, e.g. produced by a user community of existing software packages such as QMaster. However such data can be quite diverse. To this end, future research needs to focus on novel more general neural network architectures that accurately predict transfer times for flexible spectral density parameters as well as for differently sized exciton systems.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc03542j |
This journal is © The Royal Society of Chemistry 2017 |