Towards an ab initio description of the optical spectra of light-harvesting antennae: application to the CP29 complex of photosystem II.

Light-harvesting pigment-protein complexes (PPC) represent the fundamental units through which the photosynthetic organisms absorb sunlight and funnel the energy to the reaction centre for carrying out the primary energy conversion reactions of photosynthesis. Here we apply a multiscale computational strategy to a specific PPC present in the photosystem II of plants and algae (CP29) to investigate in what detail should the environment effects due to protein and membrane/solvent be included for an accurate description of optical spectra. We find that a refinement of the crystal structure is needed before any meaningful quantum chemical calculations of pigment transition energies can be performed. For this purpose we apply classical molecular dynamics simulations of the PPC within its natural environment and we perform ab initio computations of the exciton Hamiltonian of the complex, including the environment either implicitly by the polarizable continuum model (PCM) or explicitly using the polarizable QM/MM methodology (MMPol). However, PCM essentially leads to an unspecific redshift of all transition energies, and MMPol is able to reveal site-specific changes in the optical properties of the pigments. Based on the latter and the excitonic couplings obtained within a polarizable QM/MM methodology, optical spectra are calculated, which are in good qualitative agreement with experimental data. A weakness of the approach is however found in the overestimation of the fluctuations of the excitonic parameters of the pigments along the MD trajectory. An explanation for such a finding in terms of the limits of the force fields commonly used for protein cofactors is presented and discussed.


Introduction
Sunlight is the main source of energy for most of the living organisms on earth, and the capability of converting its energy into a useful chemical form has allowed the first photosynthetic bacteria to survive in the initially anaerobic atmosphere of earth, and in the next billion years to evolve into cyano, red, brown, green bacteria, algae, plants and higher-plants. 1 In the course of this long evolutionary development, different apparatusas for carrying out photosynthesis have been adopted by nature. For the oxygenic process two photosystems, PSI and PSII, 2,3 are responsible for the catalysis of the photosynthetic electron transfer taking place in their reaction centres, assisted by secondary complexes 4-6 acting as efficient light-harvesting and photo-protecting centres. Nowadays, the roles of the individual complexes composing the photosynthetic apparatus are being elucidated primarily due to the improvements in X-ray diffraction techniques which have provided the atomic resolution required to accurately resolve their structural properties. [7][8][9][10][11] As a result, a more detailed analysis of the light-harvesting and excitation transfer dynamics has been made possible by combining the structural information with mutation analysis, [12][13][14][15] 2D electronic spectroscopy, [16][17][18][19][20] and theoretical studies. [21][22][23][24] In particular, these latter have proven to be an important tool in predicting site energy landscapes, excitonic couplings, environment effects, and energy transfer rates. 21,[23][24][25][26] In most cases, however, the predictive power of these theoretical studies is still hampered by the fact of being limited to the crystallographic structures, making the investigation largely susceptible to the quality of the structural conformations of the cofactors and the environment, which may be far from the real ones under ambient conditions. Different strategies exist to solve this problem. The simplest solution is given by an electrostatic treatment using quantum chemical calculations of excited and ground states of geometry a Dipartimento di Chimica e Chimica Industriale, University of Pisa, Via G. Moruzzi  optimized chromophores in vacuo to derive atomic partial charges used afterwards in electrostatic computations including the whole pigment-protein complex (for a recent review see ref. 27). Since the electrostatic calculations are not sensitive to slight distortions of the cofactors such an approach can provide a qualitative picture of the influence of the environment on site energies, excitonic couplings and spectral densities. However, important contributions, such as the influence of environmentally induced changes of the conformations of the chromophores on their electronic structure and short-range effects on site energies and excitonic couplings are missing. An alternative strategy considering these effects, is given by a quantum chemical geometry optimization of the whole pigment-protein complex, as used by Yin et al. 28 to calculate the exciton Hamiltonian of PS I core complexes. However, also in the latter approach there is room for improvement (for a comparison of the optical spectra of PSI obtained using the two approaches, see ref. 26).
In the present work we use an approach of intermediate complexity [29][30][31] and apply it to the CP29 complex, one of the three minor chlorophyll a/b-binding proteins associated with PSII in plants and algae. CP29 is a monomeric system containing nine chlorophylls a (Chl a), four chlorophylls b (Chl b), and three carotenoids. These 13 Chls densely packed within the protein matrix embedded in the lipid bilayer are responsible for the light-harvesting process through their low energy p-p* transitions (known as Q y ) (see Fig. 1).
We first explore the impact of different polarizable QM/classical strategies in the treatment of the crystallographic structures considering implicit and explicit environment models for the prediction of the excitonic parameters. The results are successively compared to those obtained combining the same QM/classical strategies with a statistical analysis based on a molecular dynamics (MD) simulation of the trans-membrane conditions. The excitonic parameters obtained from the static crystallographic structure and those averaged from the MD configurations are comparatively used to reproduce the experimental steady state optical spectra. The comparison is finally used to achieve a molecular-level interpretation of the nature of the excitonic states of CP29, to dissect the role of the fluctuations of geometrical parameters internal to the individual pigments as well as to the external environment and to quantify how the latter are reflected in the electrostatic and polarization effects due to the environment.

Structure and MD simulation
The high-resolution X-ray structure of light-harvesting CP29 from spinach (pdb entry: 3PL9, res. 2.80 Å) recently resolved by Pan et al. 9 was used. In this structure, the first 1-87 amino acid residues have been lost during the purification procedure, so the first residue (Gly88) has been capped with an acetyl terminal group (ACE). Only the six water molecules which are involved in the axial ligation of Chls have been retained. Specifically, water molecules are axial ligands of Chls a604, b606, b607, and b608, the axial ligand of Chl b606 forms hydrogen bonds with the side chain carboxyl group of Glu 151 and with another water molecule and the axial ligand of Chl b608 forms a hydrogen bond with another water molecule. The protein protonation pattern established at 210 K by Müh et al. 21 has been used. All the titratable residues are in their standard protonation states, except for Glu128, Glu151 and His235, which are protonated. All the other histidine residues are in e-configuration, except for His 99, 200 and 229 that are in the d-configuration to allow the interaction with the Mg atom of the pigments. We note that the ratio R ab = 2.25 between Chl a and Chl b resulting from the assignment of the crystal structure contrasts with the R ab = 2.3 to 3.4 reported in the literature. [32][33][34][35][36][37] From the comparison of quantum chemical/electrostatic calculations and experiments, 21 it was found that the assignment of chlorophylls in the crystal structure is not in contradiction to the experimental data, but it cannot be excluded that up to one Chl b is missing in the solubilized complex. As a possible candidate Chl b614 was discussed: in fact being located at the C-terminus and strongly solvent exposed, this chlorophyll might get lost more easily during the isolation procedure.
The positions of the hydrogen atoms have been minimized using the Amber12 suite of programs. 38 The ff99SB force-field 39 has been used for the protein, lipid11 40 for the lipids and ad hoc parameters for the cofactors. 41,42 In more detail, for Chl a the set of parameters reported in ref. 42 was used while for Chl b the missing parameters were taken from the generalized AMBER force field. 43 For the three carotenoids (lutein, violaxanthin and neoxanthin) a new set of force field parameters was determined following a similar procedure to that used in ref. 41 for bacteriochlorophyll. To be consistent with previous structural analysis of the crystalline structure, 21 in all calculations using such a structure we have included the G3P molecule identified as the axial ligand for the two facing Chl a611 and a615.
To reproduce the in vivo architecture, the CP29 complex was embedded into a double layer phospholipidic membrane constructed with 450 DOPC (1,2-dioleoyl-sn-glycero-3-phosphatidylcholine) lipid molecules (225 lipids per layer) and a solvation water layer generated with 30 Å thickness (see Fig. 1C) using the CHARMM-GUI 44 website tool. Three potassium ions have been added to neutralize the system CP29 charged-3. The MD simulation was performed in three main steps: (I) a minimization was performed to relax the system and avoid close contacts; (II) the system was slowly heated up to 300 K in a 100 ps simulation using positional restraints with a harmonic potential of 10.0 kcal mol À2 Å À2 to constrain lipids, protein and cofactors positions; (III) the constraints were removed and a 80 ns simulation was performed in a NPT ensemble with an integration step set of 2 fs. Temperature and pressure were regulated using the Langevin thermostat and the anisotropic barostat as implemented in Amber12 was used. Water molecules were treated using the TIP3P model, and the O-H distances and H-O-H angle, as well as the other bonds involving hydrogen in the system, were constrained using the SHAKE algorithm. 45 Periodic boundary conditions were applied together with the Particle Mesh Ewald approach to deal with long-range electrostatics and a non-bonded cutoff equal to 10 Å.

Quantum chemical calculations
QM calculations were performed over 400 uncorrelated configurations extracted every 50 ps from the last 20 ns of the trajectory. All the QM calculations have been performed using a locally modified version of the Gaussian09 46 package. Transition properties of the pigments have been computed using a TD-DFT scheme with the hybrid exchange correlation functional CAM-B3LYP 47 and 6-31G(d) basis set. Only Q y excitations have been considered in the analysis. In order to reduce the computational cost we worked with a truncated QM model for the chlorophylls in which only the atoms of the ring have been described at the QM level. The tail has been cut at the C1-C2 bond and the dangling bond has been saturated with a hydrogen atom (the tail atoms still remain MMPol sites). QM calculations have been performed both in vacuo (VAC) and including the environmental effects using implicit and explicit approaches by considering the polarizable continuum model (PCM) 48 and the polarizable QM/MM methodology (MMPol), 29,30 respectively. In the QM/ MMPol description, all the protein atoms and membrane lipids and water within 45 Å of each QM site were included in the MM region. A radius of 12 Å was used for the polarization cutoff.
In addition to configurations extracted from MD, the crystalline structure has also been used. In the latter case all the atoms in the system (except for the QM atoms) were included in the MMPol region. The PCM calculations were performed using the united atom cavity built using the g03defaults parameters; effective dielectric parameters have been employed to represent the protein/lipid/water environment (e = 15.0, e opt = 2.0). 49 MMPol atoms were described using the recent charge and polarizability parameters derived by Wang and co-workers in the context of the Amber force field. 50,51 In particular, the parameter set based on Thole's linear smeared dipole field tensor was used, in which 1-2 and 1-3 interactions are excluded.
Schematic representations of the different QM/classical approaches used in this work are reported in Fig. 2.
The excitonic couplings between pigments were computed using their TD-DFT transition densitiesr j (r 0 ), 52-54 namely: where the integral is performed using the standard numerical integration methods used in DFT. The expression (1) for large intermolecular distances has a physical correspondence with the point-dipole approximation (PDA) 55 where the electronic transition dipole moments are considered instead of the full densities. When MMPol sites are included in the classical region, an additional term arises (V MMPol ), which describes the interaction between the transition density of the chromophore i with the MMPol dipoles induced by the excitation in the chromophore j: 52,53

Excitonic calculations and steady state spectra
The interactions among pigments determine a delocalization of the excited states over different sites. The resulting excitonic states can be determined by diagonalizing the exciton Hamiltonian: where e i is the site energy of the pigment i and V ij 's are the couplings between the transitions of pigments i and j. The resulting delocalized excitonic states |ki can be written as a linear combination of the localized excited states: where the expansion coefficients c (k) i represent the contribution of each individual pigment i to the specific excitonic state k.
Once solved, the excitonic Hamiltonian (3), the linear absorbance, linear dichroism (LD) and circular dichroism (CD) spectra are obtained from eqn (5)-(7) respectively: Here,l k is the transition dipole moment between the ground state and the kth exciton state, y k is its angle with respect to the membrane normal, and the rotational strength of the exciton transition containing the local transition dipole momentsm m of pigments m = i,j that are centered at positions -R i and -R j , respectively. The local transition dipole moments were assumed to be oriented along the NB-ND axis (in pdb nomenclature) of the Chls. Recent experimental estimates for the angle b between the Q y transition dipole moment of Chl a and the NB-ND axis are inferred from timeresolved anisotropy measurements and from femtosecond polarization resolved visible pump-infrared probe spectroscopy range from b = À121 to b = 4.51 (where a positive angle refers to a rotation in the direction of the 13 1 ketogroup). 56,57 The spectra, in particular the linear dichroism spectrum, depend somewhat on the angle b. The variation of the simulated spectra with respect to b could in principle be used together with experiments in order to decide which angle is most realistic. From our analysis (Fig. S1, ESI †) the 0-101 angles seem to give a better agreement but the inaccuracy of the other parameters also surely plays a role and thus a more detailed study should be carried out in order to have a definitive answer.
The ratio of magnitudes of transition dipoles of Chl a and Chl b of 5.47/4.61 was estimated, based on the analysis of the dipole strength of these pigments in different solvents, 58 assuming an average refractive index of 2. The lineshape function D k (o) takes into account vibrational sidebands and lifetime-broadening due to exciton relaxation. It reads 59 and contains the time-dependent function which is related to the spectral density J(o) of the excitonvibrational coupling, the Bose-Einstein distribution function n(o) = (exp( ho/k B T) À 1) À1 of vibrational quanta, and an electronic prefactor g kk , which, for uncorrelated fluctuations of site energies, is the The lifetime broadening is described by the dephasing time t k in eqn (8), which is obtained from the Redfield relaxation Please note that J(o) = 0 for o o 0 and o kl = (e k À e l )/ h is the transition frequency between the exciton states k and l. The frequencyõ k in the lineshape function in eqn (8) is obtained from the exciton energy e k and takes into account a renormalization of this energy by the exciton-vibrational coupling: In the above equation P denotes the principal value of the integral.
We use the spectral density J(o) = SJ 0 (o) that contains the normalized function J 0 ðoÞ ¼ 1 with the parameters s 1 = 0.8, s 2 = 0.5, ho 1 = 0.069 meV, ho 2 = 0.24 meV, as extracted from the fluorescence line narrowing spectra of B777-complexes. 59 Although the B777-complex contains a bacteriochlorophyll pigment, this spectral density has been successfully applied to a large number of pigment-protein complexes, including those containing Chl a and/or Chl b. 21,22,26,60,61 Note that this spectral density does not contain high-frequency intramolecular modes of the pigments. The latter may be obtained, e.g., from the fluorescence line narrowing spectra. 62,63 Since the respective Huang-Rhys factors are very small, it can be expected that exciton delocalization involving excited vibronic states is different from that of 0-0 transitions. To include dynamic localization effects of exciton-vibrational states would require a nonperturbative description. Alternatively, an implicit treatment might be possible. 21 At the present level of site energy calculations we consider such a refinement of the theory of optical spectra as premature.
The Huang-Rhys factor S was obtained from a fit of the temperature dependence of the linear spectra as S = 0.5. 22 Finally, we note that the homogeneous spectra described above are averaged with respect to a static disorder in site energies, assuming a Gaussian distribution function of the same width D inh = 130 cm À1 (FWHM), as obtained from a fit of the spectra.
Since the selected level of calculation (TDCAM-B3LYP) overestimates the magnitude of the transition dipole moments and the transition energies with respect to the experimental values, 58 two global adjustments of the calculated excitonic parameters have been introduced: (1) the site energies (e.g. the diagonal elements of the exciton Hamiltonian) were red shifted by the same amount (980 cm À1 ) and (2) the couplings have been rescaled by the same factor obtained assuming a linear dependence of the couplings on the square of the transition dipole moments. Namely, the scaling factor is obtained as the ratio: where m a,b,exp and m a,b,calc are the experimental and the computed transition dipoles for Chl a and Chl b, respectively.

Simulation of excitonic parameters
For the sake of clarity we first report a detailed investigation on site energies and couplings by comparing different models either including or neglecting environment effects.

Site energies
3.1.1 Effect of the distortions of the crystal structure in various treatments. Although the Chl a and b share a common porphyrin-like structure, the Chl b differs from the Chl a by a formyl substituent in the C7 position, instead of the methyl group. We thus expect that this structural difference could lead to different absorption properties. In particular, the absorption maximum for the 0-0 Q y transition is displaced by 13 nm in a solvent-free state (around 647 and 624 nm for Chl a and Chl b, respectively) 64 and the extrapolated vacuum dipole strength differs by 6.3 D 2 (21.0 and 14.7 D 2 for Chl a and Chl b, respectively). 58 Within the protein, further geometrical and electronic distortions of each pigment may occur due to its confinement in a specific binding pocket of the protein and the electrostatic interactions of the pigment with its local environments. These structural and electrostatic effects are coupled but, in the calculations, they can be assessed separately by artificially ''switching-off'' one of the two. In the following we apply such a strategy by first focusing on the structural effects only (by exploring the site energies the pigments ''in vacuo'', i.e. neglecting the environment), and successively by adding the environmental effects through different QM/classical models. Fig. 3 reports the site energies computed in vacuo using the crystalline structure (VAC@CRY) and those averaged over the MD configurations (VAC@MD).
As expected, the two sets of Chls (a and b) form two distinct groups with an average difference of about 0.06 eV for both models. This difference is in good agreement with that obtained experimentally, 0.07 eV for the solvent-free estimation. This comparison shows that the ''intrinsic'' error of the QM level here used affects in the same way as the two types of chlorophylls. Thanks to that, in the following simulation of the spectra, an equal shift to the red will be used for all the site energies without the need for introducing different corrections for Chl a and Chl b.
The energy spread in the VAC@CRY data is larger than in the VAC@MD case, indicating that the distortions of the pigment's macrocycles found in the crystal are reduced when an average picture as obtained from MD is used, thus suggesting that the statistical fluctuations of the environment act to ''homogenise'' the pigment structures.
The interaction between pigments and the environment however can explicitly modulate the site energies, thus tuning the energy landscape. Here two alternative models have been tested for describing these environment effects, namely the continuum (PCM) and the explicit (MMPol) models. In PCM the Chl is embedded into a molecule-shaped cavity surrounded by a dielectric medium described by an effective dielectric constant (see Section 1.2 and Fig. 2a). In MMPol the protein/ membrane/solvent atoms are described as point charges and induced dipoles (MMPol) allowing the inclusion of both specific interactions and the anisotropy of the environment. The site energies calculated with the two models on the crystal structure (CRY) are reported in Fig. 3.
As expected for a p-p* transition, both environment models lead to a red-shift of all the site energies. When using a PCM description, a similar shift is found for all Chls of each group, namely ca. 0.05 eV for Chl a and 0.04 eV for Chl b. When MMPol is used in combination with the crystal structure (MMPol@CRY) instead, a larger red-shift is found for the a604 and b608 pigments. At first glance this result looks like site specific changes that might be relevant for the functioning of the complex. However, by comparing the MMPol@CRY with the MMPol@MD results, obtained by applying the MMPol methodology on the MD structures it becomes obvious that those energy sinks are artificial (see the ESI † for a more detailed explanation). The MD simulations seem to suitably release the distortions of the crystal structure and allow obtaining more realistic site energies (as will be proven further below by comparing the resulting optical spectra with experimental data). Therefore, in the following we will concentrate on the analysis of the MMPol@MD results.
3.1.2 Site specific shifts obtained from MMPol@MD. By comparing the MMPol@MD results with those computed in vacuo (VAC@MD results in Fig. 3) we observe an average red-shift of 0.05 eV for Chl a and 0.03 eV for Chl b in agreement with what found for PCM. Contrary to the latter, instead, MMPol environmental effects also modify the energy ladder found in a vacuum as the red-shift is not the same for all the pigments. These differential effects are quantified in Fig. 4 where we plot the relative difference between MMPol and VAC site energies, DE rel . To allow a more immediate evaluation instead of reporting the total energy shift (that, as said, is a red-shift for all pigments) we have introduced an ''effective difference'', namely we have corrected the total VAC-to-MMPol shift of the site energy of the individual pigments with the average energy shift of the corresponding pigment type (either Chl a or b), namely: (Chl a or b) (13) According to this definition, positive DE rel values indicate a small environmental effect (i.e. smaller than the average) while negative values indicate large effects.
The results reported in Fig. 4 show that pigment a609 shows the largest stabilization due to the environment while a615 and b606 are the most blue-shifted pigments. The large shift observed for a609 can be explained in terms of the effects due the axially coordinated Glu159.
Moving to the distributions of the site energies, we note that MMPol@MD and MMPol@VAC show similar standard deviations, namely 0.066 eV and 0.060 eV, respectively (see Fig. S2-S14, ESI †). The similar deviations obtained with and without the environment indicate that the main origin of site energy fluctuations along the MD trajectory is the geometrical distortion of the pigment structure, which masks other contributions like the environment electrostatics and polarization. The amplitudes of these distortions are overestimated as inferred below from the analysis of the optical spectra. The reasons for these too large geometrical fluctuations of the pigment structure are most probably related to the use of classical force fields which have not been optimized to correctly reproduce the temperature dependent amplitude of these oscillations. 65

Excitonic couplings
In Fig. 5 we report a schematic representation of the QM/MMPol couplings for the different Chl pairs averaged over the MD configurations (see eqn (1) and (2)).  (13)).
The analysis of Fig. 5 reveals that the Chls at the stromal layer are connected by a complex network of couplings with the most intense ones (on the order of 150-180 cm À1 ) being in the Chl pairs a611-a612, a611-a615 and a603-a609. In the lumenal layer, pigments form three distinct domains: a trimer (a604-b606-b607) with a strong coupled pair a604-b606 (120 cm À1 ), a medium coupled pair, a604-b607 (35 cm À1 ), and a weakly coupled (20 cm À1 ) heterodimer, a613-b614. The pigments at the stromal and lumen regions are not strongly coupled, with the largest value found in the heterotrimer a604-b606-b607. This picture is in agreement with previous theoretical studies on CP29 where the excitonic couplings were computed using the transition charges from the electrostatic potential method (TrEsp), 43 and the Poisson-TrEsp method. 20 However, the absolute values estimated in this work are almost twice as large. This discrepancy can be mainly ascribed to the magnitude of the CAM-B3LYP transition dipole moments that are larger than the experimental values by ca. 35% (see Table S1, ESI †).
In the CP29, pigment pairs can exhibit small inter-pigment distances (around 10 Å, see Table S3, ESI †); we thus expect that the point-dipole-approximation (PDA) of the couplings will fail, especially in pairs of close by Chls. This expectation is indeed confirmed in the left panel of Fig. 6, which reports the interpigment distance dependence of the ratio between the PDA and the Coulomb component of the QM coupling (see eqn (1)).
For large distances the PDA and the QM approach predict similar values, whereas at distances shorter than 15 Å we observe large deviations using PDA couplings that are even two times larger than the QM ones.
A further interesting analysis of the QM/MMPol results is obtained in terms of the screening effect of the environment on the different pairs. To quantify it, we introduce a hypothetical effective dielectric constant (e eff ), which accounts for the ''local'' polarity felt by the single pairs. Following the same strategy as that commonly used in the PDA to account for the solvent screening e eff can be defined as the ratio between the Coulomb component of the coupling (eqn (1)) and the total value (i.e. including the MMPol term reported in eqn (2)), namely: We computed the e eff values for all the pairs with a V Cou 4 10 cm À1 : these results are reported in Fig. 6 for the QM/MMPOL@MD configurations. From the data reported in the figure it is evident that the ''effective screening'' varies significantly from one pair to the other (e eff ranges from 1.3 to 2.1); this variation however cannot be simply correlated with the inter-pigment distance but clearly depends on the specific residues present in the neighborhood of the two interacting pigments. When moving to an average analysis, we see that he eff i is 1.67, which is lower than the values previously obtained for FMO 31 (he eff i = 1.70) and PE545 29 (he eff i = 1.82). To explain this finding we have simulated the optical dielectric constant of the MMPol composite environment using the expression:  where a i is the isotropic MMPol polarizability of the site i and V is the molecular volume of the MMPol system estimated by constructing a van der Waals surface from unscaled Bondi atomic radii. The obtained value (e opt = 2.07) is indeed lower than that previously obtained using the same procedure as that for the PE545 complex (e opt = 2.27). 29 This shows that the environment surrounding the pigments in CP29 is less polarizable with respect to the other PPC probably due to the presence of the lipidic region of the membrane.

Simulation of optical spectra
The site energies and couplings presented and discussed in the previous section are used here to simulate the optical spectra and compare with experiments. This analysis is reported for both MMPol@CRY and MMPol@MD sets of data to explicitly show the sensitivity of the spectra to the excitonic parameters. The full excitonic Hamiltonian for MMPol@CRY and MMPol@MD data are reported in Tables S4 and S5 (ESI †), respectively. The absorbance, linear and circular dichroism spectra obtained from the two descriptions are reported in Fig. 7 and 8 and are compared with experimental ones at different temperatures.
As can be seen from Fig. 7 and 8, the spectra in which excitonic parameters calculated on the crystal structure are used significantly disagree with the experiment, whereas those obtained from the MD data are in good qualitative agreement with the experimental absorbance, linear dichroism and circular dichroism data.
From the comparison of the two sets of spectra we conclude that site energies and couplings obtained using the crystal structure data in combination with QM/MMPol calculations are not accurate enough because of the possible artificial distortions of the pigments and the unphysical interactions between some pigments and close-by residues present in the crystalline structure. When an MD simulation of the system in its natural environment (here membrane and water) is used, instead, these artefacts are released and more realistic average site energies and couplings are obtained. Nevertheless, some deviations between the calculated and measured spectra in the high energy (short wavelength) Chl b region still remain. In the low temperature (13 K) absorption spectrum in Fig. 9 two separate bands are visible in the experiment but only one in the calculations. It seems that our calculations miss some contributions to site energy shifts, e.g. those resulting from the dispersive interactions. 66 In the electrostatic calculations of site energies of Müh et al., 21 b608 was found to be red shifted, but still one more Chl b (b607) had to be red shifted in a refinement fit, in order to describe the high energy side of the linear spectra: hence, at least for the latter, it remains open as to what the mechanism of the red shift is. Concerning the CD spectra there is a good qualitative agreement, but the present exciton theory can only describe conservative CD spectra for which the integral over the spectrum is zero. Obviously, in the case of CP29 there are additional negative contributions to the CD signal, as, e.g. the intrinsic CD of the pigments which was not included in the present analysis.
As a last note we observe that all the simulated spectra have used a fixed value for the static disorder as well as a homogeneous broadening based on a model spectral density (see the Methods section). As a matter of fact, the band shape could be directly obtained from the combined MD-QM/MMpol strategy using the fluctuations of the excitonic parameters as determined by the intra-and intermolecular vibrations (homogeneous broadening) and the averaging effect due to a distribution of different structural environments available to the system (inhomogeneous broadening). An accurate determination of these two broadenings, however, would require very long simulation times as well an optimized force field of the pigments which could accurately sample the modulation of exciton parameters by the vibrations (spectral density). However, the already mentioned overestimation of the site energy fluctuations obtained in the present MD simulation prevents the quantitative use of the simulated broadening (see also Fig. S15, ESI †).

Analysis of the excitonic states
After having analysed the accuracy of the calculated excitonic parameters when used to simulate optical spectra, here we present a discussion on the nature of the excitonic states and their composition in terms of pigment contributions. For this analysis we use the averaged QM/MMPol data on MD configurations, which yielded the best description of the optical spectra, as discussed above.
In Fig. 9 we report the excitonic energy ladder together with a graphical representation of the composition of each exciton state in terms of pigment contributions. In particular, the contribution of the pigment i to the specific excitonic state (k) has been quantified using the ''probability coefficient'', i.e. the square of the expansion coefficient, |c (k) i | 2 , reported in eqn (4). From the visual analysis of Fig. 9, the excitonic states can be divided into two main groups: (i) the low energy states (S 1 -S 9 ) delocalized over Chl a pigments and showing an energy progression in the range of 0.06 eV and the (ii) the high energy  states (S 10 -S 13 ) involving only Chl b and being closer in energy (in the range of 0.01 eV). More in detail, the lowest energy state is delocalized over pigments a609, a603 and a602 (stromal side of the membrane). Instead, the two highest energy states involve the chlorophylls b606 and b614 that face the luminal side of the membrane. Chlorophyll b608, which is the only Chl b on the luminal side, thus well separated from the other three Chls b, is mainly involved in the state S 11 . In general we observe that most of the excitonic states are delocalized over pigments that face the same side of the membrane. The only two exceptions are states S 4 and S 5 which have a composition of 80-20% from pigments facing different sides. The pigment which contributes most to the lowest excitonic state is a609 whereas the next exciton state is dominated by the a611-a612 dimer. This pair has been found to be responsible for the low-energy states also by Müh et al. 21 and Feng et al., 37 whereas the present suggestion of a609 as representing a low energy site is new and should be further evaluated in future work. The third lowest exciton state in our calculations is dominated by a604, which has been shown by site directed mutagenesis studies 67 and electrostatic calculations 21 to represent a low energy site in CP29. The composition reported in Fig. 7 represents an average picture; we have, however, verified that the main features are preserved along the whole trajectory: this is shown in Fig. S16 (ESI †) where we report the evolution of the single pigments contributions along the MD trajectory for three selected excitonic states, namely the lowest (S 1 ), the highest (S 13 ) and an intermediate one (S 5 ).
There are other interesting questions concerning the role of correlations in fluctuations among different elements of the exciton Hamiltonian. We note that correlations have been reported to be insignificant in other light harvesting complexes such as FMO 68 and PE545, 69 using a similar methodology to that applied here. On the other hand, strong correlations in site energy fluctuations of different pigments were found 70 in a normal mode analysis of the spectral density of the FMO protein at low frequencies, which are difficult to access using the present type of simulations, because very long simulation times would be required. To accurately quantify the possible consequences of such correlations is an interesting future goal as controversial results have been found so far: their relevance for excitation energy transfer has been reported to be low for the FMO complex 70 but significant effects on both the coherences and the transfer rates have been shown in model systems. 71,72

Summary
Quantum chemistry based multiscale models have been shown to be important in the molecular level understanding of the mechanisms acting in light-harvesting complexes. In most cases, however, their predictive power is still hampered by the fact of being limited to the crystallographic structures. In this study of CP29 we have shown that indeed the crystal structure contains serious geometrical distortions that are not compatible with quantum chemical calculations on protein bound cofactors. We have found that classical molecular dynamics simulations can be used in order to both release the artificial strain of the crystal structure and suitably include the effects of the natural environment. The excitonic parameters obtained from the combination of the MD simulation with a polarizable QM/MM description of the full ''solvated'' system have shown to correctly reproduce the experimental optical spectra indicating that the methodology is capable of giving an accurate average picture. However, the fluctuations of the same properties obtained along the MD trajectory are found to be too large due to an overestimation of the intramolecular motions of the individual pigments obtained using the classical MD simulation. This finding clearly shows that the combination of classical MD and QM/MM calculations cannot be used for a really quantitative investigation of the important coupling between electronic excitations and vibrational modes without a refinement of the classical MD being introduced. Such a refinement mainly involves the force fields used for the pigments which have to be obtained with a completely new parameterization strategy where not only equilibrium geometries are to be well reproduced but also vibronic features as predicted by QM methods need to be considered as fitting data. Work in this direction is already in progress in our group for the most common pigments present in light-harvesting proteins.