Multi-scale modeling of electronic spectra of three aromatic amino acids: importance of conformational averaging and explicit solute–solvent interactions

Electronic transitions in the ultraviolet and visible spectral range can reveal a wealth of information about biomolecular geometry and interactions, such as those involved in protein folding. However, the modeling that provides the necessary link between spectral shapes and the structure is often diﬃcult even for seemingly simple systems. To understand as to how conformational equilibria and solute–solvent interaction influence spectral intensities, we collected absorption (UV-vis), electronic circular dichroism (ECD), and magnetic circular dichroism (MCD) spectra of phenylalanine (Phe), tyrosine (Tyr) and tryptophan (Trp) zwitterions in aqueous solutions, and compared them with quantum-chemical simulations. These aromatic amino acids provide a relatively strong signal in the accessible wavelength range. At the same time, they allow for a relatively accurate modeling. Energies and intensities of spectral bands were reproduced by the time-dependent density functional theory (TD DFT). The solvent was approximated by a continuum as well as clusters containing solvent molecules from the first hydration sphere. The ECD signal was found to be strongly dependent on molecular conformation, and the dependence was much weaker in UV-vis and MCD spectra. All spectral intensities, however, were significantly affected by the solvent approximation; especially for ECD and MCD the usual polarizable continuum solvent model did not yield satisfactory spectral shapes. On the other hand, averaging of the clusters obtained from molecular dynamics simulations provided an unprecedented agreement with the experiment. Proper modeling of the interactions with the environment thus makes the information about the molecular structure, as obtained from the electronic spectra, more accurate and reliable.


Introduction
Biological functions of peptides and proteins are determined by their structure and interactions, which can be conveniently studied by optical spectroscopic methods.They are relatively cheap, simple, and enable a variety of experimental conditions.In particular, the electronic circular dichroism (ECD, detecting the difference in absorption of left-and right-circularly polarized light) proved to be a very sensitive probe of peptides' secondary structure and molecular conformation in general. 1,2In the last decade, the possibility to reliably model [3][4][5][6][7] the spectra renewed interest also in the magnetic circular dichroism (MCD).This technique measures the dichroism induced by the magnetic field parallel to the light beam; unlike natural ECD it is exhibited by both chiral and non-chiral molecules.
Most conveniently, the spectra involving electronic transitions are modeled using the time-dependent density functional theory (TD DFT), accessible for relatively sizable systems. 8,91][12][13][14][15][16][17][18] Semiempirical models already indicated that a large number of geometries were needed to reproduce peptide ECD. 19The need to average over many geometries, to account for the molecular motion, makes quantum-mechanical computations demanding in terms of computer time and other resources. 202][23][24] Practically, the computations are not directly applicable either to interpret experimental spectra of larger molecules, or to smaller systems such as single amino acids.][27][28] In the present study, we measured, modeled and analyzed the absorption, ECD and MCD of three aromatic amino acids, This journal is © the Owner Societies 2014 Phe, Tyr and Trp, aiming to better understand the relationship between the spectrum and the structure, including the effects of the environmental factors.The very presence of the aromatic residue lends the molecules a relatively strong signal in the accessible wavelength range.Although the zwitterions do not possess amide groups, their polarity is relevant to the strongest solute-solvent interactions in peptides as well.][31][32][33][34] In particular, factors important for the protein MCD signal have been explored only sporadically so far.1][42] As shown below, these results suggest that the sensitivity of MCD to environmental factors and the conformation is somewhere between the absorption and ECD spectroscopies.
The solvent studies are enhanced by the latest quantummechanical codes that made it possible to compute absorption, ECD and MCD spectral intensities relatively quickly, [3][4][5][6][7] thus paving the way for explicitly including the solvent in the computation.In the spirit of the ''multi-scale'' (QM/MM) approach we average the spectra simulated for solute-solvent clusters obtained from molecular dynamics (MD). 43nlike for ECD, the dependence of the MCD signal on the conformation has not been satisfactorily addressed yet, as far as we are aware.][47] Computations of both spectral intensities thus involve similar operators, and their relative conformational sensitivity is not apparent beforehand.As shown below, for our systems, MCD is not that sensitive to the conformation, but both techniques do sensitively reflect the molecular environment.

Spectral measurement
Commercial (Sigma-Aldrich) Phe, Tyr and Trp amino acids were dissolved in miliQ water to concentrations of 0.05-0.20 mg ml À1 .The pH of B4.7 for all samples stabilized the zwitterionic forms.Absorption, ECD and MCD spectra were recorded at room temperature (298 K) on a Jasco J-5 spectrometer (Japan) equipped with a 1.5 T permanent magnet.A quartz cell of 1 mm optical pathlength, a scanning speed of 10 nm min À1 , a 0.05 nm data pitch, a 32 s response time, and 4 accumulations were used.ECD was obtained as an average, and MCD as a difference of the CD signal obtained with the two magnet orientations.The baseline correction was performed by subtracting the signal of pure solvent.

Computations
For phenylalanine, tyrosine, and tryptophan relaxed potential energy surface (PES) scans were performed for the two side chain torsion angles w 1 and w 2 (Fig. 1, 151 increments), using the MP2 48 /PCM, B3LYP-D 49,50 /PCM, B3LYP/PCM, B3LYP-D/SMD and B3LYP/SMD methods with a standard 6-31++G** basis set.The MP2 method was chosen as a standard comprising the dispersion interaction, independent of the DFT parameterization. 51,52o DFT, the empirical dispersion correction indicated by ''-D'' uses the Grimme ''D2'' method. 53Two continuum solvent models were used, PCM 54 and SMD, 55 the latter presumably better accounts for the solvent dispersion interactions.Water was used as a solvent in the continuum solvent models, while employing the atomic radii derived from the universal force field (UFF) to construct solvent cavities.For geometries near the local energy minima unconstrained optimizations were performed as well.The Gaussian program, revision D01, 56 was used for the DFT computations.
Molecular dynamics simulations were employed to estimate the effect of explicit hydrogen bonds, presumably described only incompletely by the continuum solvent models.The simulations were performed within the Amber10 program. 57Each amino acid was placed in a (20 Å) 3 cubic box otherwise filled with water molecules, then the energy of the system was minimized and the dynamics equilibrated during 10 6 steps, using the Amber03 58,59 force field, the nVT ensemble, temperature T of 300 K, and a 1 fs integration step.Relative conformer energies (DG MD ) were obtained from average conformer populations , where R is the universal gas constant.
During the following production runs, each of the 10 8 MD steps, 500 snapshots evenly distributed in time were saved for spectral computations.In the snapshots, water molecules farther away than 3.4 Å from the amino acid were deleted.To reduce the MD geometry dispersion and thus to speed up the averaging convergence 60 the resultant clusters were partially optimized (ten steps only) using the vibrational normal mode coordinates [61][62][63] and the HF/PCM method, with the standard 6-31++G** and 6-31G basis sets applied to the solute and water atoms, respectively.The HF method was used as it provided a more reliable convergence than DFT, especially on geometries significantly distorted (too long bond lengths, etc.) during MD.It is known that HF and DFT geometries are different; 64 however, these differences (e.g. 2% in bond lengths) do not significantly influence the electronic transitions. 20or the equilibrium geometries and MD clusters, absorption, ECD and MCD spectra were calculated at the B3LYP/ 6-31++G**(solute)/6-31G(solvent)/PCM level, using the timedependent DFT. 65,66This level provided realistic electronic spectra before; in particular the 6-31++G** basis set appeared to be sufficient for the amide group modeling. 20Aromatic p-systems could be reasonably-well modeled even with a smaller basis set. 47,67Natural bond orbitals (NBOs) 68 were plotted as obtained using the Gaussian 56 software.
For several trial systems MCD spectra were calculated using the complex polarization propagator 3 (CPP, implemented in the Dalton 69 program) and the sum-over-states 7 (SOS) methods.As these two approaches provided similar results, the SOS method was used to average over many cluster geometries as it is faster.Gaussian curves (a full width at half maximum of 0.1 eV) were used for the spectra generation, presented in units of cm À1 mmol À1 dm 3 (e, absorption), cm À1 mol À1 dm 3 (De, ECD), and cm À1 mol À1 dm 3 Tesla À1 (De, MCD).

Geometry
The energy landscape of the three molecules is relatively simple.Characteristic torsional angles and relative energies of individual conformers calculated by the MP2, B3LYP and B3LYP-D methods are listed in Table 1.In the last column, Gibbs energies obtained from average conformer populations during the free molecular dynamics are added.For all compounds three w 1 values generating stable conformations are possible, approximately corresponding to the canonical values of 601, À601 and 1801.All compounds also seem to prefer the ''À601'' value of a ''puckered'' conformation where the aromatic ring points between the NH 3 + and the COO À group.For w 1 B 1801 the aromatic residue points in the opposite direction than NH 3 + , and the least probable (with highest relative energy) is a conformation where the aromatic ring points between the NH 3 + and a H groups (w 1 B 601, opposite to COO À ).
For Tyr (with a specific rotation of the OH group) and Trp, another set of conformers is generated by a rotation of their aromatic rings by about 1801.As expected, energy differences caused by the different orientation of the OH group in Tyr are much smaller than those caused by rotation of the larger Trp residue.Within a few kJ mol À1 , the MP2 results correspond to B3LYP, and the empirical dispersion correction (in B3LYP-D) does not significantly affect conformer ordering.We note, however, that the torsional angles calculated e.g. by the B3LYP and B3LYP-D methods are slightly different, by up to 151 for conformer III 0 of Trp.
Comparison of the electronic and Gibbs energies (listed for the MP2 level in Table 1) suggests an important effect of the vibronic motions on conformational equilibria, although the dynamical contributions based on the ideal gas model may not be applicable to our systems. 70,71The most striking inconsistency is thus the very different conformer ordering for Phe and Tyr, and to a lesser extent also for Trp, as obtained from molecular dynamics (cf.DG MD ).For Phe, for example, conformer III, least favored by the B3LYP and B3LYP-D methods, becomes the preferential one in MD.
This contrast is also apparent in the two-dimensional (w 1 , w 2 ) potential energy surfaces (PES) presented in Fig. 2. MD predicts for Phe and Tyr the most probable w 1 value of about 1801, whereas the electronic methods rather prefer the puckered value (w 1 B À601) and w 1 B 601.For Trp, the MD PES suggests a larger conformational flexibility than DFT, with conformer populations spread  By analyzing the MD trajectories, we could construct average water densities with respect to a given orientation of the NH 3 + -CH-COO À parts of the amino acids (Fig. 3).The first hydration spheres are well-developed.They are compact, and structured both radially and angularly with respect to the charged centers.They do not seem to be significantly dependent on the aromatic residues, although the side chain clearly perturbs the hydration sphere more strongly in Trp than in Phe and Tyr.

Spectra of individual conformers
Absorption, CD, and MCD spectra calculated for individual Phe, Tyr and Trp conformers are displayed with the experiment in Fig. 4. As discussed before 31 the electronic spectra of these amino acids can be understood to a large extent based on the dominant signal of the benzene, phenol and indole models.For example, experimental positions of the longest-wavelength absorption bands (240, 275 and 278 nm, Fig. 4) correspond to the CASPT2 predictions of 263, 274 and 283 nm, respectively.Likewise, absorption of benzene is very similar to phenol, at longer wavelengths (B270 nm) indole absorbs much more than benzene, around 200 nm indole absorption is about half of that of benzene, etc. 31 Around 200-220 nm, a significant contribution of the carboxyl group is assumed. 72he experimental absorption spectra of all the zwitterionic amino acids are well-reproduced by the TDDFT computations presented in Fig. 4, with a main peak position within a 2-10 nm distance from the experiment.As expected, the absorption is relatively independent of the conformation.Conformer I of Phe and conformers I and I 0 of Tyr absorb much more strongly Fig. 2 Calculated MD free energies and electronic energies, as dependent on two torsional angles (w 1 and w 2 ).The 6-31++G** basis set and PCM were used in the electronic computations.
around 220 nm than the other conformers.On the other hand, the single peak around 190 nm does not seem to be particularly affected by the conformation.For Trp, the differences are most apparent in the 180-230 nm region comprising two strong absorption bands.Also the longer-wavelength signal at B280 nm has the potential to distinguish the conformation.
The MCD spectra (bottom row in Fig. 4) exhibit only slightly greater conformational sensitivity than the absorption spectra, although a closer look does reveal distinct patterns.For Phe, a weak band at around 235 nm is apparent for conformer II only.
At around 188 nm the relative MCD differences in the intensity and apparent peak positions between the conformers are about twice as large as that for the absorption (e.g.MCD maxima vary between 5 and 6 cm À1 mol À1 dm 3 Tesla À1 , while absorption maxima between 75 and 80 cm À1 mmol À1 dm 3 ).Tyr behaves similar to Phe, although the differences between the conformers seem to be smaller.In contrast to Phe and Tyr, the Trp MCD signal is quite conformer-dependent, although also for this molecule all conformers have a positive signal at around 190 nm, a weak one within the 200-240 nm interval, and a positive/negative MCD at 260/275 nm.However, at around 220 nm, for example, even the MCD sign of conformer II 0 is clearly different from the others.
The ECD spectra (middle of Fig. 4) are most dependent on the conformation.This reflects the characteristic mechanism of the ECD phenomenon intimately linked to a molecular conformation, 45,73 more so than in MCD.One can realize, for example, that the planar aromatic and COO À residues provide most of the MCD signal, whereas without coupling to the rest of the molecule their ECD would be zero.However, in the chiral amino acids, electronic transitions within the planar chromophores may be the main source of ECD as well, as these are perturbed by covalent bonding, electrostatic interaction and mutual polarization with the chiral part of the molecule. 47In Tyr the change in the w 1 torsion angle causes much larger intensity changes than w 2 variation, whereas in Trp the relative influence of the two angles varies throughout the spectrum.
The key to a faithful reproduction of the experimental ECD thus seems to be proper conformer weighting.The spectra were calculated at the B3LYP/6-31++G**/PCM level and averaged  using the Boltzmann factors, and MP2, B3LYP, B3LYP-D and MD conformer energies (Fig. 5).The MD conformer populations provide the best results when compared to the experiment, probably because the MD model better reproduces both the real conformer ordering and magnitudes of the energy differences.The span of the MD relative conformer energies is indeed much smaller than for the electronic methods (cf.Table 1), so that individual conformers are more evenly populated and the averaged ECD spectra are weaker, closer to the experiment.Yet even for the simplest Phe molecule the simulated intensities are more than twice as large as the experimental ones in the longer-wavelength region.The MP2 conformer energies provide the least satisfactory spectral curves, most probably due to an overestimation of the dispersion and correlation effects by this method. 50,74Indeed, the dispersion-corrected B3LYP-D with a more balanced dispersion 75 often provides more realistic results.
However, the conformer weighting alone does not suffice for a reliable modeling of the electronic spectra.The water molecules present in the clusters obtained from MD and explicitly comprised in the quantum-chemical computations have an additional potential to modulate the spectral curves.The absorption, ECD and MCD spectra obtained using the B3LYP functional as an average from 500 snapshots are plotted in Fig. 6.We also explored the CAM-B3LYP functional recommended as a more universal tool comprising also charge-transfer excitations. 76In our case its performance was clearly inferior to B3LYP (data not shown).In particular, the transition wavelengths were too low.
On the other hand, the B3LYP method and the averaging with explicit water molecules (red curve in Fig. 6) makes especially the ECD intensities more realistic compared to the PCM solvent model (Fig. 4 and 5).The greatest deviations between the theory and experiment appear in the longest-wavelength region.Most probably, the inaccuracy of the B3LYP functional, as well as the rovibronic effects, 77 not comprised in the ab initio modeling are to a large extent responsible for the rest of the error.
9][80][81] This is also documented in Fig. 7, where dependencies of spectral errors measured against the 500 cluster average are plotted as a function of the number of averaged clusters.The 500 cluster ensemble provides a reference allowing to estimate the minimal error of the computation.Note that more clusters may in principle be needed for absolute convergence.From the dependence, we can see that only few clusters are necessary to achieve a converged absorption intensity, whereas MCD and ECD require a much more extensive averaging.
For 100 clusters, for example, the error (d) for absorption already approaches 1%.A reasonable convergence is also exhibited by the MCD spectra, although the error (B10%) is about ten times larger.For ECD, the corresponding error of 100% in intensities for Trp is not acceptable, except for an approximate reproduction of the main spectral features.It is also noteworthy that when the water molecules are removed from the clusters before averaging (right-hand side of Fig. 7), the ECD error drops at least to one half.We thus can conclude that both the conformational dispersion and solute-solvent interaction contribute approximately equally to the dispersion and inhomogeneous broadening of spectral lines.The drop of MCD error is rather unexpected when the water molecules are removed, which suggests that, unlike absorption, both ECD and MCD are very sensitive to the interaction of the solvated molecules with the aqueous environment.
For a more detailed insight into the effects of the solvent and conformational changes on the spectra, we ran molecular dynamics with a fixed geometry of Phe.The 25 and 50-cluster averages are compared with an implicit PCM model in Fig. 8. Clearly, even with fixed Phe geometry, the water molecules explicitly involved in the electronic computations are needed to yield a realistic spectral profile; the PCM model provides a reasonable absorption spectral shape, but rather unrealistic magnitudes of ECD intensities.As discussed elsewhere 22,23,82 the continuum PCM approximation is not suitable to reproduce in full the strong solvent-solute interaction, in particular the   (cf., e.g., Fig. 5).We also see that in spite of the high integral error (Fig. 7) a few tens of the clusters suffice for a reasonably accurate prediction of the ECD shape.This is similar as found in previous studies for the vibrational optical activity. 80However, for the vibrational spectra, the PCM model was found to be sufficient as a reasonable computationally cheap approximation, 81   which is something that cannot be concluded from our results simulating the electronic optical activity.The modeling thus shows that water molecules directly participate in the electronic transitions, which is consistent with previous observations of similar systems. 20Indeed, as apparent from two randomly chosen snapshots of Phe and the closest water molecules (Fig. 9), the frontier orbitals most strongly contributing to the measured spectral signal are in some cases dramatically modified by the presence of water.By comparing the first and third column of the orbitals in Fig. 9, we can also notice the significant conformational dependence even for a bare Phe molecule, i.e. without water.The orbital energies are not that different if calculated for different conformers and clusters, which indicates the relative stability of the absorption spectra.The ''different chemistry'' of the vacuum and hydrated Phe molecule is also reflected in the NBO orbitals 68 plotted in Fig. 10.The lowest unoccupied NBO, for example, is virtually unchanged, except for a phase factor, while the highest occupied, and the highest occupied -3 one switch their positions, being alternately localized near the phenyl and carboxyl residues.

Conclusions
To understand the multitude of factors affecting the electronic spectra of aromatic amino acids, we compared experimental absorption, electronic circular dichroism, and magnetic circular dichroism spectra of Phe, Tyr and Trp with a series of model computations.ECD was found to be the most sensitive method reflecting molecular conformation and interaction with the environment, followed by MCD and absorption.Conformer averaging based on molecular dynamics provided much better results than the ab initio conformer energies obtained with dielectric solvent models.However, explicit water molecules had to be included in the computations to achieve more realistic simulations of spectral intensities.Rather surprisingly, MCD intensities were not much influenced by the conformation, but were found to be significantly affected by the solute-solvent interaction.As the most important result we thus consider consistent comparison of the three types of spectra with respect to their solvent and conformational sensitivity, and evaluation of the role of solvent molecules implicitly involved in the computations.A further improvement of the computational methodology is needed for reliable interpretations of spectra, needed for determination of structure and interaction of biologically relevant molecules with the environment.
more evenly among the three canonical w 1 values.As expected, addition of the dispersion (in the MP2 or B3LYP-D methods) favors the puckered conformer, as opposed to B3LYP.As only minor differences were observed between the SMD and PCM solvent models, only the PCM results are shown.

Fig. 3
Fig. 3 Hydration spheres (isodensity surfaces) obtained by an MD run with the Amber03 force field.

Fig. 6
Fig. 6 Absorption, ECD and MCD spectra obtained by averaging of 500 amino acid-water clusters, 500 conformers of a bare amino acid molecule (both obtained during the same molecular dynamics) and the experiment.

Fig. 7
Fig. 7 Error of the absorption, ECD and MCD spectra d ¼ Ð omax o min S N ðoÞ À S 500 ðoÞ j j do .Ð omax o minS 500 ðoÞ j j do as dependent on the number of averaged snapshots, for both clusters and bare molecules, the integration ran over the whole calculated region of B180-320 nm.

Fig. 8
Fig. 8 ECD and absorption spectra for a randomly chosen Phe conformer (w 1 = À163, w 2 = 72): average of 25 and 50 clusters from MD run with fixed Phe geometry.

Fig. 9
Fig.9Selected Phe canonical molecular orbitals of two clusters, with and without water (left and right, for each cluster), with orbital energies in atomic units, for the isodensity value of 0.25.

Fig. 10
Fig. 10 Selected natural bond orbitals for Phe (left) and its MD cluster with water (right).From top to bottom: lowest unoccupied, highest occupied, and highest occupied -3 NBO.

Table 1
Characteristic torsional angles (deg.) and relative energies (kJ mol À1 ) of Phe, Tyr and Trp conformers a a The 6-31++G**/PCM level was used for all methods; DE and DG are, respectively, the electronic and Gibbs energies.