Insight into vibrational circular dichroism of proteins by density functional modeling †

Vibrational circular dichroism (VCD) spectroscopy is an excellent method to determine the secondary structure of proteins in solution. Comparison of experimental spectra with quantum-chemical simulations represents a convenient and objective way to extract information on the structure. This has been difficult for such large molecules where approximate theoretical models have to be used. In the present study we applied the Cartesian-coordinate based tensor transfer (CCT) making it possible to extend the density functional theory (DFT) and model spectral intensities of large globular proteins nearly at quantum-chemical precision. Indeed, comparison with experiment provided a better understanding of the dependence of VCD spectral shapes on the geometry, their sensitivity to fine structural details and interactions with the environment. On a model set of globular proteins the simulated spectra correlated well with experimental data and revealed which structural information can (and cannot) be obtained from this kind of spectroscopy. Although the VCD technique has been regarded as being rather insensitive to side-chain variations, we found that the spectra of human and hen lysozyme differing by a few amino acids only are quite distinct. This has been explained by longdistance coupling of the amide vibrations. Likewise, the modeling reproduced some spectral changes caused by protein deuteration even when the protein structure was conserved.


Introduction
Vibrational circular dichroism (VCD) spectroscopy measures the absorption difference between left-and right-circularly polarized infrared light.The measurement is more difficult than for plain infrared absorption (IR), but the spectra are more sensitive to molecular structure.Available vibrational transitions are often more numerous and convey more information than the electronic ones.VCD is, for example, one of the very few methods by which the absolute configuration of small chiral molecules can be reliably determined. 1,2][5][6][7] Although molecular VCD was observed already in the 1970s, 8 rigorous quantum-chemical formulation appeared much later. 9,10owadays, Stephen's magnetic field perturbation formulation 10,11 implemented within the density functional theory (DFT) 12 is a widely accepted standard for medium-sized molecules, wellbalancing the accuracy and computational cost.An important component in the theory development was also an originindependent algorithm for computing VCD intensities, such as that based on the gauge-invariant atomic orbitals. 13or molecules composed of hundreds and thousands of atoms, even the DFT cannot be used, or its direct application is very inefficient in terms of required computer time and memory.Fragmentation schemes, such as the ''atom in molecules'' approach, are thus increasingly popular to simulate VCD and other vibrational optical activity spectra of such systems. 14In the present study, we apply the Cartesian-coordinate tensor transfer (CCT) [15][16][17] to a set of globular proteins.The method has been recently automated so that it can be applied to molecules of up to B10 000 atoms (the most serious limitation being the diagonalization of the harmonic force field) and used to simulate Raman optical activity (ROA). 18he computational methodology thus significantly increases the application potential of VCD spectroscopy.The technique is widely used to understand not only isolated proteins 19 but also their aggregates, including amyloidal fibrils implicated in various neurodegenerative diseases, 20 as well as other biopolymers 21 and new bio-inspired materials. 22However, the origin of spectral signals is often poorly understood.The globular proteins investigated in the present study are reasonably rigid so that they can be used as benchmarks where specific spectral features can be more reliably assigned to a given structural motif.Of particular interest is the observation that proteins with rather similar a Institute of Organic Chemistry and Biochemistry, Academy of Sciences, Flemingovo na ´me ˇstı ´2, 16610 Prague, Czech Republic.E-mail: bour@uochb.cas.czb Department of Optics, Palacky ´University, 17. listopadu 12, 77146 Olomouc,

Measurement of VCD and IR spectra
Hen-egg-white and human lysozyme, human serum albumin, myoglobin, lactalbumin, insulin and concanavalin A (all from Sigma-Aldrich) were dissolved to a concentration of 300 mg mL À1 .By default, MilliQ water was used; concanavalin A was dissolved in 100 mM phosphate buffer, pH = 6, and for insulin the pH was adjusted to 2.5 with HCl.VCD measurements were performed using a BioTools ChiralIR-2X instrument and the samples were placed in a demountable custom made (by Meopta, Inc.) CaF 2 cell devoid of any spacer 23 as it would be problematic for the 8 mm path length.High protein concentration and thin path length are needed to collect data in a wide spectral range of B1200-2000 cm À1 , which would be otherwise partially obscured by water absorption.Additional measurements were carried out in a BaF 2 cell using both D 2 Oand H 2 O-based solvents, with 50 and 15 mm spacers and protein concentrations of 30 and 150-250 mg mL À1 , respectively.This reduced spectral noise but narrowed the usable wavenumber interval.Each spectrum was accumulated for 16 hours at 8 cm À1 resolution.Absorption (IR) and VCD spectra of pure solvents were subtracted to correct for instrumental artifacts.

Computations
The computation of vibrational protein properties from protein fragments closely followed that described in detail in ref. 18.Briefly, X-ray data from the Protein Data Bank (PDB) database (http://www.rcsb.org)were used as starting geometries.The simulations were performed for ionized species corresponding to the experimental conditions (neutral pH, except for insulin).The main peptide chain was split into overlapping fragments, each containing four amide (three amino-acid) residues and capped with methyl groups.For lactalbumin, shorter and longer fragments (containing 2-5 amide groups) were used to test the methodology (Fig. S1, ESI †).The four-amide approach appeared to be a reasonable compromise between accuracy and computational cost, although the limited fragment length can still lead to some variations of the amide I VCD intensity (around 1720 cm À1 ).In some cases, small contact fragments were added to model interactions between close peptide chains; these, however, did not significantly alter principal spectral features (Fig. S2, ESI †).
Each fragment geometry was partially optimized by restrained energy-minimization in vibrational normal mode coordinates [24][25][26] at the B3PW91 27 /CPCM(water) 28,29 /6-31++G** level; the Gaussian 09 v D01 program was used for all quantum chemistry. 30The B3PW91 functional previously provided excellent results for similar systems. 31Then the force field and dipole derivatives (atomic polar and axial tensors) 32,33 were transferred from all fragments to the original protein using the Cartesian coordinate transfer (CCT) algorithm. 15,16r some proteins the effect of neglecting longer-range interactions within CCT was roughly estimated by adding the transition dipole coupling (TDC) correction 34 to the transferred force field.For each atom pair i, j not included in the fragmentation scheme the force constants (second energy derivatives) were obtained as where e r and e 0 are the relative and vacuum dielectric constants, r ij = r i À r j is the distance vector between the atoms, P ia ¼ @p @r ia is the atomic polar tensor 35 (dipole derivatives), p is the electric dipole moment, a and b denote Cartesian coordinates, and the dots indicate scalar products.Absorption and VCD intensities were generated at the harmonic level 3,33,36 and smooth spectra were obtained by a convolution with Lorentzian functions of 10 cm À1 bandwidth (full width at half maximum).

Characteristic features of protein IR and VCD spectra
Typical calculated IR and VCD spectra and their conventional subdivision by characteristic protein vibrations are documented in Fig. 1 for the example of lactalbumin.The spectra were simulated for the whole protein as well as for an analogue in which all side chains were replaced by alanine (methyl groups).In either case a ''classical'' protein IR spectrum is obtained; 37,38 it is dominated by CQO stretching signals (''amide I'' region), followed by ''amide II'', i.e. a combination of N-H bending and C(carbonyl)-N stretching motions.
The amide I band is simulated at a higher frequency (B1725 cm À1 ) than obtained experimentally (B1650 cm À1 ), 39 which is a notorious problem of such ab initio simulations, and This journal is © the Owner Societies 2018 cannot be easily resolved unless less-rigorous approaches (e.g., frequency ''scaling'') are applied.1][42] Anharmonic interactions, the DFT error and an incomplete basis set are also part of the problem, [43][44][45] but tackling of the factors for large proteins would be prohibitively computationally expensive.Fortunately, the amide I vibration is quite isolated from the other ones, the signal is easily recognizable, and the band shape (as the best indicator of the structure) is not much affected by the frequency shift. 46For lactalbumin, the ''À/+'' (when viewed from left to right, that is from higher to lower wavenumbers) VCD intensity pattern corresponds to its high contents of a-helix, 43% according to the Protein Data Bank.This pattern has been observed in a large ensemble of a-helical proteins in the past. 47he amide II band (B1550 cm À1 ) and also a weak broad absorption band around 1300 cm À1 conventionally attributed to ''extended amide III'' modes (mostly NH bending, C-N and C-C stretching coupled with backbone vibrations) are simulated at approximately the same positions as found experimentally. 48,49dditional intensity due to the methyl groups (CH bending) is apparent for the all-Ala analogue around the amide III signal.The amide III region also comprises a C-H bending, and the corresponding ROA signal was found useful to monitor the secondary structure of peptides and proteins; 50,51 in VCD, it plays a minor role only. 48etween the amide I and amide II signal, the whole-peptide IR spectrum exhibits two relatively strong sharp peaks at 1630 and 1608 cm À1 of side chain origin, in particular due to out of phase CQO stretching of the glutamic acid COO À group and NH bending of glutamate and arginine NH 2 groups.Visualization of the normal-mode motion revealed that they partially couple with the amide II C-N stretching vibration.In experimental spectra, these bands are hidden within the amide II absorption band.However, an analogous band, at least for the CQO stretching, is observable in D 2 O experiments at B1585 cm À1 (Fig. S3, ESI †).As the groups are not chiral and relatively isolated, the corresponding VCD intensity is very weak.The same groups also give rise to the 1400 cm À1 band.
By comparing the whole-peptide and all-alanine simulations (Fig. 1) we can see that a direct contribution of longer amino acid side chains to IR and VCD intensities is rather small.Nevertheless, their inclusion still causes up to 50% variations in amide I and amide II VCD intensities, due to the vibrational coupling.In both simulations, we can also recognize the signals of the CH 3 and CH 2 umbrella and scissoring CH bending modes (1495 and 1480 cm À1 ).These motions are associated with small changes of the electric dipole moment and are visible in IR spectra only because they are so numerous.Their VCD signal is negligible, which can also be seen in Fig. S4 (ESI †) where it was simulated by deleting the intensity parameters (atomic axial and atomic polar tensors) for the backbone.
On the other hand, we realize that our model is too crude to account for all aspects of the backbone-side chain interactions, such as the side chain electric field (charge state) and conformational averaging, which would require an excessive computational time.These were found to be important for the fine structure of absorption spectra 52,53 and are likely to be included for more faithful VCD models in the future.

Dependence of the spectra on protein secondary structure, comparison to experiment
Before comparing the simulated spectra to measured ones, one has to take into account both theoretical and instrumental limitations.As pointed out above, the amide I band is calculated to be about 60 cm À1 higher than in experiment; the error for the amide II (experimentally B1540-1550 cm À1 ) and other bands is typically a few cm À1 only.Although the globular proteins studied here are supposed to be relatively rigid, their solution structure might deviate from the X-ray geometry used in the simulations.The experimental bands are affected by inhomogeneous broadening, i.e. system dynamics and solute-solvent interactions, and the instrumental resolution (8 cm À1 ).The experimental intensity scale is only approximate; the real optical path length (nominally 8 mm) strongly depends on sample preparation, such as the force applied to keep the CaF 2 windows together.Small sample volumes (B20 mL) and baseline subtraction also affect the experimental accuracy.
Having in mind these restrictions, we find the agreement between the theory and the experiment very good.Fig. 2 compares theoretical and experimental results for proteins of rather distinct secondary structures.These include those with a higher content of a-helix, human serum albumin (46%) and myoglobin (77%), characterized by the principal ''À/+'' amide I couplet.While for both proteins negative sign prevails, the couplet becomes slightly more conservative for myoglobin, i.e. integrated intensities of the negative and positive lobes are about equal.This trend is overestimated by the simulations, providing almost an ideal couplet for myoglobin.The negative amide II signal remains approximately the same for the two ''a-helical'' proteins, so does the weaker but still well-visible positive amide III signal.
Comparison of the a-helical versus b-sheet proteins reveals the ''power'' of VCD spectroscopy.Indeed, unlike for IR, the spectra of the predominantly b-sheet concanavalin-A (46% b-sheet according to PDB) are quite different (Fig. 2, bottom).The helical ''À/+'' amide I signal changes to approximately opposite ''+/À'' one; however, the positive part is further split (1732/1715 and 1681/1659 cm À1 in simulation and experiment, respectively).The nature of the amide I splitting can be seen in Fig. 3 where typical vibrational modes from these two bands are plotted (other examples of normal modes can be found in Fig. S5, ESI †).The relatively regular b-sheet strand enables a longer-distance coupling of individual CQO stretching vibrations with distinct frequencies.The 1732 cm À1 band is thus dominated by the ''+ + +''/''À À À'' patterns where three consecutive CQO bonds become longer/shorter when vibrating.The other 1715 cm À1 calculated positive band is dominated by ''+ + À'' or ''À À +'' phasing of the amide I vibrations.The positive 1621 cm À1 amide I signal encountered in the measured spectrum might be related to the small calculated couplet at B1630 cm À1 .It is known that the b-sheet amide II VCD signal is often weaker than for a-helix, or becomes a ''+/À'' couplet.For concanavalin, the experimental couplet at 1559/1520 cm À1 is reproduced by the computations at 1570/1537 cm À1 .The extended amide III region changes, too, giving rise to positive bands at 1406, 1296 and 1223 cm À1 , all of which are reasonably well reproduced by theory.
The spectroscopic behavior of proteins with mixed secondary structure is more complex; the simulated and experimental spectra of bovine a-lactalbumin and also hen egg-white lysozyme are shown in Fig. 4.These two proteins have similar secondary structure, and their spectra are also similar.The hen lysozyme spectrum in some details differs from that published previously, 54 most probably because of the partial aggregation at high concentrations.
Unlike for the ''pure'' a-helical or b-sheet proteins discussed above, a band to band assignment within the amide I region is difficult, most probably because of the computational error stemming from the limited fragment size (Fig. S1, ESI †).In a-lactalbumin, for example, the lower experimental resolution prevents the up-going simulated VCD component at 1722 cm À1 to be discernible, while the predominantly negative signal around 1661 cm À1 corresponds to the simulation (Fig. 4).The experimental VCD spectra of predominantly negative amide II are simulated reasonably well except for the positive 1564 cm À1 peak not resolved in experiment, and perhaps only manifested as a steeper onset of the dominant 1521 cm À1 negative band.The amide III vibrations provide a broad weak VCD centered around 1300 cm À1 , also positive but weaker than for the purely a-helical proteins, both in theory and experiment.
The spectra of hen egg white lysozyme (containing 41% of a-helix) differ in minor features only.Instead of the positive (1722 cm À1 ) and negative (1709 cm À1 ) low wavenumber amide I VCD features in the a-lactalbumin spectrum, a positive band at 1712 cm À1 is predicted.Indeed, the band is now also detectable at Fig. 3 Example of two amide I ''b-sheet'' modes in concanavalin.In the first mode (1731 cm À1 , top) the CQO bonds vibrate locally in phase (''+ + +'' or ''À À À'', etc.) while in the second one (1715 cm À1 , bottom) the typical pattern is ''+ + À'' or ''À À +''.1633 cm À1 in experiment.The small amide I positive experimental band at 1697 cm À1 is not reproduced.

Amide I signal of deuterated proteins
Due to the high water absorption at the amide I region, IR and VCD protein measurements are often performed in deuterated samples, in D 2 O.The deuteration, however, leads to medium to large variations of the amide I signal (often referred to as amide I 0 in D 2 O); 55 these represent a complicated product of solvent effects and protein dynamics. 40,41,56The static DFT computation cannot fully explain this effect, it can nevertheless estimate the contribution of the NH -ND exchange stemming from vibrational coupling within the protein molecule.In Fig. 5 experimental IR and VCD a-lactalbumin amide I spectra are compared to the simulations.While deuteration shifts the absorption maximum by just 9 cm À1 to lower wavenumbers, the VCD pattern changes more dramatically.The main negative component at 1659 cm À1 loses intensity by B40%, shifts right, and splits.The 1628 cm À1 positive band vanishes and a new negative signal at 1616 cm À1 appears.The simulation faithfully reproduces these changes, although the precision is clearly limited in providing a perfect fit.A normal mode analysis (Fig. 6) gives a qualitative insight into the deuteration by revealing that while CQO stretching is dominant, the amide I vibrations also involve NH bending significantly affected by the isotopic substitution.

Explicit solvent model included in the calculations
The effect of ''explicit'' solvent on the spectra was estimated for the smallest molecule investigated, the insulin dimer.Water molecules closer than 3.6 Å to the protein were kept with the fragments, so that the solvent-solute hydrogen bonds could be included at the same computational level as the protein parts.By comparing the spectra with those obtained using the default CPCM dielectric (''implicit'') environment (Fig. 7) it is obvious that the ''explicit'' solvent not only conserves the principal spectral features, but makes the simulations more realistic.
The IR CQO stretching signal (originating mostly from the COOH glutamic groups) simulated at 1787 cm À1 within CPCM shifts to lower wavenumbers and becomes broader, which better corresponds to the experimental shoulder at 1737 cm À1 .The central amide I CPCM band (B1716 cm À1 ) also becomes smoother and shifts by 13 cm À1 closer to the experimental value.Changes in other spectral parts are almost negligible.Interestingly, the amide II band goes slightly up in frequency (1552 -1557 cm À1 ) when the explicit model is used.This is Fig. 6 An amide I mode of a-lactalbumin; visualization of the normal mode movement reveals mutual coupling of CQO stretching vibrations, but also the involvement of N-H bending.
Fig. 7 Insulin, its absorption and VCD spectra simulated with an implicit solvent environment (CPCM) and when water molecules from the first hydration shell were retained.For the ''explicit'' case, the spectra were simulated with and without water intensity tensors while vibrational frequencies remained the same.The experiment is at the bottom.
This journal is © the Owner Societies 2018 farther from the experimental band (1543 cm À1 ), most probably due to the DFT error.Even greater changes are observed in VCD spectra.The negative 1738 cm À1 amide I band (CPCM) becomes broader, with a similar intensity as the amide II band (1551 cm À1 ), and its position shifts to 1727 cm À1 , closer to the experimental one.On the other hand, the explicit solvent model still fails to reproduce the fine vibrational band splitting in the amide I region.While the positive band at 1738 cm À1 splits into two (1718 and 1691 cm À1 ), only a single positive band is detected experimentally at 1646 cm À1 .Rather minor changes occur at the amide II and lower-wavenumber spectral parts.Interestingly, when the water contribution to spectral intensities is included (dashed line in Fig. 7) some VCD intensity arises around 1630 cm À1 .This ''transfer'' of chirality from the protein to water could explain the negative feature around 1619 cm À1 in the experimental spectrum.Visualization of the normal mode displacement suggests that amide carbonyls making two (''bifurcated'') hydrogen bonds and also NH 2 group bending vibrations of asparagines and glutamines contribute to it.
The explicit consideration of water molecules can thus account for some inconsistencies in the simulated spectral shapes; similar conclusions have been drawn also for nucleic acids. 57,58Solvent molecules should be certainly included in future modeling of protein spectra, which is currently very demanding for computer recourses.Typically, the first solvation shell is kept, and the geometry is based on the classical MD force field.In addition to the larger size of such clusters that need to be treated quantumchemically, water/solvent molecules can contribute to the chiral signal.Therefore, the spectra of many MD snapshot geometries need to be averaged so that the solvent signal becomes realistic. 59n the other hand, the sensitivity of the VCD spectra to the solvent-solute interactions makes them an excellent tool to study biologically relevant molecules and their interactions.

Variation of the a-helical VCD signal
Although a-helical VCD ''signatures'' are very characteristic in various proteins 49 one may ask if minor variations of a-helical geometry caused by different amino acid sequences or interactions with the environment can distort them.We thus separately simulated the spectra for eight helical myoglobin segments.Their geometries were kept unchanged, without any treatment of the broken bonds, and the same spectral parameters (force field, atomic polar and axial tensors) as for the full protein were used.As apparent from Fig. 8 the main VCD sign-pattern is conserved in all the helices; the minor variations are thus unlikely to cause a major error in quantitative structural predictions. 47,60On the other hand, spectral details do vary, including different widths and positions of the absorption peak.Interestingly, the shortest a-helix (number 3 in the figure) has a very distinct pattern in the amide II region (B1550 cm À1 ), implying that this structural motif is likely to be more easily identified by VCD spectroscopy.Because the torsion angles of the third helix ((j, c) B (À621, À301)) do not significantly differ from those found in the other helices and are close to standard values (À571, À471), 61 we explain the specific signal of the shortest helix by its length and strong ''end-effects'' caused by the absence of vibrational coupling to other protein parts.

Additivity of the spectra
Another implication of stronger dependence of VCD than IR on a long-range vibrational coupling is different additivity of the spectra.The spectra of deuterated a-lactalbumin were simulated for the whole molecule and the a-helical and b-sheet protein parts (Fig. S6, ESI †).The sum of a-helical and b-sheet IR spectra add up to approximately 70% of total intensity, with the rest coming from the connecting loop and ''unordered'' peptide chains.For VCD, the situation is more complex.Within the amide II 0 region (around 1470 cm À1 ) the spectrum of the whole protein is nearly the same as the a-helical one, i.e., contributions of the b-sheet and unordered parts are negligible.This explains the similar bandshape of the amide II VCD signal in all proteins containing a substantial amount of a-helix but variable amounts of other structures (myoglobin, HSA, lysozyme and lactalbumin, cf.Fig. 2 and 4).The higher-frequency component of the amide I 0 signal (B1730 cm À1 ) behaves in a similar way.On the other hand, the sum of the a-helical and b-sheet contributions accounts for only B40% intensity of the lowest frequency amide I 0 band (B1705 cm À1 ).This is in agreement with the higher sensitivity of these vibrations to deuteration as mentioned above.Apparently, the lower-frequency amide I and amide I 0 modes are ''less local'', i.e. they are better coupled to other vibrations with similar frequency, even if they occur in different protein parts.This is also consistent with experimental experience (mostly D 2 O measurements).The higher-frequency amide I lobe comes mainly from a-helix, while the lower-frequency one from a combination of random coil and b-sheet.The random coil signal is usually stronger than that from b-sheet, approximately in the calculated ratio 60/40. 62ectric-dipolar origin of protein chirality and simpler models For any molecule, both the ''irreducible'' magnetic (local parts of the atomic axial tensors, AAT) and electric dipolar (atomic polar tensors, APT) parts contribute to VCD intensities. 11,63n the past, neglecting AAT was suggested as a fast way to approximately simulate the spectra, but in general both contributions are needed for the results to be reliable. 64,65In this context, it is interesting to note that except for some minor discrepancies the ''APT-only'' model nearly matches the exact results for proteins (the two approaches are compared in Fig. S7 (ESI †) for a-lactalbumin).In other words, protein VCD is dominated by electric dipole interactions.The transition electric dipole moments are mostly localized on the amide groups that are nearly planar, i.e. not locally chiral.However, for an accurate simulation also containing features such as the low-wavenumber amide I VCD component (1709 cm À1 ) in a-lactalbumin, it is necessary to include the contributions from the AAT part as well.
0][71] TDC results are thus very dependent on input parameters; they well-capture the general dependence of the spectra on secondary structure, but fail to reproduce important details.As an example of the performance, we present IR and VCD a-lactalbumin spectra calculated by the TDC method in Fig. S8 (ESI †).Electric transition dipoles (vectors and vibrational frequencies) were obtained for the amide I and II modes from a DFT calculation on N-methylacetylamide, using the same approximation level as for CCT.The TDC frequency-error is thus similar as for the all-atomic computation.For IR, the amide II band splitting is too large based on TDC, otherwise the spectral profile appears reasonable if compared to the experiment.The TDC VCD pattern is by definition conservative, i.e. integrated intensity over the whole spectral range is zero, which is in contrast to the predominantly negative experimental and CCT results (Fig. 3).Nevertheless the amide I TDC pattern exhibits some resemblance to the CCT one, and even more obvious is the correspondence for amide II, where both approaches predict the ''+/À'' couplet.We can thus interpret this similarity as confirmation of the importance of dipole-dipole interactions for protein VCD spectral pattern.

Protein tertiary structure and long-distance interactions
Finally, we compare the spectra of structurally very similar hen egg white and human (recombinant) lysozyme (Fig. 9).Structural differences may be reflected in tiny variations of protein tertiary structure, flexibility, interactions mediated through amino acid side chains, protein response to the environment, etc. Whereas the absorption spectra are almost identical, larger variations are observable in VCD.This is rather surprising for the low-resolution technique, although some differences were previously observed in another form of vibrational optical activity, ROA, 72 and confirmed by the simulations as well. 18While both lysozymes from different organisms exhibit a couplet (''+/À'') amide I VCD signal, the hen egg white lysozyme has a much smaller negative lobe at 1662 cm À1 , and a third peak appears at 1633 cm À1 .The shape of the amide II signal (B1510 cm À1 ) remains about the same, but the band is narrowed and slightly shifted to a lower wavenumber.
The simulations reproduce these observations reasonably well, but only when the TDC long-range interaction correction is added to the DFT force field (Fig. 9).Without TDC, predicted differences between hen and human lysozyme VCD spectra are much smaller (Fig. S9, ESI †).Supposedly, the long-range coupling of the protein vibrational modes can thus explain part of the variations.We do realize that the X-ray geometries may differ from those in solution and the experimental spectra may depend on sample preparation, 54 which may further modulate the comparison.Nevertheless, measurements in D 2 O where the signal to noise ratio in the amide I region is larger confirmed the results for non-deuterated proteins.In Fig. S10 (ESI †) the results of two independent measurements for each protein are plotted; the variations in the VCD shapes are clearly larger than the measurement error.Although the differences are not big, the high sensitivity of VCD spectroscopy to minute structural mutations is very appealing for further applications.
The importance of the longer-range vibrational interactions is also documented for concanavalin A (Fig. S11, ESI †).The TDC correction is gradually added to the DFT force field by changing the dielectric constant (e r ).Some spectral features are improved by that, such as the split positive amide II band (1559 cm À1 in experiment) or a negative amide I high-frequency signal (1733 cm À1 ).In both IR and VCD, the amide I band broadens and shifts down in frequency, i.e. closer to experiment.On the other hand, it still deviates from the experiment in some features.Also, the excessive band splitting for e r = 1 suggests that the vacuum value of the permittivity constant in the TDC approximation overestimates the interactions of distant protein chromophores.

Conclusions
The CCT fragment approach and density functional computations helped us to significantly clarify the relation between vibrational This journal is © the Owner Societies 2018 circular dichroism spectra of proteins and their structure.A very satisfactory agreement with the experiment has been achieved.The simulations explained the main differences observed among proteins differing in secondary structure, provided an insight into the nature of the underlying normal mode vibrations, made it possible to assign all principal vibrational bands, and could partially explain fine spectral changes due to deuteration.The results indicate that the high-and low-frequency amide I vibrations differ in their sensitivity to coupling with distant vibrations.To the best of authors' knowledge, this has not been previously addressed, and may be important to explain fine changes in spectral shapes caused by protein tertiary structure.Surprisingly, human and hen lysozymes provided rather different VCD spectra; this was explicable by long-range coupling of vibrational modes, and somewhat changes the concept of locality typically attributed to this kind of spectroscopy.Clearly, the combination of vibrational optical activity spectroscopy and computational chemistry appears to be quite useful for protein studies, and can prevent erroneous interpretation of experimental data.On the other hand, the accuracy of the simulations still needs to be improved, namely by better describing the solvent and dynamics effects.

Fig. 1
Fig. 1 VCD (De) and absorption (e) spectra of bovine a-lactalbumin, simulated for the whole protein and an alanine analogue where all side chains were replaced by methyl groups.The backbone geometry remained unaltered.

Fig. 2
Fig. 2 Calculated and experimental IR and VCD spectra of mostly a-helical (human serum albumin and equine myoglobin) and b-sheet (concanavalin-A) proteins.

Fig. 4
Fig. 4 Calculated and experimental IR and VCD spectra of proteins with mixed secondary structure, bovine a-lactalbumin and hen egg-white lysozyme.

Fig. 5
Fig. 5 Simulated and experimental amide I IR and VCD spectra of a-lactalbumin, for deuterated and natural environment.The experimental spectra were normalized to absorption maxima, and the 5-amide based simulation was used in this case, instead of the default 4-amide fragmentation.

Fig. 8
Fig. 8 Simulated VCD and absorption spectra for individual helical myoglobin parts.The spectra were normalized to the highest amide I absorption (B1724 cm À1 ).

Fig. 9
Fig.9IR and VCD spectra of human and hen egg white lysozyme, simulation (with TDC correction applied) and experiment.