Shaun T.
Mutter
*^{a},
François
Zielinski
^{b},
Paul L. A.
Popelier
^{b} and
Ewan W.
Blanch†
*^{a}
^{a}Manchester Institute of Biotechnology and Faculty of Life Sciences, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK. E-mail: shaun.mutter@manchester.ac.uk; ewan.blanch@rmit.edu.au
^{b}Manchester Institute of Biotechnology and School of Chemistry, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
First published on 27th January 2015
By looking back on the history of Raman Optical Activity (ROA), the present article shows that the success of this analytical technique was for a long time hindered, paradoxically, by the deep level of detail and wealth of structural information it can provide. Basic principles of the underlying theory are discussed, to illustrate the technique's sensitivity due to its physical origins in the delicate response of molecular vibrations to electromagnetic properties. Following a short review of significant advances in the application of ROA by UK researchers, we dedicate two extensive sections to the technical and theoretical difficulties that were overcome to eventually provide predictive power to computational simulations in terms of ROA spectral calculation. In the last sections, we focus on a new modelling strategy that has been successful in coping with the dramatic impact of solvent effects on ROA analyses. This work emphasises the role of complementarity between experiment and theory for analysing the conformations and dynamics of biomolecules, so providing new perspectives for methodological improvements and molecular modelling development. For the latter, an example of a next-generation force-field for more accurate simulations and analysis of molecular behaviour is presented. By improving the accuracy of computational modelling, the analytical capabilities of ROA spectroscopy will be further developed so generating new insights into the complex behaviour of molecules.
This mutually beneficial relationship between the experimental and theoretical aspects of ROA spectroscopy creates a very powerful set of analytical tools. This offers an experimental technique with an unrivalled sensitivity to chirality,^{6} combined with the great amount of detail at the molecular level available through computational modelling. These computational tools aid analysis by simulating spectra that can be used to confirm structural parameters, understand the vibrational nature of observed bands, and give vital information on conformational dynamics. As ROA can be utilised as a solution structure technique it has great applicability for biomolecular systems, which have strong interactions with the aqueous environment. As such, any model needs to carefully consider the effect a solvent will have on the results of calculations. Fortunately the current state of modern computational modelling allows for the addition of a significant number of explicit solvent molecules, which has been shown to result in a significant increase in the accuracy of results.^{7}
ROA was first predicted in 1969 in Oxford by Atkins and Barron^{8} and was then first observed experimentally in 1972 by Barron, Bogaard and Buckingham at Cambridge.^{2} The sensitivity of ROA to molecular stereochemistry was then demonstrated in a series of studies by Barron and colleagues at the University of Glasgow. It is not the purpose of this review to discuss this body of work, and interested readers are directed to a number of relevant reviews.^{9–11} In summary, these studies established the direct correlation between the manner in which vibrational modes sense local stereochemistry and the signs of ROA band patterns, and then showed that ROA spectra are incisive probes for complex higher order structures. For example, studies performed by spectroscopists at Manchester and the Diamond Synchrotron showed that ROA can characterise the structures adopted by glycosaminoglycans,^{12} complex interactions between mucin glycoproteins,^{13} DMSO-induced unfolding of proteins,^{14} as well as identified how changes in local flexibility of the protein monellin correlate with reduced sweet-taste perception.^{15}
(1) |
(2) |
(3) |
Molecular multipole moments and the quantum mechanical expressions for the molecular property tensors can be defined by the fields and field gradients evaluated at the molecular origin. The molecular property tensors can be extracted from time-dependent perturbation theory to give eqn (4)–(6).
(4) |
(5) |
(6) |
(7) |
(8) |
(9) |
(10) |
(11) |
A quantitative experimental ROA observable that can be useful for biomolecular analysis is the dimensionless circular intensity difference (CID), introduced by Barron and Buckingham, as given by eqn (12);^{1}
Δ = (I^{R} − I^{L})/(I^{R} + I^{L}) | (12) |
(13) |
(14) |
eqn (13) shows the CID expression for a forward scattering geometry and eqn (14) shows the expression for a backscattering geometry. Where a molecule is composed of idealised axial symmetric achiral bonds, a situation where β(G′)^{2} = β(A)^{2} and αG′ = 0, ROA is generated entirely by anisotropic scattering and the forward scattering and backscattering CID expressions are reduced to eqn (15) and (16), respectively.^{18,19}
Δ(0°) = 0 | (15) |
(16) |
This shows that the ROA intensity is maximised in the back scattering geometry. As Raman scattering intensities are equal for the forward and backwards geometries, this increase in the ROA signal relative to the Raman shows that this is generally the best experimental strategy. A more extensive analysis of the theory of ROA can be found in one of several reviews on this subject.^{20–24}
As stated above, in the absence of electronic resonance effects the ROA spectra obtained from these different instruments are directly comparable. In each case, laser powers used for biological molecules are typically from 1–1.4 W (measured at the laser source), concentrations usually are in the range of ∼30–100 mg ml^{−1} for proteins and nucleic acids while those of intact viruses and complex polysaccharides are ∼5–30 mg ml^{−1}. For smaller molecules concentrations are usually in the range of 50–200 mg ml^{−1}. While measurements can be conducted on concentrations lower than those mentioned here, the higher the concentration the faster the measurement will be. Under these conditions ROA spectra over the spectral range of ∼300–2000 cm^{−1} are typically obtained in minutes to a few hours for small molecules, ∼5–24 hours for proteins and nucleic acids, and ∼1–4 days for intact viruses and complex carbohydrates.
With the increase in computing power the capability of calculating ROA spectra also increased, with access to larger basis sets and more advanced methods than those used in the early studies. As well as these improvements, deficiencies in the calculations were also able to be addressed. Helgaker et al.^{44} examined issues in the calculation of magnetic properties with finite basis sets, causing errors in the calculation of the electric dipole – magnetic dipole tensor. Magnetic properties are dependent on the gauge origin of the magnetic vector potential. Hence, as a solution, gauge-invariant atomic orbitals (GIAOs) were used, as they yield intensities independent of the gauge origin.
Density functional theory (DFT) methods are widely used in computational chemistry and have a wide range of applications. As such, they have become the standard approach in the calculation of ROA intensities. The first reported studies using DFT were from Ruud et al.,^{45} using the hybrid functional B3LYP with a combination of Pople and Dunning basis sets. Results for methyloxirane, α-pinene, and trans-pinene were compared to those using HF, as well as experiment. Spectra calculated using DFT were found to be superior to the earlier methods, and a significant improvement in the calculation of harmonic vibrational frequencies was also noted.
In one of the early computational ROA studies by Polavarapu in 1990,^{46} the author noted that ROA calculations were hindered by the need to obtain the derivatives of the electric dipole – magnetic dipole polarizability tensor. As no analytical differentiation approaches were available, numerical differentiation was used, complicating the computations and causing much greater computational expense. It was also noted that the development of an analytical method to evaluate the derivatives would greatly facilitate the ROA prediction of larger molecular systems. Early analytic protocols originated in 2007 from Liegeois et al.^{47} with an analytical time-dependent HF algorithm for calculation of the derivatives of the electric dipole – magnetic dipole polarisability tensor. This work allowed for the first time fully analytical evaluation of the three frequency-dependent invariants needed for ROA calculation. Although this method utilised non-London orbitals, resulting in erroneous gauge origin dependence, further advancements have since been made allowing for the use of GIAOs and DFT.^{48–51}
Early studies by Pecul and Rizzo^{52} examined the basis set dependence of ROA using a selection of Dunning's correlation consistent basis sets, in augmented and non-augmented forms. Using MCSCF methods they found aug-cc-pVDZ to be suitable for qualitative analysis but more importantly they noted that basis sets with diffuse functions were able to reproduce experimental spectra more accurately than those without. This was also noted by Hug,^{53} and that description of the diffuse part of the electron distribution on hydrogen nuclei was essential for basis sets used for ROA calculations. This led to a more extensive benchmark study by Zuber and Hug,^{54} which also led to the development of a new basis set, rDPS. This basis set is in the form 3–21++G, with semidiffuse p functions on hydrogens, with an exponent of 0.2. In their tests, rDPS performed very well, giving better results than a selection of Pople and Dunning basis sets, with only aug-cc-pVDZ outperforming it against the benchmark Sadlej basis set.
The most recent benchmark study was carried out by Cheeseman and Frisch,^{55} where they examined the basis set dependence of the ROA tensor invariants and force field calculations separately. This study took advantage of the fully analytical derivative methods as well as examining the onestep and twostep procedures (2n + 1 and n + 1 algorithms, respectively) as described by Ruud and Thorvaldsen.^{20} The twostep approach allows for the calculation ROA tensor invariants at a different level of theory than the optimisation and force field calculation, whereas the onestep approach calculates everything at the same level of theory. Cheeseman and Frisch noted that the influence of the basis set is different in the calculation of the Raman/ROA tensor invariants compared to the force field calculation and concluded from this that the twostep procedure is the more efficient, particularly for large molecules. A selection of basis set combinations for twostep calculations, based on system size, was also reported. For intermediate-sized system aug(sp)-cc-pVDZ//cc-pVTZ was suggested, while for large systems rDPS//6-31G* for the ROA tensor invariant and force field calculations, respectively.
There are comparatively fewer benchmark studies that explore the dependence on the method rather than on the basis set. Reiher et al.^{56} presented one of the first studies with a combined basis set and DFT functional. Local density approximation, generalised gradient approximation, and hybrid DFT methods were explored using SVWN, BLYP, and B3LYP, respectively. Results showed that BLYP and B3LYP outperformed SVWN.
A more extensive study was carried out by Danecek et al.^{57} using a mix of 23 pure and hybrid DFT functionals, as well as HF and MP2. Comparison between experimental spectra and those calculated with the DFT methods, for alanine and proline zwitterions, found that the hybrid functionals generally performed best, noting that B3LYP and B3PW91 performed particularly well. Sebek et al.^{58} also noted that, overall, the B3LYP functional provides a well-balanced model between accuracy and cost, as such this functional has been widely used in recent computational ROA studies.
The calculation of ROA spectra is now possible through several commercially available programs, including Gaussian03,^{59} Gaussian09,^{60} and the Amsterdam Density Functional code (ADF).^{61} Gaussian09 also includes the necessary subroutines for analytical derivatives of all the required ROA tensors. As well as these packages, there are several other programs for ROA calculation including Dalton,^{62} CADPAC,^{63} and SNF.^{64}
The inclusion of solvent in the ab initio calculation of ROA spectra can be carried out in two ways, broadly defined as implicit and explicit solvation. Implicit solvent models place the solute within a cavity in a uniform polarisable medium with a dielectric constant. There are two widely used implicit solvent models called PCM^{68} and COSMO,^{69} which treat the solvent as a dielectric and a conductor, respectively.
The explicit solvation approach includes explicit solvent molecules in the ab initio calculation, which can be modelled at several different levels of theory. A small number of solvent molecules can be added and treated at the quantum mechanics level along with the solute, if the solvent molecule is relatively small, such as water. It is also possible to include large numbers of solvent molecules and model them using hybrid quantum mechanics/molecular mechanics (QM/MM) methods. This approach treats the solute at the QM level and the solvent with a computationally much cheaper molecular mechanics approach. Electrostatic embedding can also be included for more accurate calculation by incorporating the MM charges in the QM Hamiltonian. Such an approach is able to model systems with large solvation shells, with solvent molecules numbering in the hundreds.
It is also possible to combine implicit and explicit solvation approaches. When carrying out calculations on systems with small numbers of explicit molecules it is trivial to include a continuum method. Recent work by Biczysko et al.^{70} has also explored the possibility of incorporating implicit solvent models within QM/MM frameworks.
The suitability of the solvation approach used depends greatly on the molecular property of interest. Implicit models offer reasonable accuracy for energy and structure calculations, with a minimal increase in computational expense compared to gas phase calculations.^{71,72} However, the accuracy of vibrational frequencies increases less from the gas phase to an implicit solvent model and a small number of explicit water molecules has been shown to have a dramatic effect on calculated spectra, resulting in much better agreement with experiment.^{65,73}
The addition of explicit solvent molecules to a system can present issues, however. The simplest approach to add solvent molecules is ad hoc solvation, where the solvent is added manually to regions of each of the molecular conformers, such as water molecules being added to the polar groups of a biomolecule. Studies have shown that this approach can offer significant improvement in the reproduction of several features within ROA spectra.^{74,75} However, this approach does have a shortcoming in that systems may converge very slowly to an optimised geometry, as a result of the shallow nature of the potential energy surface.^{57,75}
Given the importance of including the major conformers of a molecule for the simulation of ROA spectra, a more practical approach for the addition of solvent molecules is a dynamic method. The use of molecular dynamics (MD) simulations offers the benefits of being able to obtain accurate solvated structures, closer to optimised geometries, as well as giving valuable conformational information.
Several recently published studies explore the area of implicit and explicit modelling of solvent. Hopmann et al.^{75} studied the effect solvation had on the Raman and ROA spectra of lactamide and 2-aminopropanal, using PCM, ad hoc hydration, classical MD, and Car-Parrinello MD. Results showed that although the PCM gave basic conformational information, energies and ROA intensity patterns, the inclusion of explicit solvent provided better agreement with experiment. Inclusion of at least ad hoc hydration was recommended and it was noted that Car-Parrinello MD gave better agreement than classical MD. However, the additional computational cost of ab initio Car-Parrinello MD might mean it is not possible to fully sample the conformational space of systems in question.
Cheeseman et al.^{7} utilised a full MD simulation to model the hydration effects of methyl-β-D-glucose for ROA calculation. Carbohydrates are a class of biomolecules that benefit greatly from ROA studies as the larger, more complex polysaccharides are inherently difficult to study with traditional structural biology techniques. Calculation of ROA spectra is often difficult for these molecules due to their high conformational flexibility and the need for accurate solvent modelling. This study incorporated solvated structures from the MD trajectories as starting points for the QM/MM calculations. The explicitly solvated structures resulted in spectra with excellent agreement with experiment, particularly when compared to the PCM models, which offered comparatively poor agreement, as shown in Fig. 1. The authors concluded that the implicitly solvated models fail to accurately model the sensitivity of ROA features to hydration effects and that adopting a full MD approach to handle the aqueous environment is essential for carbohydrates.
Fig. 1 ROA spectra of methyl-β-D-glucose. (A) experimental spectrum, (B) gas phase calculated spectrum, (C) PCM calculated spectrum, (D) QM/MM explicitly solvated calculated spectrum. Modified with permission from ref. 7. |
Urago et al.^{76} carried out a similar study to that undertaken above, on a cyclic dipeptide. They found good agreement with experiment but found that a large number of MD snapshots were required to achieve this, particularly in the 1580–1800 cm^{−1} region, which required 120 snapshots. In comparison, Cheeseman et al. required only 16 snapshots for each of the two conformers for excellent agreement across the entire frequency range examined. However, it should be noted that different snapshot optimisation approaches were used in these two publications, where Urago et al. used a technique analogous to OPTSOLUTE and Cheeseman et al. used OPTALL, as described below. This difference in optimisation approach likely results in the difference of number of snapshots required for accurate spectral reproduction.
Fig. 2 Schematic of the approach to calculation of ROA spectra, with arrows representing flow of data between steps. |
In general terms, this approach consists of exploring the configurations accessible to the explicitly solvated system, for temperature and pressure conditions similar to experiment. When no reference data are available, it becomes crucial to extensively sample the conformational space to obtain reliable quantitative information. For flexible molecules such as carbohydrates, the required simulation time can then dramatically rise as the space to explore gets more complex and features more dimensions. Dynamic sampling was used in our work, but Monte–Carlo methods would be appropriate as well. Obviously, for such time-scales, it is an understatement to say that ab Initio MD is impractical. Currently, the only alternative is the use of atomistic molecular models based on classical mechanics (force-fields). So far, most of these rely on fitting methods, parametrisation, and a number of approximations (such as the point-charge description of atomic charge densities, neglecting all polarisation effects) that may severely limit accuracy and transferability.^{77,78} A number of initiatives and developments are progressing toward maturity and will become available in the near future, among them the new generation force-field presented below. In the meantime, special care needs to be dedicated to the choice and testing of the optimal molecular model among the readily available alternatives.
From the sheer mass of data generated during sampling, it is possible to extract qualitative and quantitative information about the energetically favoured configurations. If done carefully, even a superficial conformational analysis can yield very helpful data that can be used to lighten the computational cost of the subsequent QM/MM computation steps. Indeed, by focusing the selection of MD snapshots toward the most probable conformers, in their most average form, one can then lighten the subsequent steps by a reduction in the number of snapshots, and optimisation starting structures being closer to their energetic minima. Eventually, any quantitative insight on the relative occurrence probability of these conformers can be used as weights (represented by arrows of different size in Fig. 2) for the averaging of each snapshot's spectrum into the final prediction.
The term explicit solvation refers to the inclusion of solvent molecules in the simulation. Although implicit solvation models are far less computationally demanding, they rely on a number of approximations that have been shown several times to impede the proper description of solvent effects the spectroscopic responses of carbohydrates.^{7,79,80} As a result, we strongly recommend sticking to explicit solvation when considering carbohydrates.
A possible starting point can be the gas-phase optimised structure of the considered carbohydrate. If not readily available from the literature or the Glycam project,^{81} one can quickly obtain a sufficient estimate through optimisation thanks to a force field, or a quantum chemical computation at a low level of theory. Afterwards, this isolated molecule has to be immersed in a solvent bath: cubic cells of side 30 Å (approximately 900 water molecules) have been a safe choice so far for monosaccharides, while remaining relatively cost effective regards computing time. As interest tends to larger systems, this size is expected to evolve accordingly, so that the overall solvent bath increases in size with the increasing size of the solutes, as to ensure molecules stay surrounded from every side by at least 10 Å of explicit solvent molecules (within a single periodic cell).
Any molecular dynamics or Monte–Carlo computer program with basic features would be suited to the task at hand, provided it can handle a carbohydrate-specific force field (such as Glycam^{82}). Periodic boundary conditions have to be applied to the cubic cell previously described, thus we recommend handling electrostatic interactions with a method adapted to such a system (e.g. Ewald Sums or one of its derivatives).^{77}
Special care must be dedicated to bringing the system to proper equilibrium at experimental conditions (298 K, 1 atm). An explicitly solvated system is likely to be, at first, lying far from its minimal energy configuration, depending on the way the system has been prepared and the software tool used. In our experience, preliminary optimisation, progressive heating, and force-capping can sometimes be necessary. Similarly, we recommend letting the cell volume adjust until reaching a safe convergence threshold within the NPT ensemble, prior to moving toward production simulation (NVT).
In principle, a random sampling of a molecule's dynamic trajectory would provide the time-averaged conformational diversity necessary to predict the spectroscopic response.^{75,76} However, accuracy cannot be guaranteed until the sampling coverage reaches statistical significance. Instead, we advocate an alternative that aims at reducing the number of MD snapshots to process by QM/MM. Indeed, it is possible to maximise the statistical significance of the extracted snapshots by targeting the most probable structures, as identified by conformational analysis of the MD trajectory. Since it is already necessary to run long simulations (at least 50 ns for a monosaccharide^{83}) to ensure exhaustive sampling, the conformer populations can be expected to be reliable enough. As a result, it is possible to use ratios between these populations in order to weight the snapshots’ spectra into an average spectrum representative of the molecule's dynamic and complex conformational space.
To conduct a conformational analysis appropriate for our purposes, it is preferable to focus on general structural features since slight structural changes are to be expected through the QM/MM optimisation. Bond lengths and 3-atom angles are indeed more prone to readjustment and change during optimisation, whereas dihedral angles can be expected to be more constrained. The latter type of geometrical feature typically describes the orientation of chemical groups and ring puckering, which can be expected to be held into place by the environment's general influence. It is unlikely to see dramatic changes during optimisation.
In order to scan a geometrical feature's evolution and detect the occurrence of multiple conformers, it can be sufficient to plot its value against simulation time, as in Fig. 3. As we need a clear picture of the favoured structures and to define their corresponding domain of values, we recommend to push further the analysis and plot histograms, as in Fig. 4, i.e. the occurrence of the considered feature values within regular intervals (bins). Such a graph enables the ability to easily notice any overlap between conformers, shoulders or minor structures. Once each conformer's domain is determined, their populations and average values can be accurately calculated.
Fig. 3 Evolution of the ω dihedral angle (O5–C5–C6–O6) in GlcNAc, a relevant monosaccharide, along simulation time. Two grey lines mark the average value for each of the two observable conformers. |
Fig. 4 Histogram plot of the occurrence of the ω dihedral angle values in GlcNAc, with bins of 5 degrees. |
Once these last data are calculated, they can be used as combinations of target values to filter the MD frames and obtain representative snapshots of each conformer. The closer a snapshot assumes dihedral values from the conformer's average, the quicker it can be expected to optimise. Furthermore, selecting frames as far from each other in the trajectory ensures that the diversity of solvent layer configurations is properly sampled. In cases where it is necessary to include a counter ion in the simulation, it is advised to select only frames where the ion is lying as far as possible from the molecule of interest. It is still unclear what the ideal number of snapshots is in order to obtain an average spectrum in agreement with experiment. Even though good results have been obtained with as few as 6 snapshots, the number is likely to scale up with the conformational variety. Further investigations are currently in progress to address this concern.
The selection of MD frames, taken from the conformational analysis, is then used as a starting point for the QM/MM calculations. The initial step is optimisation of the system. Whilst this may seem contradictory, as it results in loss of dynamic information from the MD simulations, it is necessary, because the vibrational normal modes and frequencies are only valid for minima on the potential energy surface. There are two main approaches used for QM/MM optimisation, carried out using ONIOM^{84} as implemented in Gaussian 09. The first is to optimise the solute and freeze the solvent molecules in the MD geometry, which is dubbed OPTSOLUTE. The second approach is allowing the entire system to optimise, with full geometrical freedom for all molecules, named OPTALL. The second approach, OPTALL, is arguably more accurate as the optimisation of the solvent close to the solute creates a better model of solvent–solute interactions when calculating ROA. However, OPTALL does have a practical disadvantage as convergence to optimisation is difficult and, on occasion, these systems will never reach full optimisation. In these situations the electronic energies of each optimisation step oscillate but do not decrease in value overall. However, they can be considered to be optimised if the maximum and root-mean-square values of the forces meet the convergence criteria of 0.00045 and 0.00030 a.u., respectively, in Gaussian 09. The level of theory recommended for optimisation is B3LYP/6-31G* for the solute and AMBER/TIP3P for the aqueous solvent.
After optimisation of the solvent–solute clusters, harmonic frequency calculations are carried out at the same level of theory as the optimisation. Using the two step approach the ROA tensor invariants for the optimised clusters can be calculated at a different level of theory to that of the optimisation. Based on the benchmark studies of Cheeseman and Frisch^{55} B3LYP/rDPS is recommended, because for large system sizes a combination of 6-31G* and rDPS for force field and ROA calculation, respectively, has proven to be successful.
Raman and ROA intensities are obtained from the appropriate combinations of tensor invariants, and we have included the ν^{4} and Boltzmann factors, which are essential for experimental comparison.^{7} The most common experimental ROA set up utilises SCP in the backscattering geometry, at a laser excitation of 532 nm, such that spectra can be calculated for these experimental parameters.
A simulated spectrum for a single snapshot only takes into account the single conformer present in that cluster, as such conformational averaging over multiple snapshots needs to be carried out. Averaging can be done with equal weighting for all snapshots if a large number have been taken at regular intervals along the trajectory. When conformational analysis has been carried out to find the major conformers the more appropriate approach is to weight the snapshots based on the conformer populations of the MD simulations. Averaging of the spectra should be carried out with the lineshape form (average of the curves) and not the ROA intensities at individual frequencies.
With the steps outlined above it is possible to accurately simulate Raman and ROA spectra for many different chiral systems. These calculations account for not only conformational dynamics but also include explicit modelling of solvent interactions and offer great improvements over implicit solvent models.
One of the most important aspects of ROA analysis is the ability to assign the absolute configuration of chiral molecules. While other techniques are more commonly applied to this problem, such as X-ray diffraction and VCD, ROA possesses considerable advantages for such studies. A study by Haesler et al.^{6} proved this by means of the assignment of absolute configuration of chirally deuterated neopentane. This molecule is chemically inert and its chirality results from dissymmetric mass distribution, with configuration studies unable to be carried out successfully by any other method but ROA.
As ROA exhibits such a high sensitivity to stereochemistry, a large amount of structural and vibrational data can be elucidated from the spectral bands. Therefore, it is important to understand the origins of the aforementioned bands so that they can be used as identifiers of structural motifs. Some spectral regions already have well defined peak assignments, such as in peptides and proteins for which there are a number of bands known to be markers of secondary structure. For other molecules, such as carbohydrates and chiral transition metal complexes, relatively little insight into the details of higher order conformation have been obtained so far. New tools and approaches are still needed to explore the vibrational nature of such complex molecules.
Using software such as Gaussview^{88} or pyvib2,^{89} calculation output from ROA simulations can be visualised in terms of vibrational modes, from which the bands originate. Visualisation of these modes can then lead to the assignment of peaks, as exhibited by Cheeseman et al.^{7} for methyl-β-D-glucose and Humbert-Droz et al.^{90} for rhodium trisethylenediamine. This approach can be successful but has the disadvantage that assignments are limited to a spectrum simulated from one conformer in the gas phase or with implicit solvent models. Individual conformer assignments do offer interesting structural information but it is also important to consider origins of bands for spectra simulated with the more accurate combined approach, outlined above. In order to do so, it is important to consider all the snapshots used to generate the final spectrum. The calculation outputs for each individual snapshot's spectrum can be examined visually, using the software mentioned above. Initially, this can be problematic due to the difficulty of visual analysis when explicit solvent molecules are present, but options can enable them to be presented in different ways, such as wireform, to increase clarity. A preliminary examination of the final spectrum generated from all the snapshots should be carried out to determine the features of greatest interest on which to focus assignments for. Comparison between the assignments of individual snapshots at the frequency of the peaks of interest in the final spectrum can then lead to a final assignment, thus offering analysis of systems modelled in environments closer to that observed experimentally.
In cases where molecules have very diffuse vibrational modes it can be difficult to assign the most important vibrations. Small carbohydrates exemplify this, as nuclear motions arising from most, if not all, of the molecule often contribute to many of the observed vibrations. In these situations, for each individual mode, the analysis of the molecular displacement data (obtained from calculation outputs) can offer deeper insight. Corroboration of the molecular displacements and visual analysis can confirm which atoms make either larger or smaller/negligible contributions to the modelled vibrations.
Combined, visual analysis and molecular displacement data can enable deeper understanding of simulated spectra, and can be used to great effect when assigning peaks in a spectrum generated from several solvated snapshots. Better insight can be obtained regarding the origin of these bands than is possible from examining individual conformers in the gas phase or with PCM. However, this approach is much more laborious and can become much more time-consuming as more snapshots are used. Consequently, such an analysis is much more practical when ROA simulations are based on an optimally limited number of snapshots, as in the presented MD and quantum approaches that incorporates conformational analysis to identify the preferred conformers.
Biomolecular modelling urgently needs more realistic force field potentials. In order to enhance its utility for experimentalists, force field design needs an urgent overhaul because current force-field architecture has remained largely stagnant since the 1980s. This overhaul must be guided by both rigour and imagination, while respecting the quantum physics ultimately underpinning biomolecular structure and behaviour. A long-term concerted research effort^{91–93} in this direction is currently ongoing. At the core of a future-proof force-field is a maximally energy-transferable atom. In this respect, the best atom,^{94} according to literature evidence, is that defined by Quantum Chemical Topology (QCT).
In the condensed matter phase QCT atoms are parameter-free three-dimensional fragments of finite volume, appearing in the electron density. Fig. 5 shows an example in the gas phase, where an outer boundary is fixed for visual purposes. This figure shows how QCT naturally partitions a carbohydrate derivative into its constituent (topological) atoms. These atoms have sharp boundaries, they do not overlap, and they leave no gaps. A change in any nucleus's position will change the electron density and hence the shape of the atoms. Each atomic property (i.e. kinetic energy, charge, dipole moment, volume, etc.) is obtained from the same universal formula, which embodies a 3D integral of the atom's volume.
Combining this QCT partitioning with the universal quantum expression of energy, leads to four types of fundamental energy contributions from which all chemical features and phenomena can be derived. These fundamental energy contributions are (i) intra-atomic energy, (ii) inter-atomic exchange energy, (iii) inter-atomic Coulomb energy and (iv) inter-atomic correlation energy. The first energy covers stereo-electronic effects (e.g. rotation barriers, steric hindrance), the second governs chemical bonding (including weakly covalent interactions) and (hyper)conjugation effects, the third accounts for the ubiquitous electrostatics, while the fourth covers dynamic correlation, which gives rise to dispersion. These contributions are all physically well-defined. They are all derived under the same Ansatz, both conceptually and computationally. This means that they are properly balanced and give the best guarantee to describe energetics of large systems at atomistic level. This approach resolves typical debates in force–field design such as the one on the nature of torsion potentials and that on the need for dedicated hydrogen bonding terms. The QCT force field does not directly mimic energy terms but faithfully expresses what is behind them. It should also be emphasised that the Coulomb interaction will be categorically represented by atomic multipole moments.^{95} This accurate representation of the anisotropy, i.e. the deformation of atomic electron densities eliminates the inaccuracy of point charges.^{78}
Finally, machine learning captures the response of these four fundamental energy contributions upon any change in geometry of the system including its environment. We pioneered the use of kriging in the context of potential design.^{96} The machine learning method of kriging^{97} best handles the complexity^{98} of distant geometrical changes, and models pivotal polarisation effects. Kriging learns the mapping between an atomic quantity (as output, e.g. an atomic multipole moment or self-energy) and the coordinates of the neighbouring atoms (as input, called features).
There are a number of attractive features to Kriging, the main one being that this machine learning method is excellent in handling the high-dimensional complexity of configurational change in condensed matter. The four advantages of kriging are: (i) ranking of feature values according to importance: chemical insight, (ii) performance scales linearly with the dimension of feature space, (iii) trained model is analytical and differentiable, so forces can be computed quickly and accurately and, finally (iv) the knowledge of how an environment influences an atom is stored in Kriging parameters. The latter act as the trained memory of the atoms. Hence, there is no need to perform iterative calculations to self-consistent energies during a MD simulation.
Work is under way at Manchester that aims at demonstrating proof-of-concept that the QCT force field (which is still being developed) can be used in a molecular dynamics simulation. The analytically obtained forces, acting on the nuclei, and due to the interatomic multipolar electrostatic potential^{99} are currently being implemented in the molecular simulation program DL_POLY_4.^{100} The forces caused by the other fundamental forces are also added and the first ever simulation is expected in 2015.
Footnote |
† Current address: School of Applied Sciences, RMIT University, GPO Box 2476, Melbourne VIC 3001, Australia. |
This journal is © The Royal Society of Chemistry 2015 |