Jerelle A.
Joseph
a,
Konstantin
Röder
a,
Debayan
Chakraborty
ab,
Rosemary G.
Mantell
a and
David J.
Wales
*a
aDepartment of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK. E-mail: dw34@cam.ac.uk
bDepartment of Chemistry, The University of Texas at Austin, 24th Street Stop A5300, Austin, TX 78712, USA
First published on 2nd May 2017
The potential energy landscape perspective provides both a conceptual and a computational framework for predicting, understanding and designing molecular properties. In this Feature Article, we highlight some recent advances that greatly facilitate structure prediction and analysis of global thermodynamics and kinetics in proteins and nucleic acids. The geometry optimisation procedures, on which these calculations are based, can be accelerated significantly using local rigidification of selected degrees of freedom, and through implementations on graphics processing units. Results of progressive local rigidification are first summarised for trpzip1, including a systematic analysis of the heat capacity and rearrangement rates. Benchmarks for all the essential optimisation procedures are then provided for a variety of proteins. Applications are then illustrated from a study of how mutation affects the energy landscape for a coiled-coil protein, and for transitions in helix morphology for a DNA duplex. Both systems exhibit an intrinsically multifunnel landscape, with the potential to act as biomolecular switches.
There are actually two aspects to the protein folding problem, which have been extensively discussed in previous reviews.5–9 The first one concerns structure prediction: given an amino acid sequence, what three-dimensional structures will a biomolecule populate at thermodynamic equilibrium to fulfil a certain biological function? The second aspect concerns folding mechanisms and kinetics, which describe how the system achieves its functional three-dimensional structure from a relatively unstructured configuration by navigating along folding routes on the underlying landscape, or switches between different metastable conformations that dominate the equilibrium. Many evolved proteins conform to the ‘principle of minimum frustration’, corresponding to landscapes where kinetic traps are absent, and fold on physiological timescales. This scenario led to the funnel hypothesis, that the native state lies at the bottom of a relatively smooth free energy landscape.5,6 However, some proteins have evolved to exhibit multifunctional character, with an underlying multifunnelled landscape, which can be tuned via mutations,10 ligand-binding,11,12 or even subtle changes in environmental conditions.13 An analogous situation exists for nucleic acids, especially RNA.14,15 Although protein and RNA folding share several common features, significant differences arise due to the polyelectrolyte nature of RNA. Furthermore, due to the lower stability gap between native and non-native states, structure prediction as well as studies of folding mechanisms and kinetics become even more challenging for RNA.16,17
Various experiments18–20 and numerical simulations21–23 have succeeded in extracting specific features of biomolecular energy landscapes. Nonetheless, determining the global topography of the landscape, and predicting or interpreting emergent properties have proved more difficult. In this context, single molecule pulling experiments are an attractive approach.24–26 Recently, several studies27,28 have employed non-equilibrium pulling experiments to reconstruct landscape profiles, using an extended version of Jarzynski's equality29 suggested by Szabo and Hummer.30,31 The specific challenges encountered in computational studies of biomolecular structure and pathways are determined by the complexity of the underlying potential energy landscape. Most biological systems exhibit relatively large barriers between alternative competing morphologies, leading to broken ergodicity. In such situations, brute-force simulations have limited applicability, because the equivalence between time and ensemble averages breaks down over short timescales. Hence, the development of enhanced sampling techniques is essential. Many of these techniques primarily aim at predicting thermodynamic properties, and exploit multiple coupled simulations at different temperatures32–34 or energies,35 predefined reaction coordinates,36–38 or hypersurface deformations39–42 to improve conformational sampling. Obtaining kinetic information, and identifying key intermediates along the multitude of folding pathways, is more difficult. Although techniques based on systematic sampling of reactive trajectories, such as transition path sampling,43,44 transition interface sampling,45 milestoning,46,47 and forward flux sampling,48 can provide kinetic and mechanistic insight, they can be computationally intensive.
The construction of kinetic transition networks provides a complementary way to study biomolecular energy landscapes.49–51 We have actively developed methods based on the potential energy landscape perspective, which provides a convenient framework for building and analysing transition networks.4,50 In this approach, the landscape is coarse-grained into a set of interconnected stationary points [minima (M) and transition states (TS)]. This simplification is particularly attractive because stationary points can be located using geometry optimisation, largely independent of energy barriers. The global thermodynamics is dictated by the configuration space associated with the potential energy minima. The connections between different regions of the landscape are defined in terms of M–TS–M triples, which encode the kinetics.4 Observable properties are extracted from the databases of stationary points using established tools of statistical mechanics and unimolecular rate theory.52–55 It is also possible to analyse the landscape in terms of selected order parameters and lumping schemes to identify states or ensembles, commensurate with the experimental definition.56 Methods that employ explicit dynamics to construct transition networks, mainly in the context of Markov State Models (MSM),49,51,57 are again complementary to methodology based on geometry optimisation.
The potential energy landscape approach has been used to study a range of biomolecular processes. Recent applications include folding of RNA hairpins,58 the origin of heterogeneity in the intrinsically disordered PUMA peptide,59 ring puckering in collagen,60 the effect of mutation on the influenza A virus,61 and the transformation of the landscape for a model protein by a static pulling force.62 The purpose of the present contribution is to briefly summarise the potential energy landscape framework, and discuss recent advances that have made new applications possible.
Biological systems of practical interest can often exceed 100000 atoms in size. The number of degrees of freedom poses a serious challenge to current simulation techniques, and even for a biomolecule of moderate size with around 1000 atoms, attaining true equilibrium is time consuming. To circumvent this problem, a common approach is to coarse grain the atomistic representation of the molecule.63–65 However, such reductionist schemes may not be able to represent key dynamical features, and can result in an artificially smooth landscape. Here we present two strategies within the potential energy landscape perspective that have proved effective in structure prediction, and obtaining thermodynamic and kinetic insight at the all-atom level. In Section 3, we review the key findings of a general local rigid body approach from a recent study of the trpzip1 peptide. On the technological side, implementing geometry optimisation routines for GPU hardware has proved effective, as summarised in Section 4. We then discuss two recent case studies involving biological conformational switches. The first example analyses how mutations reshape the underlying landscape of a coiled-coil protein, and the second case study shows how the potential energy landscape framework can be used to probe transitions between different helical morphologies of a DNA duplex. Both examples exhibit multifunnel energy landscapes, which we associate with intrinsic multifunctional behaviour.59
We start by constructing an initial path from the reactant (A) to the product (B).86–88 Transition state candidates are first located using the doubly-nudged elastic band (DNEB) procedure.86 A double-ended interpolation between A and B produces an intermediate set of images [X1, X2…XM], where Xi represents the Cartesian coordinates of the ith image. Next, harmonic springs are used to connect equivalent atoms in adjacent images, resulting in a spring potential. Components of the spring gradient and the gradient of the true potential are then utilised in the derivation of the elastic band gradient. This gradient prevents interference of the spring potential (which affects convergence of images) and the true potential (which affects the spacing of images) and gives the band its ‘nudging’ properties.89 The complete set of images is then relaxed by L-BFGS minimisation, using a weak convergence condition, which focuses on the local maxima.86
Approximate transition states (the images corresponding to maxima in the DNEB energy profile) are then converged more tightly by hybrid eigenvector-following (HEF).90,91 This is a single-ended procedure for locating transition states, where a single starting configuration is considered.90,91 In contrast, for double-ended procedures, such as the DNEB method, two initial endpoint geometries are needed. In HEF, only one Hessian eigenvector (emin) and the corresponding eigenvalue (λmin) are used for uphill searches and a minimisation procedure, such as the L-BFGS algorithm, is used for optimisation in all other directions.91 The smallest non-zero eigenvalue can be found using the Rayleigh–Ritz ratio , where x represents a small perturbation in the current geometry. To avoid explicit computation of the Hessian (H), λ(x) is estimated from the numerical second derivative of the energy. Through successive iterations, tightly converged transition states are characterised. These transition states are then connected to minima by following approximate steepest-descent paths parallel and antiparallel to the eigenvector corresponding to the unique negative eigenvalue.
The minima and transition states found during DNEB/HEF searches form a database of stationary points. Recall that the goal is to find a connected path between the reactant (A) and product (B). Before each new DNEB/HEF cycle, a metric is needed to determine which minima in the database are most appropriate for subsequent connection attempts. A modified version92 of Dijkstra's algorithm,93 which selects pairs based on a minimised Euclidean distance metric, is used for this purpose. In a database of stationary points the total set of minima can be described using a complete graph G(M,D); where the nodes M represent minima and the edges D represent the transition states. The edge weight w(x,y) between arbitrary minima x and y is set to zero if the minima are connected by a single transition state. If the number of connection attempts between minima x and y reaches the maximum value (set by the user), the edge-weight becomes infinite. Otherwise, w(x,y) is expressed as a function of the Euclidean distance. At the beginning of each DNEB/HEF cycle, pairs of minima for connection attempts are prioritised based on these weights. This process is repeated until there are no missing connections along the path.
The doubly-nudged elastic band procedure, hybrid eigenvector-following, and the shortest path algorithm are implemented in the OPTIM code.94 Parallel OPTIM runs are organised by the PATHSAMPLE program.95
The SHORTCUT scheme96,97 chooses pairs of minima from the current ‘fastest’ path that are separated by a minimum number of transition states (steps) and attempts to connect them using the procedures discussed above. The SHORTCUT procedure usually decreases the total number of steps on the path and leads to a significant increase in the overall rate constant. Alternatively, the SHORTCUT BARRIER scheme92,96 selects minima on either side of the largest barriers on the current path, up to a maximum number of steps apart. Additional connection attempts between these minima may find paths avoiding such high barriers, again improving the rate constant.
However, these procedures may also introduce kinetic traps, in the form of low-lying minima separated from the product minimum by high barriers. Most of these traps are artificial and are due to insufficient sampling. To find low-barrier paths for these minima the UNTRAP scheme96 is used. Candidate minima for ‘untrapping’ are chosen based on the ratio of the potential energy barrier and potential energy difference from the product (B). Hence, minima with low potential energies connected by high barriers are most likely to be chosen. Connection attempts between these minima and the product minimum then proceed in search of better paths. Local free energies can also be used.
![]() | (1) |
![]() | (2) |
Fi(T) = −kBT![]() ![]() | (3) |
In the local rigidification scheme employed here, groups of atoms, whose relative coordinates vary on timescales significantly shorter than the process of interest, are treated as rigid units.115–117 Each non-linear rigid unit, regardless of size, then has six degrees of freedom: three translational and three rotational. Since interactions between sites within each local rigid body are not required, this formulation results in computational savings in evaluating the energy and gradient. Additionally, fewer geometry optimisation steps are generally needed, due to the dimensionality reduction of the conformational search space. Hence, significant speedups in the mean first encounter time for global optimisation are possible.116 Moreover, a clear mapping between the minima found on locally rigidified and unconstrained potential energy landscapes can be maintained.116
In a recent article,108 we investigated the systematic effects of local rigidification on the structure, thermodynamics and kinetics of trpzip1.119 The tryptophan side-chain rings, peptide bonds, trigonal planar centres and termini were grouped as rigid bodies, and, based on these sets, we formulated three LRB schemes: I – rings, II – rings and peptide bonds, III – rings, peptide bonds, termini, trigonal planar centres treated as local rigid bodies. Trpzip1 was represented by the AMBER99SB forcefield,120 and discrete path sampling71,72 was used to analyse the potential energy landscape for each LRB scheme, as well as the unconstrained (U) peptide.
The main conclusions are summarised in Fig. 2 and 3. The potential energy range and structural heterogeneity of the locally rigidified landscapes are consistent with the unconstrained landscape (Fig. 2). However, there is a significant increase in the number of prominent subfunnels in the disconnectivity graph for the most constrained case (Fig. 2d). This result suggests that excessive rigidification can reduce the conformational flexibility of the protein in an unphysical manner, leading to artificially high barriers on the potential energy landscape.
![]() | ||
Fig. 2 Potential energy disconnectivity graphs for TZ1 (ΔE = 2 kcal mol−1) at different levels of local rigidification. The branches are coloured based on order parameters L (the sum of the four inner native hydrogen-bond lengths and the distances between the CD2 atoms of the three TRP pairs) and S (the relative orientation of the TRP rings). The three main morphologies are: blue (L < 60 Å, S-value = +1), green (L < 60 Å, S-value = −1), and red (all other minima). Adapted from ref. 108. |
![]() | ||
Fig. 3 Summary of thermodynamic and mechanistic properties for trpzip1, as a function of local rigidification: U – no local rigid bodies; I – TRP rings, II – TRP rings and peptide bonds, III – TRP rings, peptide bonds, termini, trigonal planar centres treated as rigid bodies. (a) Constant volume heat capacity curves: the heat capacities are divided by the appropriate total number of degrees of freedom (DOF) and the melting temperature of the unconstrained peptide, TUm, is indicated. The free energy global minima at 0.48 and 0.88 kcal mol−1 are superimposed on the plot; key: red (U), green (I), blue (II), magenta (III). (b) Variation of the total potential energy (kcal mol−1) with the integrated path length (Å) for the fastest folding path from the denatured trpzip1 peptide to the global minimum. Some morphologies encountered along the paths are shown. Adapted from ref. 108. |
Nonetheless, the free energy global minimum (particularly at low temperatures) and the melting temperature for the locally rigidified implementations were in good agreement with the unconstrained benchmarks (Fig. 3a). To assess the effects of local rigidification on the folding mechanism, we extracted the fastest folding path71–73 from each transition network (Fig. 3b). The folding pathways for the unconstrained peptide, schemes I and II, were found to be consistent with an initial hydrophobic collapse (s1 → s2), followed by subsequent zipping (s2 → s4). However, for LRB scheme III a significant lengthening of the folding pathway was observed, which is consistent with excessive reduction in conformational flexibility.
These results suggest that local rigidification should be useful for analysis of thermodynamics and folding mechanisms of biomolecules. Moreover, the results provide some practical guidelines for systematically eliminating unnecessary degrees of freedom, and demonstrate how too much rigidification can significantly perturb pathways and kinetics. An appropriate choice of local rigidification (based on the level of resolution required, and the timescale to be probed) can preserve the underlying properties, and provide useful gains in computational efficiency.
The number of stationary points on the energy landscape increases exponentially with system size,98,121 but the mean first encounter time for the global minimum in basin-hopping global optimisation for atomic clusters seems to scale roughly as the cube of the number of atoms.68 Hence, removing irrelevant degrees of freedom will be even more beneficial if the size scaling is steeper, which may be the case for larger proteins.
The most time-consuming component in all of the computational energy landscapes approaches is the calculation of the potential energy and gradient. For biomolecules, the task of adapting a force field for GPU was facilitated by the release of a GPU-accelerated version of the AMBER potential.124 The generalised Born (GB) implicit solvent potential from AMBER 12 was interfaced with all the energy landscapes codes. The DPDP precision model was used, in which contributions to forces and their accumulation are both performed in double precision.
Basin-hopping global optimisation and transition state determination employ several variants of the L-BFGS algorithm. In each case, the whole algorithm was ported to the GPU, and compared to an implementation with just the potential on GPU. Our GPU implementation of L-BFGS was based on the code of Wetzl and Taubmann,125,126 with various modifications applied to make it as similar as possible to our CPU code. The coordinate transformation and gradient projection functions from our local rigid body framework were also ported to GPU. Extensive use was made of the cuBLAS library127 in this work, alongside custom CUDA kernels.
Tests were performed for eight different system sizes ranging from 81 to 22811 atoms. A subset of these results is presented here for the trimeric haemagglutinin (HA) glycoprotein of the influenza A(H1N1) virus128 (‘full trimer’, 22
811 atoms), a monomeric version of HA (‘full monomer’, 7585 atoms), and finally a truncated version of this monomer (‘truncated monomer’, 3522 atoms). These systems are shown in Fig. 4. L-BFGS was benchmarked in isolation as a proxy for full basin-hopping global optimisation. The change in coordinates and the change in gradient for the last m steps are used in calculating the step direction for minimisation.129 The history size, m, affects the accuracy of the L-BFGS step direction and the overall time taken for minimisation. Results for L-BFGS are shown in Table 1 for history sizes 4 and 1000. Significant speed ups are obtained of up to two orders of magnitude. A large history size is more favourable for bigger systems with the whole L-BFGS algorithm on GPU, whereas a smaller history size gives optimal performance for smaller systems; in contrast, for CPUs a larger history size is always more favourable. We extended our analysis to find the optimal history size and GPU implementation for the full HA trimer. The fastest average minimisation time obtained was 1788 s for a history size of 75 with the whole L-BFGS algorithm on GPU (implementation 1).
![]() | ||
Fig. 4 Structures of the biomolecules used in the tests of GPU geometry optimisation procedures. Adapted from ref. 122. |
History size | System | Average minimisation time for GPU implementation 1/seconds | Average minimisation time for GPU implementation 2/seconds | Average minimisation time for CPU implementation/seconds | Time for CPU implementation/time for GPU implementation 1 | Time for CPU implementation/time for GPU implementation 2 |
---|---|---|---|---|---|---|
a GPU implementation 1 has the entire L-BFGS routine on GPU, including the potential calculation. GPU implementation 2 has just the potential calculation on GPU. See the original paper for full descriptions of all the parameters.122 | ||||||
4 | Truncated monomer | 40.1 | 39.4 | 5937.7 | 148.1 | 150.8 |
Full monomer | 489.9 | 492.1 | 86552.8 | 176.7 | 175.9 | |
Full trimer | 2546.9 | 2527.1 | 517567.7 | 203.2 | 204.8 | |
1000 | Truncated monomer | 382.1 | 239.6 | 4561.2 | 11.9 | 19.0 |
Full monomer | 1249.5 | 1609.0 | 48754.5 | 39.0 | 30.3 | |
Full trimer | 2392.5 | 4747.1 | 337077.0 | 140.9 | 71.0 |
Tests were also performed to compare transition state searches on GPU and CPU. Interfacing the DNEB procedure with the GPU-accelerated potential gave a 178 times speed up compared to CPU for the full HA trimer with a small history size of 4. Results for hybrid eigenvector-following with HA are shown in Table 2. Again, significant improvements are obtained.
History size | System | Average time for GPU implementation (m = 4)/seconds | Average time for CPU implementation (m = 4)/seconds | Time for CPU implementation/time for GPU implementation (m = 4) |
---|---|---|---|---|
a The GPU implementation has the entire Rayleigh–Ritz L-BFGS and L-BFGS routines with gradient projection on GPU, including the potential calculation. See the original paper for full descriptions of all the parameters.122 | ||||
4 | Truncated monomer | 25.7 | 3529.1 | 137.4 |
Full monomer | 197.1 | 36779.8 | 186.6 | |
Full trimer | 1047.6 | 176550.7 | 168.5 | |
1000 | Truncated monomer | 186.3 | 3160.9 | 17.0 |
Full monomer | 669.6 | 32046.2 | 47.9 | |
Full trimer | 1225.4 | 169234.0 | 138.1 |
Local rigidification is now supported on GPU in conjunction with all of these methods in the GMIN85 and OPTIM94 programs. The GPU implementations of the coordinate transformations and gradient projection were found to incur negligible computational overhead in each L-BFGS procedure.
A single funnel on the free energy landscape leads to a single dominant ensemble corresponding to the native state, yet for many biological systems more than one stable fold is observed, e.g. activated states or misfolded states. In many cases these additional stable states require a significant rearrangement of the original folded structure. The resulting alternative morphologies are unlikely to exist within a single funnel on the energy landscape. As a result we observe multifunnel energy landscapes that contain at least two competing morphologies. One important group of multifunnel landscapes are associated with proteins that exhibit stable misfolded structures. Such structures alter the functionality of the protein, and hence they may be pathological. Particularly important examples are amyloid forming proteins, such as Aβ1–42. Another set of proteins likely to exhibit multifunnel landscapes are those that show activated and inactive states, such as kinases. Apart from these important biological and medically relevant proteins, artificial systems used in nanotechnology, such as peptide-based molecular switches, may also require multifunnel landscapes to represent alternative states of an engineered system.
An interesting question for multifunnel landscapes involves the changes caused by mutations, which affect the observable properties. Analysing these changes will lead to a better understanding of biological processes, such as activation and disease, as well as design principles for functional peptides. Previous studies of mutations suggest that naturally occurring proteins possess an inherent stability towards mutations.134 Hence, if a protein still folds after mutation, the folded structure may closely resemble the wild type native state. However, it has also been noted that this stability is likely to be marginal, and may be overcome.135–137
Recently, we have studied a system that exemplifies these features.133 The coiled-coil protein, GCN4-pLI, experimentally exhibits exclusively parallel assemblies of α-helices. A mutation of the parent sequence, namely E20S, leads to the observation of both parallel and antiparallel states.138 Other studies have considered a variety of similar coiled-coil systems, and concluded that there is competition between different oligomer sizes, as well as between parallel and antiparallel alignments.139–142 A multifunnel landscape was suggested as one explanation for this behaviour.143 Our calculations used the properly symmetrised144,145 AMBER ff99SB force field,120,146–148 with an implicit generalised Born solvation model149,150 using infinite interaction cutoffs and the Debye–Hückel approximation for salt (0.1 M).151
The energy landscapes for the parent sequence and the E20S mutant dimers resemble each other, as shown in Fig. 5, in that the main topography is maintained. There are two distinct funnels for parallel (red) and antiparallel (blue) structures. However, a key difference is found in the intermediate region between these funnels. The E20S mutant supports stable intermediate structures, which show a kink in one of the helices (one example is shown in Fig. 5). These low energy structures facilitate interconversion at lower temperatures, which will not occur for the parent sequence, a feature that leads to signatures in the calculated heat capacity curves. The structural basis for this effect is the change in interaction patterns caused by the mutation. When the energy landscapes are analysed regarding the distribution of the local minima with respect to potential energy, and the logarithm of the product of their normal mode frequencies,133 we find that the E20S mutant exhibits a bimodal distribution, corresponding to the parallel and the antiparallel funnels in the landscape. In contrast, the parent sequence only exhibits unimodal distributions, suggesting that the energy landscape is based on an underlying single funnel.
![]() | ||
Fig. 5 Disconnectivity graphs for the E20S mutant (top) and the parent sequence (bottom). The order parameter q1 represents the angle between the two helices and is 0 for parallel and 1 for antiparallel alignments. The separate funnels are clearly visible. Adapted from ref. 133. |
Further insight can be obtained by decomposing the contributions of local minima to the heat capacity features, using the temperature gradient of the occupation probabilities.152 For the E20S mutant three peaks in the heat capacity curve can be associated with three distinct transitions (see Fig. 6): (i) from the global minimum to other low-lying minima in the same funnel, (ii) from minima in the parallel funnel to minima in the intermediate region, and (iii) a melting peak. The second transition is the key to the thermal accessibility of both parallel and antiparallel states at ambient conditions.133 In contrast, the parent sequence only shows transitions within the same funnel and a single melting peak.
![]() | ||
Fig. 6 The heat capacity curve for the E20S mutant shows three distinct peaks. The central peak (A) corresponds to the transition from the parallel funnel to intermediate states. The disconnectivity graph (left) highlights the minima involved, colouring branches red and blue for decreasing and increasing occupation probabilities, respectively, for the states that make the largest contributions to the heat capacity peak.152 The highest temperature peak (B) corresponds to the melting peak and a disconnectivity graph with the key minima coloured in the same way is shown on the right. These results exploit a formulation of the heat capacity written in terms of occupation probability temperature gradients,152 which enables the contribution of each minimum to be quantified. Adapted from ref. 133. |
Moving to more disruptive mutations, the energy landscapes of three further mutants, E20P, E20A and E20G, all exhibit distinct features (Fig. 7). While all three still support stable dimers, both antiparallel and parallel, a number of significant changes occur. Firstly, the energy differences between funnels shift as the dimers are destabilised. Additionally, for E20A and E20G, we see new funnels appearing. Furthermore, the number of intermediate structures increases and they show a larger variety of conformations. While the E20S mutant exhibits one particular interconversion mechanism, the larger perturbations of the bonding pattern lead to coiling and helix kinking in many places.
![]() | ||
Fig. 7 Disconnectivity graphs for E20A (left), E20G (middle) and E20P (right) mutants. The order parameter q1 represents the angle between the two helices and is 0 for parallel and 1 for antiparallel alignments. Adapted from ref. 133. |
These results suggest a number of important conclusions. First of all, the effects of mutations can be analysed using energy landscape methods, leading to a detailed understanding of energetic balances, structural and mechanistic differences, and changes of associated observables, such as the heat capacity curves. Moreover, protein sequences may be mutated in silico to use energy landscape methods as a predictive tool in engineering and de novo design. Our results also reveal the inherent complexity of multifunnel landscapes and the nature of changes associated with mutations. Last but not least, even the relatively simple coiled-coil system highlights the marginal stability of these energy landscapes towards mutations.
Although the helical transitions are relatively slow, this does not present a problem for methodology based on geometry optimisation. In fact, the transition state theory formulation we employ to calculate the individual minimum-to-minimum rate constants using harmonic vibrational densities of states will probably be most reliable for pathways with higher overall barriers. We investigated the landscape and helix interconversion kinetics for a DNA hexamer duplex with the sequence (CGCGCG)2, using a properly symmetrised144,145 version of the AMBER99bsc0 force-field,179 and ε and ζ torsional corrections180 with implicit solvent treated by a generalised Born model,149,150 and an effective salt concentration of 1.0 M from the Debye–Hückel approximation.151 Full details are available in the original report,181 including comparisons with molecular dynamics runs using explicit solvent, which provide a check on the stability of the predicted structures on short timescales.
The resulting free energy landscape at 298 K is illustrated in Fig. 8, where minima separated by free energy barriers less than 5 kcal mol−1 have been recursively regrouped.56,182 There are two principal funnels, which can be largely assigned to left- and right-handed helices. The global free energy minimum under these conditions corresponds to an ensemble of BDNA configurations, including structures with terminal bases frayed out, stacked on top of the base-pairing partner, and trans Watson–Crick/sugar edge hydrogen-bonding patterns, in addition to the canonical form. Helices exhibiting B–Z junctions and ADNA structures are also present in this funnel, and are generally connected to BDNA minima via low downhill barriers. The B–Z junctions have the first two base-pairs in a ZDNA-like conformation, and the last three in a BDNA-like conformation, with the G3 and C10 bases at the boundary flipped out.
![]() | ||
Fig. 8 Free energy landscape computed at 298 K using a regrouping threshold182 of 5 kcal mol−1.181 The branches are coloured according to the handedness (H) of the DNA structure, with positive values representing right-handed conformations and negative values representing left-handed conformations.185 Some representative structures from the different conformational ensembles are also shown. Adapted from ref. 181. |
The funnel containing ZDNA is more structurally homogeneous, and our results support the suggestion that BDNA is more flexible. This funnel also contains conformations that we have described as left-handed BDNA.181 At high energy we see stretched DNA structures, where the helix is significantly unwound. These minima are different from the SDNA state that forms during overstretching, which is thought to be a ladder-like structure, with Watson–Crick base-pairs maintained throughout.183,184 The stretched forms that we have characterised have no such pairs.
A mechanistic analysis of the landscape reveals two distinct pathways (Fig. 9) for the B → Z transformation:181 one involving a B–Z junction and the other sampling stretched intermediates. The stretched minima are accessed via helix unwinding, loss of Watson–Crick base-pairs, with the central guanines flipping from anti to syn conformations before the guanines near the helix termini. This ordering is reversed for the BZ-junction mechanism, which is predicted to be more favourable at 298 K, since the enthalpy barrier is lower.181 However, stretched intermediates will become more competitive at higher temperatures.
![]() | ||
Fig. 9 Potential energy (V) as a function of the integrated path length (s) for pathways corresponding to the BDNA to ZDNA transition, via BZ junction (top panel), and stretched intermediates (bottom panel). Adapted from ref. 181. |
In contrast to the B → Z transition pathways, the A → B transformation is essentially downhill (Fig. 10), consistent with previous molecular dynamics simulations.186,187 The small barriers encountered along the pathway correspond to repuckering of the ribose sugars, which in turn are coupled to global changes in the helix morphology. The key features of the Calladine–Drew model are correctly captured in our proposed mechanism.188
![]() | ||
Fig. 10 Potential energy (V) as a function of the integrated path length (s) for pathways corresponding to ADNA to BDNA transition (top panel), left-handed BDNA to ZDNA transition (bottom panel). Adapted from ref. 181. |
A switch in the chain topology is required for the transformation from the left-handed BDNA structure to ZDNA (Fig. 10). This transition does not require large-scale deformations of the DNA duplex, and is largely driven by concerted motions involving only a few nucleobases. The anti to syn torsional flips of the guanine bases, which are necessary to reach the ZDNA state, introduce kinetic bottlenecks in the pathway.
Due to the intrinsic coupling between many degrees of freedom, probing helical transitions in a DNA duplex is a challenging task. This case study therefore provided an ideal testing ground for the potential energy landscape framework. Our simulations captured several aspects of the models proposed for helical transformations in DNA, and revealed some additional complexity, which would be difficult to capture in approaches that require the definition of a reaction coordinate.
Accelerating the sampling and exploration of the potential energy landscape has many benefits, especially in terms of treating larger systems. Initial surveys involving global optimisation to find favourable conformations are also a stringent test of the empirical force field. Identifying unphysical features can inform future force field development and improvement, which provides the foundations on which analysis of thermodynamic properties and kinetics depends.
Many new applications can be envisaged for biomolecules, including more coarse-grained models of mesoscopic systems. The two applications we have summarised for this report show how new insight into mechanism and design of new materials might arise. Both the coiled-coil protein and the DNA duplex landscapes exhibit multifunnel organisation, with competing low energy morphologies separated by relatively high barriers. The potential to tune such landscapes through mutations, applied fields, salt concentration, temperature, and specific ligands, raises the possibility of new design principles for multifunctional devices. The ability to explore such designs through theory and simulation, in tandem with feedback from experiment, could significantly facilitate progress in these endeavours.
This journal is © The Royal Society of Chemistry 2017 |