Exploring biomolecular energy landscapes

The potential energy landscape perspective provides both a conceptual and a computational framework for predicting, understanding and designing molecular properties. In this Feature Article, we highlight some recent advances that greatly facilitate structure prediction and analysis of global thermodynamics and kinetics in proteins and nucleic acids. The geometry optimisation procedures, on which these calculations are based, can be accelerated significantly using local rigidification of selected degrees of freedom, and through implementations on graphics processing units. Results of progressive local rigidification are first summarised for trpzip1, including a systematic analysis of the heat capacity and rearrangement rates. Benchmarks for all the essential optimisation procedures are then provided for a variety of proteins. Applications are then illustrated from a study of how mutation affects the energy landscape for a coiled-coil protein, and for transitions in helix morphology for a DNA duplex. Both systems exhibit an intrinsically multifunnel landscape, with the potential to act as biomolecular switches.


Introduction
The structure-function paradigm continues to play a central role in advancing our understanding of molecular biology.With the advent of modern spectroscopy and computer simulation techniques, it is now possible to study biomolecules at very fine spatial and temporal resolutions to obtain new insight into the structure-function relationship.It is becoming increasingly clear that evolution has endowed biomolecules with a certain degree of flexibility, which allows them to not only adopt stable structures, but also switch between different conformations over a hierarchy of timescales, to perform different functions. 1,2hese results highlight the importance of a dynamical perspective.The notion that biomolecules are 'static sculptures' is not sufficient to describe their impressive functional capabilities. 3 deeper understanding of the intimate connections between structure, dynamics and function at the molecular level is therefore necessary.In this Feature Article we illustrate how these connections can be addressed in the framework of potential energy landscape theory. 4][7][8][9] The first one concerns structure prediction: given an amino acid sequence, what three-dimensional structures will a biomolecule populate at thermodynamic equilibrium to fulfil a certain biological function?The second aspect concerns folding mechanisms and kinetics, which describe how the system achieves its functional three-dimensional structure from a relatively unstructured configuration by navigating along folding routes on the underlying landscape, or switches between different metastable conformations that dominate the equilibrium.Many evolved proteins conform to the 'principle of minimum frustration', corresponding to landscapes where kinetic traps are absent, and fold on physiological timescales.This scenario led to the funnel hypothesis, that the native state lies at the bottom of a relatively smooth free energy landscape. 5,6However, some proteins have evolved to exhibit multifunctional character, with an underlying multifunnelled landscape, which can be tuned via mutations, 10 ligand-binding, 11,12 or even subtle changes in environmental conditions. 13An analogous situation exists for nucleic acids, especially RNA. 14,15Although protein and RNA folding share several common features, significant differences arise due to the polyelectrolyte nature of RNA.Furthermore, due to the lower stability gap between native and non-native states, structure prediction as well as studies of folding mechanisms and kinetics become even more challenging for RNA. 16,17arious experiments [18][19][20] and numerical simulations [21][22][23] have succeeded in extracting specific features of biomolecular energy landscapes.Nonetheless, determining the global topography of the landscape, and predicting or interpreting emergent properties have proved more difficult.][26] Recently, several studies 27,28 have employed non-equilibrium pulling experiments to reconstruct landscape profiles, using an extended version of Jarzynski's equality 29 suggested by Szabo and Hummer. 30,31The specific challenges encountered in computational studies of biomolecular structure and pathways are determined by the complexity of the underlying potential energy landscape.Most biological systems exhibit relatively large barriers between alternative competing morphologies, leading to broken ergodicity.In such situations, brute-force simulations have limited applicability, because the equivalence between time and ensemble averages breaks down over short timescales.Hence, the development of enhanced sampling techniques is essential.Many of these techniques primarily aim at predicting thermodynamic properties, and exploit multiple coupled simulations at different temperatures [32][33][34] or energies, 35 predefined reaction coordinates, [36][37][38] or hypersurface deformations [39][40][41][42] to improve conformational sampling.Obtaining kinetic information, and identifying key intermediates along the multitude of folding pathways, is more difficult.Although techniques based on systematic sampling of reactive trajectories, such as transition path sampling, 43,44  This journal is © The Royal Society of Chemistry 2017 milestoning, 46,47 and forward flux sampling, 48 can provide kinetic and mechanistic insight, they can be computationally intensive.
][51] We have actively developed methods based on the potential energy landscape perspective, which provides a convenient framework for building and analysing transition networks. 4,50n this approach, the landscape is coarse-grained into a set of interconnected stationary points [minima (M) and transition states (TS)].This simplification is particularly attractive because stationary points can be located using geometry optimisation, largely independent of energy barriers.The global thermodynamics is dictated by the configuration space associated with the potential energy minima.The connections between different regions of the landscape are defined in terms of M-TS-M triples, which encode the kinetics. 4][54][55] It is also possible to analyse the landscape in terms of selected order parameters and lumping schemes to identify states or ensembles, commensurate with the experimental definition. 56ethods that employ explicit dynamics to construct transition networks, mainly in the context of Markov State Models (MSM), 49,51,57 are again complementary to methodology based on geometry optimisation.
The potential energy landscape approach has been used to study a range of biomolecular processes.Recent applications include folding of RNA hairpins, 58 the origin of heterogeneity in the intrinsically disordered PUMA peptide, 59 ring puckering in collagen, 60 the effect of mutation on the influenza A virus, 61 and the transformation of the landscape for a model protein by a static pulling force. 62The purpose of the present contribution is to briefly summarise the potential energy landscape framework, and discuss recent advances that have made new applications possible.
Biological systems of practical interest can often exceed 100 000 atoms in size.The number of degrees of freedom poses a serious challenge to current simulation techniques, and even for a biomolecule of moderate size with around 1000 atoms, attaining true equilibrium is time consuming.4][65] However, such reductionist schemes may not be able to represent key dynamical features, and can result in an artificially smooth landscape.Here we present two strategies within the potential energy landscape perspective that have proved effective in structure prediction, and obtaining thermodynamic and kinetic insight at the all-atom level.In Section 3, we review the key findings of a general local rigid body approach from a recent study of the trpzip1 peptide.On the technological side, implementing geometry optimisation routines for GPU hardware has proved effective, as summarised in Section 4. We then discuss two recent case studies involving biological conformational switches.The first example analyses how mutations reshape the underlying landscape of a coiled-coil protein, and the second case study shows how the potential energy landscape framework can be used to probe transitions between different helical morphologies of a DNA duplex.Both examples exhibit multifunnel energy landscapes, which we associate with intrinsic multifunctional behaviour. 59

Methods
The potential energy landscape is characterised by stationary points: local minima and transition states.A stationary point is a local minimum if all the non-zero normal mode frequencies are real, whereas transition states are classified geometrically as stationary points with one imaginary normal mode frequency. 662][73] We then explain how the free energy surface may be computed from the underlying potential energy landscape using the superposition approach. 4,74astly, we discuss the use of disconnectivity graphs [75][76][77][78] to visualise the landscape.The more recent developments, in terms of local rigidification and implementation of geometry optimisation techniques on GPU hardware, are described in more detail in the following sections.

Structure prediction by basin-hopping global optimisation
The basin-hopping global optimisation procedure 67,68 has been successfully applied to a wide range of systems, spanning atomic clusters, 68 glass formers, 79 and biomolecules. 80This method employs a hypersurface deformation, but does not change the global minimum of the potential energy surface (PES).Each configuration on the PES can be represented by a 3N-dimensional vector X, where N is the number of atoms and the energy corresponding to X is given by E(X).The energy obtained by a minimisation starting from X is written as min{E(X)}.In the present work energy minimisation was achieved using the limited-memory BFGS (L-BFGS) 81,82 algorithm, which is wellsuited for large-scale problems, since the user is able to control the amount of storage required.The PES is the union of the 'basins of attraction' 83,84 of all the local minima.This procedure effectively removes all downhill barriers between connected minima.Basin-hopping global optimisation has been implemented in the GMIN program, 85 which is available for use under the GNU General Public License.

Kinetic transition networks from discrete path sampling
In discrete path sampling [71][72][73] (DPS) the aim is to determine the kinetics of a system from a collection of transition pathways, connecting reactant (e.g. a denatured configuration) and product (e.g. the native structure) states.A discrete path is defined as a sequence of local minima on the potential energy surface (PES) and the transition states that directly connect them.
7][88]  located using the doubly-nudged elastic band (DNEB) procedure. 86 double-ended interpolation between A and B produces an intermediate set of images [X 1 , X 2 . ..X M ], where X i represents the Cartesian coordinates of the ith image.Next, harmonic springs are used to connect equivalent atoms in adjacent images, resulting in a spring potential.Components of the spring gradient and the gradient of the true potential are then utilised in the derivation of the elastic band gradient.This gradient prevents interference of the spring potential (which affects convergence of images) and the true potential (which affects the spacing of images) and gives the band its 'nudging' properties.89 The complete set of images is then relaxed by L-BFGS minimisation, using a weak convergence condition, which focuses on the local maxima.86 Approximate transition states (the images corresponding to maxima in the DNEB energy profile) are then converged more tightly by hybrid eigenvector-following (HEF).90,91 This is a singleended procedure for locating transition states, where a single starting configuration is considered.90,91 In contrast, for doubleended procedures, such as the DNEB method, two initial endpoint geometries are needed.In HEF, only one Hessian eigenvector (e min ) and the corresponding eigenvalue (l min ) are used for uphill searches and a minimisation procedure, such as the L-BFGS algorithm, is used for optimisation in all other directions.91 The smallest non-zero eigenvalue can be found using the Rayleigh-Ritz ratio lðxÞ ¼ x t Hx x 2 , where x represents a small perturbation in the current geometry.To avoid explicit computation of the Hessian (H), l(x) is estimated from the numerical second derivative of the energy.Through successive iterations, tightly converged transition states are characterised.These transition states are then connected to minima by following approximate steepest-descent paths parallel and antiparallel to the eigenvector corresponding to the unique negative eigenvalue.
The minima and transition states found during DNEB/HEF searches form a database of stationary points.Recall that the goal is to find a connected path between the reactant (A) and product (B).Before each new DNEB/HEF cycle, a metric is needed to determine which minima in the database are most appropriate for subsequent connection attempts.A modified version 92 of Dijkstra's algorithm, 93 which selects pairs based on a minimised Euclidean distance metric, is used for this purpose.In a database of stationary points the total set of minima can be described using a complete graph G(M,D); where the nodes M represent minima and the edges D represent the transition states.The edge weight w(x,y) between arbitrary minima x and y is set to zero if the minima are connected by a single transition state.If the number of connection attempts between minima x and y reaches the maximum value (set by the user), the edge-weight becomes infinite.Otherwise, w(x,y) is expressed as a function of the Euclidean distance.At the beginning of each DNEB/HEF cycle, pairs of minima for connection attempts are prioritised based on these weights.This process is repeated until there are no missing connections along the path.
The doubly-nudged elastic band procedure, hybrid eigenvectorfollowing, and the shortest path algorithm are implemented in the OPTIM code. 94Parallel OPTIM runs are organised by the PATHSAMPLE program. 95

Optimisation of stationary point databases
The initial path found between reactant and product states is usually long with many high barriers, particularly for states distant in configuration space.The objective is then to grow the stationary point database and locate kinetically relevant paths.At any point, the fastest path (B ' A) is taken as the one making the largest contribution to the steady-state rate constant, k SS BA , which can be defined as a sum over all discrete paths with the steady-state approximation for intervening minima. 71,72nce the fastest path is identified, it can then be used in various ways to search for new paths.
The SHORTCUT scheme 96,97 chooses pairs of minima from the current 'fastest' path that are separated by a minimum number of transition states (steps) and attempts to connect them using the procedures discussed above.The SHORTCUT procedure usually decreases the total number of steps on the path and leads to a significant increase in the overall rate constant.Alternatively, the SHORTCUT BARRIER scheme 92,96 selects minima on either side of the largest barriers on the current path, up to a maximum number of steps apart.Additional connection attempts between these minima may find paths avoiding such high barriers, again improving the rate constant.
However, these procedures may also introduce kinetic traps, in the form of low-lying minima separated from the product minimum by high barriers.Most of these traps are artificial and are due to insufficient sampling.To find low-barrier paths for these minima the UNTRAP scheme 96 is used.Candidate minima for 'untrapping' are chosen based on the ratio of the potential energy barrier and potential energy difference from the product (B).Hence, minima with low potential energies connected by high barriers are most likely to be chosen.Connection attempts between these minima and the product minimum then proceed in search of better paths.Local free energies can also be used.

Computation of free energies
Thermodynamic properties can be estimated directly from the underlying potential energy landscape using the superposition approach.At a given temperature, the canonical partition function Z(T) is written as a sum of contributions from the basins of attraction of the local minima: 4,52,74,98-101 A harmonic approximation is often used to estimate the vibrational partition function of each local minimum, where n i is the number of distinct permutational isomers of minimum i with potential energy V i , b = 1/k B T, k B is the Boltzmann constant, % v i is the geometric mean vibrational frequency and k is the number of vibrational degrees of freedom. 4he free energy of each minimum is then: This journal is © The Royal Society of Chemistry 2017 The harmonic approximation has also been used as a starting point for superposition calculations that include quantum corrections and anharmonicity. 99,102-107

Visualisation of energy landscapes: disconnectivity graphs
7][78] Fig. 1 illustrates how a disconnectivity graph (red lines) may be constructed from a database of minima and the transition states that connect them (blue curve).The energy is represented on the vertical axis of the graph, while the horizontal axis can be arbitrary or may represent an order parameter.In the disconnectivity graph, a vertical line is drawn from each minimum (A-D), beginning at the potential or free energy of that state.At the energy threshold E + DE minima A and B are grouped together, since the transition state connecting them lies below the threshold, and similarly for minima C and D. However, the two sets are disjoint, as the transition state connecting them lies above this threshold.When the threshold is high enough the two sets of minima merge.Since the energy spacing (DE) determines how the analysis is performed, the graph is most meaningful when the thresholds are spaced at suitable regular intervals.This analysis provides useful information on relative barriers separating minima in different regions of the landscape.

Systematic local rigidification for trpzip1
0][111][112][113] For instance, changes in ring planarity occur orders of magnitude faster than the formation of salt bridges.5][116][117] This approach is similar in spirit to the FEG-RBD procedure, 118 where rigid body dynamics is employed to compute free energy gradients in constrained MD simulations.
6][117] Each non-linear rigid unit, regardless of size, then has six degrees of freedom: three translational and three rotational.Since interactions between sites within each local rigid body are not required, this formulation results in computational savings in evaluating the energy and gradient.Additionally, fewer geometry optimisation steps are generally needed, due to the dimensionality reduction of the conformational search space.Hence, significant speedups in the mean first encounter time for global optimisation are possible. 116Moreover, a clear mapping between the minima found on locally rigidified and unconstrained potential energy landscapes can be maintained. 116n a recent article, 108 we investigated the systematic effects of local rigidification on the structure, thermodynamics and kinetics of trpzip1. 119The tryptophan side-chain rings, peptide bonds, trigonal planar centres and termini were grouped as rigid bodies, and, based on these sets, we formulated three LRB schemes: I -rings, II -rings and peptide bonds, III -rings, peptide bonds, termini, trigonal planar centres treated as local rigid bodies.Trpzip1 was represented by the AMBER99SB forcefield, 120 and discrete path sampling 71,72 was used to analyse the potential energy landscape for each LRB scheme, as well as the unconstrained (U) peptide.
The main conclusions are summarised in Fig. 2 and 3.The potential energy range and structural heterogeneity of the locally rigidified landscapes are consistent with the unconstrained landscape (Fig. 2).However, there is a significant increase in the number of prominent subfunnels in the disconnectivity graph for the most constrained case (Fig. 2d).This result suggests that excessive rigidification can reduce the conformational flexibility of the protein in an unphysical manner, leading to artificially high barriers on the potential energy landscape.
Nonetheless, the free energy global minimum (particularly at low temperatures) and the melting temperature for the locally rigidified implementations were in good agreement with the unconstrained benchmarks (Fig. 3a).To assess the effects of local rigidification on the folding mechanism, we extracted the fastest folding path [71][72][73] from each transition network (Fig. 3b).The folding pathways for the unconstrained peptide, schemes I and II, were found to be consistent with an initial hydrophobic collapse (s1 -s2), followed by subsequent zipping (s2 -s4).However, for LRB scheme III a significant lengthening of the folding pathway was observed, which is consistent with excessive reduction in conformational flexibility.
These results suggest that local rigidification should be useful for analysis of thermodynamics and folding mechanisms of biomolecules.Moreover, the results provide some practical guidelines for systematically eliminating unnecessary degrees of freedom, and demonstrate how too much rigidification can significantly perturb pathways and kinetics.An appropriate choice of local rigidification (based on the level of resolution required, and the timescale to be probed) can preserve the underlying properties, and provide useful gains in computational efficiency.The number of stationary points on the energy landscape increases exponentially with system size, 98,121 but the mean first encounter time for the global minimum in basin-hopping global optimisation for atomic clusters seems to scale roughly as the cube of the number of atoms. 68Hence, removing irrelevant degrees of freedom will be even more beneficial if the size scaling is steeper, which may be the case for larger proteins.

GPU-accelerated geometry optimisation
Basin-hopping global optimisation, 67,68 the doubly-nudged 86 elastic band 87,89 method (DNEB), hybrid eigenvector-following (HEF) 90 and the local rigid body framework 116,117 have all been adapted to run on graphics processing units (GPUs). 122GPU hardware is well suited to massively parallel computations, as a greater number of transistors are devoted to data processing than to data caching and flow control, relative to a CPU. 123he most time-consuming component in all of the computational energy landscapes approaches is the calculation of the potential energy and gradient.For biomolecules, the task of adapting a force field for GPU was facilitated by the release of a GPU-accelerated version of the AMBER potential. 124The generalised Born (GB) implicit solvent potential from AMBER 12 was interfaced with all the energy landscapes codes.The DPDP precision model was used, in which contributions to forces and their accumulation are both performed in double precision.
Basin-hopping global optimisation and transition state determination employ several variants of the L-BFGS algorithm.In each case, the whole algorithm was ported to the GPU, and compared to an implementation with just the potential on GPU.Our GPU implementation of L-BFGS was based on the code of Wetzl and Taubmann, 125,126 with various modifications applied to make it as similar as possible to our CPU code.The coordinate transformation and gradient projection functions from our local rigid body framework were also ported to GPU.Extensive use was made of the cuBLAS library 127 in this work, alongside custom CUDA kernels.
Tests were performed for eight different system sizes ranging from 81 to 22 811 atoms.A subset of these results is presented here for the trimeric haemagglutinin (HA) glycoprotein of the influenza A(H1N1) virus 128   This journal is © The Royal Society of Chemistry 2017 in gradient for the last m steps are used in calculating the step direction for minimisation. 129The history size, m, affects the accuracy of the L-BFGS step direction and the overall time taken for minimisation.Results for L-BFGS are shown in Table 1 for history sizes 4 and 1000.Significant speed ups are obtained of up to two orders of magnitude.A large history size is more favourable for bigger systems with the whole L-BFGS algorithm on GPU, whereas a smaller history size gives optimal performance for smaller systems; in contrast, for CPUs a larger history size is always more favourable.We extended our analysis to find the optimal history size and GPU implementation for the full HA trimer.The fastest average minimisation time obtained was 1788 s for a history size of 75 with the whole L-BFGS algorithm on GPU (implementation 1).
Tests were also performed to compare transition state searches on GPU and CPU.Interfacing the DNEB procedure with the GPU-accelerated potential gave a 178 times speed up compared to CPU for the full HA trimer with a small history size of 4.  Results for hybrid eigenvector-following with HA are shown in Table 2. Again, significant improvements are obtained.
Local rigidification is now supported on GPU in conjunction with all of these methods in the GMIN 85 and OPTIM 94 programs.The GPU implementations of the coordinate transformations and gradient projection were found to incur negligible computational overhead in each L-BFGS procedure.

Multifunnel energy landscapes for proteins
The question that originally motivated interest in the energy landscapes of proteins is how a sequence of amino acids achieves its native fold. 130From the application of statistical mechanics to various model systems the existence of intrinsic properties of protein sequences emerged, which lead to a preference for particular folded states.This idea produced the notion of a funnelled free energy landscape containing the native state as the global minimum. 9,131,132 single funnel on the free energy landscape leads to a single dominant ensemble corresponding to the native state, yet for many biological systems more than one stable fold is observed, e.g.activated states or misfolded states.In many cases these additional stable states require a significant rearrangement of the original folded structure.The resulting alternative morphologies are unlikely to exist within a single funnel on the energy landscape.As a result we observe multifunnel energy landscapes that contain at least two competing morphologies.One important group of multifunnel landscapes are associated with proteins that exhibit stable misfolded structures.Such structures alter the functionality of the protein, and hence they may be pathological.
Particularly important examples are amyloid forming proteins, such as Ab 1-42 .Another set of proteins likely to exhibit multifunnel landscapes are those that show activated and inactive states, such as kinases.Apart from these important biological and medically relevant proteins, artificial systems used in nanotechnology, such as peptide-based molecular switches, may also require multifunnel landscapes to represent alternative states of an engineered system.
An interesting question for multifunnel landscapes involves the changes caused by mutations, which affect the observable properties.Analysing these changes will lead to a better understanding of biological processes, such as activation and disease, as well as design principles for functional peptides.Previous studies of mutations suggest that naturally occurring proteins possess an inherent stability towards mutations. 134Hence, if a protein still folds after mutation, the folded structure may closely resemble the wild type native state.][137] Recently, we have studied a system that exemplifies these features. 133The coiled-coil protein, GCN4-pLI, experimentally exhibits exclusively parallel assemblies of a-helices.A mutation of the parent sequence, namely E20S, leads to the observation of both parallel and antiparallel states. 138Other studies have considered a variety of similar coiled-coil systems, and concluded that there is competition between different oligomer sizes, a GPU implementation 1 has the entire L-BFGS routine on GPU, including the potential calculation.GPU implementation 2 has just the potential calculation on GPU.See the original paper for full descriptions of all the parameters. 122][141][142] A multifunnel landscape was suggested as one explanation for this behaviour. 143Our calculations used the properly symmetrised 144,145 AMBER ff99SB force field, 120,146-148 with an implicit generalised Born solvation model 149,150 using infinite interaction cutoffs and the Debye-Hu ¨ckel approximation for salt (0.1 M). 151 The energy landscapes for the parent sequence and the E20S mutant dimers resemble each other, as shown in Fig. 5, in that the main topography is maintained.There are two distinct funnels for parallel (red) and antiparallel (blue) structures.However, a key difference is found in the intermediate region between these funnels.The E20S mutant supports stable intermediate structures, which show a kink in one of the helices (one example is shown in Fig. 5).These low energy structures facilitate interconversion at lower temperatures, which will not occur for the parent sequence, a feature that leads to signatures in the calculated heat capacity curves.The structural basis for this effect is the change in interaction patterns caused by the mutation.When the energy landscapes are analysed regarding the distribution of the local minima with respect to potential energy, and the logarithm of the product of their normal mode frequencies, 133 we find that the E20S mutant exhibits a bimodal distribution, corresponding to the parallel and the antiparallel funnels in the landscape.In contrast, the parent sequence only exhibits unimodal distributions, suggesting that the energy landscape is based on an underlying single funnel.
Further insight can be obtained by decomposing the contributions of local minima to the heat capacity features, using the temperature gradient of the occupation probabilities. 152For the E20S mutant three peaks in the heat capacity curve can be associated with three distinct transitions (see Fig. 6): (i) from the global minimum to other low-lying minima in the same funnel, (ii) from minima in the parallel funnel to minima in the intermediate region, and (iii) a melting peak.The second transition is the key to the thermal accessibility of both parallel and antiparallel states at ambient conditions. 133In contrast, the parent sequence only shows transitions within the same funnel and a single melting peak.
Moving to more disruptive mutations, the energy landscapes of three further mutants, E20P, E20A and E20G, all exhibit distinct features (Fig. 7).While all three still support stable dimers, both antiparallel and parallel, a number of significant changes occur.Firstly, the energy differences between funnels shift as the dimers are destabilised.Additionally, for E20A and E20G, we see new funnels appearing.Furthermore, the number of intermediate structures increases and they show a larger variety of conformations.While the E20S mutant exhibits one particular interconversion mechanism, the larger perturbations of the bonding pattern lead to coiling and helix kinking in many places.
These results suggest a number of important conclusions.First of all, the effects of mutations can be analysed using energy landscape methods, leading to a detailed understanding of energetic balances, structural and mechanistic differences, and changes of associated observables, such as the heat capacity curves.Moreover, protein sequences may be mutated in silico to use energy landscape methods as a predictive tool in engineering and de novo design.Our results also reveal the inherent complexity of multifunnel landscapes and the nature of changes associated with mutations.Last but not least, even the relatively simple coiled-coil system highlights the marginal stability of these energy landscapes towards mutations.

Helical transitions in a DNA duplex
The B -Z transition is one of the slowest conformational changes that has been characterised in biomolecules.ZDNA exhibits a left-handed helical structure, [153][154][155] in contrast to the well known right-handed helices of BDNA and ADNA.This left-handed helix Fig. 6 The heat capacity curve for the E20S mutant shows three distinct peaks.The central peak (A) corresponds to the transition from the parallel funnel to intermediate states.The disconnectivity graph (left) highlights the minima involved, colouring branches red and blue for decreasing and increasing occupation probabilities, respectively, for the states that make the largest contributions to the heat capacity peak. 152The highest temperature peak (B) corresponds to the melting peak and a disconnectivity graph with the key minima coloured in the same way is shown on the right.These results exploit a formulation of the heat capacity written in terms of occupation probability temperature gradients, 152 which enables the contribution of each minimum to be quantified.Adapted from ref. 133.Fig. 7 Disconnectivity graphs for E20A (left), E20G (middle) and E20P (right) mutants.The order parameter q 1 represents the angle between the two helices and is 0 for parallel and 1 for antiparallel alignments.Adapted from ref. 133.5][156] The discovery that ZDNA has important cellular functions [156][157][158][159][160][161] makes the complex rearrangements required to interconvert alternative helices particularly interesting.Experimentally, BDNA may convert to ZDNA at high salt concentrations, 153,156,162,163 through negative supercoiling, 164-166 methylation, 167-169 and binding to protein ligands. 170,171Various suggestions have been proposed for the mechanism, 172 ranging from cooperative schemes, 173 pathways involving base-pair opening, 154 ADNA intermediates, 174 stretched intermediates, 175,176 and a zipper model, which involves stepwise propagation of a B-Z junction. 177,178lthough the helical transitions are relatively slow, this does not present a problem for methodology based on geometry optimisation.In fact, the transition state theory formulation we employ to calculate the individual minimum-to-minimum rate constants using harmonic vibrational densities of states will probably be most reliable for pathways with higher overall barriers.We investigated the landscape and helix interconversion kinetics for a DNA hexamer duplex with the sequence (CGCGCG) 2 , using a properly symmetrised 144,145 version of the AMBER99bsc0 force-field, 179 and e and z torsional corrections 180 with implicit solvent treated by a generalised Born model, 149,150 and an effective salt concentration of 1.0 M from the Debye-Hu ¨ckel approximation. 151Full details are available in the original report, 181 including comparisons with molecular dynamics runs using explicit solvent, which provide a check on the stability of the predicted structures on short timescales.
The resulting free energy landscape at 298 K is illustrated in Fig. 8, where minima separated by free energy barriers less than 5 kcal mol À1 have been recursively regrouped. 56,182There are two principal funnels, which can be largely assigned to left-and right-handed helices.The global free energy minimum under these conditions corresponds to an ensemble of BDNA configurations, including structures with terminal bases frayed out, stacked on top of the base-pairing partner, and trans Watson-Crick/sugar edge hydrogen-bonding patterns, in addition to the canonical form.Helices exhibiting B-Z junctions and ADNA structures are also present in this funnel, and are generally connected to BDNA minima via low downhill barriers.The B-Z junctions have the first two base-pairs in a ZDNA-like conformation, and the last three in a BDNA-like conformation, with the G3 and C10 bases at the boundary flipped out.
The funnel containing ZDNA is more structurally homogeneous, and our results support the suggestion that BDNA is more flexible.This funnel also contains conformations that we have described as left-handed BDNA. 181At high energy we see stretched DNA structures, where the helix is significantly unwound.These minima are different from the SDNA state that forms during overstretching, which is thought to be a ladder-like structure, with Watson-Crick base-pairs maintained throughout. 183,184The stretched forms that we have characterised have no such pairs.A mechanistic analysis of the landscape reveals two distinct pathways (Fig. 9) for the B -Z transformation: 181 one involving a B-Z junction and the other sampling stretched intermediates.The stretched minima are accessed via helix unwinding, loss of Watson-Crick base-pairs, with the central guanines flipping from anti to syn conformations before the guanines near the helix termini.This ordering is reversed for the BZ-junction mechanism, which is predicted to be more favourable at 298 K, since the enthalpy barrier is lower. 181However, stretched intermediates will become more competitive at higher temperatures.
In contrast to the B -Z transition pathways, the A -B transformation is essentially downhill (Fig. 10), consistent with previous molecular dynamics simulations. 186,187The small barriers encountered along the pathway correspond to repuckering of the ribose sugars, which in turn are coupled to global changes in the helix morphology.The key features of the Calladine-Drew model are correctly captured in our proposed mechanism. 188 switch in the chain topology is required for the transformation from the left-handed BDNA structure to ZDNA (Fig. 10).This transition does not require large-scale deformations of the DNA duplex, and is largely driven by concerted motions involving only a few nucleobases.The anti to syn torsional flips of the guanine bases, which are necessary to reach the ZDNA state, introduce kinetic bottlenecks in the pathway.Due to the intrinsic coupling between many degrees of freedom, probing helical transitions in a DNA duplex is a challenging task.This case study therefore provided an ideal testing ground for the potential energy landscape framework.Our simulations captured several aspects of the models proposed for helical transformations in DNA, and revealed some additional complexity, which would be difficult to capture in approaches that require the definition of a reaction coordinate.

Conclusions
The computational procedures associated with the potential energy landscapes framework are largely complementary to conventional molecular dynamics and Monte Carlo schemes.Coarse-graining the landscape, in terms of local minima and the transition states that connect them, enables us to predict structure, dynamics, and thermodynamic properties using established methods and approximations of statistical mechanics and unimolecular rate theory.In this Feature Article we have highlighted the gains in efficiency resulting from implementation of all the key geometry optimisation techniques on GPU, and from a general local rigid body formulation.Recent results for systematic analysis of these schemes have been summarised, and specific applications to protein and nucleic acid systems provide examples of interesting test cases.
Accelerating the sampling and exploration of the potential energy landscape has many benefits, especially in terms of treating larger systems.Initial surveys involving global optimisation to find favourable conformations are also a stringent test of the empirical force field.Identifying unphysical features can inform future force field development and improvement, which provides the foundations on which analysis of thermodynamic properties and kinetics depends.
Many new applications can be envisaged for biomolecules, including more coarse-grained models of mesoscopic systems.The two applications we have summarised for this report show how new insight into mechanism and design of new materials might arise.Both the coiled-coil protein and the DNA duplex landscapes exhibit multifunnel organisation, with competing low energy morphologies separated by relatively high barriers.The potential to tune such landscapes through mutations, applied fields, salt concentration, temperature, and specific ligands, raises the possibility of new design principles for multifunctional devices.The ability to explore such designs through theory and simulation, in tandem with feedback from experiment, could significantly facilitate progress in these endeavours.
transition interface sampling, Rosemary G. Mantell Rosemary G. Mantell received her BSc in Chemistry in 2012 from the University of Bristol.She then went on to complete an MPhil in Scientific Computing at the University of Cambridge.She continued her graduate studies at the same institution under the direction of Professor David J. Wales.Her PhD has so far focused on the acceleration of computational energy landscape methods using Graphics Processing Units (GPUs).David J. Wales David J. Wales received his BA, PhD, and ScD degrees from Cambridge University in 1985, 1988 and 2004.He spent as a Lindemann Trust Fellow at the University of Chicago, working with Prof. R. S. Berry.He was a Research Fellow at Downing College Cambridge in 1990, a Lloyd's of London Tercentenary Fellow in 1991, and a Royal Society University Research Fellow from 1991 to 1998.He was awarded the Meldola Medal and Prize by the Royal Society of Chemistry in 1992, and the Tilden Prize in 2015.In 1998 he was appointed to a Lectureship in Cambridge and is now Professor of Chemical Physics and Chair of the Theory Group in the Chemistry Department.He was elected as a Fellow of the Royal Society in 2016.Debayan Chakraborty Debayan Chakraborty was born in Kolkata, India.He received his BSc degree from St. Stephens College, University of Delhi, in 2010, and MSc degree from the University of Oxford in 2011.He carried out his doctoral research in the area of Theoretical Chemistry from 2012 to 2016, under the supervision of Prof. David J. Wales at the University of Cambridge.Currently he is a postdoctoral research scholar at the University of Texas at Austin.His research interests include the study of protein and nucleic acid folding exploiting ideas rooted in energy landscape theory, and coarse-grained modelling of biomolecules.

Fig. 1
Fig. 1 Construction of a disconnectivity graph from a database of stationary points.Minima are labelled A-D.In the disconnectivity graph (red) each local minimum is represented by a vertical line, starting at the energy of that minimum.At a given energy threshold, E + nDE, minima connected by transition states that lie below the threshold are grouped into disjoint sets.
('full trimer', 22 811 atoms), a monomeric version of HA ('full monomer', 7585 atoms), and finally a truncated version of this monomer ('truncated monomer', 3522 atoms).These systems are shown in Fig. 4. L-BFGS was benchmarked in isolation as a proxy for full basin-hopping global optimisation.The change in coordinates and the change

Fig. 2
Fig. 2 Potential energy disconnectivity graphs for TZ1 (D E = 2 kcal mol À1 ) at different levels of local rigidification.The branches are coloured based on order parameters L (the sum of the four inner native hydrogen-bond lengths and the distances between the CD2 atoms of the three TRP pairs) and S (the relative orientation of the TRP rings).The three main morphologies are: blue (L o 60 Å, S-value = +1), green (L o 60 Å, S-value = À1), and red (all other minima).Adapted from ref. 108.

Fig. 4
Fig. 4 Structures of the biomolecules used in the tests of GPU geometry optimisation procedures.Adapted from ref. 122.

Fig. 3
Fig. 3 Summary of thermodynamic and mechanistic properties for trpzip1, as a function of local rigidification: U -no local rigid bodies; I -TRP rings, II -TRP rings and peptide bonds, III -TRP rings, peptide bonds, termini, trigonal planar centres treated as rigid bodies.(a) Constant volume heat capacity curves: the heat capacities are divided by the appropriate total number of degrees of freedom (DOF) and the melting temperature of the unconstrained peptide, T U m , is indicated.The free energy global minima at 0.48 and 0.88 kcal mol À1 are superimposed on the plot; key: red (U), green (I), blue (II), magenta (III).(b) Variation of the total potential energy (kcal mol À1 ) with the integrated path length (Å) for the fastest folding path from the denatured trpzip1 peptide to the global minimum.Some morphologies encountered along the paths are shown.Adapted from ref. 108.

Fig. 5
Fig.5Disconnectivity graphs for the E20S mutant (top) and the parent sequence (bottom).The order parameter q 1 represents the angle between the two helices and is 0 for parallel and 1 for antiparallel alignments.The separate funnels are clearly visible.Adapted from ref.133.

Fig. 9
Fig. 9 Potential energy (V) as a function of the integrated path length (s) for pathways corresponding to the BDNA to ZDNA transition, via BZ junction (top panel), and stretched intermediates (bottom panel).Adapted from ref. 181.

Fig. 10
Fig. 10 Potential energy (V) as a function of the integrated path length (s) for pathways corresponding to ADNA to BDNA transition (top panel), lefthanded BDNA to ZDNA transition (bottom panel).Adapted from ref. 181.

Table 1 L
-BFGS benchmarking for HA with history sizes 4 and 1000 using two GPU and one CPU implementation a

Table 2
Hybrid eigenvector-following benchmarking for HA with history sizes 4 and 1000 using GPU and CPU a This journal is © The Royal Society of Chemistry 2017