Mike
O'Connor
ab,
Emanuele
Paci
c,
Simon
McIntosh-Smith
b and
David R.
Glowacki
*ab
aSchool of Chemistry, University of Bristol, Bristol BS8 1TS, UK. E-mail: drglowacki@gmail.com
bDepartment of Computer Science, University of Bristol, Bristol BS8 1UB, UK
cAstbury Centre for Structural Molecular Biology, University of Leeds, Leeds, UK
First published on 28th June 2016
The past decade has seen the development of a new class of rare event methods in which molecular configuration space is divided into a set of boundaries/interfaces, and then short trajectories are run between boundaries. For all these methods, an important concern is how to generate boundaries. In this paper, we outline an algorithm for adaptively generating boundaries along a free energy surface in multi-dimensional collective variable (CV) space, building on the boxed molecular dynamics (BXD) rare event algorithm. BXD is a simple technique for accelerating the simulation of rare events and free energy sampling which has proven useful for calculating kinetics and free energy profiles in reactive and non-reactive molecular dynamics (MD) simulations across a range of systems, in both NVT and NVE ensembles. Two key developments outlined in this paper make it possible to automate BXD, and to adaptively map free energy and kinetics in complex systems. First, we have generalized BXD to multidimensional CV space. Using strategies from rigid-body dynamics, we have derived a simple and general velocity-reflection procedure that conserves energy for arbitrary collective variable definitions in multiple dimensions, and show that it is straightforward to apply BXD to sampling in multidimensional CV space so long as the Cartesian gradients ∇CV are available. Second, we have modified BXD to undertake on-the-fly statistical analysis during a trajectory, harnessing the information content latent in the dynamics to automatically determine boundary locations. Such automation not only makes BXD considerably easier to use; it also guarantees optimal boundaries, speeding up convergence. We have tested the multidimensional adaptive BXD procedure by calculating the potential of mean force for a chemical reaction recently investigated using both experimental and computational approaches – i.e., F + CD3CN → DF + D2CN in both the gas phase and a strongly coupled explicit CD3CN solvent. The results obtained using multidimensional adaptive BXD agree well with previously published experimental and computational results, providing good evidence for its reliability.
In this paper, we present a relatively simple adaptive algorithm for discovering minimum free energy pathways between states in a multidimensional space of CVs, which can then be used to calculate rate coefficients in complex systems. There is strong evidence within computational complexity theory that problems of this sort are NP-complete10–12 – i.e., it is possible to verify (within polynomial time) whether any proposed solution is indeed a solution, but there is no known polynomial time algorithm to find a solution in the first place. This has rather profound consequences for how we think about free energy path sampling in complex molecular systems: the emphasis is less on finding an algorithm which is well-suited to every type of rare event problem, but rather on having access to a flexible range of methods which can be practically used to tackle different conformational search problems.
‘Boxed Molecular Dynamics’ (BXD),13–16 a method we have been actively involved in developing over the last few years, allows one to obtain both thermodynamic and kinetic information from the same run, producing data that produces a Markov master equation.1,4,14,17 BXD can be formulated so as to conserve energy, accelerating NVE simulations as well as NVT simulations. As a result of these features, BXD has been successfully utilized to provide microscopic insight into a range of problems within condensed phase chemistry.15,16,18–30 The fact that BXD preserves the dynamics (unlike umbrella sampling, where dynamics is lost) has been experimentally confirmed for a growing set of systems.18,20–22,24 The fundamental idea in BXD is to accelerate dynamics simulations by introducing a set of hard boundaries within the hyper-dimensional configuration space of the system being simulated. When a trajectory passes a boundary, those components of the velocity vector that take the trajectory across the boundary are reflected. The statistics of reflections at the boundary of the box are subsequently used to renormalize the results. Within BXD, ‘boxes’ refer to the configuration space domain between a particular set of boundaries. In principle, it is possible to implement boundaries which depend on the 6n dimensional phase space of Cartesian coordinates and momenta (where n is the number of atoms); however, in practice the original implementations of BXD utilized one-dimensional CVs in configuration space.
BXD falls within a class of sampling methods in which molecular configuration space is divided into a set of boundaries (also called interfaces or hypersurfaces), and short trajectories are run between boundaries. These methods (e.g., milestoning,31,32 forward flux sampling,33,34 transition interface sampling,35 nonequilibrium umbrella sampling,36 and others37–39) have yet to displace umbrella sampling40 as the most widely used method to determine free energies (or potentials of mean force), but in fact they have a number of features which we believe make them more attractive than umbrella sampling: (1) because they do not require modification of the potential energy function, they perturb the dynamics far less than umbrella sampling; (2) they allow for exact renormalization of the results in each box (unlike the iterative numerical WHAM scheme typically utilized to renormalize umbrella sampling results); (3) they require specification of fewer parameters than umbrella sampling (i.e., BXD only requires specifying a boundary location; umbrella sampling requires specifying the umbrella position and force constant); (4) they can provide both thermodynamic (free energy) and kinetic (rate) data simultaneously; (5) unlike umbrella sampling, they provide results which are in fact dynamically meaningful; and (6) it is possible to rigorously define the regimes in which the accelerated dynamics they provide map onto the results that would have been obtained using standard unbiased simulations with standard initial conditions sampling strategies.
An important concern with these methods is how to generate boundaries (analogous to the umbrella sampling issue of how best to choose ‘umbrella’ potentials). In a broad range of molecular simulation studies, boundaries (or umbrella potentials) are located along a particular set of CVs which align with the intuition of the investigator (i.e., “user”). For example, in enzyme catalysis, it is usually possible to highlight a few key bonds as being particularly important; similarly in a drug binding study, it is often possible to identify a few key motions as particularly important to binding. Such user intuition is not a panacea: it may in fact fail to identify important CVs, and there are potential pitfalls41 owing to the fact that it is often extremely difficult to find good CVs.42 Nevertheless, for understanding dynamics in hyperdimensional systems, user ‘intuition’ as to the important CVs usually constitutes an important guess as to where to initiate sampling and make practical progress in a simulation study.
With BXD's implementation in the CHARMM molecular simulation package,43 it has found application to a range of chemical systems. These applications have highlighted two important issues: (1) the BXD velocity reflection procedure must be generalized to deal with a wider range of CVs than the relatively small subset with which it is currently compatible; (2) with the implementation of a wider range of CVs, BXD must be formulated in a way that can automatically identify optimal boundaries in multi-dimensional collective variable space. The reason for the latter point is that BXD in multi-dimensional CV space requires specifying a large number of parameters. The number of parameters required scales as NCV × NB, where NCV is the number of collective variables, and NB is the number of boundaries (NB is typically between 10 and 100 in systems studied so far). For relatively small systems where NCV = 1 and which require no more than ∼10 boundaries, a user can typically keep track of the number of parameters requiring specification; however, for larger systems where NCV > 1, the number of parameters which requires specification rapidly expands beyond what even an expert can keep track of, becoming extremely tedious (if not altogether impossible). By automating the boundary selection scheme outlined in this paper using an adaptive algorithm, we avoid these problems entirely, and we also guarantee the specification of optimal boundaries. Adaptive sampling strategies have been previously explored in the context of umbrella sampling,44,45 force biasing,46 weighted ensemble sampling,39 transition interface sampling,47 accelerated molecular dynamics,48 metadynamics49,50 and steered MD.51
BXD's robustness arises in part from the fact that it generates free energy profiles which are largely insensitive to the location of boundaries, so long as the typical transit time from one boundary of the box to the other is larger than the system's characteristic decorrelation timescale.13,14 This is in fact the only ‘hard-and-fast’ rule which must be satisfied in order for BXD to yield physically meaningful results: the average time between boundary reflections in any given box must be larger than the system's characteristic dynamical decorrelation timescale in that region of the free energy surface.14 This rule places a lower limit on the allowed distance between any box's boundaries; otherwise, ballistic reflection between box boundaries will occur, and the results are meaningless. So long as the boundaries are far enough apart to avoid problems related to dynamical decorrelation, then the choice of box boundaries is flexible, and the BXD results do not depend on boundary location.
However, the computational efficiency of BXD (i.e., the speed at which it converges a free energy or a rate calculation) does depend on the boundary placement. For maximum efficiency, the boundaries should be placed close enough together so that a typical trajectory will visit the boundaries of any given box in a reasonable amount of time. Optimally placed boundaries will result in faster convergence. This is an issue that has become particularly apparent as we have attempted to use BXD to accelerate dynamics obtained from on-the-fly electronic structure theory, and also in condensed phase reactions, where force evaluations are very expensive. Our experience to date has shown that ‘user-selected’ box boundaries are often far from ideal, and can result in wasted clock cycles, a point which is easily understood from Scheme 1. In regions with a large gradient, boxes should be smaller, given that an unbiased trajectory free to sample the box is more likely to get trapped downhill rather than travel uphill, while in flatter regions that have a small gradient, the boxes can be larger, given that an unbiased trajectory will more readily sample wide regions of the configuration space. Scheme 1 therefore allows us to understand how clock cycles are wasted as a result of two common boundary-selection pitfalls: (1) large boxes in a region of the free energy surface with steep gradients, or (2) small boxes in a relatively flat region of the free energy surface. In the former case, the trajectory will rarely visit high free energy configurations within the box, and convergence will be slow. In the latter case, clock cycles are wasted on constraining sampling in flat regions of the free energy surface that the trajectory would have naturally visited anyway – i.e., the boundaries actually slow down an intrinsic sampling rate which was already satisfactory. Scheme 1 highlights a final important point – i.e., sampling on any given free energy path often requires boxes of varying sizes, with the size of the box inversely related to the gradient of the free energy surface along a particular coordinate, which is generally unknown in advance. Box boundary placement is also sensitive to the local friction regime in which the dynamical process of interest takes place (a point discussed in further detail below). In general, efficient sampling in high friction environments (e.g., a chemical reaction occurring in a solvent) requires closely spaced boundaries, while boundaries in low friction environments (e.g., a chemical reaction in the gas phase) are farther apart.
In this paper, we outline an extension of BXD to multidimensional CV space, and an automated procedure that adaptively generates optimal BXD hypersurfaces to sample dynamical pathways within a user-specified multidimensional CV space. The underlying idea guiding this approach is simple, and exploits one of the key advantages of BXD compared to a method like umbrella sampling: because the underlying dynamics are in fact meaningful, ‘on-the-fly’ analysis of their information content is in fact the most reliable guide to boundary placement. This philosophy allows us to use BXD for generating optimal boundaries in multidimensional applications, which may be subsequently used to accelerate rare events or carry out free energy sampling. We also report on results using this multi-dimensional adaptive BXD scheme to accelerate free energy sampling along the F + CD3CN → DF + CD2CN reactive pathway, in both CD3CN solvent and in the gas phase. This system constitutes a stringent test of the methodology, owing to the extreme asymmetry of the PES either side of the transition state (TS) – e.g., similar to that shown in Scheme 1. The results are in good agreement with previous experimental and modelling studies, providing good evidence for the reliability of our extended BXD algorithm. We believe that the adaptive scheme described in this article may be useful to other methods that rely on sampling between configuration space interfaces.
The trajectory constraint procedure involves an elastic collision procedure applied at the boundaries, which works as follows: whenever the next time step in the dynamics would result in the trajectory crossing the boundary, the trajectory is reset to the previous step, and a velocity inversion (i.e., reflection) procedure is applied to those atoms that contribute to the definition of the collective variable. For a given box i bounded by ρi and ρi−1, the rate coefficient for transfer from box i to i − 1 is determined by the inverse of the mean first passage time (MFPT) 〈τ〉. The simplest way to compute this is to keep track of the number of times the trajectory has undergone velocity transformation at each boundary, hi,i−1, along with the total amount of time, ti, that the trajectory spends within box i. This gives the rate coefficient for transfer from box i to box i − 1 as follows:
(1) |
The equilibrium constant between box i and box i − 1 may then be obtained from equilibrium statistical mechanics as
(2) |
(3) |
Having determined the probability of residing in any specific box according to (3), it is then possible to determine p(ρ) to arbitrary resolution by renormalizing the statistics within each box using histogram binning. Letting pi(ρ) be the probability of a particular value of ρ observed in box i, estimated by histogram binning from a sample within the box, then the probability of residing anywhere along the reaction coordinate defined by the boxes is given by
p(ρ) = pi(ρ) × pi. | (4) |
Since only the box-to-box rate coefficients need to be computed, the length of time the trajectory needs to spend in each box is only determined by how long it takes for these rate coefficients to converge. The BXD method of partitioning the configuration space allows regions that are poorly sampled in standard MD trajectories to be isolated within a box and sampled independently, which lends itself well to parallelisation on modern cluster architectures. Alternatively, it is easy to formulate the BXD algorithm so that a given trajectory – after a specified number of reflection events at a particular boundary – is allowed to proceed to the next box, as illustrated in Fig. 1. Such a ‘box-to-box’ strategy allows trajectories to scan over adjacent boxes until convergence is achieved.
(5) |
Using the notation outlined above, Fig. 2 schematically illustrates a set of BXD boundaries that one might choose in order to partition a system defined in terms of two collective variables.
Within the space of collective variables, eqn (5) specifies that a BXD boundary Bj is defined in terms of a unit norm j ∈ M, which lies a distance Dj from the origin. The function ϕ((t)) = (t)·j + Dj provides a measure of how far the system is from a particular boundary at time t, with changes in the sign of ϕ((t)) indicating that the system has crossed Bj. In order to constrain dynamics so that they lie to a specific side of a particular boundary Bj, we wish to satisfy the following inequality:
ϕ((t)) ≥ 0. | (6) |
The inequality in this equation gives it the form of a so-called “unilateral constraint”58 – i.e., a constraint which is enforced only at times when the inequality is unsatisfied. For example, consider a case where ϕ((t)) ≥ 0 at time t, and ϕ((t + Δt)) < 0 at the next timestep t + Δt. In this case, the BXD procedure specifies that we revert back to (t), and invert the velocities to give new velocities ′(t), propagation according to which ensures that the constraints are satisfied at timestep t + Δt. By the chain rule, the time derivative of the constraint may be written as the projection of the atomic velocities onto the gradient of ϕ((t)):
(7) |
To ensure that the constraint will be satisfied at time t + Δt, the inverted velocities must satisfy the following:
∇ϕ·′(t) + ∇ϕ·(t) = 0. | (8) |
In the general case of a system of K constraints, ∇ϕ is a matrix of K rows by 3N columns, but here we are restricting ourselves to the case of a single constraint, and therefore ∇ϕ in eqn (8) represents a row vector. The inverted velocities are related to the unbiased ones through eqn (8) in order to ensure a fully elastic reflection of the velocities normal to Bj. This procedure is in contrast to the sorts of holonomic constraints typically employed in molecular dynamics (e.g., SHAKE59 and RATTLE60), in which velocities normal to the constraint are set to zero in order to constrain the dynamics. The equation of motion for dynamics58,60–62 under a single constraint may be written as:
M = F + G, | (9) |
G = −λ∇ϕT, | (10) |
′(t) = (t) + λM−1∇ϕT. | (11) |
By substituting eqn (11) into eqn (8) and rearranging for λ we have
(12) |
The Lagrangian multiplier and subsequent impulse is only computed and applied for time steps in which an unaltered velocity would result in the constraint being unsatisfied, similar to the strategy used in the original BXD velocity reflection algorithm. Defining BXD boundaries as hyperplanes ensures that the derivatives of ϕ in eqn (12) may be computed by combining the derivatives of the components of as follows:
(13) |
Eqn (13) means that the reflection procedure can easily be constructed from a linear combination of derivatives of collective variables. This allows for straightforward combination of arbitrary reaction coordinates for which gradients are defined. The appendix to this paper includes an illustrative example of how to implement a velocity inversion procedure in the space of two CVs.
(1) Specification of the CVs which the user wishes to adaptively sample along with a pair of limits that bound the sampling within a particular CV. In many cases, one of the CV limits (e.g., B0 in Fig. 2) helps define the extremum for what can be considered a reactant state, and the other CV limit (e.g., Bn in Fig. 2) helps define the extremum for what can be considered a product state.
(2) A ‘starting’ or ‘reactant’ geometry (characterized by a set of ‘starting’ CVs).
(3) A ‘target’ or ‘product’ geometry (characterized by a set of ‘target’ CVs).
We define Γ ⊂ M to be the region of CV space defined by two boundaries BR and BP (in Fig. 2, BR ≡ B0 and BP ≡ Bn), and Bi to be some arbitrary boundary that lies within Γ. The region Γ1 ⊂ M lies between BR and Bi, while the region Γ2 ⊂ M lies between Bi and BP, with Γ1 + Γ2 = Γ. The approach of adaptive BXD is to carry out a single sampling run that makes two passes over the CV space – i.e., from BR to BP, and then to reverse direction and go from BP to BR. Along the way, statistical analysis determines the most efficient location at which to place the next bound. After the placement of a bound, the BXD velocity reflection procedure is used to enhance the sampling of the next region. Passes in both directions are generally required so that barriers on the energy landscape are sampled in both directions (i.e., what may not require any acceleration going downhill will in fact require acceleration when going uphill).
The overall adaptive BXD procedure is illustrated by the flowchart in Fig. 3, which assumes that the adaptive procedure has been initialised near BR, so that the first pass involves generating boundaries en route to BP. At the start of the trajectory, Bi ← BR, and BEnd ← BP (i.e., Bi and BEnd initially enclose the region Γ ≡ Γ2, with Γ1 = 0). After n steps of dynamics, sampling within Γ2 (constrained through application of the BXD velocity reflection procedure), there are two possible outcomes: (1) velocity reflections against BEnd were observed, implying that the path from Bi to BEnd requires no additional acceleration, or (2) velocity reflections against BEnd were not observed, which means that an additional bound Bnew is required according to the procedure outlined below in section 2.4.2. Dynamics are then run until the system crosses Bnew, at which point Bi ← Bnew. The dynamics in this hitherto unexplored space are then restricted in the region of Γ2 through the application of the BXD velocity reflection procedure. The sampling procedure is repeated until the dynamics reach BEnd, at which point Bi ← BP and BEnd ← BR, and the dynamics sweep back for a second pass in the opposite direction. Upon completion of the reverse pass, the Fig. 3 schematic arrives at the “Stop” point, and Γ will have been partitioned into a set of boxes with bounds BR, B1, B2, …, Bn, BP for subsequent use in BXD runs to generate free energy surfaces to a specified degree of convergence.
Fig. 4 Illustration of the adaptive boundary generation procedure. Panel A shows sampling of values within a 2d collective variable space, with an existing boundary Bi; panel B shows histogram binning of the distances with respect to the existing boundary Bi. min andmax, located within histogram bins bmin and bmax, are both shaded in red. Panel C shows generation of a new bound Bnew, where the norm defined in eqn (14) is illustrated by the purple arrow. |
(1) A normalized histogram of is computed to give p(r), a probability density function representing the distances from Bi that a trajectory samples between reflections, as shown in Fig. 4B.
(2) From p(r), we calculate the cumulative distribution function . We then identify a histogram bin bmax in p(r) with a bin centre rmax chosen so that P(rmax) ≥ (1 − ε). ε ∈ (0, 1) is a parameter which specifies the “probability threshold” at which to place a new boundary (the value of ε is specified by the user, and typically ranges from 0.01–0.1). We then identify max (the mean value of the sampled values in S that fall within bin bmax), illustrated in Fig. 4B, as the point at which to place a new boundary Bnew.
(3) To determine the orientation of Bnew as an (M − 1)-dimensional plane, we use a simple strategy consistent with BXD's origins in transition state theory (TST) – i.e., Bnew should be more or less orthogonal to the path of the observed dynamics.13,14 With bmin defined as the first bin in the histogram of (see Fig. 4B), we calculate min (the mean value of the sampled values in S that fall within bmin). With this definition, min represents the average value of immediately prior to and after reflection against Bi, i.e. the mean crossing point through Bi. Similarly, max represents the average crossing point through Bnew. The vector from min to max thus serves as an approximate dynamical pathway through the box, and we define the unit norm for Bnew as
(14) |
The unit norm in eqn (14), combined with max, allow us to fully define the new boundary Bnew, as illustrated in Fig. 4C. The next time the trajectory crosses Bnew, it becomes enforced as a constraint (i.e., Bi ← Bnew), and an identical analysis will be carried out to determine the next Bnew.
As discussed above, adaptive boundary generation in this fashion will eventually lead to reflection against BP. Once a trajectory reaches the barrier via adaptive boundary generation on the reactant side of the barrier, reflection against BP generally follows rapidly without any need for boundaries on the product side of the barrier. To generate boundaries on the product side, a second adaptive sweep from BP to BR is required. To do this, the direction of sampling is reversed, and the adaptive boundary generation process is repeated going the opposite way. The only difference is that – because adaptive boundaries are already in place on the reactant side of the barrier – the reactant region is unlikely to require any more boundaries on the second sweep. For example, consider a BXD trajectory on its second sweep which is passing through the reactant region enclosed by boundaries Bi and Bi−1 (both of which were adaptively generated in the first pass). It is likely for reflection events against Bi−1 to be observed – i.e., sampling within this region is already suitably accelerated by BXD, and the trajectory can move on to the region defined by boundaries Bi−1 and Bi−2. Should we observe that the trajectory has not inverted against Bi−1 after n steps, then an additional boundary is adaptively generated as described above.
A one-dimensional implementation of BXD was previously used in these simulations to restrict the distance between the F radical and the reactive deuterium between 1.5 Å and 1.8 Å. This prevented the F radical diffusing away from its co-reactant during pre-production equilibration sampling runs. In the production NVE runs, the lower bound was removed. This allowed the reaction to occur, so that we could obtain an accurate measurement of energy deposition and relaxation in the nascent reaction products. At the time these studies were published, it was not possible to use BXD to generate a free energy surface for this reaction given that reversible reactive sampling requires the use of at least two CVs: the distance between the F radical and deuterium (F–D distance), and the distance between transferring deuterium and the carbon atom to which it is bonded (C–D distance). Constraint of the F–D distance accelerates abstraction over a relatively early barrier, and constraint of the C–D distance prevents the product DF from immediately diffusing away from its co-product. In addition to BXD sampling of the condensed phase reaction, we also carried out BXD sampling of the gas phase reaction, which included only three EVB states: the reactant F + CD3CN state, the co-product DF + CD2CN state, and the [CD3CND]+⋯[F]− state. Unless stated otherwise, all the results presented herein were run with a time step of 0.1 fs, using a Langevin thermostat at 300 K with a friction coefficient of 20 ps−1.
As outlined above, adaptive BXD free energy sampling was undertaken in a CV space comprised of the F–D and C–D distances: an F–D distance of 2.7 Å was used to define BR, and a C–D distance of 3.5 Å was used to define BP. Adaptive sampling times of 100 ps (in the gas phase) and 30 ps (in solution phase) per box were used to determine the placement of new boundaries, with ε = 0.01 (guaranteeing that new boundaries are placed at a location visited no more than 1% of the time). Fig. 6 shows a series of snapshots taken during the automated boundary location procedure, illustrating how the adaptive algorithm works. Beginning from an initial point sampled near BR, BXD adaptively generates a boundary, which allows it to sample regions near the TS. Once the dynamics arrive at the TS, the system rapidly descends toward products, and quickly arrives at BP. At this point, BXD begins sweeping back in the opposite direction, adaptively generating boundaries which eventually return it back to the first box bounded by BR. Fig. 7 shows the final set of adaptively generated boundaries used to sample the free energy along the reaction pathway in both solution phase and in the gas phase. The plots also show the dynamical traces in CV space used to construct the BXD boundaries. There are some important points to note with respect to Fig. 6 and 7: (1) the adaptively generated boundaries generally follow the route taken by the dynamics along the reaction pathway, with orientations that are roughly orthogonal to the dynamical pathway through CV space; (2) the spacing between boundaries varies as a function of the steepness of the free energy surface (the gradient) of the underlying PES and the corresponding free energy profile – i.e., steep regions with large gradients require several boundaries, whereas less steep regions with smaller gradients require fewer boundaries; and (3) the reaction pathway in solvent has more adaptively generated BXD boundaries than the corresponding gas phase pathway, as a result of solvent friction effects that do not occur in the gas phase. Placing such a large number of BXD boundaries by user trial and error would be an extremely labour intensive process.
Once sampling was completed within each box (generating statistics similar to those shown in Fig. 7), MFPTs were calculated as described in section 2.1, and the results used to generate a ‘box-averaged’ free energy profile and corresponding ‘box-averaged’ probability spanning BR to BP. A higher-resolution free energy profile was obtained placing the statistics for a particular box into histogram bins and then using eqn (4) to renormalize by the box-averaged probabilities. Fig. 9A shows the smaller histogram bins into which we partitioned the statistics in each box to accomplish this. In the 1d case, high-resolution partitioning along the dynamical pathway is straightforward; in this case (and more generally for higher-dimensional cases), our strategy is as follows:
(1) Define a path ρ which passes through the average dynamical crossing points through each boundary (i.e., max), and spans BR to BP.
(2) Each region between a set of boundaries is then partitioned into a series of bisecting hyperplanes, to an arbitrary user-specified resolution. The regions between these bisecting hyperplanes constitute the high-resolution histogram bins. The centre of each bin is chosen to be the point along ρ which is equidistant from the hyperplanes that bound the bin.
The blue line in Fig. 9A shows the path ρ which spans BR to BP, and which was used to generate finer histogram bins for plotting the high-resolution free energy profile. Fig. 9B shows the corresponding high-resolution BXD free energy profiles for the gas phase reaction. The overlapping curves in this plot show how the free energy profiles change with increasing sampling time, giving some indication of how quickly the BXD free energy profile converges in this particular system.
Fig. 10 shows a comparison of the reactive free energy surfaces obtained in both the gas phase and in solvent. In the vicinity of the reactants and transition states, the profiles are very similar; however, they show considerable differences in the post-reaction region. The reason for this difference arises from post-reaction hydrogen bonding complexes formed by the nascent DF. In the gas phase BXD free energy sampling, the DF rotates around the backside of its CD2CN co-product and finds a stable hydrogen-bonding complex with the nitrile moiety. In solvent, such interactions are possible with any of a wide range of nearby solvent molecules, and therefore no distinct minima can be observed along the free energy profile. In terms of understanding the DF energy deposition observed in the previously published experimental and MD results, the key quantity in Fig. 10 is the free energy difference between: (1) the maximum value observed near the transition state region, and (2) the minimum observed near the product state region. This quantity places an upper bound on the amount of energy which may be deposited into the nascent DF: for the reaction taking place in solvent the value is 27.6 kcal mol−1, and for the reaction taking place in the gas phase the value is 31.1 kcal mol−1. Both of these values are in good agreement with previous experimental and modelling studies. Our previously published experimental and MD studies indicated the prompt deposition of ∼23 kcal mol−1 into the stretching motion of the nascent DF prior to relaxation.63,64 While gas phase experiments of F + CD3CN are not available for direct comparison to our free energy results, experiments examining gas-phase energy deposition into HF in the F + CH3CN reaction have been performed,65 and suggest that the nascent diatomic product in solvent contains slightly less excitation than in the gas phase.63 This is consistent with the results in Fig. 10, which indicate that more energy is available to the products in the gas phase reaction than in the solvent reaction.
We believe that the adaptive scheme outlined in this paper, which allows us to generate hyperplanes in multi-dimensional collective variable space, may be more broadly useful to a wide range of techniques which rely on splicing up configuration space into a set of interfaces or boundaries. In the future, we will explore rigorous methods for estimating the error bars of free energy surfaces generated using BXD.67 We also plan to explore extensions of the adaptive BXD scheme in systems with CV spaces that have dimensionalities of three and higher – e.g., enzyme reactions and conformational dynamics,68 drug binding,69 and chemical reactions at surfaces and in liquids.70 As shown in eqn (7), implementation of BXD in multi-dimensional CV space requires definitions of the gradient in CV space, ∇ϕ, a wide library of which are available in the PLUMED71 package. We are presently working on writing the BXD algorithm as a portable, and mostly ‘standalone’ plugin that may be easily interfaced with a wide range of molecular dynamics packages, a similar philosophy to that which has been adopted by PLUMED.71 Implementation of adaptive BXD in a package of this sort should allow it to be used in a wide range of contexts.
For the sake of brevity we restrict ourselves to 2 spatial coordinates. Let = [ax, ay, bx, by, cx, cy] be the coordinates and = [Vxa, Vya, Vxb, Vyb, Vxc, Vyc] be the velocities of atoms A, B and C, and let M be the diagonal matrix of atomic masses, i.e.:
(A.1) |
Our collective variable s() is given by:
(A.2) |
Suppose we have some BXD boundary B, defined as a two-dimensional line in Hessian form with norm = (n1, n2) and point D. The constraint on our dynamics is
ϕ ≡ n1rAB + n2rBC + D ≥ 0. | (A.3) |
Suppose that we identify a timestep in which our constraint will no longer be satisfied – i.e., the stepping forward using the current velocities will result in a boundary being crossed, and we require a velocity reflection. In order to perform the velocity reflection using a Lagrangian multiplier, we need to compute ∇ϕ, the transpose of which is given by
(A.4) |
The expression above demonstrates how it is simple to construct the reflection procedure from the gradients of the individual collective variables. With ∇ϕ in hand, it is a simple matter to determine the Lagrangian multiplier λ with which the velocities may be inverted. From eqn (12) we may compute λ via
(A.5) |
This journal is © The Royal Society of Chemistry 2016 |