S-shooting: a Bennett – Chandler-like method for the computation of rate constants from committor trajectories

Mechanisms of rare transitions between long-lived stable states are often analyzed in terms of commitment probabilities, determined from swarms of short molecular dynamics trajectories. Here, we present a computer simulation method to determine rate constants from such short trajectories combined with free energy calculations. The method, akin to the Bennett – Chandler approach for the calculation of reaction rate constants, requires the de ﬁ nition of a valid reaction coordinate and can be applied to both under-and overdamped dynamics. We verify the correctness of the algorithm using a one-dimensional random walker in a double-well potential and demonstrate its applicability to complex transitions in condensed systems by calculating cavitation rates for water at negative pressures


Introduction
Many processes occurring in molecular systems are dominated by rare transitions between long-lived states. 1,2Examples include nucleation during rst-order phase transitions, chemical reactions in solution, and conformational changes of biological macromolecules.Understanding the molecular mechanism of such transitions is challenging due to the large number of interacting degrees of freedom and the resulting complex collective behavior.In analyzing rare transitions in complex systems, the goal is to nd a reaction coordinate, i.e., a dynamically meaningful variable that captures the essential physics of the transition and is capable of quantifying its progress.][7][8] A good reaction coordinate should be able to tell us what is likely to happen next.Hence, the quality of a reaction coordinate can be assessed in terms of the committor, i.e., the probability of a given conguration to rst reach the product rather than the reactant region.In fact, it has been noted [9][10][11] that the committor itself is the perfect reaction coordinate since, by its very denition, it measures the progress of reaction.However, knowledge of the committor as a function of the conguration does not automatically yield any insight into the nature of the collective variables that are relevant for the transition.Nevertheless, the committor is very useful in the analysis of reaction mechanisms, because it permits to test reaction coordinates postulated based on physical reasoning or on the analysis of reactive trajectories. 12,13][16][17] From a good reaction coordinate one expects that its value completely determines the committor, or, in other words, that iso-surfaces of the reaction coordinate are also iso-surfaces of the committor.Whether this is the case can be tested by computing the committor for congurations with a given value of the reaction coordinate.This can be done by initiating multiple short trajectories from these congurations and counting how many of them reach the product state before the reactant state.Frequently, such calculations are combined with an estimate of the free energy as a function of the reaction coordinate, which, provided the reaction coordinate reects the underlying mechanism, provides information on the nature of the transition state, i.e., of the dynamical bottleneck the system needs to cross during the transition.If committor calculations are carried out also near the transition state region, the dynamical information obtained from the eeting trajectories can be combined with the free energy landscape to determine rate constants for the transition, as has been recently suggested by Daru and Stirling. 18Their divided saddle theory, which may be viewed as generalization of the celebrated Bennett-Chandler method for the calculation of rate constants, 19,20 provides an efficient way to determine rate constants by post-processing information harvested in free energy and committor calculations.
In this article, we present an alternative way to extract reaction rate constants from committor trajectories and the free energy prole.Like the Bennett-Chandler method and the divided saddle theory, the method is based on a factorization of the rate constant expression into two factors, one that can be expressed in terms of the free energy and another one that contains dynamical information.In our approach, the factorization is applied on the level of the time correlation function of the populations of the stable states between which the transition occurs.From the linear regime of this correlation function the rate constant is then extracted.Since the time correlation function is considered instead of the reactive ux, which requires a well-dened time-derivative of the reaction coordinate at the interface, the method can be applied equally well to the under-and over-damped case.The shape of the time correlation function, which is evaluated from the committor trajectories, provides additional insight into the barrier crossing dynamics.In the following, we will rst outline the algorithm (details of the derivation are provided in the Appendix) and then demonstrate the application of the method to two test cases: a Brownian walker in a one-dimensional double well potential and cavitation of water at negative pressures.

S-shooting algorithm
The goal of the algorithm presented here is to determine the rate constant of transitions between two long-lived states, A and B, which can be viewed as the reactant and the product state, respectively.We assume that we are able to distinguish between these two states using a reaction coordinate q(x) that tracks the progress of the transition, i.e., a suitable collective variable dened for each microscopic conguration x.In practice, q(x) can vary considerably in its complexity depending on the investigated system: it can be as simple as a Cartesian coordinate in cases where the underlying (free) energy landscape is known (as is the case for a Brownian walker in a double well potential in Section 3.1) or can be based upon detecting the largest cluster of a nucleating phase in the case of rst-order phase transitions (see Section 3.2).

Time correlation function and reaction rate constant
The method presented here is rooted in the Bennett-Chandler approach, 19,20 in which the transition rate constant is expressed in terms of the time correlation function of the populations of the stable states.To introduce this correlation function, we rst dene the characteristic functions for states A and B, which indicate if the system is in the respective state or not, and h B (x) is dened analogously for region B. In all cases considered in this work, the underlying free energy landscapes determining the behavior of the systems exhibit a barrier dividing the two states.We assume that the regions A and B correspond to the ranges A ¼ (ÀN, q A ) and B ¼ (q B , N) of the reaction coordinate, respectively, and q A < q B .Accordingly, we dene the characteristic functions as h A (x) ¼ 1 À q[q(x) À q A ] and h B (x) ¼ 1 À q[q B À q(x)], where q(q) is the Heaviside step function.The time correlation function encodes the conditional probability to nd the system in B at time t provided it was in A at time 0. In the above expression, h B (t) is a shorthand for h B (x t ), where x t is the microscopic state of the system at time t, and the angular brackets h.i denote equilibrium averages.The equilibrium probability of nding the system in A can be expressed as where b ¼ 1/k B T with the Boltzmann constant k B and temperature T, and H(x) is the total energy of the system.The free energy F(q) is related to the probability density p(q) ¼ hd[q À q(x)]i of the reaction coordinate by F(q) ¼ Àk B T ln p(q).Aer initial transient behavior related to the details of the dynamics and the specic denition of the stable states, the time correlation function C AB (t) is expected to enter a linear regime and its time derivative gives the reaction rate constant k AB , The rate constant k BA for the inverse reaction from B to A is obtained by applying detailed balance, k BA ¼ k AB hh A i/hh B i. Together, the forward and backward rate constants determine the reaction time s rxn ¼ (k AB + k BA ) À1 , the time scale at which a non-equilibrium population of states A and B decays to equilibrium.

S-ensemble
The S-shooting algorithm presented here introduces an additional region, S, located such that any trajectory transitioning from A to B must cross S (see Fig. 1).As illustrated in the gure, there are various possibilities to dene region S, which plays essentially the same role as the saddle region in divided saddle theory. 18If A and B are adjacent and separated by a dividing surface, S could be located entirely in A or B or include parts of both regions.In this case, it is only required that the dividing surface is part of S. If regions A and B are not adjacent but rather separated from each other, region S can be in A or B or somewhere in between.The only requirement is that no trajectory can connect A with B without visiting S.Although the particular denition of S does not affect the validity of the expressions we will derive in the following, its particular location will have an effect on the statistical accuracy of the rate constant estimation.To make the rate calculation efficient, region S should include the transition state region, from which both stable states are accessible with non-vanishing probability.
Since any trajectory which gives a non-zero contribution to the correlation function C AB (t) has at least one conguration in S by construction, it should be possible to express C AB (t) as path average in the ensemble of trajectories touching S. We call this ensemble of trajectories the S-ensemble.In the following we consider discretized trajectories x(s) ¼ {x 0 , x 1 , x 2 , /, x L } with xed length s ¼ LDt consisting of L + 1 congurations separated by the time step Dt.Such trajectories may result, for instance, from a molecular dynamics simulation.The probability Fig. 1 Schematic representation of the S-shooting approach.In addition to the reactant state, q(x) < q A , and the product state, q(x) > q B , we introduce a state S, defined by q min S < q(x) < q max S , which is chosen such that any trajectory crossing from A to B or vice versa must cross S. As long as this criterion is obeyed, any arrangement of A, B and S is valid.(a) Regions A and B are separated by a dividing surface (dashed line) and S overlaps with both stable states.Regions defined in this manner are used to obtain cavitation rates in water under tension in Section 3.2.(b) Region S is located between the stable states A and B, adjacent to both.(c) Region S is located between the stable states A and B, where these regions are separated by areas not corresponding to any of the three states.We employ this setup to compute the reaction rate constant for a Brownian walker in a double-well potential in Section 3.
is the probability of a trajectory x(s) in the unconstrained ensemble of trajectories.In writing this expression, we have assumed that the initial conditions x 0 are distributed according to the equilibrium probability density r(x 0 ), and that the dynamics are Markovian such that the total path probability can be written as the product of short time transition probabilities p(x i / x i+1 ).In the ensemble of eqn ( 5), the path indicator function H S [x(s)] assigns a vanishing statistical weight to any trajectory x(s) that has no point in S, thereby restricting the S-ensemble to trajectories visiting S.
As shown in detail in Appendix A.1, the time correlation function C AB (t) can be expressed in terms of path averages in the S-ensemble, Here, h.i S denotes a path average over trajectories x(s) in the S-ensemble and N S [x(s)] is the number of congurations of x(s) in S (out of L + 1 total congurations).The characteristic function h S for region S is dened analogously to eqn (1).Conveniently, all quantities appearing in the equation above are either obtained from a free energy computation along q, namely hh A i and hh S i, or from sampling the trajectories touching S. Note that by considering trajectories of length s in the S-ensemble, the path average hh A (0)h B (t)i S appearing in eqn ( 6) can be evaluated for all times 0 # t # s.

Sampling the S-ensemble
In order to make the equation above useful for the calculation of the time correlation function C AB (t), and hence for computing the transition rate constant k AB , one needs an efficient way to sample the S-ensemble of trajectories.The ratio hh S i/hh A i of equilibrium probabilities needed in eqn ( 6) can be obtained by free energy calculation methods, e.g., umbrella sampling.By doing so, one also obtains a set of congurations x i ˛S distributed according to their equilibrium probabilities.Each of these congurations can be viewed as a time-slice of a path which has at least one conguration in S, and consequently these congurations can be used as shooting points from which trajectories touching S are generated.So, let us consider the following algorithm to create trajectories of length L + 1 that are guaranteed to have at least one conguration in S: Step 1. Generate a state x in S from the equilibrium probability distribution restricted to S, r S ðxÞ ¼ rðxÞh S ðxÞ Ð dxrðxÞh S ðxÞ : Step 2. Select an integer number n between 0 and L at random.The state x created in the previous step is now considered to be x n , that is, the nth conguration of the trajectory to be created.
Step 3. Starting from x n , perform L À n dynamics steps forward and n steps backward in time to create a trajectory starting at x 0 which consists of L + 1 congurations.
The algorithm outlined above can generate all possible trajectories which have at least one conguration in S and, as such, all trajectories occurring in the desired S-ensemble.However, a closer analysis of the probability to generate a particular trajectory x(s) using this procedure indicates that the resulting trajectories are not distributed according to the path probability density P S [x(s)].Rather, they are distributed according to a path probability density P G [x(s)], which gives a larger statistical weight to trajectories with many congurations lying in S. Since a trajectory x(s) with N S [x(s)] congurations in S has N S [x(s)] possible shooting points from which it can be generated and all of them are selected with the same probability, the probability of x(s) is proportional to N S [x(s)], i.e., P G [x(s)] f N S [x(s)]P S [x(s)].In order to account for the different weights assigned to trajectories by the procedure outlined above, eqn (6) needs to be modied to recover the correlation function C AB (t) in terms of averages in the ensemble of trajectories produced by the algorithm: Here, h.i G denotes an average in the ensemble P G [x(s)] generated by the algorithm.A detailed derivation of this result is provided in Appendix A.2.So far, we have assumed that the starting points x ˛S harvested from free energy calculation methods are distributed according to their Boltzmann weight.However, in many cases these congurations are sampled under the inuence of a bias, for instance by employing a parabolic potential in umbrella sampling.By expressing the time correlation function C AB (t) in terms of path averages over the ensemble P B [x(s)] of trajectories generated by shooting from points obtained using a bias potential, eqn (8) can be easily generalized (see Appendix A.3).

Improving sampling by shiing the pathway origins
The computation of the correlation function C AB (t) in the ensemble P G [x(s)] can be performed more efficiently by treating long trajectories as a collection of shorter pathways with shied starting points.Imagine generating one pathway consisting of 2L + 1 congurations from a starting conguration x in S by propagating L steps in the forward and in the backward direction.From the resulting long trajectory, one can extract a total L + 1 shorter trajectories of length L + 1, one starting in x 0 , one in x 1 and so on up to the trajectory starting at the shooting point x L ¼ x (see Fig. 2).Each of these trajectories contains at least one point in S, namely the shooting point x, and is thus a member of the desired ensemble.Moreover, each of the resulting trajectories is created with the same probability by the algorithm (see eqn (28) in the Appendix), so one can simply average over these pathways by way of the improved algorithm outlined below: Step 1. Generate a state x from the probability distribution r S ðxÞ ¼ rðxÞh S ðxÞ Ð dxrðxÞh S ðxÞ : Step 2. Starting from x, perform L dynamics steps forward and L steps backward in time.Here, s ¼ LDt is the maximum length for which the correlation function C AB (t) is to be computed.In practice, L should be chosen to be sufficiently large such that the system commits to either state A or state B when the trajectory is initiated from a point inside S.
Step 3. Run over all L + 1 possible trajectories that still envelop the original shooting point x L with the initial points x 0 ,.,x L and compute N S [x(s)] for each trajectory.
Step 4. Update the path average appearing in eqn (8) and repeat.
Once the path average hh A (0)h B (t)/N S [x(s)]i G has been determined according to this method, it can be combined with the ratio hh S i/hh A i, determined from the free energy calculation, to yield the time correlation function C AB (t).The transition rate constant k AB is then obtained from C AB (t) by numerical differentiation (or t of a straight line) in its linear regime (provided it exists).

Brownian walker in a double-well
We demonstrate the algorithm by applying it to a simple model for an activated process, namely, a one-dimensional Brownian walker in a double-well potential.For this model, the correlation function hh A (0)h B (t)i S can be computed in a direct simulation, providing a point of comparison to the results of biased and unbiased S-shooting.The system is supposed to obey over-damped Langevin dynamics, where b is the inverse temperature, D is the diffusion constant, F(x) ¼ ÀdU(x)/dx is the force exerted on the particle by the underlying double-well potential and x is delta-correlated Gaussian white noise with zero mean and unit variance.The chosen parameters are D ¼ 1.0, b ¼ 4.0 and Dt ¼ 0.001.The regions A, S and B are dened as x < À0.4,À0.1 < x < 0.1 and 0.4 < x, respectively (a schematic representation of the chosen region boundaries is shown in Fig. 1c).
As a reference, we obtain the correlation function hh A (0)h B (t)i S from a single long trajectory produced by a straightforward Brownian dynamics simulation of 5 Â 10 8 time steps.This is done by averaging over all trajectory segments of length s ¼ 0.5 with at least one point in S.Then, we compute the correlation function hh A (0)h B (t)i S by S-shooting, integrating forward and backward in time from points in S sampled by a Monte Carlo simulation with random displacements drawn from a Gaussian distribution.The whole procedure was repeated by carrying out this Monte Carlo simulation under the inuence of a bias U b ¼ x 2 /2, thereby obtaining the starting points for biased S-shooting.In terms of averages obtained in the ensemble of trajectories sampled by the algorithm, the correlation function is given by in the case without bias [see eqn (32)] and by in the case with bias [see eqn (40)].
The agreement between the two variants of S-shooting and the straightforward simulation is shown in Fig. 3 and 4. Aer an initial transient, the correlation function hh A (0)h B (t)i S enters a linear regime.Consequently, its time derivative exhibits a plateau whose value can be used in the estimate for the reaction rate constant k AB .Using the averages hh S i ¼ 0.00407 and hh A i ¼ 0.487 obtained from the straightforward simulation as well as the path average hN S [x(s)]i S ¼ 24.58, we obtain the rate constant k AB ¼ 0.056.
Analysis of the convergence of the rate constants as a function of the number of generated pathways indicates that the statistical error in the rate constants obtained by S-shooting is similar to that of divided saddle theory (provided the same method is used for the free energy calculation).Both methods extract the reaction rate constants from the same set of dynamical trajectories such that this equivalence of their efficiencies is not so surprising.Since divided saddle theory has been shown 18 to compare well with the reactive ux method of Bennett and Chandler using the effective positive ux approach for the calculation of the transmission coefficient, 21 this is the case also for the S-shooting method.

Cavitation in water under tension
Liquids can sustain remarkably strong tensions due to the free energetic cost associated with the formation of a liquid-vapor interface which impedes an immediate transition to the vapor phase.Water, in particular, can exist in such a metastable state for a long time before decaying into the vapor phase via cavitation, i.e., bubble nucleation, which has implications for the behavior of various biological systems [22][23][24][25][26] and for technical applications. 27,28As a further demonstration of S-shooting, we now use it to compute the cavitation rate, i.e., the number of cavitation events per unit time and unit volume, of liquid water at different negative pressures (the cavitation free energy and cavitation rates were rst obtained in ref. 29).Specically, we consider a system of N ¼ 2000 water molecules interacting via the TIP4P/2005 potential 30 with long range forces treated with Ewald sums.This system is exposed to pressures ranging from p ¼ À105 MPa to p ¼ À165 MPa at a temperature of T ¼ 296.4 K, where the equation of state is known at moderate negative pressures from experiments. 31o compute the free energy of bubble formation, one must be able to detect bubbles and determine their size for any molecular conguration of this system.This is accomplished using a grid-based procedure that is calibrated to give a thermodynamically consistent estimate for the volume of a bubble. 32The bubble volume obtained in this way corresponds to the average increase in system volume due to the presence of a bubble compared to the unconstrained metastable liquid at the same conditions.We use the volume of the largest bubble present in the system, v, as the reaction coordinate.The free energy, F(n), as a function of the bubble volume v is given by F(n) ¼ Àln[n 0 p(n)], where p(n) is the probability density of encountering a conguration with a largest bubble of size v and n 0 is an arbitrary constant volume required to make the argument of the logarithm dimensionless.We computed p(n), and from it the free energy F(n), by employing hybrid Monte Carlo 33,34 umbrella sampling 35 with "hard" windows in the isobaric-isothermal ensemble.
Free energy proles F(n) computed for several pressures are shown in Fig. 5.The shape of these curves can be understood in the general framework of classical nucleation theory.The tension applied to the metastable liquid favors the formation and subsequent growth of bubbles through a gain in mechanical work, pn, by expanding the system under tension.This contribution, in conjunction with the free energy cost of forming the interface (Ag, where A is the surface of the bubble and g is the surface tension), leads to a barrier in the free energy which separates the metastable liquid from the vapor (shown in Fig. 5).Once the system overcomes this barrier, it transitions to the vapor phase, which, in contrast to the liquid, cannot sustain tension.Consequently, there is no stable basin on the vapor side of the free energy barrier.
In order to apply the S-shooting formalism to the calculation of bubble nucleation rates, we need to dene the stable regions A and B as well as the transition region S. Based on the computed free energy proles, we dene the region S to be located around the top of the free barrier and regions A and B le and right of the barrier.A schematic representation of the regions A, B and S employed here is shown in Fig. 1a and a detailed list of the region boundaries is given in Table 1.Once the regions are dened, the averages hh A i and hh S i needed for the rate calculation can be determined from the free energy proles.Next, we need to generate dynamical trajectories from region S in order to compute the correlation function hh A (0)h B (t)i S .Starting from congurations in the region S generated in the free energy calculations, we create pathways by propagating the system backward and forward in time at constant pressure 36 and temperature 37,38 Fig. 5 Free energy F ¼ Àln[v 0 p(v)] as a function of the volume v of the largest bubble, where v 0 is an arbitrary constant volume, for various pressures.Due to the gain in mechanical work pv associated with the formation of a bubble, the height of the barrier and the size of the critical bubble decrease with increasing tension.Curves are shifted such that the lowest point in the liquid basin aligns for all pressures.0][41] In order to keep the computational cost manageable, the trajectories used in the rate computation are propagated until they reach a xed value along the reaction coordinate, rather than for a xed time.These xed values were chosen such that they correspond to being approximately 10 k B T lower than the top of the free energy barrier, at which point trajectories are unlikely to re-cross the barrier.† Since this approach leads to trajectories of varying length, trajectories shorter than the desired trajectory length 2L + 1 were padded with their nal value on either side of the barrier when computing the correlation function C AB (t).
Correlation functions hh A (0)h B (t)i S , obtained via eqn (32) from trajectories in the ensemble P G [x(s)], are shown in Fig. 6.Since the state S fully overlaps with the adjacent states A and B, one encounters non-nite contributions to the correlation hh A (0)h B (t)i S even for short times, leading to a steep slope for small t (in contrast to the shape of hh A (0)h B (t)i S observed in Section 3.1).Its time derivative,  a The equilibrium probability ratio hh S i/hh A i was obtained from the free energy data shown in Fig. 5 and hVi is the average volume of the metastable liquid at the appropriate pressure.
b The cavitation rate J is obtained by combining the equilibrium probability ratio with a t to the long-time tail of the time derivative of the correlation function from Fig. 7. † For the two lowest tensions investigated, p ¼ À105 MPa and p ¼ À120 MPa, we extrapolated the free energy to higher bubble volumes to determine the limiting value of the order parameter.

Paper Faraday Discussions
This This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.
shown in Fig. 7, exhibits transient behavior followed by the emergence of a plateau for longer times.The resulting cavitation rates J ¼ k AB /hVi, which span almost 40 orders of magnitude over the investigated range of pressures, are presented in Table 1.

Conclusion
In the study of rare transitions between long-lived stable states, one oen uses large numbers of short molecular dynamics trajectories to determine committors or estimate diffusion coefficients along some collective variable of interest.Here we have presented an algorithm to calculate reaction rate constants by combining dynamical information extracted from such brief trajectories with the results of free energy calculations.For the method to be computationally efficient, the trajectories need to be initiated close to the transition state region such that they have a non-negligible probability to connect the stable states.Hence, the method follows the central idea of the Bennett-Chandler approach, 19,20 in which one rst computes the transition state theory approximation of the reaction rate constant based on the free energy, and then applies a dynamical correction obtained from short trajectories started on a dividing surface separating the stable states.Our approach uses the same information as the divided saddle theory of Daru and Stirling, 18 but processes it in a different way to yield the time correlation function of the stable state populations, from which the reaction rate constant is obtained by taking a time derivative.Knowledge of the correlation function also yields information about the barrier crossing dynamics and permits to verify whether the kinetics follows the exponential behavior expected from the phenomenological rate equations.Just like the Bennett-Chandler approach and divided saddle theory, the new method, which is equally applicable to under-and overdamped dynamics, also requires a priori knowledge of a reaction coordinate.The reaction coordinate needs to provide an at least rough measure for the progress of the transition and it can be either continuous or discrete, such as the size of the largest crystalline cluster usually used in crystallization studies.As an illustration, we have applied the procedure to a Brownian walker in a double well potential and to cavitation in water at negative pressures, demonstrating that it can be used to determine reaction rate constants in complex condensed environments.

A Derivation of the time correlation function in the S-shooting formalism
A.1 The time correlation function in the S-ensemble The time correlation function which equals the conditional probability to nd the system in B at time t provided it was in A at time 0, can be written as an average in the ensemble P[x(s)] of trajectories x(s) of length s $ t: where the notation Ð DxðsÞ indicates a summation over all pathways x(s), x t is the microscopic state (which we also call conguration or point) of the system at time t, and the probability density of a trajectory consisting of L + 1 congurations is given by Here, r(x 0 ) is the equilibrium probability density of the conguration x 0 in the thermodynamic ensemble of interest and p(x i / x i+1 ) is the probability density of reaching conguration x i+1 when the system in conguration x i is propagated by one step.The time correlation function C AB (t) contains all the information needed to determine the rate constant k AB for transitions from A to B, which is equal to the time derivative of C AB (t) in its linear regime.If transitions from A to B are rare, it is difficult to determine C AB (t) from straightforward molecular dynamics simulations.In the following we present an algorithm to determine C AB (t) that is not affected by this limitation.
Although the integral in eqn ( 15) extends over all possible trajectories, the nonvanishing contributions to the correlation function C AB (t) stem from trajectories that start in A and reach the product state B by the time t.As such, one can obtain the correlation function C AB (t) by sampling from a constrained ensemble, provided that the constrained ensemble contains all trajectories going from A to B. To dene such an ensemble, we introduce the additional region S located such that any trajectory transitioning from A to B necessarily crosses S, i.e., has at least one point in S (see Fig. 1).The correlation function can then be written as Here, the path function where the denominator normalizes the distribution.Since any reactive trajectory, i.e., any trajectory connecting A and B, has to have points in S, the ensemble P S [x(s)] contains all reactive trajectories, albeit with a statistical weight that differs from that in the equilibrium trajectory ensemble P[x(s)] by the factor hH S [x(s)]i, the probability of an equilibrium trajectory of length s to visit S. In order to evaluate C AB (t) according to eqn (18) using trajectories sampled from P S [x(s)], we write eqn (18) as where hh S i ¼ ð q max S q min S pðqÞdq is the equilibrium population of the region S, such that the ratio hh S i/hh A i can be obtained from the free energy F(q) ¼ Àk B T ln p(q) computed as a function of the reaction coordinate q.Since hh A (0)h B (t)i S can be determined as a path average over the ensemble P S [x(s)], all that is still needed to compute C AB (t) is the ratio hH S [x(s)]i/hh S i.Since the dynamics is microscopically reversible and thus fullls detailed balance, one can express the average hh S i as a path average, evaluated at an arbitrary point x i along the trajectories.Since if h S (x i ) ¼ 1 for at least one point of the trajectory x(s) also H S [x(s)] ¼ 1, we can insert H S [x(s)] into the average hh S (x i )i without changing its value and we obtain But since all congurations x i along a path occur with the same likelihood (see eqn ( 21)), one can simply average hh S (x i )i S over all L + 1 time slices: A.2 Relation to the ensemble generated by S-shooting The algorithm presented in the main text generates trajectories by picking shooting points x in S according to the probability density and subsequently propagating these congurations forward and backward in time.But which ensemble is generated by this algorithm?Since every trajectory x(s) visiting S, and as such all reactive trajectories, has at least one conguration in S by denition, it can be generated by shooting from points in S.However, in order to verify whether the algorithm generates the desired ensemble, one has to determine how these trajectories are weighted with respect to one another, i.e., one has to inspect the likelihood with which the algorithm generates a particular trajectory.In particular, in doing that one has to take into account that trajectories with multiple points in S have more than one way to be generated by the algorithm.Consider a trajectory xðsÞ ¼ fx 0 ; /; x i1 ; /; x i N S ½xðsÞ ; /; x L g with N S [x(s)] congurations in region S. (Two examples of such trajectories are shown in Fig. 8.) Here, the N S [x(s)] subscripts i 1 ,i 2 ,.,i N S [x(s)] are the indices of the congurations of the trajectory x(s) that are in S. In the shooting algorithm presented in the main text, the trajectory x(s) can be generated by shooting from each of these N S [x(s)] points.Let us now consider the probability density that the trajectory x(s) is generated from the rst one of these points, x i 1 .For this to happen, conguration x i 1 must rst be selected from r S (x i 1 ) and then designated to be the i 1 -th conguration along the trajectory by drawing the index i 1 with probability 1/(L + 1) from the integers 0 to L. Starting from x i 1 , one then propagates the dynamics for L À i 1 steps in the forward and i 1 steps in the backward direction using the rules of the underlying dynamics, such that the probability density p[x(s);x i 1 ] to generate exactly trajectory x(s) from conguration x i 1 is given by p½xðsÞ where x is the conguration x with reversed momenta (because the rst product corresponds to propagating the system backward in time).Since the dynamics is microscopically reversible, the transition probability obeys detailed balance; in particular, r(x i+1 )p( x i+1 / x i ) ¼ r(x i )p(x i / x i+1 ), where we took advantage of the fact that the Hamiltonian does not change when the momenta of a conguration are reversed, i.e., r( x) ¼ r(x).Applying eqn (27) to eqn (26) repeatedly, the equilibrium probability density r(x i 1 ), initially applied to x i 1 , is shied to the rst conguration x 0 of the trajectory: P½xðsÞ; where h S (x i 1 ) ¼ 1 since the shooting point x i 1 is in S by construction.The equation above shows that the likelihood of generating x(s) is independent of the chosen shooting point and the probability of generating the trajectory from conguration x i 1 is the same as that of generating it from any of the other possible shooting points x i 2 to x i N S ½xðsÞ .Hence, the total likelihood to generate a particular trajectory x(s) by shooting from points in S is just N S [x(s)] times the likelihood of generating it by shooting from one specic point in S: Thus, the likelihood to generate a particular trajectory x(s) in the ensemble P G [x(s)] generated by the shooting algorithm is related to the likelihood in the ensemble P S [x(s)] of trajectories visiting S via P G [x(s)] f N S [x(s)]P S [x(s)], i.e., in the F41] as well as under grant P24681-N20. A. S. also acknowledges support from the Vienna Scientic Cluster (VSC) School.Calculations were carried out on the Vienna Scientic Cluster (VSC).

Fig. 2 A
Fig.2A long pathway consisting of 2L + 1 time-slices (top) can be viewed as the envelope of L + 1 trajectories of length L (below).Note that each of the paths contains at least one configuration in S, namely the original shooting point x L , and is thus a member of the S-ensemble.

Fig. 3
Fig. 3 Correlation function hh A (0)h B (t)i S for a Brownian walker in a double-well potential.The estimates for the correlation function obtained by the S-shooting method (red and blue lines) agree perfectly with the results of the straightforward Brownian dynamics simulation (black line).

Fig. 4
Fig. 4 Numerical time derivative dhh A (0)h B (t)i S /dt of the correlation function.Note the plateau emerging after the initial transient behavior at t z 0.3; this value enters the computation of k AB .

Fig. 6
Fig. 6 Correlation function hh A (0)h B (t)i S .Note the difference in shape compared to Fig. 3 due to the different boundaries of the states A and B.

Fig. 7
Fig.7Time derivative dhh A (0)h B (t)i S /dt of the correlation function depicted in Fig.6.A fit to the emerging plateau for long times is used in the computation of cavitation rates.
)where we inserted H S [x(s)] in the rst line to emphasize that this algorithm only creates trajectories with at least one conguration in S.The ensemble P G [x(s)] can be expressed in terms of the ensemble P S [x(s)] of trajectories visiting S, P G ½xðsÞ ¼ N S ½xðsÞ P½xðsÞH S ½xðsÞ Ð DxðsÞP½xðsÞH S ½xðsÞ Ð DxðsÞP½xðsÞH S ½xðsÞ ðL þ 1Þhh S i ¼ N S ½xðsÞP S ½xðsÞ hH S ½xðsÞi ðL þ 1Þhh S i :

Fig. 8
Fig.8When shooting points are picked from the region S according to the probability density r S (x), pathways with a larger number N S [x(s)] of configurations in S are more likely to be created.This overcounting of pathways must be corrected for when computing path averages.
1. DxðsÞ indicates a summation over all trajectories x(s) and the path function H S [x(s)] gives unity if the trajectory x(s) has at least one conguration in S and zero otherwise.The integral in the denominator on the right-hand side of the
journal is © The Royal Society of Chemistry 2016 Faraday Discuss., 2016, 195, 345-364 | 355 H S [x(s)] is unity if the trajectory x(s) has at least one point in S and zero otherwise.We can insert H S [x(s)] ¼ 1 in the average above without changing the correlation function, because if both h A (0) ¼ 1 and h B (t) ¼ 1 then H S [x(s)] ¼ 1, and if h A (0) ¼ 0 or h B (t) ¼ 0 the value of H S [x(s)] does not matter.Multiplying and dividing by hH S [x(s)]i one obtains Open Access Article.Published on 14 June 2016.Downloaded on 10/9/2023 8:59:37 AM.This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.S ðx i Þ is the number of congurations in S. Consequently, hh S i S is the fraction of congurations x i of a given path x(s) located in the region S. Inserting this result into eqn (20) yields the time-correlation function C AB (t) expressed in terms of path averages obtained in the ensemble of trajectories visiting S, Faraday Discussions Paper358 | Faraday Discuss., 2016, 195, 345-364 This journal is © The Royal Society of Chemistry 2016