Rajesh
Dutta
and
Eli
Pollak
*
Chemical and Biological Physics Department, Weizmann Institute of Science, 7610001 Rehovot, Israel. E-mail: eli.pollak@weizmann.ac.il
First published on 6th October 2022
Experimentally measured transition path time distributions are usually analyzed theoretically in terms of a diffusion equation over a free energy barrier. It is though well understood that the free energy profile separating the folded and unfolded states of a protein is characterized as a transition through many stable micro-states which exist between the folded and unfolded states. Why is it then justified to model the transition path dynamics in terms of a diffusion equation, namely the Smoluchowski equation (SE)? In principle, van Kampen has shown that a nearest neighbor Markov chain of thermal jumps between neighboring microstates will lead in a continuum limit to the SE, such that the friction coefficient is proportional to the mean residence time in each micro-state. However, the practical question of how many microstates are needed to justify modeling the transition path dynamics in terms of an SE has not been addressed. This is a central topic of this paper where we compare numerical results for transition paths based on the diffusion equation on the one hand and the nearest neighbor Markov jump model on the other. Comparison of the transition path time distributions shows that one needs at least a few dozen microstates to obtain reasonable agreement between the two approaches. Using the Markov nearest neighbor model one also obtains good agreement with the experimentally measured transition path time distributions for a DNA hairpin and PrP protein. As found previously when using the diffusion equation, the Markov chain model used here also reproduces the experimentally measured long time tail and confirms that the transition path barrier height is ∼3kBT. This study indicates that in the future, when attempting to model experimentally measured transition path time distributions, one should perhaps prefer a nearest neighbor Markov model which is well defined also for rough energy landscapes. Such studies can also shed light on the minimal number of microstates needed to unravel the experimental data.
Our recent study of the transition path time distribution based on a diffusion equation (Smoluchowski equation) led to the conclusion that long time tails in the transition path time distribution indicate the existence of a “trap” in the transition path region.37 Introduction of a well along the transition path gave good agreement with the experimental data provided that the transition path barrier height is ∼3kBT, significantly higher than previous estimates of ∼1kBT based on theoretical studies using bell shaped free energy surfaces and open boundary conditions. The same study negated the requirement of multi-dimensional effects, free energy barrier asymmetry, sub-diffusive memory kernels or systematic ruggedness to explain the experimentally measured data.
Our present understanding of the experiments is predicated on the use of a Smoluchowski equation (SE) to describe the dynamics. This assumption needs proper clarification. It is well understood that the SE is a coarse-grained description of the barrier crossing process. It may be derived as the continuum limit of a nearest neighbor coupled set of master equations38,39 for motion through a series of uncorrelated transitions or random hopping between the many microstates that exist along the transition path. The “friction coefficient” used in the SE is related to the trapping time in the microstates. The strong friction limit, which validates the use of the SE, is a result of the long trapping times in the microstates. The free energy surfaces are known to be “rugged”39–43 and may be described in principle by a series of random well and barrier energies along the reaction path. A rugged energy landscape is more “physical” than that of the smooth free energy surface used in the diffusion equation analysis of the experiments. It is applicable not only to protein folding–unfolding40,41,44 but also to the glass transition.45,46 Zwanzig described the effect of the roughness of the potential by the introduction of random components on the potential energy surface.42 He found that the roughness of the potential leads to a non-Arrhenius increase in the mean first passage time and provided a relation for the dependence of the diffusion coefficient on the variance of the randomness associated with the rough potential. Bryngelson and Wolynes47 used a generalized master equation formalism and continuous-time random walk description to estimate mean first passage times for protein folding. They considered the number of native contacts as the reaction coordinate.48 Nevo et al.49 showed how the roughness can be measured by introducing an external force on the biosystem. In a series of studies Bagchi and co-workers investigated the role of spatial correlation of ruggedness and dimensionality dependence of diffusion and entropy when the energy landscape is rugged.50–52 In a recent study, Wales investigated how kinetic traps and corresponding barriers affect the first passage time distribution using a master equation approach.53 A similar analysis was also used in the study of single molecule enzyme kinetics.54
Li et al. studied the first-passage-time distribution in a time-dependent parabolic potential with spatial roughness using Kramers theory and a nonsingular integral equation.55,56 They concluded that the theory based on an integral equation almost correctly describes the mean first passage time for the whole range of external force whereas Kramers' theory57,58 is valid for only small external forces. Metzler and coworkers36 explored the role of roughness in the barrier by varying the amplitude and periodicity of the roughness to describe the transition path properties. In a recent experiment, Woodside and co-workers found brief pauses in transition paths, suggesting roughness or the presence of micro-barriers in the free energy landscape.59
Despite the wide usage of the diffusion equation, it is not at all clear why it is really valid in the context of protein folding and unfolding dynamics. The continuum limit of the nearest neighbor master equation is valid provided that the heights of the barriers separating the microstates change systematically. The continuum limit is not well defined when the free energy profile is that of a rough landscape. Even if the barrier profile is smooth it remains unclear how many microstates are need in practice to assure that the SE provides a valid description of the dynamics. These are the two central questions which motivated the present study. Theoretical formulation of the nearest neighbor master equation and the diffusion equation is reviewed in Section 2.1. We then provide in Section 2.2 a numerical study which compares the dynamics obtained by the two approaches. Specifically, we compare the transition path time distributions obtained for a bell shaped barrier profile for different models with different numbers of microstates. In Section 3 we apply the master equation model to analyse the experimentally measured transition path time distributions for the hairpin DNA molecule and the PrP protein.14,15 We find that also the master equation model leads to the conclusion that there is a central well along the reaction path which is responsible for the experimentally measured long time tails. We conclude with a discussion of the implications of the present work on the widespread usage of diffusion equations when modeling transition path time distributions for the folding–unfolding transition dynamics.
The diffusion equation, or equivalently the Langevin equation describing the motion along a one dimensional reaction coordinate is
![]() | (1) |
〈F(![]() |
〈F(![]() ![]() ![]() ![]() | (2) |
![]() | (3) |
In the master equation formalism, the system gets trapped in different microstates from which it can escape to either side by an Arrhenius rate law. The location of the microwells is determined by discretizing the reaction coordinate x. The distances between the microwells is assumed to be l.
The distance between the wells is taken such that if there are N wells, the over all distance between the initial and final barrier is l(N + 1). As may be seen from ref. 38 and 39, that the continuum limit is obtained by letting l → 0 and n → ∞ such that the overall distance from right to left is kept fixed. The potential of mean force in the resulting diffusion equation is the profile of the barrier energies. Since we want to compare between the diffusion dynamics of a parabolic barrier obtained from the continuum limit of a master equation, we take the profile of the barrier heights to be that of a parabolic barrier, while the well depths have a Gaussian random component such that
![]() | (4) |
To obtain the transition path time distribution from the diffusion equation, one needs to consider that the trajectories enter the transition region by crossing the left boundary located at −L and arrive at the right boundary L without returning to the left boundary. The transition path time distribution p(tTP) is then defined as
![]() | (5) |
![]() | (6) |
The calculation of the transition path time does not require a smooth potential when it is evaluated using the master equation. To obtain the transition path time distribution from the master equation, we use an algorithm based on the kinetic Monte Carlo simulation.65 The master equation approach is the same as the nearest neighbor jump model for a continuous time Markov chain where transitions from one state to another are governed through the forward and backward jump probabilities. The detailed steps are as follows. First, using the Arrhenius law, we evaluated the jump probability for going forward PF and backward PB from the j-th well
![]() | (7) |
![]() | (8) |
![]() | (9) |
We assume that initially at t = 0 the system is located at the j = 0 micro-state. We then impose absorbing boundary conditions at the j = 0 micro-state and at the j = N micro-state. The time is then advanced in steps of Δt and is stopped when the system reaches the final microstate. The histogram of final time values then gives the transition path time distribution with the condition of absorbing boundary at both ends.
βU(x) = −βUTPx2; −1 ≤ x ≤ 1 | (10) |
For the master equation dynamics the rate for the nearest neighbor jump is evaluated using eqn (10) where the value of the prefactors (ν) are chosen to take the (reduced) values of 18, 90, 320 and 700 for the models with 5, 11, 21 and 31 microstates, respectively. These prefactors assure that the magnitude of the mean transition path times obtained from both diffusion and master equation models are the same. They are obtained for each model using the relation between the diffusion coefficient (D) and the jump rate (k) for a one dimensional system random walk39,66
![]() | (11) |
The barrier heights for each microstate model are taken so as to fit the parabolic barrier shape. However, as described above, the well depths are obtained from a Gaussian distribution with a mean well depth β〈Uj〉 = −1. As a result the average barrier height for the nearest neighbor jump is assumed to be 1.
The transition path time distribution obtained from the diffusion equation is compared with the results obtained from the master equation in Fig. 2. The brown dotted line shows the distribution obtained from the diffusion equation. In each panel, the solid red, dot-dash green and dashed blue lined show the transition path time distributions obtained from the master equation for three different realizations of the random wells. Panels (a)–(d) correspond to 5, 11, 21 and 31 microwells used in the master equations. Each distribution was obtained from 106 trajectories propagated for up to 3 × 103 time steps. Comparison of the plots shows that the master equation distributions converge to the diffusion equation result as the number of microstates increases, irrespective of the precise realization of the random well depths. The resulting mean transition path times (see eqn (15)) are compared in Table 1. These mean time values show that although the mean time is roughly the same, it does change somewhat randomly, depending on the model used.
Diffusion equation | Master equation |
---|---|
0.42 | 0.36 (1), 0.42 (2), 0.34 (3) (5 microstates) |
0.39 (1), 0.41 (2), 0.36 (3) (11 microstates) | |
0.41 (1), 0.39 (2), 0.35 (3) (21 microstates) | |
0.42 (1), 0.38 (2), 0.36 (3) (31 microstates) |
For a more quantitative comparison of the distributions we computed the Pearson correlation function and mean squared deviations of the master equation models with respect to the diffusion equation. The Pearson correlation function and mean squared error are defined as
![]() | (12) |
![]() | (13) |
The resulting quality of fit is provided in Table 2. The fitting parameter is calculated only for the best fits obtained from different realizations of random microstate well depths in the models. As noted qualitatively from Fig. 1, it is evident from Table 2 that the correlation between data sets increases and the mean squared error decreases as the number of microstates increases. The trend suggests the validation of the diffusion equation as a model for the transition path dynamics, provided that one may assume a sufficient number of microstates bridging the path from the unfolded to the folded states. When the number of microstates is greater than 20 the quality of the fitting coefficients is almost unity and remains unchanged.
No. of microstates | PCC(Pxy) | MSE(M) |
---|---|---|
5 | 0.92 | 0.13 |
11 | 0.96 | 0.06 |
21 | 0.98 | 0.03 |
31 | 0.98 | 0.03 |
For the diffusion equation solution a piecewise continuous potential is constructed. For the DNA hairpin the parameters are taken to assure the measured activation energy of ∼10kBT. The potential chosen is:
Region-I
8.1(x + 2)4 − 9.1; −3 ≤ x ≤ −1 | (14) |
−92.571(x + 0.825)2 + 1.835; −1 ≤ x ≤ −0.795 | (15) |
3.493x2 − 0.456; −0.795 ≤ x ≤ 0.795 | (16) |
−92.571(x − 0.825)2 + 1.835; 0.795 ≤ x ≤ 1 | (17) |
8.1(x − 2)4 − 9.1; 1 ≤ x ≤ 3 | (18) |
A similar piece-wise continuous potential is constructed for the PrP protein. Here, the Arrhenius activation energy is ∼5kBT and the potential is.
Region-I
3.4(x + 2)6 − 4.4; −3 ≤ x ≤ −1 | (19) |
−40.8(x + 0.75)2 + 1.55; −1 ≤ x ≤ −0.68 | (20) |
4.2x2 − 0.592; −0.68 ≤ x ≤ 0.68 | (21) |
−40.8(x − 0.75)2 + 1.55; 0.68 ≤ x ≤ 1 | (22) |
3.4(x − 2)6 − 4.4; 1 ≤ x ≤ 3. | (23) |
![]() | ||
Fig. 3 Model potential energy surfaces for the DNA hairpin and PrP protein. The transition path covers half of the distance between the minima of the folded and unfolded states. The transition path region is extended to obtain a full potential energy surface where the barrier height is fitted to the experimental estimate.14 |
To apply the master equation modelling, the potentials are discretized using either 11 or 21 microstates keeping the potential energy profile defined by the potentials shown in Fig. 3. The resulting surfaces are shown in Fig. 4. The upper two panels show the microstate structure used for the DNA hairpin with 11 and 21 wells. As in Fig. 1 we employed three different random realizations of the profile of the microstates, these are shown as the solid (red) line, dotted (green) line and dashed (blue) line. The same is shown for the PrP protein in the bottom two panels of the figure.
![]() | ||
Fig. 4 Microstate profile of the potentials used to simulate the experimentally measured transition path time distributions. For further explanations, see the text. |
One may well ask how does one determine the number of microstates needed for a given system. The method we are suggesting, which is based on knowledge of the experimental time distributions is as follows. One initiates the fit with a small number of microstates of the order of 3–5, choosing an odd number of microstates to maintain symmetry. One then must ascertain whether the experimental mean time agrees with the model as well as use quantitative measures such as the Pearson correlation coefficient (PCC) and mean squared error (MSE) to assure that the simulation gives good agreement with the measured time distribution. As the number of microstates is increased, one will find that the fitting measures saturate. The number of microstates needed to simulate the experiment is the number needed to reach the saturation region. Inspection of Table 1 shows that in the cases studied here, 21 microstates are needed for the quantitative description of the transition path time distribution.
A comparison between the theoretical transition path time distributions obtained from the diffusion and master equations with the experimentally measured distributions for the DNA hairpin and PrP protein molecules is shown in Fig. 5. The experimental distributions are adapted from ref. 14 and normalized to unit area, as are the theoretical distributions shown in the figure. The time scale is reduced, using the experimentally fitted frequency factor 6 × 104 s−1 for the DNA hairpin and 3 × 103 s−1 for the PrP protein. The frequency factor ν is chosen as ν = 90 for the 11 microstates model, while ν = 320 is chosen when using 21 microstates. The solid (red), dotted (green) and dashed (blue) lines are for the three different realizations of the microwells, shown in Fig. 4.
As may be observed in all panels of the figure, good agreement between theory and experiment is obtained, irrespective of the theoretical model used. As few as 11 microstates are sufficient to account for the measured distributions. These results further validate the need to include a trap in the transition path region that leads to the long time tail in the time distribution. In this context, following our work, Woodside and co-workers recently observed brief pauses in the transition paths, indicating the existence of ruggedness or traps in the transition region, negating models based on a simple barrier free energy profile.59 These results further verify our claim that the transition path barrier height is ∼3kBT barrier height37 rather than ∼1kBT as found in ref. 14. We believe that our result is realistic, especially when keeping in mind that the activation energy for the folding–unfolding transition is of the order of ∼10kBT while the transition path begins and ends at half the distance between the minima of the folded and unfolded bio-molecules.
The respective mean (reduced) transition times obtained from the diffusion and master equations are compared with experiment in Table 3 using the diffusion equation and the three different realizations of the microstates, as shown in Fig. 4. As may be seen from the table, the agreement between theory and experiment is reasonable, for all models considered, although it improves when using 21 microstates instead of 11.
Experiment | Diffusion equation | Master equation | |
---|---|---|---|
DNA hairpin | 1.62 ± 0.12 | 1.69 | 1.73(1), 1.54(2), 1.47(3) (11 microstates) |
1.51(1), 1.68(2), 1.69(3) (21 microstates) | |||
PrP protein | 1.50 ± 0.30 | 1.53 | 1.46(1), 1.34(2), 1.57(3) (11 microstates) |
1.51(1), 1.45(2), 1.50(3) (21 microstates) |
It is well accepted that the free energy landscape along the reaction path is not “smooth” but rugged with many microstates along the way. To account for this we presented results based on numerical solution of a master equation for the nearest neighbor hopping transition between microstates in the presence of absorbing boundary condition. In this picture, the friction comes from the mean residence time of the system in each of the microstates before escaping them. The mean residence time is long as compared to internal motion within a bio-molecule, leading to diffusive like dynamics as evidenced by comparison with a suitably chosen diffusion equation.
The comparison between transition path time distributions obtained using the diffusion and master equation dynamics shows that agreement increases with the number of microstates. One can obtain quite good agreement between mean transition times even for less than ten microstates, however inspection of the transition path time distribution shows that typically one would need two dozen microstates to obtain similar results. On the other hand, analysis of experimentally measured transition path time distributions showed that one can mimic the experimental results using as few as 11 microstates. This comparison also further validates our recent conclusion that the long time tail found in the experimental distribution indicates the existence of an intermediate well along the reaction path whose well depth is a few kBT. The master equation simulation also confirms that the transition path barrier is ∼3kBT height as also obtained from an analysis based on the diffusion equation.37
The present study suggests that in principle, there is no need to use the simplistic diffusion equation dynamics to analyze transition path time distributions. A master equation approach, which is arguably more realistic due to the rugged energy landscape inherent to it, may lead not only to similar conclusions, but may also provide more insight as to the dynamics. For example, the well needed to obtain agreement with the long time tail experimental data itself is not “smooth” as in the diffusion equation, but contains a number of microstates along it, as may be discerned from Fig. 4.
On the other hand, the diffusion equation has the distinct advantage that sometimes, as with a parabolic21 or cusped barrier the solution is analytic.33,34 The fact that the master equation dynamics may lead to diffusion dynamics provides some justification for employing such analytic models, which shed light on the diffusional dynamics.
Finally, we do note that also the Markov modelling presented in this paper is an idealised description of the transition between folded and unfolded conformations of bio-molecules. Perhaps the present study provides further impetus for molecular dynamics studies on “realistic” force fields which include all molecular degrees of freedom.23,67,68 These may then be compared with Markov and diffusion equation studies.
This journal is © the Owner Societies 2022 |