Fabian
Knoch
and
Thomas
Speck
*

Institut für Physik, Johannes Gutenberg-Universität Mainz, Staudingerweg 7-9, 55128 Mainz, Germany. E-mail: thomas.speck@uni-mainz.de

Received
29th August 2017
, Accepted 26th October 2017

First published on 30th October 2017

We show how multi-ensemble Markov state models can be combined with constant-force equilibrium simulations. Besides obtaining the unfolding/folding rates, Markov state models allow gaining detailed insights into the folding dynamics and pathways through identifying folding intermediates and misfolded structures. For two specific peptides, we demonstrate that the end-to-end distance is an insufficient reaction coordinate. This problem is alleviated through constructing models with multiple collective variables, for which we employ the time-lagged independent component analysis requiring only minimal prior knowledge. Our results show that combining Markov state models with constant-force simulations is a promising strategy to bridge the gap between simulation and experiments even for medium-sized biomolecules.

## Design, System, ApplicationEngineering at the molecular level requires reliable and robust computational tools to predict properties like binding affinities and to efficiently scan large chemical spaces. Given the size of small biomolecules and the time scales involved, these tools inevitably are going to be multiscale methods. We employ discrete Markov state models, which are built from atomistic molecular dynamics simulations but allow faithful sampling of the slowest time scales that are typically far beyond the reach of a single molecular dynamics trajectory. On this time scale, the folding/self-assembly into the native structure occurs, which is required for the particular functionality of the molecules. We demonstrate the feasibility and usefulness of Markov state models to study the folding of single molecules at constant force, an important experimental technique. Here we elucidate the folding pathways of two peptides under an applied external force, but our method is versatile and can be applied to a wide range of bio- and macromolecules. |

In experiments, probing folding pathways can be realized through applying an extensional force to individual proteins, which is commonly implemented by one of two different protocols: pulling or constant force (force clamp). In pulling experiments, ends are separated with a constant speed or a constant loading rate while recording the force-extension curve. Hence, the protein is driven out of thermal equilibrium but the free energy might be recovered from the fluctuations.^{9,10} On the other hand, a constant force can be achieved through a feedback loop,^{4,11,12} which in principle generates an equilibrium ensemble of configurations at that force. Both techniques monitor structural dynamics only along a single reaction coordinate, which means that the interpretation of the data has to rely on theoretical models (see for example ref. 13 and references within).

Computational studies, on the other hand, provide a detailed picture of folding pathways with the full structural dynamics accessible on time scales from nanoseconds to milliseconds. Although millisecond molecular dynamics (MD) trajectories are nowadays affordable for small proteins on supercomputers,^{14} predicting folding rates requires the computation of trajectories exhibiting many unfolding/folding events. Markov state modeling has tackled this problem successfully for the folding of small proteins^{15–17} as well as ligand binding kinetics.^{18} This is achieved by coarse-graining the configuration space of these molecules into discrete sets based on the dynamical information provided by molecular dynamics simulations. Necessarily, the dynamics in this coarse-grained space is stochastic and (approximately) Markovian by virtue of the time scale separation underlying the construction of the discrete sets. One key advantage is that Markov state models (MSMs) can be constructed from many short trajectories, inferring transition rates of processes that are much longer than individual trajectories. Of course, any shortcomings of the input molecular dynamics simulations such as the dependence of certain molecular properties on the chosen force field (e.g., the over-representation of alpha-helical structures^{51}) will still be present in the final MSM.

Despite the modeling advantages and increase in computational power, it is still a challenging task to simulate protein dynamics for more than 100 residues. To connect simulations with force-probe experiments,^{19–21} molecular dynamics simulations mimic the experimental protocol by either applying a time-dependent harmonic force or a time-independent constant force. The advantage of pulling simulations is that even medium or large-sized proteins can be unfolded in a reasonable amount of time, though pulling velocities commonly differ by orders of magnitude from experimental velocities. Theoretical and experimental studies^{22,23} show, however, that folding pathways are dependent on pulling velocities, which makes direct comparison with experiments challenging. Another drawback is that the protein and its environment are not in thermodynamic equilibrium. More precisely, through the driving, they do not reach a steady state and hence cannot be described by Boltzmann's statistics. In constant-force simulations,^{24–26} on the other hand, a time-independent constant biasing force is applied, allowing the expression of all molecular interactions through a Hamiltonian operator. Constant-force simulations thus follow equilibrium statistical mechanics, enabling the construction of equilibrium MSMs while still accelerating the protein dynamics. In this study, we illustrate the application of equilibrium Markov state modeling to constant-force simulations for two specific peptides.

We consider a system effectively described by a one-dimensional collective variable (CV) r (the generalization to multi-dimensional CVs is straightforward). Defining the dimensionless potential of the mean force as βU(r) with β^{−1} = k_{B}T, the probability distribution is given by the Boltzmann factor

p(r) = e^{−βU(r)}/Z, | (1) |

p^{k}(r) = p(r)e^{−Bk(r)}/Z^{k}, | (2) |

The first step is to discretize the CV r into finite, non-overlapping cells (or centroids) with R_{i} being the ith centroid. The space-continuous MD trajectories are mapped onto all R_{i} centroids returning a Markov chain. To infer dynamical information, all observed transitions between states R_{i} → R_{j} are counted by c^{k}_{ij}. Finally, to estimate the entries t^{k}_{ij} of the transition matrices, a maximum likelihood approach is employed, which estimates all entries t^{k}_{ij} based on the observed transition counts c^{k}_{ij},

(3) |

We emphasize that eqn (3) is valid for every ensemble. Equilibrium dynamics obeys detailed balance p^{k}_{i}t^{k}_{ij} = p^{k}_{j}t^{k}_{ji}, which is used to constrain the maximization of eqn (3). This constraint enforces the stationary state of the MSM to correspond to thermal equilibrium.^{30}

To infer information from multiple ensembles, we express the stationary probability of centroid R_{i}

(4) |

Z^{k}_{i} = Z^{k}∫_{Ri}p^{k}(r)dr | (5) |

Connecting eqn (2) with eqn (3) and exploiting the local equilibrium conditions in eqn (4), the TRAM maximum likelihood function follows as

(6) |

To study conformational changes of peptides, a reasonable set of coordinates is given by the distances between C_{α} atoms, still spanning a high-dimensional configuration space. To further reduce the complexity of the configuration space, we employ the time-lagged independent component analysis (TICA).^{32} In addition to spatial information, TICA uses temporal information through computing time-lagged covariance matrices ∑(τ) = (σ_{ij}(τ)) and solving the generalized eigenvalue problem

∑(τ)U = ∑(0)UΛ | (7) |

To construct a reasonable set of CVs, we employ the TICA algorithm. As original configuration space representation serving as the input for TICA, we choose all C_{α} pair distances which are separated by at least two residues and, additionally, all native hydrogen bond distances between donor (nitrogen) and acceptor (oxygen) atoms that stabilize alpha-helical structures. More precisely, we use the distance between CO of the ith residue and N–H of the (i + 4)th residue. We compute the TICA time scale −τ_{TICA}/log(λ_{TICA}) for multiple lag times τ_{TICA} (see Fig. 1a). We select τ_{TICA} = 5 ns for the final representation since it is the shortest lag time for which the slowest implied TICA time scale is approximately independent of the lag time. Moreover, the number of effective TICA components recovering at least 90% of the kinetic variance, i.e., , drops down to four.

For this 34-dimensional configuration space, we keep the first four TICA components exhibiting a cumulative kinetic variance >92%. Together with the end-to-end distance, the 5-dimensional reduced configuration space is discretized employing the k-means clustering algorithm with 300 clusters. For the TICA input, we use only trajectories for f = 5 kJ mol^{−1} nm^{−1} since TICA computes its dominant components based on the information stored in the time-lagged covariance matrix (see eqn (7)) and thus an input mixed from different constant-force ensembles would lead to inferior results for the independent components. However, all trajectories serve as the input for the k-means clustering algorithm. Because the output of the k-means algorithm depends on the starting configuration of centroids (zeroth iteration), we perform the clustering with 50 different randomly selected configurations and select the discretization exhibiting the smallest error. We verified that the best three discretizations do not show any qualitative difference.

Regarding the choice of the initial configuration space representation (C_{α} distances and hydrogen bond distances), all of these coordinates, including the biasing coordinate (end-to-end distance), come with a distance metric, which improves the clustering results of the k-means algorithm and the subsequent space-discretization of trajectories. Note that in the study of Jo et al.^{38} the authors pursue a similar approach, though only using the C_{α} distances.

After mapping the trajectories of all constant-force ensembles onto the reduced configuration space and counting all observed transitions, we use TRAM with a lag time of τ_{TRAM} = 4 ns returning a fine-grained MSM for every ensemble containing 300 micro states (clusters). The selected lag time is the shortest for which the slowest implied time scale, i.e., −τ_{TRAM}/log(λ_{TRAM}), is approximately constant (data not shown) for all ensembles. As a means of error estimation, we split the 48 trajectories per applied external force into six independent sets and repeat the TRAM estimation for each set individually. In the following analysis, error bars correspond to the standard error computed from these six sets. If not shown, error bars are smaller than the marker size.

In Fig. 1b, an exemplary time series of the first two TICA components and the end-to-end distance for deca-alanine is shown. First, note that more than two states are involved which can be clearly seen in the time series of the first TICA component (upper panel) illustrated by jumps between plateaus (metastable states) with different amplitudes. The time series of the second TICA component offers less plateaus with more fluctuations (see the shaded time series), while for the time series of the end-to-end distance (lower panel), plateaus are insufficiently resolved. Additionally, fluctuations of the end-to-end distance are much stronger for the TICA components shown. The comparison of all three time series illustrates that for deca-alanine both TICA components are better suited as CVs than the end-to-end distance of the peptide.

1. Free energies.
To gain quantitative insight and to compare different constant-force ensembles, we compute the free energy landscape along the end-to-end distance, which is obtained by computing the stationary solution of the MSMs. The stationary probability distribution corresponds to the eigenvector belonging to the eigenvalue λ = 1 of its respective transition matrix. The free energy follows from F_{i} = −k_{B}Tlogp_{i}. Fig. 2a (upper panel) shows the free energy of every micro-state with (f = 10 kJ mol^{−1} nm^{−1}) and without constant-force bias (f = 0 kJ mol^{−1} nm^{−1}) exhibiting its global minimum at f_{min} ≈ 1.6 nm corresponding to the alpha-helical structure. Although the position of the minimum is identical for both ensembles, their free energy landscapes differ drastically. While for f = 0 kJ mol^{−1} nm^{−1} the free energy for shorter end-to-end distances is lower than for elongated configurations, the opposite applies for f = 10 kJ mol^{−1} nm^{−1} where the free energy of elongated configurations is almost as low as for folded ones. Furthermore, the maximum free energy difference is up to 25% higher for the unbiased ensemble.

For a better comparison with previous studies (see the Discussion section), we integrate out all dimensions except the end-to-end distance. The results for all ensembles are shown in Fig. 2c (lower panel) for which the probability distributions are binned with a bin size of 0.2 nm. For f = {0, 5, 10} kJ mol^{−1} nm^{−1}, the global minimum is located at the same position as in Fig. 2a. However, for f = 15 kJ mol^{−1} nm^{−1}, the global minimum is shifted (≃2.5 nm) toward the elongated configurations.

Moreover, we use this integrated free energy profile to verify the TRAM approach (for an in-depth benchmark see ref. 29). To this end, we conduct 48 additional (50 ns each) MD simulations for the unbiased ensemble and determine the integrated free energy (along the end-to-end distance) by constructing a “regular” MSM on the same configuration space discretization (τ_{MSM} = 4 ns and maximum likelihood estimator, see eqn (3)) and by computing the histogram over all trajectory frames. The obtained free energy profiles agree well with the results obtained from TRAM (see Fig. 2b).

2. Metastable sets and unfolding/folding rates.
To gain insight into stable configurations of deca-alanine, we employ the robust Perron-cluster cluster analysis (PCCA+)^{41} that coarse-grains a given MSM into k metastable sets. The PCCA+ is a discrete spectral clustering algorithm using the k-dominant eigenvectors of a given transition matrix, returning a fuzzy assignment of all micro-states to k macro states (metastable sets). “Fuzzy” means that instead of having a crisp assignment of micro-states to macro-states every micro-state is linked to every macro-state by a certain probability. Here we employ the PCCA+ for the transition matrix belonging to the unbiased ensemble and keep the assignment of metastable sets throughout the other ensembles. The transition matrix shows a spectral gap between its 6th and 7th largest eigenvalues (data not shown) implying a time scale separation (Δλ ≈ 0.35). Note that the eigenvalue spectra of the other ensembles are mostly shifted toward lower values and do not show such a dominant time scale separation, indicating the acceleration of the slow processes due to the biasing force. To exploit the time scale separation of the unbiased dynamics, we choose k = 6 as input parameter for PCCA+ algorithm.

In Fig. 2c, two projections of the discretized configuration space are shown along the first two TICA components as well as along the first TICA component and the end-to-end distance. While different colors indicate different metastable sets, gray centroids belong to transition states. One issue arises here: although the PCCA+ suggests 6 metastable sets, we manually separate stretched configurations (configurations in which all residues are almost linearly aligned, light blue colored set) from unfolded/weakly coiled configurations (orange set) as the former can be clearly identified by their end-to-end distance. PCCA+ does not detect stretched configurations automatically because they are temporally not well separated from the remaining unfolded configurations (orange set). These remaining unfolded configurations are weakly coiled along the centered residues, while outer residues can fluctuate freely. Large fluctuations of the outer residues can lead to hairpin-like structures which are stabilized by hydrogen bonds of the outer residues (yellow set). Hairpin-like structures are misfolded configurations acting as traps which slow down the folding process. Other misfolded configurations are detected by the red, magenta, dark blue and cyan colored sets. All of these misfolded configurations have a thing in common – that at some residues, the helical structure is disturbed, e.g., coiled in the wrong direction and stabilized by hydrogen bonds different from the ones guaranteeing the correct alpha-helical structure. Moreover, these misfolded structures act as “dead ends” in the folding process, i.e., to reach the correctly folded helical structure the peptide has to reverse the “misfolding step” first and pass through folding intermediates (orange colored configurations). Lastly, the green metastable set represents correctly folded configurations, that is, the dihedral angles formed by the inner residues are in the alpha helical configuration.

Knowing the probability distribution for all ensembles, we determine the cumulated probability of populating misfolded states (cyan, yellow, dark blue and magenta colored states in Fig. 2c). As show in Fig. 2e, all probabilities decrease drastically with increasing biasing force f. The sum over all misfolded probabilities (inset) equals approx. 17% without bias and vanishes (less than 1%) for f = 15 kJ mol^{−1} nm^{−1}. Individually, cyan colored configurations exhibit the largest population probability for f = 0 kJ mol^{−1} nm^{−1} (approx. 8%), followed by yellow and magenta colored configurations, with red and dark blue colored states being less important. For f = {5, 10} kJ mol^{−1} nm^{−1}, the order is partially disturbed and lost (or not significant) for f = 15 kJ mol^{−1} nm^{−1}.

Finally, folding/unfolding rates of deca-alanine are determined by computing inverse mean-first passage times (MFPTs) between correctly folded (green) and stretched (light blue) configurations. Fig. 2d shows both rates for all ensembles ranging from 7 × 10^{−4}–4 × 10^{−2} ns^{−1} (unfolding) and from 2 × 10^{−2}–3 × 10^{−3} ns^{−1} (folding).

As an initial configuration space representation, we select all C_{β} distances excluding the two closest neighbors, as well as all hydrogen bond forming donor–acceptor distances that stabilize the beta-helical structure. More precisely, the distance between CO of the ith residue and N–H of the (i − 2)th residue is chosen. This 21-dimensional space is reduced by TICA to three effective dimensions (cumulative kinetic variance >95%), for which we select the lag time τ_{TICA} = 2 ns due to a similar reasoning to that for deca-alanine. Including the end-to-end distance, a reduced 4-dimensional configuration space representation is then discretized into 300 micro-states employing the k-means clustering algorithm. Again, we use for TICA only the trajectories belonging to f = 5 kJ mol^{−1} nm^{−1}, while all trajectories serve as an input for the k-means algorithm. Next, we map all trajectories onto the discretized configuration space and estimate the transition matrices for all ensembles by employing TRAM with a lag time τ_{TRAM} = 0.6 ns, due to a lag time analysis (data not shown). Similar to the deca-alanine example, we compute error bars by splitting the 48 trajectories per external force into six independent sets, repeating the analysis for every set, and determining the standard error. If not shown, error bars are smaller than the marker size.

1. Metastable sets.
To visualize the reduced configuration space of beta-alanine, we show in Fig. 3a subspaces spanned by the first two TICA components and the first TICA component with the end-to-end distance. Circles represent centroid positions with different colors distinguishing different metastable sets. Same as before, metastable sets are identified by applying the PCCA+ algorithm to the unbiased transition matrix, corresponding to the five slowest modes (λ_{5}–λ_{6} ≈ 0.25). Similar to the analysis of deca-alanine, we identify a seventh set, exclusively expressing stretched configurations, by the end-to-end distance. These seven identified metastable sets are distinguishing correctly folded beta-helical structures (green), stretched configurations (light blue), misfolded structures (cyan and red) and folding intermediates. Structures characterized as folding intermediates are exclusively coiled along the centered residue (magenta), coiled at the N-terminus (dark blue) and unfolded but weakly stabilized by one or two hydrogen bounds (orange), which also stabilize the correctly folded beta-helix.^{45}

2. Free energies.
In Fig. 3d, we show the free energy of every individual centroid plotted against the end-to-end distance for the unbiased ensemble. Compared with the previous example (see Fig. 2b), the free energy landscape shows a rather broad minimal region between 1–1.5 nm, coinciding with 6 out of 7 metastable sets (cf.Fig. 3a). However, when selecting the first TICA component, the free energy landscape exhibits 3 distinguishable minima as shown in Fig. 3e (upper panel) for which we have integrated out the remaining configuration space components. The first minimum for tica_{#1} ≤ −2.5 belongs to correctly folded structures. The second minimum at tica_{#1} ≃ −1.5 is identified as belonging to folding intermediates (magenta and blue colored structures) and finally the global minimum at tica_{#1} ≃ 0.5 corresponds to (orange) unfolded structures (colors match the colored sets illustrated in Fig. 3a). The free energy projection along the first TICA component thus reveals that the beta-helical structure is thermodynamically unstable, which is hidden when only taking the end-to-end distance into account. The lower panel of Fig. 3e shows the same analysis but for all ensembles. Interestingly, for f = {0, 2, 5} kJ mol^{−1} nm^{−1}, all free energy landscapes are almost identical but for f = 10 kJ mol^{−1} nm^{−1}_{,} a noticeable change occurs. The free energy landscape between the first two minima becomes rather flat, whereas a new minimum appears at tica_{#1} ≃ −0.7 and the global minimum is shifted to tica_{#1} ≃ −0.7.

3. Unfolding/folding rates and unfolding pathways.
To obtain quantitative unfolding/folding rates, inverse MFPTs are computed connecting folded (green) and stretched configurations (light blue). Both rates are displayed for all constant-force ensembles in Fig. 3f, ranging from 2.7 × 10^{−2}–5 × 10^{−3} per ns (unfolding) and from 2 × 10^{−2}–2 × 10^{−1} per ns.

Next, we quantify the importance of folding intermediates by determining the unfolding pathways of the peptide. To this end, we employ a discrete transition path theory^{46} by identifying and weighting pathways starting from folded and ending with stretched configurations. In Fig. 3b, we show the transition network for the unbiased ensemble, where metastable sets are represented by an exemplary snapshot and enumerated by colored labels with colors matching the colors used in Fig. 3a. The arrows point in the direction of the reactive flux with their width indicating the relative importance of its transition (the thicker, the more important). For example, the transition from state 1 to 4 illustrates that between 10% and 25% of all transition pathways – connecting folded and stretched configurations – run along this transition. The main contribution is given by pathways containing folding intermediates, while structures coiled at the N-terminus (dark blue) are less important. Interestingly, pathways including misfolded configurations do not contribute (<5%) to the unfolding pathways, indicating that these states act as kinetic traps. A similar picture is obtained when applying the same analysis to the constant-force ensembles shown in Fig. 3c, although slight differences appear. For f = {5, 10} kJ mol^{−1} nm^{−1}, for instance, the transition from 1 → 2 vanishes. However, a new transition from 4 → 1 appears or rather becomes more dominant. Moreover, the most dominant pathway changes from 1 → 3 → 5 for f = {0, 2} kJ mol^{−1} nm^{−1} to 1 → 4 → 3 → 5 for f = {5, 10} kJ mol^{−1} nm^{−1}. Numerical values for all shown transitions are presented in Table 1.

# Transition | f = 0 | f = 2 | f = 5 | f = 10 |
---|---|---|---|---|

1 → 2 | 8 ± 1 | 10 ± 1 | — | — |

1 → 3 | 45 ± 3 | 42 ± 2 | 46 ± 2 | 32 ± 4 |

1 → 4 | 31 ± 3 | 34 ± 3 | 42 ± 2 | 54 ± 2 |

2 → 3 | 8 ± 1 | 10 ± 2 | 7 ± 1 | 11 ± 2 |

3 → 5 | 74 ± 1 | 72 ± 2 | 74 ± 2 | 68 ± 1 |

4 → 2 | — | — | 8 ± 1 | 10 ±1 |

4 → 3 | 20 ± 2 | 22 ± 1 | 23 ± 1 | 26 ± 4 |

4 → 5 | 10 ± 1 | 11 ± 1 | 10 ± 1 | 18 ± 3 |

Although the dihedral angles spanned by every residue represent a well-tested set of CVs,^{47} our results show that the C_{α}/C_{β} and hydrogen bond distances employed here yield an equivalently good description. The results of the study of Vitalini et al.,^{48} in which the authors explored the (unbiased) helix–coil transition for deca-alanine for different force fields by constructing MSMs from a discretized dihedral space, are in good agreement with our results; in particular when comparing the slowest implied time scales and the shortest lag time at which the implied time scales become lag time independent.

To compare the unbiased PMF of deca-alanine with previous studies, we compute the PMF along the end-to-end distance (see Fig. 2b). While its global minimum is found at 1.46 nm in ref. 37 and at around 1.4 nm in ref. 38, we find it at 1.6 nm, which agrees well when regarding the different definitions of the end-to-end distance. The general form of the PMF is in very good agreement with ref. 38, but differs from ref. 37 due to the different force field (CHARMM36 (ref. 50)) used. This difference stems from the fact that the CHARMM22/CMAP force field is over-stabilizing the alpha-helical structure.^{51}

The PMF along the first TICA component of beta-alanine (cf.Fig. 3e) reveals three stable regions corresponding to: folded structures, an intermediate, and weakly folded structures; with the latter being the most dominant region. When only considering the end-to-end distance (Fig. 3d), the integrated PMF decreases monotonically (data not shown) in agreement with the results obtained by Uribe et al.^{44} Moreover, in their study, the authors employed force-probe MD simulations separating the N- and C-termini with a constant velocity. By means of force-extension curves and hydrogen bond analyses, the authors concluded that the unfolding pathway includes two intermediates before obtaining a fully stretched peptide. We identify these intermediates with the magenta and orange colored sets shown in Fig. 3a and b. These findings are further supported by the reactive flux analysis as illustrated in Fig. 3b and c. Imposing a constant force shifts the weight of transitions pathways. Specifically, the transition 1 → 4 becomes more frequent while the transition 1 → 3 becomes less frequent with increasing bias. In other words, when applying a larger force, it is more likely to pass magenta colored states than without an applied force.

The additional minimum in the PMF for f = 10 kJ mol^{−1} nm^{−1} (lower panel Fig. 3e), however, suggests that the free energy landscape is drastically modified by the biasing force. To elucidate the mechanism causing the additional minimum, we visualize an exemplary configuration for tica_{#1} ≃ −0.7 exhibiting a low free energy. The snapshot shown in Fig. 3e illustrates that the outer residues are stretched while the inner residues are stabilized by hydrogen bonds.

- A. Nicholls, K. A. Sharp and B. Honig, Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons, Proteins: Struct., Funct., Bioinf., 1991, 11, 281–296 CrossRef CAS PubMed.
- K. A. Dill, S. Bromberg, K. Yue, H. S. Chan, K. M. Ftebig, D. P. Yee and P. D. Thomas, Principles of protein folding: a perspective from simple exact models, Protein Sci., 1995, 4, 561–602 CrossRef CAS PubMed.
- C. M. Dobson, Protein folding and misfolding, Nature, 2003, 426, 884–890 CrossRef CAS PubMed.
- G. Žoldák and M. Rief, Force as a single molecule probe of multidimensional protein energy landscapes, Curr. Opin. Struct. Biol., 2013, 23, 48–57 CrossRef PubMed.
- M. Rief, M. Gautel, F. Oesterhelt, J. M. Fernandez and H. E. Gaub, Reversible unfolding of individual titin immunoglobulin domains by AFM, Science, 1997, 276, 1109–1112 CrossRef CAS PubMed.
- M. Carrion-Vazquez, A. F. Oberhauser, S. B. Fowler, P. E. Marszalek, S. E. Broedel, J. Clarke and J. M. Fernandez, Mechanical and chemical unfolding of a single protein: a comparison, Proc. Natl. Acad. Sci. U. S. A., 1999, 96, 3694–3699 CrossRef CAS.
- M. S. Z. Kellermayer, S. B. Smith, H. L. Granzier and C. Bustamante, Folding–unfolding transitions in single titin molecules characterized with laser tweezers, Science, 1997, 276, 1112–1116 CrossRef CAS PubMed.
- C. Cecconi, E. A. Shank, C. Bustamante and S. Marqusee, Direct observation of the three-state folding of a single protein molecule, Science, 2005, 309, 2057–2060 CrossRef CAS PubMed.
- G. Hummer and A. Szabo, Free energy reconstruction from nonequilibrium single-molecule pulling experiments, Proc. Natl. Acad. Sci. U. S. A., 2001, 98, 3658 CrossRef CAS PubMed.
- D. Collin, F. Ritort, C. Jarzynski, S. B. Smith, I. Tinoco and C. Bustamante, Verification of the Crooks fluctuation theorem and recovery of RNA folding free energies, Nature, 2005, 437, 231 CrossRef CAS PubMed.
- W. J. Greenleaf, M. T. Woodside, E. A. Abbondanzieri and S. M. Block, Passive all-optical force clamp for high-resolution laser trapping, Phys. Rev. Lett., 2005, 95, 208102 CrossRef PubMed.
- J. Schönfelder, R. Perez-Jimenez and V. Muñoz, A simple two-state protein unfolds mechanically via multiple heterogeneous pathways at single-molecule resolution, Nat. Commun., 2016, 7, 11777 CrossRef PubMed.
- O. K. Dudko, G. Hummer and A. Szabo, Theory, analysis, and interpretation of single-molecule force spectroscopy experiments, Proc. Natl. Acad. Sci. U. S. A., 2008, 105, 15755–15760 CrossRef CAS PubMed.
- K. Lindorff-Larsen, S. Piana, R. O. Dror and D. E. Shaw, How fast-folding proteins fold, Science, 2011, 334, 517–520 CrossRef CAS PubMed.
- F. Noé, C. Schütte, E. Vanden-Eijnden, L. Reich and T. Weikl, Constructing the equilibrium ensemble of folding pathways from short off-equilibrium simulations, Proc. Natl. Acad. Sci. U. S. A., 2009, 106, 19011 CrossRef PubMed.
- V. A. Voelz, G. R. Bowman, K. Beauchamp and V. S. Pande, Molecular simulation of ab initio protein folding for a millisecond folder ntl9 (1-39), J. Am. Chem. Soc., 2010, 132, 1526–1528 CrossRef CAS PubMed.
- G. R. Bowman and V. S. Pande, Protein folded states are kinetic hubs, Proc. Natl. Acad. Sci. U. S. A., 2010, 107, 10890–10895 CrossRef CAS PubMed.
- N. Plattner and F. Noé, Protein conformational plasticity and complex ligand-binding kinetics explored by atomistic simulations and Markov models, Nat. Commun., 2015, 6, 7653 CrossRef PubMed.
- H. Grubmüller, B. Heymann and P. Tavan, Ligand binding: molecular mechanics calculation of the streptavidin–biotin rupture force, Science, 1996, 997–999 Search PubMed.
- H. Lu, B. Isralewitz, A. Krammer, V. Vogel and K. Schulten, Unfolding of titin immunoglobulin domains by steered molecular dynamics simulation, Biophys. J., 1998, 75, 662–671 CrossRef CAS PubMed.
- D. deSancho and R. B. Best, Reconciling intermediates in mechanical unfolding experiments with two-state protein folding in bulk, J. Phys. Chem. Lett., 2016, 7, 3798–3803 CrossRef CAS PubMed.
- Z. T. Yew, M. Schlierf, M. Rief and E. Paci, Direct evidence of the multidimensionality of the free-energy landscapes of proteins revealed by mechanical probes, Phys. Rev. E: Stat., Nonlinear, Soft Matter Phys., 2010, 81, 031923 CrossRef PubMed.
- F. Rico, L. Gonzalez, I. Casuso, M. Puig-Vidal and S. Scheuring, High-speed force spectroscopy unfolds titin at the velocity of molecular dynamics simulations, Science, 2013, 342, 741–743 CrossRef CAS PubMed.
- E. Paci and M. Karplus, Forced unfolding of fibronectin type 3 modules: an analysis by biased molecular dynamics simulations, J. Mol. Biol., 1999, 288, 441–459 CrossRef CAS PubMed.
- M. Sotomayor and K. Schulten, Single-molecule experiments in vitro and in silico, Science, 2007, 316, 1144–1148 CrossRef CAS PubMed.
- R. B. Best and G. Hummer, Protein folding kinetics under force from molecular simulation, J. Am. Chem. Soc., 2008, 130, 3706–3707 CrossRef CAS PubMed.
- S. Kumar, J. M. Rosenberg, D. Bouzida, R. H. Swendsen and P. A. Kollman, The weighted histogram analysis method for free-energy calculations on biomolecules. i. the method, J. Comput. Chem., 1992, 13, 1011–1021 CrossRef CAS.
- A. S. J. S. Mey, H. Wu and F. Noé, xtram: Estimating equilibrium expectations from time-correlated simulation data at multiple thermodynamic states, Phys. Rev. X, 2014, 4, 041018 Search PubMed.
- H. Wu, F. Paul, C. Wehmeyer and F. Noé, Multiensemble Markov models of molecular thermodynamics and kinetics, Proc. Natl. Acad. Sci. U. S. A., 2016, 113, E3221–E3230 CrossRef CAS PubMed.
- J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D. Chodera, C. Schütte and F. Noé, Markov models of molecular kinetics: Generation and validation, J. Chem. Phys., 2011, 134, 174105 CrossRef PubMed.
- M. K. Scherer, B. Trendelkamp-Schroer, F. Paul, G. Perez-Hernandez, M. Hoffmann, N. Plattner, C. Wehmeyer, J.-H. Prinz and F. Noé, PyEMMA 2: A Software Package for Estimation, Validation, and Analysis of Markov Models, J. Chem. Theory Comput., 2015, 11, 5525–5542 CrossRef CAS PubMed.
- G. Pérez-Hernández, F. Paul, T. Giorgino, G. DeFabritiis and F. Noé, Identification of slow molecular order parameters for Markov model construction, J. Chem. Phys., 2013, 139, 015102 CrossRef PubMed.
- M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess and E. Lindahl, Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers, SoftwareX, 2015, 1, 19–25 CrossRef.
- G. Bussi, D. Donadio and M. Parrinello, Canonical sampling through velocity rescaling, J. Chem. Phys., 2007, 126, 014101 CrossRef PubMed.
- M. Parrinello and A. Rahman, Polymorphic transitions in single crystals: A new molecular dynamics method, J. Appl. Phys., 1981, 52, 7182–7190 CrossRef CAS.
- B. Hess, H. Bekker, H. J. C. Berendsen and J. G. E. M. Fraaije, Lincs: a linear constraint solver for molecular simulations, J. Comput. Chem., 1997, 18, 1463–1472 CrossRef CAS.
- A. Hazel, C. Chipot and J. C. Gumbart, Thermodynamics of deca-alanine folding in water, J. Chem. Theory Comput., 2014, 10, 2836–2844 CrossRef CAS PubMed.
- S. Jo, D. Suh, Z. He, C. Chipot and B. Roux, Leveraging the information from Markov state models to improve the convergence of umbrella sampling simulations, J. Phys. Chem. B, 2016, 120, 8733–8742 CrossRef CAS PubMed.
- A. D. MacKerell, M. Feig and C. L. Brooks, Extending the treatment of backbone energetics in protein force fields: Limitations of gas-phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations, J. Comput. Chem., 2004, 25, 1400–1415 CrossRef CAS PubMed.
- W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey and M. L. Klein, Comparison of simple potential functions for simulating liquid water, J. Chem. Phys., 1983, 79, 926–935 CrossRef CAS.
- S. Röblitz and M. Weber, Fuzzy spectral clustering by PCCA+: application to Markov state models and data classification, Adv. Data Anal. Classif., 2013, 7, 147–179 CrossRef.
- C. Oostenbrink, A. Villa, A. E. Mark and W. F. VanGunsteren, A biomolecular force field based on the free enthalpy of hydration and solvation: the GROMOS force-field parameter sets 53a5 and 53a6, J. Comput. Chem., 2004, 25, 1656–1676 CrossRef CAS PubMed.
- H. J. C. Berendsen, J. R. Grigera and T. P. Straatsma, The missing term in effective pair potentials, J. Phys. Chem., 1987, 91, 6269–6271 CrossRef CAS.
- L. Uribe, J. Gauss and G. Diezemann, Comparative study of the mechanical unfolding pathways of α- and β-peptides, J. Phys. Chem. B, 2015, 119, 8313–8320 CrossRef CAS PubMed.
- We separate stretched configurations from unfolded configurations according to their end-to-end distance.
- J.-H. Prinz, B. Keller and F. Noé, Probing molecular kinetics with Markov models: metastable states, transition pathways and spectroscopic observables, Phys. Chem. Chem. Phys., 2011, 13, 16912–16927 RSC.
- A. Altis, M. Otten, P. H. Nguyen, R. Hegger and G. Stock, Construction of the free energy landscape of biomolecules via dihedral angle principal component analysis, J. Chem. Phys., 2008, 128, 06B620 CrossRef PubMed.
- F. Vitalini, A. S. J. S. Mey, F. Noé and B. G. Keller, Dynamic properties of force fields, J. Chem. Phys., 2015, 142, 02B611_1 CrossRef PubMed.
- S. Park and K. Schulten, Calculating potentials of mean force from steered molecular dynamics simulations, J. Chem. Phys., 2004, 120, 5946–5961 CrossRef CAS PubMed.
- J. Huang and A. D. MacKerell, Charmm36 all-atom additive protein force field: Validation based on comparison to NMR data, J. Comput. Chem., 2013, 34, 2135–2145 CrossRef CAS PubMed.
- R. B. Best, J. Mittal, M. Feig and A. D. MacKerell, Inclusion of many-body effects in the additive charmm protein cmap potential results in enhanced cooperativity of α-helix and β-hairpin formation, Biophys. J., 2012, 103, 1045–1051 CrossRef CAS PubMed.

This journal is © The Royal Society of Chemistry 2018 |