Maziar
Heidari
ab,
Vahid
Satarifard
ac and
Alireza
Mashaghi
*a
aLeiden Academic Centre for Drug Research, Faculty of Mathematics and Natural Sciences, Leiden University, Leiden, The Netherlands. E-mail: a.mashaghi.tabari@lacdr.leidenuniv.nl
bMax Planck Institute for Polymer Research, Mainz, Germany
cMax Planck Institute of Colloids and Interfaces, Potsdam, Germany
First published on 22nd August 2019
Physics of protein folding has been dominated by conceptual frameworks including the nucleation–propagation mechanism and the diffusion–collision model, and none address the topological properties of a chain during a folding process. Single-molecule interrogation of folded biomolecules has enabled real-time monitoring of folding processes at an unprecedented resolution. Despite these advances, the topology landscape has not been fully mapped for any chain. Using a novel circuit topology approach, we map the topology landscape of a model polymeric chain. Inspired by single-molecule mechanical interrogation studies, we restrained the ends of a chain and followed fold nucleation dynamics. We find that, before the nucleation, transient local entropic loops dominate. Although the nucleation length of globules is dependent on the cohesive interaction, the ultimate topological states of the collapsed polymer are largely independent of the interaction but depend on the speed of the folding process. After the nucleation, transient topological rearrangements are observed that converge to a steady-state, where the fold grows in a self-similar manner.
There have been a number of numerical and analytical studies on stretching globular homopolymers under a controlled constant force or a constant unwinding velocity.28–33 However, they all lack information on how the internal organization and structure of the polymer changes during folding to the final compact globular state. Here, we provide the first circuit topological mapping of a folding process and search for the determinants of fold topology during the folding process and of the final “native” structure. We ask whether and how constraints on the end of the molecules and cohesive interactions affect the fold topology. While the latter is important for understanding biomolecular folding, the former is also technically important as single-molecule pulling tools are emerging as key technologies for resolving folding processes (formation and disruption of contacts).34–36 These techniques work by applying constraints on the polymer ends which raises a critical question whether the constraint itself affects the folding process.
![]() | (1) |
![]() | (2) |
![]() | (3) |
The equilibrium positions of the first and last monomer are given by x10 = [−0.5L, 0, 0]T and xN0 = [xN0(t), 0, 0]T, respectively. The chain is initially equilibrated in the fully stretched (or coil) configuration for 104τ with xN0(t) = 0.496L. This corresponds to the initial distance between the two ends as xN0 − x10 = 0.996L (see Fig. 1 right panel or Fig. S2 in the ESI†). Then the position of the last monomer decreases by folding velocity vf, i.e. xN0(t) = 0.496L − vft. The folding process continues until the end-to-end distance reaches lee = 0.025L. At this point, it is ensured that the size of the formed globule is smaller than the end-to-end distance (see Fig. 1 right panel). Since the chain does not have bending elasticity, the persistence length is one monomer size, i.e. lp = σ. For all cases, the averages and the error bars are calculated over 10 independent simulation runs. The initial equilibration process with different random seeds ensures that the initial conditions of the folding processes of all trajectories are independent.
![]() | ||
Fig. 1 (left panel) Folding process of a chain whose initial condition is in coil configuration (FCC) is shown with three successive snapshots generated by VMD.37 (middle panel) The folding pathway of the FCC and RC in a topological space (SPX) is shown. (right panel) Nucleation and folding sequence of a restrained chain (RC) is illustrated at different end-to-end lengths (lee) when the folding speed is vf = 1 × 10−3σ/τ. The chains start to nucleate at lee = 0.726L and the region in proximity to the nucleating globule is marked by a dashed circle. The cohesive strength between the monomers is ε = 2.0kBT in both the FCC and RC. |
We define contact between two non-bonded monomers if their relative distance is <1.5σ. We analyzed the topology of the chains during the folding processes by categorizing the intra-chain contact arrangements as defined previously. In this so-called circuit topology approach, the pair-wise arrangement of contacts from a partially or fully folded polymer chain is categorized into different arrangement types namely, series (S), parallel (P) and crossing (X). As it is shown in Fig. 2, such an arrangement is analogous to the arrangement of elements in an electrical circuit. The topological fraction of each category is calculated by the number of loop pairs in that category divided by the total number of loop pairs.
![]() | ||
Fig. 2 Illustration of loops in three different topological states, series (S), parallel (P) and cross (X). The contacts are displayed by two red circles. |
It is worth mentioning that when the folding process occurs rapidly the monomers do not have enough time to relax and they form local separate compartments within the globule (Fig. 1 left panel) whereas such compartments disappear in a slow folding process as the monomers can diffuse within the globule and relax the structure (Fig. 1 right panel).
The effect of the out-of-equilibrium collapsing process on the internal structures is investigated by comparing the circuit topology of the resulting folded chains at different collapsing speeds. As shown in Fig. 4 for two cohesive strengths ε = 1.0, 2.0kBT, when the collapsing speed increases to vf = 10−2σ/τ, the topological fractions of the RC internal loops approach those of the FSC (solid and dashed lines). As displayed in Fig. 4d, the number of contacts within the globules having a larger cohesive strength, ε = 2.0kBT, is higher than when ε = 1.0kBT. This is expected since in the collapsed globule with ε = 2.0kBT, the attractive forces are larger, and it is more probable to find two monomers within the contact region (1.5σ). Furthermore, the number of contacts of the RC is approximately equal to that of the FSC and does not change by varying the folding velocities. This implies the necessity of the topological arrangement as a piece of extra information to distinguish between different globular structures having the same number of contacts.
![]() | ||
Fig. 4 Topology fractions of series (a), parallel (b) and cross (c) loops of the restrained globules against folding speeds vf. In each panel, the circles and squares correspond to cohesive strength ε = 1.0kBT and 2.0kBT, respectively and the solid and dashed black lines represent the topological fractions obtained from freely collapsed chains (see Fig. 2). The number of contacts in the globules is shown in panel (d). |
The evolution of the internal structures of the globules against the end-to-end distance lee is shown in Fig. 5 for slow (panel a) and high (panel b) folding speeds. In both folding speeds when the chain is in the elongated regime (lee/L > 0.8), transient (entropic) loops appear along the chain, thus occupying the serial topology. Due to the chain elongation, the transient loops do not collide leading to zero fractions of parallel and cross topology. At the nucleation length (lnuc), the chain starts to nucleate and form its primary collapsed structure. The topological states of the loops within the nuclei are different from the transient loops leading to sudden drops in the serial topological fraction and rises in the fractions of the parallel and cross topology. This event confirms that within the nucleus, the dominant topological classes of the loops are parallel and cross; this is similar to the the topological changes associated with the formation of secondary structures (i.e., alpha helices and beta-sheet) during protein folding processes. Then the nucleus grows into a larger globule as the chain's moving end approaches the other end. During the growth process, while the monomers are added into the globule with the rate of the folding velocity vf, the topological states of the globule are preserved. Additionally, as it is shown in Fig. 3 for the slow folding speed, within the statistical error bars, the topological fractions of the final structures of the collapsed chains converge to the same values. This is interesting because although for the chain having a larger cohesive strength, the onset of the nucleation is earlier, this does not affect the final topological states of the globules. The self-similar circuit of the loops and the corresponding sizes after the nucleation events are also shown in Fig. S5 of the ESI.†
![]() | ||
Fig. 5 Topology fractions of the loops against end-to-end distance lee for different cohesive strengths (ε) and when the folding velocity is vf = 0.001σ/τ (a) and vf = 0.01σ/τ (b). |
To analyze the statistics of the loop size, we computed the probability of contacts Pc(s) as a function of monomer distance s (loop length) along the contour length of the chain (Fig. 6). A universal decay with scaling Pc ∼ s−1 is observed within the intermediate distance interval (approximately 10–100σ) for all globules, resembling the fractal-like structure.41 The contact probability distribution of the FCC and the FSC also follows the same decay (see Fig. 6 in the ESI†). It has been proven that for a network of interconnected chain, the configurational weight of the loops obeys power-law decay ∼s−α whose exponent α is universal and it is dependent on the topology of the loops, i.e. the number of emerging strands from the loop.43 Such a universal property has been used to study analytically the RNA translocation through nanopores44 and thermodynamics of RNA molecules close to folding transition.45
The size of the loops formed in each topological state can be quantified by calculating the contact orders. The contact order of two loops with topology of i is calculated by , where Ni is the number of double loops, which are categorized in the topological state i, and ΔL1i and ΔL2i are the monomer separation of each loop and L is the chain contour length. The contact orders of each topological sets in the course of folding are shown in Fig. 7 when the cohesive strength is ε = 2kBT and folding speed is vf = 0.001, 0.01σ/τ. As has also been found for the compact structure of the chain under spherical confinement,13 there is a universal trend i.e., COs < COp < COx. The reason is that the loop pairs in S topology are locally formed along the chain and the contour distance between the loop pairs is not important in the contact order. However the loop pairs in P and X are the result of nonlocal contacts of the chain segment that subsequently leads to the formation of large loops. The non-equilibrium effect of folding speed is also reflected by the decrease in the size of loops of all topological states. This is the consequence of the compartmentization of monomers in the globule structure and the emergence of large local domains in which the loops are mainly formed by the contacts between the monomers having a small distance along the contour length (see Fig. 1).
To specify the nucleation length at which a stable globule is formed and then grows, we computed the number of contacts during the folding process for all trajectories. The average 〈nc〉 and variance 〈nc2〉 − 〈nc〉2 of ten trajectories are plotted in Fig. 8. For all cohesive strengths, in the extended regime (lee/L > 0.8) the transient entropic loops are formed along the chain, and since the looping probability is proportional to the length L − lee, there is an increasing trend as the ends of the chain are closing (Fig. 8a). Such an increasing trend is observed in all trajectories and thus the variance becomes negligible (Fig. 8b). When lee approaches the vicinity of the nucleation length lnuc, the chain starts to nucleate and subsequently, there is an abrupt rise in the number of contacts. The uncertainty of the nucleation length in each trajectory is captured by the large change in the contact number variance. After the nucleation lee > lnuc, as the nucleus starts to grow to a larger globule, the number of internal contacts increases linearly with 1 − lee/L while the variance of the contacts among the trajectories becomes negligible.
To determine the nucleation length, the contact variations are fitted with Gaussian curves (solid curves in Fig. 8b). The means and standard deviations which are considered as the error bars of the Gaussian fits are plotted in Fig. 9.
As is described in the following, we build a thermodynamic model involving the internal energy and entropy of the chain during the folding and then we try to investigate the nucleation events and explain the observed trend in Fig. 9. It is supposed that we have a freely rotating LJ chain of length L whose cohesive inter-monomer interaction is ε. The chain is initially stretched along the x-direction, and the two ends of the chain are fixed. Then, one of the chain's ends approaches the other end along the x-axis by a constant velocity vf. If the distance between both ends reaches lnuc, the chain starts to nucleate. For a very slow speed, the process can be considered as quasi-static and close to equilibrium and then we can write the equilibrium free energy difference between the nucleated and nonnucleated states as follows:
![]() | (4) |
ΔE = −ε(γVNG + γSNG2/3). | (5) |
To calculate the entropy difference in the nucleation event (ΔS in eqn (4)), it is required to formulate the configuration entropy of the chain. In this respect, we neglect the chain cohesive interaction and assume it as a self-avoiding chain (SAC) instead, i.e. non-bonded monomers only interact through the repulsive part of the LJ potential (i.e. ε = kBT, the cut-off radius is set at Rij = 21/6σ and the potential is shifted by ε). Then we use Jarzynski's equality46,47 to relate the free energy difference between the chain's fully stretched state and the state at which the chain's end-to-end distance is lee to the work required to transit the chain between the two states. We use the same constraining harmonic potential as in eqn (3). The work required to contract the chain with a constant velocity vf is calculated by48
![]() | (6) |
e−ΔF/kBT = 〈e−W/kBT〉. | (7) |
![]() | (8) |
In the slow contraction speed, i.e., the quasi-static process, the chain remains close to the equilibrium state during the contraction process. In this case, one can neglect second and higher cumulants. Therefore, the equilibrium free energy is equal to the thermal average of work. Since in the SAC, the interaction of the monomers is short-ranged and repulsive, only the chain's configurational entropy contributes to the free energy, ΔF = −TΔS. The entropy of the fully stretched chain is zero because, in this state, the chain has only one configuration and we set it as the reference, i.e., F(lee) = −TS(lee). The free energy profile (FEP) of the SAC when vf = 10−3σ/τ is shown inFig. 11. Additionally, we calculate the FEP when the chain is being stretched. As shown in the inset, the FEPs in both directions of the contraction and stretching are approximately equal (there is maximum 20kBT deviation for the interval lee/L < 0.1 which is not under examination). This implies that under the velocity vf = 10−3σ/τ, the contracting/stretching process is quasi-static and reversible for lee/L > 0.1. It is worth mentioning that the contraction time scale τc = 1000τ is also comparable with the slowest relaxation time obtained from the Rouse model, τp = ζN2σ2/3π2kBT.49–51 In the Rouse model, the chain is assumed to be ideal and given the length of N = 1000 and the friction coefficient of the surrounding solvent ζ = 10−1 (this coefficient is obtained from the ratio of the monomer mass to the damping coefficient used in the Langevin thermostat, i.e. m/λ39), the slowest relaxation time becomes τp ≈ 3370τ. In addition to the SAC, we calculated the free energy of an ideal chain (IC). I.e. ULJ = 0 for non-bonded pairs. In the extended regime lee/L > 0.5, the excluded volume interaction is negligible. Thus FEPs of the IC and SAC follow similar trends. However, when the chain's ends approach closer lee/L ≲ 0.4, the self-avoiding interaction becomes more apparent, and FEPs deviate. We approximate the FEPs by the following polynomial functions:
![]() | (9) |
In the nucleation event, the entropy of the chain is given by . After the nucleation event, we assume that the tails of the chain are stretched, i.e. NG = L − lee, and there is only the contribution of nucleus translational entropy along the chain
. Thus, we can rewrite the free energy difference (eqn (4)) as
![]() | (10) |
Fig. 12 shows the free energy change of the chain in the nucleation events against lee for different strengths of cohesive interaction. As it is expected from the classical nucleation theory, there is a free energy barrier in the extended regime due to the interplay between the internal energy of nuclei and the chain's configurational entropy change upon nucleation. The nucleation lengths are obtained by letting and plotted in Fig. 9. Similar to the simulation results, there is a monotonic increase in the nucleation length as ε increases. However, quantitatively, there is a mismatch which originated from the simplification and assumption we used to obtain the chain entropy and nucleation energy.
In our analysis, the strength of the interaction energies appears as the main determinant of the folding pathway in the space of fold topologies. Importantly, we find that the interaction strength determines the onset of nucleation events. Our observation has also been justified using a thermodynamic model which is built on the chain internal energy and entropy. The interaction energy affects the native state topology as well. We noticed that by increasing the interaction energies, the total number of contacts increases and the series arrangement is slightly promoted in the final folded state.
We note that our model ignores the complexity of linear chains found in nature including biological molecules (e.g., proteins and nucleic acids). We however believe that despite its simplicity, the model enables us to reveal a generic topological picture of a chain folding process. The coarse-grained polymer model we used in this work can represent a mean-field picture of a protein in which the monomers of the chain represent a group of residues with uniform interaction. The protein folding problem can be seen at different levels of coarse-graining; at each level, the onset of folding often refers to the emergence of a nucleus which is composed of either large unstructured loops or partially formed secondary structures.52 In the future, our study can readily be extended to include more complex models. Furthermore, our topological analysis was limited primarily to the frequency changes in basic topological arrangements i.e., X, P and S. Circuit topology matrices however include additional information which could be relevant.6 Finally, our predictions however are theoretical and thus call for experimental validation. The experimental validations of our findings will be considered in our future studies.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c9cp03175h |
This journal is © the Owner Societies 2019 |