Time-dependent communication between multiple amino acids during protein folding

Cooperativity is considered to be a key organizing principle behind biomolecular assembly, recognition and folding. However, it has remained very challenging to quantitatively characterize how cooperative processes occur on a concerted, multiple-interaction basis. Here, we address how and when the folding process is cooperative on a molecular scale. To this end, we analyze multipoint time-correlation functions probing time-dependent communication between multiple amino acids, which were computed from long folding simulation trajectories. We find that the simultaneous multiple amino-acid contact formation, which is absent in the unfolded state, starts to develop only upon entering the folding transition path. Interestingly, the transition state, whose presence is connected to the macrostate cooperative behavior known as the two-state folding, can be identified as the state in which the amino-acid cooperativity is maximal. Thus, our work not only provides a new mechanistic view on how protein folding proceeds on a multiple-interaction basis, but also offers a conceptually novel characterization of the folding transition state and the molecular origin of the phenomenological cooperative folding behavior. Moreover, the multipoint correlation function approach adopted here is general and can be used to expand the understanding of cooperative processes in complex chemical and biomolecular systems.


Introduction
Biomolecular assembly, recognition and folding are complex processes in which building blocks, such as amino acids in proteins, search for favorable inter-or intra-molecular interactions in intricate manners. [1][2][3] Cooperativity has been recognized to be a key concept associated with these processes. [4][5][6] However, cooperativity in macromolecular systems is typically described at a phenomenological, macrostate level, and is broadly dened as a characteristic of processes in which intermediate states are disfavored, i.e., only the extreme states are signicantly populated. Such all-or-none behavior, corresponding to switching between "on" and "off" states, is critical in regulation and signaling to avoid undesirable effects. The all-or-none character in ligand binding-receptor binding sites are either empty or fully occupied-is the basis for the Hill equation, which provides a commonly adopted measure of cooperativity. 7 The cooperativity concept in protein folding was also introduced at the macrostate level, 8 conveying that folding proceeds in a twostate, all-or-none fashion.
Such a macrostate cooperativity concept, however, does not reveal underlying molecular mechanisms. In this regard, we notice that the cooperativity between two events A and B can in general be captured by the correlation, c ¼ P(A, B) -P(A)P(B), dened in terms of the joint probability P(A, B) and the product P(A)P(B) of the probabilities of individual events: 9,10 c > 0 or c < 0 corresponds to positive or negative cooperativity, respectively. For example, when A and B refer to ligand binding events to receptor sites i and j, c > 0 indicates that the conditional probability P(B/A) ¼ P(A, B)/P(A) is larger than P(B), i.e., the ligand binding to site i enhances the binding affinity to site j from what it would be in isolation. Thus, the cooperativity formulated with c is able to uncover the existence of a certain communication between molecular events occurring at distinct sites (the term "communication" is used here only in this sense, i.e., when the correlation or cooperativity quantied by c s 0 is present). Owing to the recent advances in experimental and computational technologies, the folding transition path that was previously inaccessible has now become within our reach. [11][12][13][14][15][16] The folding transition path is a small fraction of equilibrium folding trajectories where the folding process actually takes place. The transition path thus contains, in principle, all the mechanisms of protein folding, and there must be certain concerted molecular processes that underlie the macrostate folding cooperativity.
Here, we investigate the folding cooperativity through the correlation c dened with microscopic events occurring in the transition path. This is done for a number of small globular proteins displayed in Fig. 1 (see also Table S1 †), whose all-atom simulations were reported by Shaw and coworkers. [17][18][19][20] Since protein folding requires the establishment of native amino-acid contacts, we will choose the formations of those contacts as the relevant microscopic events. Of particular interest in the present work is the timing (early, intermediate, or late stage) at which the cooperativity sets in during the transition path. To achieve this goal, c(t) carrying the time-dependence shall be introduced, which hence probes time-dependent cooperativity or communication between amino acids. Thereby, we would like to address how and when the folding process is cooperative on a molecular scale. We will then argue how such microscopic cooperativity is connected to the emergence of the macrostate cooperative folding behavior.

Results
We start from surveying the folding behavior of the systems studied here. To succinctly describe our results, we will mainly deal with the a-helical villin headpiece subdomain (HP-35) in the following; the results for the b-sheet WW domain (FiP35) are also included in the main text, and those for the other eight systems are presented in Fig. S1 to S8. † The folding process is monitored by the fraction of native amino-acid contacts Q (0 # Q # 1), which was reported to be a good reaction coordinate of folding. 21 We computed Q(r(t)) for each protein conguration r(t) along the trajectory ( Fig. 2A), and constructed the probability distribution P(Q) of sampled Q(r(t)) values. The folding reaction free energy prole is then obtained from F(Q) ¼ Àk B T log P(Q) with Boltzmann's constant k B and temperature T (Fig. 2B). It is observed that the system stays most of the time either in the folded or unfolded state ( Fig. 2A) and that the unfolded-(Q u ) and foldedstate minima (Q f ) are separated by a transition-state maximum (Q*), whose locations are indicated by the dashed lines (Fig. 2B). These results represent a typical two-state behavior in the sense of the original, macrostate cooperativity concept.
The transition path is a portion of the trajectory that starts from an unfolded conguration (Q(r) < Q u ) and ends at a folded one (Q(r) > Q f ) without recrossing the Q ¼ Q u line. To detect cooperativity among multiple amino acids, we introduce a timedependent correlation, c ij;kl ðtÞ ¼ s ij ð0Þs ij ðtÞs kl ð0Þs kl ðtÞ À s ij ð0Þs ij ðtÞ hs kl ð0Þs kl ðtÞi: (1) Here, the time t is measured relative to the beginning of the transition path (i.e., Q(r(t)) ¼ Q u at t ¼ 0); s ij (t) is equal to 1 when there is a contact between a pair of amino acids i and j at time t, and equal to À1 otherwise; s ij (0)s ij (t) therefore varies from 1 to À1 when a contact absent at time t ¼ 0 is formed at time t; and the angular brackets denote an average over the congurations at t ¼ 0. By denition, c ij;kl (t) ¼ 0 when the contact formations of (i, j) and (k, l) amino-acid pairs occur independently. Therefore, c ij;kl (t) > 0 indicates the existence of positive cooperativity between (i, j) and (k, l) amino-acid pairs at time t. We also introduce cðtÞ ¼ ð1=NÞ P ði;jÞ;ðk;lÞ c ij;kl ðtÞ averaged over all the pairs forming native amino-acid contacts, with N denoting the number of those pairs, which is a measure of an overall strength of the cooperativity present in a protein at time t. The timedependent correlation c(t), when viewed as a multipoint timecorrelation function, is an analog of the dynamic susceptibility used for probing cooperative dynamics in glass-forming supercooled systems. [22][23][24] We computed c(t) for the transition path (cyan curve in Fig. 2C) by averaging over all the transition paths identied in each system. We also computed c(t) for the unfolded state (magenta curve in Fig. 2C) using the trajectory parts that are close to Q ¼ Q u (painted magenta in Fig. 2B). We nd that, while c(t) for the unfolded state remains small at all the times, the one for the transition path develops a signicant peak. We conrmed that the peak indeed originates from the correlation of distinct amino-acid pairs by comparing the diagonal ((i, j) ¼ (k, l)) and off-diagonal ((i, j) s (k, l)) contributions to c(t) (Fig. S9 †), to be denoted as c diag (t) and c off-diag (t) in the following. Thus, the growth of the amino-acid correlation is a distinguishing characteristic unique to the transition path. Such a behavior of c(t) as a function of time closely resembles that of a microscopic measure of "thermodynamic cooperativity" versus temperature, 25 and the cooperativity described by c(t) may be termed the dynamic cooperativity. Our observation is also consistent with the recent NMR measurements demonstrating that the amino acids forming key contacts in the transition state interact not simultaneously in the denatured state. 26 Here, a digression might be useful to better understand the nature of c(t) ¼ c diag (t) + c off-diag (t) since a peak in c(t) may arise from a trivial reason, i.e., just from the fact that a number of amino-acid contacts are formed roughly at the same time (in fact, the folding occurs within quite a short duration of time as can be inferred from Fig. 2A). We introduce a simple random model in which amino-acid pair contact formations are assumed to occur at random, Gaussian distributed times about the middle of the transition path. We nd that c(t) of this model exhibits a peak whose height is about 1. However, since this model does not incorporate any correlations between distinct amino-acid pairs, such a peak entirely reects the "self" term, i.e., c(t) z c diag (t) z 1 and c off-diag (t) z 0 ( Fig. 3A and B). Thus, the mere presence of a peak in c(t) does not warrant the existence of cooperative processes. We next consider an extended model in which correlations (characterized by the correlation coefficient r) are imposed between contact formation times of n amino-acid pairs. This model can be implemented by using the n-variate Gaussian distribution. 9 (We notice that n ¼ 1 corresponds to the random model.) We nd for the model of r ¼ 0.9 that, whereas c diag (t) remains the same as that of the random model, the peak of c off-diag (t) increases linearly with n and that the peak height of c(t) provides a very rough estimate of the average number of correlated contact pairs (Fig. 3C to F). Thus, c(t) conforming to c off-diag (t) [ 1, which holds in the protein systems studied here (Fig. S9 †), indeed indicates the presence of highly cooperative amino-acid contact formation.
Interestingly, we nd that the time at which the amino-acid cooperativity attains its maximum size corresponds to the time when the system crosses the transition state. Not only can this be identied in Fig. 2C, but it can be also observed in the corresponding gures for the other systems, in which the average time t* the transition state is reached at (i.e., Q(t*) ¼ Q* with Q(t) ¼ hQ(r(t))i) is indicated by the vertical dashed line. This implies that the transition state can be characterized as the state in which the amino-acid cooperativity is maximal. To further corroborate this observation, we plotted c(t) as a function of Q(t) with t as a parameter. The resulting c(Q(t)) prole is shown and compared with the free energy prole F(Q) in Fig. 4A and B. We nd that c(Q(t)) closely traces F(Q) not only in the transition-state region (Q ¼ Q*), but also in the whole Q range (Q u # Q # Q f ) it is dened (Pearson's correlation coefficient is R ¼ 0.93; corresponding results for the other systems are shown in Fig. 4C, D and in Fig. S10 †). This is a nontrivial result since c(Q(t)) is purely a dynamic quantity, and provides evidence demonstrating that the macrostate, thermodynamic cooperativity (brought about by the presence of the transition-state barrier) is connected to the microscopic, dynamic cooperativity (characterized by c(t)).
The element-wise correlation c ij;kl (t) at t ¼ t* (Fig. 2D) quanties the strength of communication between individual amino-acid pairs. To facilitate its visual understanding, we present in Fig. 5A network representations of protein congurations during the transition path. In the upper section, the vertices (yellow circles) refer to amino acids and the edges (black lines) represent the formation of native amino-acid contacts; the folding process implies an increase in the number of black edges. In the lower section, the vertices and edges are colored cyan when c ij;kl (t) > 0.3 for those amino acids in (i, j) and (k, l) pairs (this criterion was chosen since such large amino-acid correlation is barely observed in the unfolded state, as shown in Fig. S11 †). The growth of the amino-acid correlation toward the transition state and its subsequent diminution are clearly visible in the network graphs.
Further insights into the amino-acid cooperativity, which are smeared in c(t) aer summing over all the pairs, can be gained through the analysis of individual c ij;kl (t) elements. For example, c(t) can be decomposed into the main-chain and sidechain contributions by examining which of the main-chain and side-chain contacts is mainly involved in the (i, j) and (k, l) amino-acid pairs, and we nd that the magnitude of those contributions is comparable (Fig. S12 †). The peak time t * ij;kl for each c ij;kl (t) element can also be introduced. We observe that t * ij;kl values are dispersed around the average peak time t* (Fig. S13 †). Again, this is a dynamical analog of the thermodynamic transition in which residue-dependent variations were identied in the transition midpoint temperature. 27

Discussion
The fact that the folding transition state can be characterized as the state of maximum cooperativity is, to the best of our knowledge, a novel view. However, it is in fact quite natural once the existence of such cooperativity is cognized. This is because the protein congurations exhibiting the maximum internal correlations will be the ones with the lowest probability of forming spontaneously. This new view in turn implies that the transition state barrier height should be an increasing function of the strength of the cooperativity. This is indeed the case as demonstrated in Fig. 4, which connects the microscopic cooperativity (characterized by c(t)) and the macrostate two-state folding cooperativity (brought about by the presence of the transition-state barrier in F(Q)).
Our current view of protein folding owes much to the funneled energy landscape perspective. [28][29][30] This perspective asserts that, in order to resolve Levinthal's paradox, 31,32 folding should not be a random conformation search; it must be energetically biased. However, the landscape perspective does not provide a clear picture of the transition-state barrier responsible for the emergence of cooperative two-state folding: the barrier is ascribed as being due to a "mismatch" between the energy gain and the entropy loss at the middle of the funneled landscape. 33 As we argued here, the folding transition state comes out naturally as the state of the maximum microscopic cooperativity by realizing that the amino acid contact formation is not a random process, but occurs on a multiple-interaction basis. In this sense, the new view for the folding transition state represents an extension of the landscape perspective.
While native contacts are of primary interest in protein folding studies, non-native contacts can in principle contribute to the time-dependent amino-acid cooperativity discussed in the present work. This is because c ij;kl (t) dened in eqn (1) is invariant under the sign change, s ij (t) / Às ij (t): s ij (0)s ij (t) varies from 1 to À1 not only when a contact absent at time t ¼ 0 (s ij (0) ¼ À1) is formed at time t (s ij (t) ¼ 1), but also when a contact present at time t ¼ 0 (s ij (0) ¼ 1) is broken at time t (s ij (t) ¼ À1). Therefore, if there exist a number of non-native contacts that are highly populated in the unfolded state but are broken during the folding process, they would contribute to c(t). For the systems studied here, the existence of highly populated nonnative contacts was not detected, and we cannot illustrate such a possibility. Nevertheless, it is important to realize that the amino-acid cooperativity does not necessary refer to the formation of contacts; the breaking of contacts can also occur cooperatively.
Finally, we present a possible experimental method for the detection of the cooperative contact formation of multiple amino acids by using a kind of Kirkwood relation that connects uctuations and response. For this purpose, we introduce a twopoint time correlation function F(t) ¼ hq(t)i with qðtÞ ¼ ð1=NÞ P ði;jÞ s ij ð0Þs ij ðtÞ. This function describes how on average the native contacts are being formed as the folding proceeds. The multipoint function c(t) capturing the timedependent cooperativity can be written as the uctuations around the average folding dynamics: c(t) ¼ Nhdq(t) 2 i in which dq(t) ¼ q(t) À hq(t)i. Let us introduce a susceptibility dened as the response of F(t) to a perturbation 4 (such as a change in temperature): c 4 (t) ¼ vF(t)/v4. It was demonstrated for dielectric and density uctuations in glass-forming systems that c 4 (t) 2 exhibits essentially the same dynamics as c(t). 34 Since the average function F(t) is intimately related to the "shape" function of the transition path that is now experimentally accessible, 35 measuring c 4 (t) by varying experimental conditions will provide experimental evidence of the microscopic cooperativity in protein folding.

Conclusions
Cooperativity in complex systems is typically described at a macrostate level, and its characterization in molecular terms has been very challenging. In the present work, we succeed in identifying time-dependent cooperativity among multiple amino acids concealed in the folding transition path, and argue how it might be connected to the macrostate cooperative behavior. The use of the multipoint correlation functions is essential in this regard, since a cooperative nature of uctuating processes occurring at two distinct sites cannot be disclosed by conventional, two-point correlation functions. Since cooperativity pervades complex biological phenomena-the most notable example being allostery 36 -the multipoint correlation function approach will bring out novel microscopic insights into those complex processes.

Author contributions
S.-H. C. and S. H. designed the research, conducted the research, and wrote the manuscript.

Conflicts of interest
There are no conicts to declare.