Open Access Article
Patrick Senet
*a,
Adrien Guzzob,
Patrice Delaruea,
Christophe Laforgea,
Gia G. Maisuradze
c,
Jean-Marie Heydela,
Fabrice Neiersa and
Adrien Nicolaïa
aLaboratoire Interdisciplinaire Carnot de Bourgogne ICB, UMR 6303, Université Bourgogne Europe, CNRS, F-21000 Dijon, France. E-mail: psenet@ube.fr; Fax: +33 (0)3 80396132; Tel: +33 (0)3 80396130
bINSERM U1903 CAPS, Université Bourgogne Europe, Dijon, France
cBaker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, New York 14853, USA
First published on 16th December 2025
Proteins populate dynamic ensembles, yet how temperature and mutations reshape these ensembles remains poorly understood. We introduce a local entropy metric that assigns each residue a Shannon entropy based on a graph-derived map of accessible substates, providing a continuous measure of structural complexity across folded, unfolded, and intrinsically disordered states. In molecular dynamics simulations of the fast-folding gpW protein, the average local entropy exhibits a sharp transition near the melting point. Residue-specific entropy curves cluster into distinct unfolding categories and reveal that the apparent unfolding transition depends on the spatial scale used to describe amino-acid environments. We further show that local entropy captures features that differ markedly from other residue-level measures of structural fluctuations, such as the accessible volume (and the associated packing entropy), which is correlated with B-factors and primarily reflects the hydrophobic effect. In simulations of α-synuclein, an intrinsically disordered protein, local entropy varies strongly along the sequence at physiological temperature and resembles that of gpW near its melting point. Parkinson's-disease mutations in α-synuclein locally reduce entropy while also perturbing distant regions including P1, P2 and NAC segments implicated in fibril formations. These results highlight how temperature and subtle perturbations—such as single-residue changes—remodel conformational ensembles. Local entropy correlates with NMR observables and provides a generalizable framework for quantifying disorder, with broad potential applications beyond protein science.
In contrast, unfolded proteins, intrinsically disordered proteins (IDPs), or intrinsically disordered regions (IDRs)18–21 cannot be accurately represented by a single structure, as a large ensemble of accessible states contribute significantly to their free-energy. The classification of IDRs and IDPs primarily relies on functional features related to recurring local sequence properties – such as linear motifs or molecular recognition elements associated with intermolecular interactions.22 Computational analyses of large conformational ensembles of IDRs and IDPs focus on clustering full-length conformations based on structural parameters.23,24 Because amino-acid composition strongly influences both protein disorder25,26 and dynamics,27 global structural features of IDRs are often correlated with their sequence.28 At the local scale, however, unfolded proteins and IDPs display dynamic structural organization, as shown by NMR and single-molecule spectroscopy.29
In reality, the distinction between the structural representations of folded and disordered proteins is not as clear-cut as often assumed. Even folded proteins exist as ensembles of conformations fluctuating around a well-defined structural state, whereas disordered proteins populate a much broader and more heterogeneous ensemble. A single representation of a folded protein structure does not adequately reflect the dynamic landscape of its local conformational substates.30 A unifying characteristic of protein conformations is the presence of short-range structural order around each amino acid, which varies continuously with thermodynamic parameters such as temperature and pH. This continuum is evident during thermal denaturation, where the local structural changes occur progressively with temperature. Below 240 K,31 local structural fluctuations are harmonic and can be represented by stationary motifs at short and long distances, enabling the identification and definition of recurrent patterns6–17 such as secondary structural elements.6–8 At physiological temperatures, proteins behave as surface-molten solids: surface-exposed (hydrophilic) residues undergo thermal fluctuations, while core (hydrophobic) regions retain relatively stable structures.32 Above the melting temperature—or in intrinsically disordered states—all residues fluctuate within dynamic micro-environments resembling molecular liquids, as demonstrated by NMR and single-molecule studies.30
In this work, we show that analyzing ensembles of short-range structures around each amino acid under varying conditions enables the quantification of local disorder along the protein sequence. This is achieved by computing a local entropy derived from a protein graph (PG). Replacing molecular geometries with graph representations—where amino acids are nodes and edges denote geometric relationships—have long been employed in protein science33–35 to facilitate the detection of recurrent local patterns in databases of single-structure folded proteins. Since then, graph-based models have become powerful tools for analyzing protein structure, dynamics and function.36–52
Accurately computing the entropic contribution to the free energy of protein conformations remains a long-standing challenge.53–67 Since the early days of protein science,53–56 various methods have been developed to approximate this contribution by decomposing entropy into local components,65,68–70 based on local structural properties—for example, the NMR order parameter S2,71–73 atomic coordinate fluctuations within the (quasi)-harmonic approximation,53,55,56 backbone and side-chain torsion angles,70 or the amino acid packing fraction.65 In the present work, we do not aim to calculate this thermodynamic quantity directly. Instead, we employ Shannon entropy to quantify the degree of intrinsic disorder in the protein backbone, beyond local conformational descriptors such as Ramachandran angles or residue accessible volume. In this framework, local entropy serves as a measure of structural diversity within the interaction network surrounding each residue. Our approach differs from previous site-resolved entropy estimations by explicitly incorporating residue–environment interactions, thereby providing a more integrative and context-sensitive description of local conformational variability during protein folding and unfolding. Finally, by comparing our definition of local entropy with the packing entropy65—which is known to correlate strongly with site-resolved entropies derived from quasi-harmonic approximations and B factors65—we demonstrate both the differences and the complementarity between these two measures of structural entropy.
The present approach is applicable to both folded and unfolded proteins, including intrinsically disordered proteins and intrinsically disordered regions. Entropy plays a central role in IDP function, binding and aggregation.63,66 We first examined how local entropy evolves during thermal unfolding using all-atom molecular dynamics (MD) simulations of the W protein of bacteriophage lambda (gpW, PDB ID: 2L6Q).74 This 62-residue polypeptide adopts a folded structure comprising two α-helices (residues 4–19 and 40–54) stacked above a β-hairpin (residues 23–28 and 31–36),74 and folds via a downhill mechanism.74,75 Experimental local unfolding curves for gpW were inferred from temperature-dependent chemical shifts of atomic probes.76 These NMR measurements revealed abrupt chemical shift changes near the melting temperature, supporting their interpretation as local heat-induced denaturation curves.76 This behavior, along with the weak cooperativity of gpW's folding/unfolding transition, was successfully reproduced by all-atom MD simulations.76 In earlier work, we showed that these local denaturation curves could be captured using a coarse-grained two-state (folded/unfolded) model based on Cα–Cα pseudobond angles.77,78 Here, we demonstrate that local entropy varies along the sequence in the folded state, including within secondary structural elements, and acts as an order parameter for the unfolding phase transition, correlating well with experimental Cα chemical shifts.
Second, using coarse-grained MD simulations,79 we show that local entropy is heterogeneously distributed along the sequence of α-synuclein, a prototypical IDP. To enable comparison between the simulation results for local entropy and experimental data for wild-type α-synuclein, we selected two residue-level disorder descriptors derived from chemical shifts: the 13Cα secondary chemical shift80,81 and the Chemical Shift Z-score used to assess order/disorder.82 A detailed analysis of the three descriptors reveals both common features—when averaged over the three regions of the protein—and key differences in their local behavior. Finally, we show that local entropy is sensitive enough to detect subtle changes in the conformational ensemble induced by single-point mutations (A30P, E46K, and A53T) in α-synuclein, including long-range effects potentially linked to aggregation.83–85
For each conformation in the protein ensemble, we construct a corresponding PG. In each PG, we define a protein subgraph (sPG) centered on a selected node by including all nodes within a graph distance D from it. For D = 1, the sPG comprises the immediate neighbors of the central node, along with all edges connecting them. For D = 2, second-nearest neighbors in the PG are also included, thereby extending the structural context. In this study, we analyze sPGs with D = 1 and D = 2, which represent the local micro-environment of a residue, encompassing its first and second neighbors on the graph, respectively. Throughout the text, each graph node is identified by its sequence position and the corresponding amino acid name.
The local entropy Sk is then calculated for each residue at position k along the sequence, based on the ensemble of its sPGs, using the Shannon entropy definition:
![]() | (1) |
As predicted by the Boltzmann formula, the maximum value of Sk is constrained by the number of its accessible micro-environments in the ensemble, and equals:
| Sk,max = ln(nk) | (2) |
The maximum value of nk is the number N of protein conformations in the ensemble. In the main text, we report normalized entropy values of Sk/Smax where Smax = ln(N). The local entropy Sk defined by eqn (1), is dimensionless. In information theory, entropy is typically expressed in natural units (nats). When interpreted as a thermodynamical quantity with kB = 1, a value of S = 1 corresponds to an entropy of 1.987 cal mol−1 K−1, equivalent to a free-energy contribution of −0.616 kcal mol−1 at T = 310 K, for example.
In a sPG, permuting the node labels alters the amino acid sequence of the protein; thus, graph homomorphisms are not treated as equivalences. The probabilities pi used in the entropy calculation are obtained by identifying automorphic sPGs—that is, structurally identical subgraphs across conformations. The local entropy values of Sk are computed using the NetworkX Python library.86 For simplicity, we denote local entropy as S instead of Sk in the remainder of the text.
In the main text, local entropy values were computed from 100
000 randomly selected snapshots from each MD trajectory and from SAW ensembles of the same size. As illustrated in the SI using wild-type α-synuclein as an example (Fig. S2), this sampling strategy provides sufficient statistical accuracy while substantially reducing computational cost compared to full-ensemble analyses.
![]() | ||
Fig. 1 Local entropy S at a graph distance D = 1 (upper panel) and D = 2 (lower panel) is shown as a function of the amino acid sequence of gpW at various temperatures, calculated from all-atom MD simulations for R = 8 Å. Normalized values of the local entropy S/Smax are presented with Smax = ln(100 000) = 11.513 (eqn (2)). Lines are provided as guide for the eye. The thick black solid line corresponds to results at T = 280 K (native, folded state). Symbols indicate residue properties: red for positively charged amino acid, blue for negatively charged ones, black for glycine, triangles for residues in β-sheets, and empty black circles for all other cases. Gray shading highlights regions corresponding to α-helices. Results at the experimental melting temperature Tm = 330 K are shown as a thick red line. Thin red dotted lines represent results at T = 325 K and 335 K. Gray dotted lines show results at increasing temperatures, from bottom to top: T = 280, 285, 290, 295, 305, 310, 315, 320, 325, 335, 340, 355, 380 K. The black dashed line represents the local entropy computed from a SAW ensemble representing for a chain of 62 atoms. | ||
We begin by examining the variation of local entropy S as a function of amino acid position at graph depth D = 1. In the native state (280 K), S is highly heterogeneous along the sequence, including within well-defined secondary structures represented in Fig. S3. The lowest entropy values are observed in the central region of the sequence, corresponding to residues involved in β-sheets. A clear trend emerges: S increases from the center toward the termini of the chain. The highest local entropy is observed for residue T54, located at the interface between the two helices within the three-dimensional structure of gpW, as shown in Fig. S3. Notably, if the folded protein was represented by a single static structure, S would be exactly zero for all residues. The results in Fig. 1 underscore the necessity of representing even folded proteins as conformational ensembles, rather than single snapshots.30
The spatial variations of S in the folded state provide complementary insight into protein dynamics, focusing on the network of intramolecular interactions rather than on local positional fluctuations (as captured by B-factors), bond vector order parameters (such as the NMR-derived S2)71–73 or local accessible volume (such as the packing fraction65). For comparison, we computed the packing entropy SP, which has been demonstrated to correlate strongly with entropies derived from the quasi-harmonic approximation or from B factors,65 at all temperatures along the amino acid sequence of gpW using structures extracted every nanosecond from all trajectories. The packing entropy of each amino acid, SP(k), is computed for every protein structure from its packing fraction (see ref. 65 and the SI). The results shown in Fig. S4 reveal that SP represents a distinct quantity. As expected for its small size, glycine residues contribute the most, and the packing entropy values of G20, G30, G55, and G62 are of the same order of magnitude. Importantly, in contrast, the local entropy S depends more strongly on the position of the residue within the sequence than on the chemical nature of the amino acid itself. For instance, residues G30 and G55 display markedly different entropy values in the folded state at T = 280 K (0.11 and 6.27, respectively as shown in Fig. 1). Local entropy is a property of the interaction network and depends on the identity of the interacting residues, whereas packing entropy measures the locally accessible volume around a residue and is less sensitive to the chemical nature of the amino acids forming the surface of this volume. Both quantities contribute to the overall entropy of the protein.
For the majority of residues, local entropy S increases progressively with temperature up to 320 K, i.e. below Tm as shown in Fig. 1. Beyond this point, a notable shift occurs: the curves of S at 325 K, the melting temperature Tm = 330 K, and 335 K form a clearly distinct cluster, separated from those at 320 K and 340 K. This abrupt change is hallmark of the unfolding phase transition. By contrast, the packing entropy SP does not vary much with the temperature and no global unfolding phase transition is observed in Fig. S4. This can be understood as according to the Gaussian model of an ideal polymer and molecular dynamics simulations, the unfolded state is a compact structure and for most of the residues the accessible volume will not be very different in folded and unfolded states.
At Tm and above (i.e., in the unfolded state), local entropy becomes approximately uniform along the sequence, except at the N-terminus (residues 1–3) and C-terminus (residues 60–62), where S remains lower (Fig. 1 and S3). Interestingly, this pattern closely resembles the entropy profile derived from ensembles of self-avoiding random walks of the same chain length. In particular, the SAW ensemble also exhibits reduced entropy at the chain ends, suggesting a common geometrical origin linked to reduced connectivity or spatial constraint near termini.
The low values of local entropy S observed at the N- and C-termini can be interpreted through combinatorial considerations. In the absence of specific interactions—such as in SAW models where only steric constraints are present—or at high temperatures in MD simulations where the potential energy landscape exerts minimal influence, the conformational ensemble is governed primarily by geometrical and entropic factors. In such regimes, the local entropy reflects the combinatorial diversity of local environments. According to the Gaussian polymer model, the probability of forming a contact between two residues i and j decreases sharply with their sequence separation dseq = |i − j|.87 As a result, residues located at the chain termini have fewer opportunities to establish contacts, since they can interact with residues only on one side of the sequence. This asymmetry limits the number of distinct subgraph configurations that can form around terminal residues, reducing the diversity of micro-environments and therefore leading to lower local entropy values compared to residues in the interior of the chain.
Second, we examined the variation of local entropy at D = 2 in Fig. 1. The profile of S along the sequence exhibits features similar to those observed at D = 1, with the main difference being a marked increase in entropy values across all residues and a more uniform local entropy in the C-terminal region beyond residue L50. This increase arises from the fact that, by construction, the subgraphs at D = 2 contain more nodes and edges, and are thus more sensitive to conformational fluctuations over time.
At the melting temperature Tm, the local entropy reaches values close to the maximum possible entropy Smax for nearly all residues, except at positions 19 and 30, where the entropy remains low—consistent with their strong structural constraints in the native state at D = 1. At 380 K, the local entropy remains uniform across the central region of the sequence, similarly to the behavior observed at D = 1. Notably, the gap between the curve at Tm and those at 5 K above or below is smaller at D = 2, indicating that the temperature dependence of S is smoother at this scale.
Each sPG can be interpreted as a microstate of the local environment of a residue, defined by its network of connections. At equilibrium and sufficiently high temperatures, the local entropy for each residue is expected to approach a maximum given by the Boltzmann formula (eqn (2)). In this regime, the variations of Sk and of the number nk of accessible microenvironments (i.e., distinct sPGs) along the sequence should be highly correlated.
To quantify this, we computed the Pearson correlation coefficient r between the vectors Sk and nk as functions of the residue index k, at various temperatures, for gpW at D = 1 (Fig. S5). The correlation remains high across all temperatures, with a modest discontinuity at Tm. Specifically, at D = 1, the average value of r is 0.915 for T < Tm and 0.954 for T > Tm. At D = 2 (Fig. S6), the correlation is even stronger, increasing from 0.970 at 280 K to 0.992 at 380 K. These results show that, to a very good approximation, local entropy reflects the number of accessible structural states around each residue in the graph representation.
In addition, the Pearson correlation between the average size of each sPG (measured by the number of peptide bonds it includes) and S at D = 1 displays an abrupt shift at the melting temperature and converges at high temperatures to values comparable to the correlation between Sk and nk (see Fig. S5). In the unfolded state, the number of accessible sPGs nk becomes nearly uniform across the sequence, except at the terminal residues. As also observed for the global entropy curve S, the transition is less pronounced at D = 2, as illustrated in Fig. S6.
The curves S(T) extracted from Fig. 1 at D = 1 are shown for selected residues in Fig. 2 a, and for all residues in the SI (Fig. S7). For most residues, S(T) exhibits a sharp change near the melting temperature (Tm = 330 K), which supports the interpretation of local entropy as a local order parameter of the unfolding phase transition.
![]() | ||
| Fig. 2 Local heat denaturation curves from MD of gpW and those extracted from NMR chemical shifts data δ (in ppm) from ref. 76 for selected residues of gpW. The curves S(T)/Smax extracted from Fig. 1 at D = 1 are showed (panel (a)). The local entropy differences, represented at D = 1 (panel (b)) and D = 2 (panel (e)), are ΔS = S(T) − Smin where Smin is the minimum value of S between 280 K and 380 K. The chemical shifts data are shown Δδ = δ(T) − δmin if δ increases with T, and Δδ = δmax − δ(T) if δ decreases with T (panel (c)). The local packing entropy differences, represented (panel (d)), are ΔSP = SP(T) − SP,min where SP,min is the minimum value of SP between 280 K and 380 K. Solid lines are provided as guide for the eye. A red dashed curve represents the average of the curves over all residues for each quantity. The values of the Pearson correlation coefficient r and the Jensen–Shannon distance JS computed between each local entropy curve and the average curve are shown. | ||
For each residue, the curve S(T) is compared to the average over all residues, denoted 〈S(T)〉. Although the local entropies are not strictly independent—since residues are connected—the average serves as a useful global entropic descriptor. The S(T) curves can be qualitatively classified into four categories based on their deviation from 〈S(T)〉, as illustrated in Fig. 2a. The similarity between the local entropy curve and the average curve is quantified using both the Pearson correlation coefficient (r) and the Jensen–Shannon distance (JS) computed using SciPy library.88 The correlation coefficient measures the linear relationship between two curves and reaches its maximum value, i.e., 1, when the two curves are identical up to a multiplicative constant, whereas the JS distance quantifies the dissimilarity between the distributions of values along the two curves and reaches its minimum value, i.e., 0, for identical distributions.
The first category includes 41 out of 62 residues whose local entropy curves S(T) are highly correlated with the average entropy 〈S(T)〉 (r ≥ 0.96). For this category, the unfolding transition is clearly cooperative. The local entropy curves can be further divided into two classes. In the first class (JS ≤ 0.06), 20 residues display local entropy profiles that closely follow the average curve, such as A13, K21, and K28 (Fig. 2a). This class also includes E5, L7, A9, A10, L14, M18, R22, A24, T25, V26, Q27, F35–A37, S39, V40, and L43 (Fig. S7). The second class (0.09 ≤ JS ≤ 0.28) comprises 21 residues whose S(T) curves undergo sharper transitions than the global curve due to their very low entropy in the folded state. Examples include H15 and G30 (Fig. 2a), as well as R11, A12, D16, T19, G20, V23, R31, R32–E34, T38, S41, D42, and K44–E49 (Fig. S7). Notably, G30 exhibits a multistep transition with three distinct plateaus, indicating complex unfolding behavior. Except for K21, R22, A37, and T38, residues in the first category are located within secondary structure elements.
The second category includes residues located in secondary structure elements whose S(T) curves are highly correlated with the average entropy curve (r ≥ 0.9), but that increase approximately linearly with temperature up to the unfolded state (0.07 ≤ JS ≤ 0.11), as illustrated by D29 in Fig. 2a. Additional examples include E6, L50, E51, M56, and T57 (Fig. S7).
The third category includes residues exhibiting more complex S(T) profiles that remain significantly correlated with the average entropy curve (0.83 ≤ r ≤ 0.96 and 0.07 ≤ JS ≤ 0.16). Examples include A8, R11, V52, Q53, and G55, which are located within secondary structure elements, and Q58–G62 in the C-terminal region (Fig. S7). These patterns may reflect the presence of intermediate states during unfolding.
Finally, the fourth category consists of five residues showing little to no variation in S(T) across the temperature range (0.11 ≤ JS ≤ 0.16), with low correlation (r ≤ 0.83) between the local and average entropy curves. Representative examples are V2 and T54 (Fig. 2a), the latter located at the interface between the two helices (Fig. S3). Other members include M1, R3, and Q4 (Fig. S7). These residues, mostly located at the termini, appear structurally disordered or only weakly affected by unfolding.
For all residues, the number of links in their sPGs decreases significantly above the melting temperature, and the probability distribution of the sPGs becomes flatter at T > Tm, as shown at 280 K (folded), 330 K (Tm), and 380 K (unfolded) in Fig. S8 for selected residues.
The first category of S(T), which accounts for more than two-thirds of the amino acids, corresponds to distinct behaviors of the sPG probability distribution pi in the folded state. In the first class of this category, where S(T) agrees closely with the average S(T) (JS ≤ 0.06), the most probable sPG has a probability of about 0.3–0.4, and only around ten sPGs have significant probabilities. In the second class of this category (0.09 ≤ JS ≤ 0.28), the most probable sPG has a probability close to 1, and only about three sPGs occur with non-negligible probabilities. This indicates that the links of these residues—such as H15 and G30—are very stable, even at the melting temperature.
The local entropy clearly reveals distinct behaviors among residues, depending on the stability of their local environments. This indicates that the unfolding transition is not a strict all-or-none, two-state process. In particular, extremely low values of S in the native state—such as those observed for residues H15 and G30—highlight key residues that contribute significantly to the stability of the folded structure. Both of these residues with low local entropy are located within secondary structural elements.
The S(T) curves extracted from Fig. 1 at D = 2 are shown in the SI (Fig. S9). As previously observed in Fig. 1, the unfolding transition is much less pronounced at D = 2, except for a few residues—mainly those located in secondary structure elements, notably residues M18–V23, T36–T38, and S41. As a result, the average curve of S does not exhibit a clear phase transition at D = 2 but instead shows a gradual increase with temperature up to Tm, beyond which S becomes nearly constant.
Previous NMR studies have revealed a variety of denaturation behaviors in the chemical shifts of gpW during unfolding, as measured by 13Cα nuclei.76 Given that the local entropy is a combinatorial property derived from the short-range environment of each Cα atom, we investigate here the relationship between this theoretical local order parameter and the variations in 13Cα chemical shifts observed during protein unfolding.
It is well established that 13Cα chemical shifts are sensitive indicators of secondary structure.89 More specifically, their values are known to vary with the Ramachandran angles89 or with the local backbone curvature θ and torsion γ.78 More generally, chemical shifts are highly sensitive to local structural segments or motifs.90 As temperature increases, the conformational space explored by atoms in the (θ, γ) map expands.78 However, the connection between a theoretical local order parameter and an experimental observable depends on both the nature of the probe and its resolution.91 As previously shown, changes in the shape and size of the (θ, γ) region explored by an atom during unfolding may lead to chemical shift variations too subtle to be detected experimentally.78 Nevertheless, since the number of microstates visited by an atom correlates with the accessible surface in this map, a link between local entropy and chemical shifts is expected.
To compare δ(T) and S(T)—two fundamentally different physical observables—we define normalized variations based on their temperature-dependent changes. For the local entropy computed from MD, we define ΔS(T) ≡ S(T) − Smin, where Smin is the minimum value observed between 280 and 380 K. For the chemical shift, we define Δδ(T) as either δ(T) − δmin or δ(T) − δmax, depending on the monotonicity of δ(T) across the temperature range. Specifically, we use the first expression if δ(T) increases with T, and the second if it decreases.
The ΔS(T) curves computed at D = 1 (Fig. 2b) and D = 2 (Fig. 2e) are compared to the Δδ(T) profiles (Fig. 2c) for the selected residues shown in Fig. 2. Results for all residues are provided in the SI (Fig. S10 and S11 for ΔS(T) at D = 1 and Δδ(T), respectively).
In Fig. 2, we observe a remarkable similarity between ΔS(T) and Δδ(T) at D = 1, with the exception of residues K21 and D29. Notably, the multi-step transition observed in S(T) for residue G30 is also reflected in the δ(T) data. Similar correlations are found across the full set of residues, as shown in the SI. These observations confirm that, at D = 1, the different classes of S(T) curves closely reproduce the δ(T) behaviors previously reported by Sborgi et al.76
In contrast, ΔS(T) computed at D = 2 does not exhibit a clear phase transition (see Fig. S12), and diverges significantly from the corresponding Δδ(T) measurements. This discrepancy is evident when comparing panels c and e in Fig. 2, as well as Fig. S11 and S12 in the SI. This result is expected, as the chemical shift of a nucleus is primarily influenced by its immediate local environment. The comparison of experimental δ(T) data with entropy calculations at two different graph distances suggests that the cooperative features of protein unfolding resemble a phase transition only within a limited interaction range. This conclusion holds for the complete set of residues, as illustrated in Fig. S10–S12.
It is relevant to compare the local packing entropy ΔSP(T) curves extracted from the Fig. S4 with the local packing entropy curves ΔS(T) at D = 1 (Fig. 2b and S12) and NMR chemical shifts variations (Fig. 2c and S11). The curves ΔSP(T) are shown in Fig. 2d for selected residues and in Fig. S13 for all residues. As it can be anticipated from Fig. S4, most of the local curves ΔSP(T) do not show a significant variation and consequently the average entropy curve 〈ΔSP(T)〉 does not show clearly a phase transition, in contrast with the local entropy ΔS(T) (Fig. 2b and S10) and NMR chemical shifts Δδ (Fig. 2c and S11).
Nevertherless, a zoom on the average curve is shown in Fig. S14 where we observed a small increase of the 〈ΔSP(T)〉 with T between 280 K and 380 K and a small jump between 325 K and 335 K which can be interpreted as due to the unfolding process. This small jump is due to only a few residues for which the local entropy does indeed show a transition as observed for residue A13 in Fig. 2d. Other residues in this category are L7, A9, A10, L14, L17, A24, V26, L43, Y46, L50. Their local packing entropy curve is correlated to average one with Pearson correlation coefficient r > 0.7 but also deviates significantly as the Jensen–Shannon distance is high 0.16 ≤ JS ≤ 0.46. All these residues are hydrophobic.
Two residues show a transition anti-correlated with 〈ΔSP(T)〉, i.e., the entropy of these residues decreases with T, as shown in Fig. 2d for G30 and in Fig. S13 for G20. Glycine residues tend to cluster with polar amino acids regarding change in contacts related to entropy variations.64 The decrease in SP for G20 and G30 with increasing T may be related to the hydrophobic effect, similarly to the increase in entropy observed for the hydrophobic residues mentioned above.65 Packing entropy is indeed correlated with the accessible surface area of residues.65 In the early days of protein science, Janin proposed a hydrophobicity scale by computing a free energy ΔGt = RT
ln
f, which can be interpreted as the free energy of transferring a residue from the protein interior to the surface, where f is the ratio of the buried to accessible molar fractions of that residue across a set of proteins.93 The values of ΔGt, computed using the ProtScale server,92 are compared with the variation of ΔSP = SP(380 K) − SP(280 K) in Fig. S15. Residue G30 is predicted to be more stable at the protein surface, and one can see that the change in packing entropy contributes to this stability. In contrast, residue L43—which exhibits a marked increase in entropy at high temperature (Fig. S13)—has a positive value of ΔSP ≡ SP(380 K) − SP(280 K) in Fig. S15. Representative snapshots of gpW at 280 K and 380 K, shown in Fig. S16, confirm the relocation of G30 toward the protein interior and the movement of L43 toward the surface at high temperature. Packing entropy is therefore a powerful tool for quantifying the hydrophobic effect from static structures.
Clearly, packing entropy quantifies a contribution to the total entropy that is distinct from local entropy, as it is directly related to accessible surface area and thus to the hydrophobic effect. This explains why it does not correlate with local entropy or with NMR chemical-shift variations, as shown by comparing panels b, c, and d of Fig. 2 and S9, S13, and S12. The two entropies do not probe the same microstates. For packing entropy (which correlates with B-factors65), the microstates correspond to local configurations of the accessible volume around a residue, without distinguishing the chemical nature of the surrounding amino acids. Local entropy, in contrast, accounts for the interaction network, which varies even for the same accessible volume depending on the identity and arrangement of neighboring residues. The number of microstates—and its variation with temperature—is therefore larger for local entropy, which also correlates with NMR chemical shifts that depend on nuclear interactions and structural organization. Both entropies contribute to the total entropy of the protein.
The monomeric form of α-synuclein is well known to be intrinsically disordered in solution. Fig. 3 displays the local entropy S at T = 310 K computed at distances D = 1 and D = 2 from coarse-grained MD simulations for the wild-type α-synuclein monomer, along with values obtained for a SAW ensemble of the same chain length. On average, the local entropy of α-synuclein is close to that of the SAW ensemble: the ratios 〈S〉/〈Ssaw〉 are 0.856 and 0.979 at D = 1 and D = 2, respectively.
![]() | ||
Fig. 3 Local entropy S at a graph distance D = 1 (lower curves) and D = 2 (upper curves) is shown as a function of the amino acid sequence of α-synuclein at physiological temperatures, calculated from coarse-grained MD trajectories at R = 8 Å. Normalized values of the local entropy, S/Smax, are presented with Smax = ln(100 000) = 11.513 (eqn (2)). Lines are provided as guide for the eye. Solid and dashed lines represent local entropies computed from MD and SAW structural ensembles, respectively. Solid red symbols indicate positively charged amino acids, while blue symbols indicate negatively charged ones. Diamond symbols mark the positions of missense mutations at A30, E46 and A53. Light gray shading indicates the P1 (residues 36–42) and P2 (residues 45–57) segments, while dark gray shading indicates the NAC region (residues 61–95). | ||
It is informative to compare these values to the ones calculated between the local entropy of the unfolded state of gpW and those of SAW of the same chain length. We found: 〈S〉/〈Ssaw〉 = 1.139 at D = 1 and 1.046 at D = 2. These ratios greater than one reflect the presence of enhanced fluctuations in the unfolded globular protein, due to transient interactions between amino acids, which are less frequent in more extended structures like SAWs or intrinsically disordered proteins. As shown by comparing Fig. 1 and 3, the variation of S in gpW at Tm relative to its SAW reference closely resembles the entropy profile of α-synuclein at T = 310 K, for both D = 1 and D = 2. At Tm, gpW exhibits 〈S〉/〈Ssaw〉 = 0.730 at D = 1 and 0.996 at D = 2. This suggests that the conformational disorder of a folded protein at the transition state can closely match that of a fully disordered protein under physiological conditions. Further investigation is required to determine whether this observation reflects a general principle applicable to other protein systems. Indeed, these observations may also arise from the different models employed (all-atom versus coarse-grained force fields) and from the different time scales involved (microsecond versus millisecond effective time scales).
Among the different regions, the NAC domain appears as the most disordered, while the C-terminal domain is comparatively more structured, particularly at D = 1. Specifically, the entropy ratios 〈S〉/〈Ssaw〉 at D = 1 are 0.893, 0.907, and 0.762 for the N-terminal, NAC, and C-terminal regions, respectively. The same trend holds at D = 2 with values of 0.982, 0.992, and 0.965.
At D = 2, the local entropy is nearly uniform along the sequence, except at the termini and for a few specific residues. Distinct local minima of S are observed at residues E13, G25, G31, K45, T59, E110, and D119, with shallower minima in the NAC region around residues G68, V74, and E83. Many of these residues also exhibit low entropy at D = 1, particularly E13, K45, T59, E83, E110, and D119.
As evident from both D = 1 and D = 2, the minima at K45 and T59 lie at the centers of the low-entropy segments K43–E46 and K58–E61, respectively, which separate the P1 and P2 regions. These regions were recently shown to play a crucial role in α-synuclein aggregation, in addition to the well-known NAC region.83 Maxima of local entropy at D = 1 are observed in these regions at hydrophobic residues L38 (P1), V52–A53 (P2), and I88–A89 (NAC). Interestingly, the residues forming native contacts in α-synuclein fibril-like dimers with probability greater than 0.9 in our previous simulations on a millisecond time scale are located near this flexible segment in the NAC (residues G86–F94).85 Further work will be required to establish whether a clear relationship exists between the local entropy of the monomeric state and aggregation propensity.
Interestingly, residues with the lowest entropy values tend to be charged. As for gpW, the Pearson correlation coefficient between the local entropy Sk and the number of distinct micro-environments nk visited by each residue is high (P = 0.954). Low S values can therefore be interpreted as reflecting a reduced number of accessible microstates, due to strong electrostatic interactions or hydrogen bonding. For example, deep entropy minima at K45, K60, and E110 correspond to the sampling of only 7217, 6349, and 3282 distinct sPGs, respectively, out of a maximum of 100
000, compared to the average 〈nk〉k = 17
800 and the maximum, 40
633, observed for L38.
In the absence of experimental measurements of the temperature-dependent chemical shift variation Δδ(T) for α-synuclein—which would provide direct evidence for a link between Δδ and S in IDPs—we instead compare variations in S to two residue-level disorder descriptors based on chemical shifts at a single temperature and compared them to the function 1 − S/Ssaw. This function is designed to reflect the relative degree of local order, with a maximum value of 1 corresponding to S = 0 (i.e., a single stable micro-environment), and values approaching zero when S = Ssaw, which corresponds to maximal steric disorder as represented by self-avoiding walks.
Notably, values of S exceeding Ssaw may occur when multiple minima in the interaction potential generate a greater diversity of sPGs. In such cases, the probability of the contact-free sPG—typically dominant in SAWs—drops significantly, resulting in negative values of the function 1 − S/Ssaw.
The first experimental descriptor used for comparison is the 13Cα secondary chemical shift,80,81 which reflects local backbone conformational preferences. The second is the Chemical Shift Z-score for assessing Order/Disorder (CheZOD score), which integrates the secondary chemical shifts of all backbone atoms into a quantitative measure of local order.82
It is important to note that the chemical shifts of α-synuclein are sensitive to solvent conditions and pH.81 For consistency with the simulation conditions, we used NMR data collected under near-physiological conditions, corresponding to the α-synuclein monomer entry in the Biological Magnetic Resonance Bank (bmrb ID: 6968).80,81
We first compare the 13Cα secondary chemical shifts,80,81 δCα, to the function 1 − S/Ssaw in Fig. 4. As discussed above, MD simulations indicate that the NAC region is the most disordered, while the C-terminal is the most ordered. This is reflected in the averaged values of the entropy-based order parameter: 〈1−S/Ssaw〉N-term/〈1−S/Ssaw〉NAC = 1.19, and 〈1−S/Ssaw〉C-term/〈1−S/Ssaw〉NAC = 2.63. These estimates can be compared to the experimental secondary chemical shifts: 〈|δCα|〉N-term/〈|δCα|〉NAC = 1.75, and 〈|δCα|〉C-term/〈|δCα|〉NAC = 2.13, supporting the conclusion that the NAC region is the most flexible and the C-terminal the most structured—consistent with MD results.
![]() | ||
| Fig. 4 Representation of 1/(S − Ssaw) of α-synuclein at a graph distance D = 1 and R = 8 Å as a function of the amino acid sequence, computed from coarse-grained MD simulations (solid line, open symbols) at physiological temperature, and of the absolute values of Cα secondary chemical shifts δCα extracted from Fig. 1 of ref. 81 (dashed line, solid symbols). The dot-dashed lines correspond to the condition S = Ssaw (right Y axis) and to random coils values of the chemical shifts (left Y axis). All lines are provided as guide for the eye. Light gray shading indicates the P1 (residues 36–42) and P2 (residues 45–57) segments, while dark gray shading indicates the NAC region (residues 61–95). | ||
A similar conclusion arises from the CheZOD descriptor82 (see SI, Fig. S15), which also shows increased order in the N-terminal and C-terminal regions. When averaged (after shifting to positive values), the CheZOD values yield: 〈CheZOD〉N-term/〈CheZOD〉NAC = 1.19, and 〈CheZOD〉C-term/〈CheZOD〉NAC = 2.63. This is also in agreement with NMR measurements of the 15N transverse relaxation rate R2, which indicate enhanced backbone flexibility in the NAC region under physiological pH and temperature conditions.94,95
Fig. 4 and S14 highlight noticeable differences between the two NMR-derived disorder descriptors and the entropy-based data at the residue level. A quantitative residue-by-residue comparison is thus difficult, given the inherent uncertainties in both NMR chemical shift measurements and MD simulations, but shared patterns do emerge. Along the N-terminal region, peaks indicative of increased local order appear at residues K6, V16–A29, K34, K45, V52, and T59 in |δCα|, and at comparable positions in 1 − S/Ssaw (notably K6, E13, K21–E35, K45, K60). The CheZOD score also shows elevated values at K6, V15–T33, and A53, but differs by presenting a distinct peak at V40 and lower order at K60.
In the NAC region, |δCα| shows peaks at V70, T75, A85, and I88. The entropy-based descriptor 1 − S/Ssaw displays corresponding peaks at G67, G73, K80, and G84. The CheZOD score shows similar variations with peaks at V66, K80, and A85.
The C-terminal region displays significant variability in all three descriptors. In |δCα|, prominent peaks are observed at A124, E126, and E137. The CheZOD score presents additional maxima at K96, A107, M116, G132, E126, and E137. The entropy-derived profile also reveals numerous peaks within residues K96–E139, including E110, D119, M127, G132, and P138.
While a perfect residue-by-residue correspondence is not observed, even between the two experimental descriptors based on the same δ data, overall trends and relative variations in local order along the α-synuclein sequence are consistent across all three descriptors, albeit with some positional shifts. Interestingly, the similarity between |δCα| and the entropy-based descriptor is stronger than with the CheZOD score. This is expected, as both |δCα| and S are sensitive to the local environment of Cα atoms, whereas CheZOD is derived from chemical shifts of all backbone atoms and therefore captures a broader range of structural features. The present discussion highlights the need for further experimental studies on IDPs across different temperatures to clarify the role of local entropy in protein dynamics and aggregation.
In this context, we propose using local entropy as a quantitative descriptor to assess the impact of single-point mutations on the structural ensemble of α-synuclein. This analysis is based on coarse-grained MD simulations reported in a previous study.79
The variations of the local entropy S at D = 1 and D = 2 for the mutants A30P, E46K, and A53T are presented in the SI (Fig. S1 panels c to h). The average values of S across the N-terminal, NAC, and C-terminal regions follow the same trend observed for the wild-type protein, as discussed in the previous section, namely: 〈S〉 C-term < 〈S〉N-term < 〈S〉NAC. As in the wild-type, local inhomogeneities of S are observed along the sequence, with values tending to converge toward those of the SAW ensemble at D = 2. In general, residues near the mutation sites exhibit a decrease in local entropy, while distant residues can experience either an increase or a decrease. These effects are examined in more detail at D = 1 in Fig. 5, and are discussed below for each mutant.
In the A30P mutant, the substitution of alanine by proline induces a marked local effect, with a sharp reduction of S in the A27–K34 segment. The lowest entropy values are found at P30 and G31. Notably, P30 explores only 2521 microstates (sPGs), compared to 23
736 for A30 in the wild-type. Additional significant changes in S are observed at or near K21, as well as in key regions associated with aggregation: at K43, between the P1 and P2 regions; at E57, between the P2 and NAC regions; and at G73 within the NAC region. In contrast, the highly negatively charged C-terminus remains largely unaffected.
For the E46K mutant, the replacement of glutamate by lysine leads to substantial changes in local entropy across the sequence, including long-range effects in the C-terminal region far from the mutation site. The most prominent variations are observed at K46 and E110: the number of explored sPGs decreases from 9482 for E46 in wild-type to 6527 for K46 in mutant and increases from 3282 in wild-type to 4803 for E110 in mutant. Additional fluctuations are noted at S9, K32, and K60 in the N-terminal, and at E110 and E119 in the C-terminal. The key-related regions for aggregation are significantly affected by the mutation: the entire region P2 and the residues between P1 and P2 and P2 and the NAC regions. In the NAC region, aside from G73, S decreases significantly at T81 and increases at G86 and G93.
The A53T mutation also causes significant changes in local entropy, particularly in the N-terminal region. The largest reduction is observed at E57, where the number of substates decreases from 16
050 in the wild-type to 8343 in the mutant. For comparison, the number of sPGs varies from 36
556 in the wild-type to 28
276 in the mutant for residue 53. Substantial entropy variations are also seen for G25, G31, K43, and T53. In the C-terminal region, S generally decreases, except at P120 and D121. In the NAC region, N65 shows a notable reduction in entropy.
Interestingly, the residues that form native contacts in α-synuclein fibril-like dimers with a probability greater than 0.9 in our previous millisecond-timescale simulations (nucleation phase) are located in regions spanning both the P1 and P2 segments (L38–V55 in E46K and L38–E57 in A53T). These regions display substantial alterations in local entropy compared with the wild-type protein, for which residues with native-contact probabilities above 0.9 lie in the NAC region (G86–F94).85
The localization of these mutation-induced changes in local order/disorder correlates with the physicochemical properties of the substituted amino acids. Alanine is small and flexible, whereas proline introduces a rigid constraint due to its side chain being covalently linked to the backbone, explaining the strong local reduction in S for A30P. The E46K mutation replaces a negatively charged glutamate with a positively charged lysine, which significantly alters the network of long-range electrostatic interactions. Similarly, the A53T mutation introduces a polar side chain (threonine), which explain the extended perturbation along the sequence.
Although a detailed analysis of the impact of a single mutation on the full network of interactions is complex, the local entropy can quantify these effects by comparing the values of S for the wild-type and mutant proteins, as shown in Fig. 5. These variations can be further interpreted by examining the individual sPGs.
As an example, we describe how the A30P mutation induces long-range effects on residues K21, K43, E57, and G73. Since describing each individual sPG would be overly detailed and cumbersome, we summarize the mechanisms underlying these distant mutational effects in Tables S1 and S2.
Table S1 lists the nodes of the most probable sPGs (i.e., with pi > 0.01) for residues A30, E43, V52, A53, T54, and V55 in the wild-type protein. Table S2 contains the corresponding nodes for residues P30 and E43 in the A30P mutant.
In the wild-type protein, three regions—namely A19–K23, E28–E35, and V52–V55—are involved in the sPGs of A30, showing high diversity with 15 graphs having pi > 0.01 (Table S1). In the mutant, two of these regions, A19–K21 and E28–K32, are still present in the sPGs of residue 30 (Table S2). The first has a slightly lower probability, explaining the increase of S around K21 in the mutant. The second exhibits much greater stability, with a probability an order of magnitude higher, accounting for the substantial decrease of S around P30.
Additionally, two new regions, M1–V3 and T64–V66, interact with residue 30 in the mutant, while one region, V52–V55, is no longer part of its micro-environment (Table S2). This loss explains the long-range effects on residues E57 and G73, since in the wild-type, residues A53–V55 interact with E57 and G73 (Table S1). Moreover, A30 was indirectly connected to region T64–V66, which is involved in the sPGs of the now-missing residues V52–V55 in the mutant. The substitution of A30 with P30 shifts the segment V62–V66 from being connected to residue 30 at a graph distance D = 2 in the wild-type protein to a distance D = 1 in the mutant. This reorganization impacts the local entropy.
The effect of the mutation on the entropy of E43 can be understood by comparing the node ensembles in Tables S1 and S2. In the A30P mutant, the absence of the region A53–A56 and the reduced probability of the A19–T22 segment compared to the wild-type result in a redistribution of the sPG probabilities for E43. Specifically, the probability associated with the Y39–K45 segment is significantly reduced in the mutant.
In conclusion, the substitution of a single amino acid in the N-terminal region of α-synuclein alters the local order/disorder not only near the mutation site but also at distant positions in key regions associated with aggregation, i.e., the P1, P2, and NAC segments, and, in the case of E46K, even in the C-terminal region. These modifications are expected to influence the dimerization behavior of α-synuclein. Indeed, we previously found that the nucleation centers for the formation of amyloid precursors in dimers are located in the NAC region for the wild-type and A30P proteins, whereas they shift predominantly to the N-terminal region (V40–K60 segment) for the A53T and E46K mutants.85
In structured proteins, local entropy varies significantly along the sequence, as the B factors and packing entropy, including within secondary structure elements. This makes it a powerful tool for identifying inhomogeneities in unfolding transitions at the residue level, in good agreement with experimental observations, as illustrated here for the gpW protein.
In intrinsically disordered proteins such as α-synuclein, local entropy shows substantial sequence dependence and is, on average, lower than that of a self-avoiding walk (SAW) ensemble of the same chain length or the unfolded state modeled for gpW. The descriptor S is particularly sensitive to amino-acid substitutions and can capture their effects through a single quantitative metric. Significant differences in local entropy are observed between mutants and the wild-type protein, particularly in regions implicated in aggregation such as P1, P2, and the NAC segment. Further work will be required to determine whether the local entropy of the monomeric state is predictive of fibril formation.
Beyond quantifying disorder, the concept of local entropy has broad potential applications. For instance, it could be employed to study protein–protein interactions by characterizing the sPGs stabilized at binding interfaces, or to investigate allosteric mechanisms by quantifying entropy changes upon ligand binding or post-translational modifications. More generally, local entropy could be used outside protein science—for example, to characterize heterogeneity and temperature dependence in polymer hydration, or to quantify molecular-level or atom-level entropies at liquid–solid or disordered solid–solid interfaces.
Additional data supporting the findings of this work are included in the supporting information (SI). Supplementary information: details of the methods, supporting figures and tables . See DOI: https://doi.org/10.1039/d5sc06411b.
| This journal is © The Royal Society of Chemistry 2026 |