Open Access Article
This Open Access Article is licensed under a Creative Commons Attribution-Non Commercial 3.0 Unported Licence

Nearest neighbour interactions between amino acid residues in short peptides and coil libraries

Reinhard Schweitzer-Stenner*
Department of Chemistry, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA. E-mail: rs344@drexel.edu

Received 16th December 2025 , Accepted 20th March 2026

First published on 30th March 2026


Abstract

Intrinsically disordered proteins (IDP) or proteins with intrinsically disordered regions (IDR) perform a plethora of functions mostly in a cellular environment. As unfolded proteins, IDPs can adopt molten globule or coil ensembles of conformations. Regarding the latter, the question arises as to whether they are describable as a self-avoiding random coil. Locally, this requires that amino acid residues sample the entire sterically allowed region of the Ramachandran plot with very similar probabilities and independent of the conformational dynamics of their neighbours. However, various lines of experimental and bioinformatic evidence suggest a more restricted, side chain and nearest neighbor dependent conformational space for individual residues. Over the last 25 years, short peptides and coil libraries were employed to determine conformational propensities of amino acid residues in unfolded states. The question arises as to whether conformational ensembles obtained from these two sources are comparable. In this paper, a variety of metrics were used to compare Ramachandran plots of a limited number of GXYG peptides (X,Y: guest residues) with XY dimers in the coil library of Ting et al. (Ting D., Wang G., Shapovalov M., Mitra R., Jordan M. I., Dunbrack R. L., Neighbor-dependent Ramachandran probability distributions of amino acids developed from a hierarchical dirichlet process model, PLoS Comput. Biol., 2010, 6(4), e1000763-1–21, 10.1371/journal.pcbi.1000763). The results reveal major differences between corresponding plots, which might in part be due to the fact that solely the influence of one of the two neighbours of a given residue is probed by the above coil library while averages were taken over the respective opposite neighbours. The presented results suggest that coil libraries alone might not be a sufficient tool for determining the characteristics of statistical coils of IDPS and IDRs alike.


Introduction

It is now well established that intrinsically disordered proteins (IDPs) and folded proteins with intrinsically disordered regions (IDRs) perform a plethora of functions mostly in a cellular environment. Their very existence shatters the conventional dogma that stipulate the need for a stable folded protein for its function.1–6 Estimates of the fraction of biologically relevant IDPs vary in the literature.2,6 A recent study reported 30% of eukaryotic proteins to be at least partially disordered.7 In many cases IDPs or IDRs of otherwise folded proteins are involved in molecular recognition processes that involve disorder → order transitions.8–13 In less frequent cases transitions into fuzzy or even another disordered state occur.10,14,15 Protein disorder was found to dominate in proteins associated with signal transduction, cell-cycle regulation, gene expression and chaperone action.5,6,16,17 Besides being of pivotal importance for biological processes IDPs can also malfunction by self-assembly into fibrillar structures that have been implicated in several neurological diseases.2,18–20 Aggregation of IDPs is also involved in the droplet formation due to liquid–liquid phase separations.21

Generally, IDPs exhibit a dynamic behavior that allows them to exist as an ensemble of energetically similar conformations under physiological conditions. Several lines of evidence suggest that in structural terms IDPs can be treated as unfolded proteins so that their conformational ensembles can be classified either as collapsed molten or pre-molten globules (if e.g. water is a poor solvent) or as self-avoiding random coils (if water is a good solvent).3 A more nuanced picture emerged from NMR studies using residual dipolar coupling and secondary chemical shifts, which revealed the occurrence of local residual structure in ensembles which otherwise appear dominantly coiled. The propensities of proteins for these global states generally depend on their amino acid composition and in particular on the respective charge distributions.22–25

This paper focusses on the local structural characteristics of totally unfolded, coiled proteins. Traditionally, they are described by invoking the random coil concept. Conceptually, both the ideal Flory random coil model and the self-avoiding random walk model are based on the assumption that a polymer can be described as a freely jointed chain of subunits.26,27 In the case of polypeptides/proteins this would require a nearly unhindered rotation around the NCα and CαC’ bonds that connect the peptide groups. Apparently, this idealization is at variance with the observation that steric hindrance and electrostatic interactions between residues restrict the corresponding dihedral backbone angles φ and ψ.28–31 The sterically allowed regions of the Ramachandran plot of individual amino acid residues are believed to be very similar with the sole exception of glycine (more conformationally flexible due to the lack of chirality) and proline (more restricted due to the pyrrolidine ring). Hence, individual sequences of polypeptides and proteins would not matter much for coiled proteins with regard to experimentally accessible parameters like the radius of gyration and end to end distances. Indeed, the radii of gyration Rg of proteins and IDPs in denaturing solvents were found to be describable by the classical scaling law RgNν (N: number of residues) with scaling exponents clustering around 0.6,32,33 the expectation value for a self-avoiding random walk in a good solvent.26,34

While totally unfolded (denatured) proteins behave globally like self-avoiding random coils in most cases, experimental, computational and bioinformatic evidence suggest a more restricted and side chain dependent conformational sampling of the Ramachandran space than assumed even by restricted random coil models. The latter stipulates that the Ramachandran plots of all residues besides proline and glycine are describable by a very broad and shallow minimum of the Gibbs energy landscape in the upper left quadrant of the Ramachandran plot so that all conformations in the region between φ = −160° and −50° as well as between ψ = 180° and 90° are practically isoenergetic. Traditionally, it was called the β-strand region.28,29 A less extended region between φ = −160° and −50° as well as ψ = 20° and −70° exhibits similar minimal Gibbs energies as the broad region in the upper left quadrant. It encompasses all types of right-handed helical and type I/II β-turn conformation generally sampled by their i+2 residue. Much smaller regions of higher Gibbs energy are associated with left-handed helical and other turn structures. In Fig. 1, the local restricted random coil distribution is approximated by a mixture of nearly isoenergetic Gaussian basins the parameters of which are listed in Table S1 (taken from ref. 35). Contrary to this view, spectroscopic experiments on short host-guest residue type peptides such as blocked dipeptides, NH3+-GXG-COOH and Ac-GGXGG-NH2, peptides (X: guest residue) revealed that e.g. alanine intrinsically exhibits a high preference for a conformation called polyproline II (pPII, φ > −100°), while valine slightly favors β-strand (φ ≤ −100°) over pPII (cf. ref. 36 and references cited therein). Protonated aspartic acid samples turn-supporting conformations to a significant extent. Similar though not identical conformational preferences have been obtained in so-called coil libraries. They were constructed from the dihedral angles adopted in segments of folded proteins, which do not depict one of the well-known regular structures (ref. 37 and references cited therein). They are considered as representative of coiled proteins because it is assumed that any non-local interactions and steric constraints are averaged out if one builds Ramachandran distributions with a large number of proteins that possess a significant fraction of loops. Computationally, an increasing number of molecular dynamics (MD) simulations using modified Amber, CHARMM or OPLS forcefield and TIP water models is capable of reproducing the high pPII propensity of alanine to some extent,38–44 but a reproduction of the side chain specificity of Ramachandran distributions has only recently been achieved, though only on a qualitative level.45


image file: d5ra09727d-f1.tif
Fig. 1 Ramachandran plot of a (local) restricted random coil distribution of conformational ensembles. The respective parameters have been taken from ref. 35. They are listed in Table S1.

The discovery of residue specific propensities changed our picture of coiled peptides and proteins in that it suggests that, locally, unfolded peptides and proteins exhibit a statistical coil rather than a (restricted) random coil behaviour. In the statistical coil case, the broad basin in the upper left quadrant splits into two much more confined minima with side chain dependent free energies, which in many cases are not identical. Conformations in the right-handed helical/β-turn I/II region are generally of significantly higher Gibbs energy. Hence, the free energy landscape is describable by a statistical canonical set of Boltzmann factors. The (restricted) random coil case thus appears as a special case of a statistical coil. Generally, statistical coil ensembles carry significant less conformational entropy then a classical restricted random coil distribution.35,46

Even though the above conformational restrictions of the Ramachandran space seem to have a limited influence on global properties of coils such as the radius of gyration or end to end distance,33,46,47 their assessment is of utmost importance for producing statistical coil reference systems based on which local transient residual structures in unfolded proteins and IDPs can be identified by nuclear magnetic resonance spectroscopy (ref. 48 and references cited therein). Information about the local properties of coiled peptides and proteins also allow for an accurate estimation of configurational entropies.49,50 Unfortunately, the task of identifying the coiled state of proteins is complicated by nearest and potentially even second nearest neighbour interactions (NNIs) between residues (ref. 37 and references cited therein). The occurrence of such interactions is well established in general, but the underlying mechanisms are not well understood. It is still not entirely clear to what extent NNIs involve correlated conformational motions between residue, in which case they would depend on the conformation adopted by the neighbours. Several lines of evidence suggest that such structural correlations exist, in which case the isolated pair hypothesis (IPH), another pillar of the random coil model, would no longer be valid (ref. 37 and references cited therein). As a consequence, configurational entropies and free solvation energies would no longer be additive.

Carrying out a complete quantitative analysis of NNIs is difficult because they are side chain dependent. Hence, even if one focuses on NNIs and thus ignores second neighbour effects, one would have to analyse e.g. 203 GXYZG unblocked penta-peptides or blocked tetra-peptides in solution in order to cover all possible combinations. The number reduces to 400 GXYG tetra-peptides (or blocked tripeptides), if one assumes that the influence of down- and upstream neighbours are additive. Several lines of evidence suggest that this is too optimistic an assumption.51–54 Spectroscopic investigations of tetra- and pentapeptides in water clearly revealed significant nearest neighbour interactions, which most likely involve correlation effects.53–57 Even though the number of investigated peptides has been significant, they cannot be considered as fully representative for the entire manifold of side chain combinations. Therefore, coil libraries have emerged as a preferred source for NNI analysis. A coil library of proteins is a database of backbone and side chain conformations, which generally excludes helical and sheet sequences of proteins. For some libraries, turns have also been excluded. However, the use of thus restricted libraries makes it difficult to obtain statistically significant Ramachandran plots of individual residues for all possible up- and downstream neighbours (cf. the Ramachandran plots that can be constructed at58). As a remedy, Ramachandran plots for individual amino acid residues have been constructed by averaging over all neighbours.59–62 Such a strategy would be valid if individual NNIs with different neighbours were very similar, at variance with the results of the above spectroscopic studies on tetra- and pentapeptides.52–54,57

In this paper, I explore the influence of residue averaging by a comparative analysis of Ramachandran distributions of amino acid residues in tetra-peptides and in the coil-library of Dunbrack and coworkers.63 Due to my best knowledge, the latter is the only one that provides statistically significant information about the specific influence of individual neighbours on Ramachandran distributions of all twenty amino acid residues. The authors restricted averaging to either the right (downstream) or the left (upstream) neighbours, respectively. In other words, for all-XY pairs, the Ramachandran plots of X were reported for all Y neighbours, while the influence of the left neighbours were averaged, whereas plots for Y were produced for all X neighbours with average over the right neighbours (XY-all pairs). Ting et al. used a mathematical algorithm to construct complete Ramachandran plots with a resolution of 5° for the φ and ψ coordinate from the obtained data points taken from a restricted coil library (no helices, strands and turns). A limited comparison of these coil library Ramachandran plots with tetra-peptide distributions had been undertaken by Schweitzer-Stenner and Toal based on the use of calculated Hellinger distances (vide infra).56 The current paper presents the results of a by far more comprehensive analysis, which compares respective population of mesostates as well as experimental scalar NMR-based J-coupling constants of tetrapeptides with coupling constants calculated for respective coil library distributions of Ting et al. Revised Hellinger distance calculations are augmented by comparisons of configurational Gibbs entropies for individual residues. To my best knowledge, such a set of different metrics have not yet been applied to any coil library. The results of this analysis reveal that the diversity of individual NNIs is not sufficiently represented by coil library distributions and that averaging over neighbours underestimates the influence of neighbours.

Theoretical background

The basics of the so-called Gaussian model implemented for the analysis of J-coupling constants as well Raman, IR, and vibrational dichroism amide I′ profiles of tri-, tetra- and pentapeptides have been described in several earlier papers of my research group.64,65 Briefly, observables O (i.e. scalar J-coupling constants and amide I′ profiles in the above spectra) were calculated as ensemble average of a statistical coil distribution:
 
image file: d5ra09727d-t1.tif(1)
where O represents the considered spectroscopic observable (vide supra) and Z denotes the partition sum of the ensemble. In the past, the distribution function P(φ,ψ) was constructed as a superposition of two-dimensional Gaussian functions:
 
image file: d5ra09727d-t2.tif(2)
where the index j represents the jth conformation summed over N different sub-distributions with statistical weights (mole fractions) χj. The position vector image file: d5ra09727d-t11.tif represent a coordinate pair (φ,ψ), image file: d5ra09727d-t12.tif denotes the position the center of the sub-distribution. [V with combining circumflex] is a 2 × 2 matrix, the diagonal of which contains of the standard deviations of the Gaussian sub-distributions with respect to φ and ψ. The off-diagonal elements account for correlations between the two coordinates, which in the case of unequal standard deviations for φ and ψ lead to a rotation of the axes of the respective ellipse. In the current study, the Gaussian model was solely used for non-linear least square fits of eqn (1) to a set of experimental scalar J-coupling constants of GXYG peptides (i.e. the φ-dependent 3J(HNH), 3J(HNC’), 3J(HC’), 3J(HNCβ) and the ψ-dependent 1J(NCα), which were carried out with a Matlab program (X and Y represent the two guest residues). For the fitting procedure, the statistical weights as well as the positions of the Gaussian sub-distributions were used as free parameters in an iterative procedure described by Milorey et al.54 The standard deviations of the distributions were slightly varied between 10° and 20° for a fine-tuning of the fits. Scalar coupling constants were calculated by using the Karplus parameters in ref. 66 for the 3J constants and in ref. 67 for the 1J constant. Ramachandran plots were calculated with a resolution of 2° for φ and ψ. In order to account for the amide I′ band profiles, conformations of the entire peptide rather than residue conformations have to be used. The final output of the above fitting procedure was used as input for a truncated model applied to reproduce the amide I′ profiles. This reduction was necessary to reduce the computational cost of the conformational sampling. A total of 21 pairs of φ, ψ angles represents an ellipse around the center pPII and β-strand troughs, respectively. Hence, we substitute the two-dimensional Gaussian distributions by ellipses (not cuboids, as stated in ref. 54) with axis widths of 2Δϕj and 2Δψj in ϕ and ψ direction, where Δϕj and Δψj were set equal to the standard deviations of the respective Gaussian distributions. The height of the ellipses is given by the statistical weights of the conformations divided by the total number of representing data points (i.e. 21). Representative plots of these distributions are shown in Fig. S1. In addition, the less dominant conformations (right-handed helical, inverse γ-turn, asx-turn, etc.) were represented by a single conformation positioned at the respective center of the Gaussian sub-distribution with the statistical weights obtained from the fits of the Gaussian model to J-coupling constants. Several checks revealed that the truncated model reproduced the coupling constants produced by the fitting to the J-coupling data sufficiently well. If the input distribution reproduced the measured amide I′ profiles, the analysis was considered complete. If the measured and calculated amide I′ profiles differed significantly, the above parameters were slightly modified to reduce the deviations. The newly obtained set of parameters was used a starting point for refitting the J-coupling constants. The procedure was repeated until convergence was obtained. This fitting procedure, which was first implemented by Milorey et al. for their conformational analysis of GRRG and GRRRG peptides,54 was found to be more reliable and efficient than the procedure that our group implemented in the earlier study of GXYG peptides.57 Therein, we had used the identical sub-spaces of the Ramachandran plot for manual simulations of the entire data set of J-coupling constants and amide I profiles. The resolution of the conformational sampling was 6°, much lower than the 2° used for the current Gaussian analysis of tetra- and penta-peptides.54 A change of the positions of Gaussian sub-distributions could move part of them outside of the area probed by the respective simulations. Therefore, in order to put all the available tetra-peptide data on an equal footing, I revisited the data of Toal et al.57 with the above procedure of Milorey et al.54 It should be noted that the same exciton coupling model was used in both studies to link residue conformations to amide I′ band profiles. Since vibrational excitonic coupling between amide modes in different residues depends on the dihedral angles between the coupled oscillators the intensity distributions of amide I′ are excellent indicators of backbone conformations. Details of the utilized excitonic model are described in earlier papers.64,65

Different metrics were used to compare the Ramachandran plots of amino acid residues flanked by different neighbours in the investigated tetra-peptides and in the coil library of Ting et al.63 The Hellinger distance H is a measure of overlap between different distributions. It is defined as follows:

 
image file: d5ra09727d-t3.tif(3)
where Pj(φ,ψ) and Pj’(φ,ψ) denote the Ramachandran probability density distributions of the jth residue in tri- and tetrapeptides, respectively. The Hellinger distance can vary between 0 (total agreement) and 1 (completely distinct). To interpret these numbers, earlier introduced criteria were used.56 If the Hellinger value lies in the region below 0.1, the corresponding distributions are considered as very similar. If 0.1 < H ≤ 0.25 and 0.25 < H ≤ 0.4, the distributions are moderately similar and moderately dissimilar, respectively. H values larger than 0.4 indicate very dissimilar distributions. The Hellinger distance responds very sensitively to changes of the positions of individual basins (i.e. Gaussian distributions). Sole changes of statistical weights have a lesser influence on H.

An alternative metric is the Gibbs (or Shannon) entropy which for an individual residue is written as:

 
image file: d5ra09727d-t4.tif(4)
where x is just an indicator of the considered residue and R is the gas constant. The numerical value of Sx depends on the mesh size of the Ramachandran plot (2° for the investigated peptides, 5° for the coil library Ramachandran plots).49,50 Therefore, we just used the differences between the entropies of X with different neighbours as a suitable metric for characterizing NNIs.35

Calculating the Gibbs entropies of residues does not provide a complete picture, if NNIs involve correlations between conformational motions of neighbours. In such a case only the conformational distributions of entire peptides and even of unfolded and disordered proteins have to be taken into account, because thermodynamic parameters are no longer additive. In order to model such a scenario e.g. for our tetra-peptides, one has to use the Ramachandran plots of GXG peptides as reference and model the NNI induced differences in terms of interaction free energies.46,52,55,68 The statistical weight of a distinct conformation of a tetra-peptide GXYG where X has adopted conformation L and Y the conformation K is thus written as:

 
image file: d5ra09727d-t5.tif(5)
where GL,X and GK,Y are the Gibbs free energies of the states L and K of the residues X and Y in GXG/GYG tripeptides. Hence, glycine as neighbour is used as a reference point, for which one would expect minimal nearest neighbour interactions. The interaction free energy between the two conformations in X and Y are denoted as δGLK,YX and δGKL,XY. The latter are obtained by reproducing the differences between the statistical weights of the conformations in GXG/GYG and GXYG.52,55 Z is the partition sum of the system. It can conveniently be calculated by employing Zimm-Bragg like transfer matrices. Details can be found in ref. 55 The configurational Gibbs entropy of a XY ‘dimer’ can now be written as:
 
image file: d5ra09727d-t6.tif(6)
where χLK is the normalized statistical weight (mole fraction) of dimers adopting the conformations L and K. Ramachandran plots of peptide and coil library residues can be compared by calculating the respective population of mesostates. Fig. S2 exhibits a graphic representation of the mesostates, which were used earlier for the comparison of the Gaussian model with the results of molecular dynamics calculations.44 Each mesostate is associated with a specific secondary structure. It should be noted that the classical β-strand region is subdivided into areas associated with parallel (pβ), antiparallel (aβ) and transitional β-strand (βt). While pβ and aβ are canonical, βt is a more recent addition introduced by Tran et al.69 More coarse-grained mesostate models partition this area only into a pPII and a β-strand region. The higher resolution of our model allows for a more nuanced picture of Ramachandran distributions. Since the same mesostates are used for all systems investigated, populations depend on the statistical weight of the respective Gaussian sub-distribution, its width and its position. The population of mesostates are just calculated as mole fraction by integrating the population density over its area in the Ramachandran plot.

Results

This section is organized as follows. First, the experimental data reported for GXYG peptides by Toal et al.57 are revisited using the protocol described in the preceding section. Thus, obtained conformational propensities (statistical weights) and basin positions (of Gaussian sub-distributions) are combined with the results of the more recent tetra-peptide studies of Milorey et al.52–54 Second, respective Ramachandran plots of GXYG and coil library distributions are compared on a qualitative, visual level. In a third step such comparisons are carried out on a more quantitative level by using calculated J-coupling constants, mesostate populations, Hellinger distances and configurational entropies as metrics.

Conformational propensities and basin positions

In a first step the experimental data which Toal et al. reported for a series of cationic GXYG peptides in aqueous solution were revisited by employing the protocol of Milorey et al. (vide supra). The choice of cationic peptides was motivated by the need to obtain sufficiently high intensities of amide proton signals. The influence of terminal charges on the conformational distributions of central residues in AAA, GAG and GVG have been found to be negligible, while it is moderate for GDG.43,53 These finding suggests that even in tripeptides hydration of terminal groups cause a significant shielding and thus a reduction of electrostatic interactions. The results of the GXYG analysis are presented in Tables S1 and S2 as well as in Fig. S3–S7. Table S1 compares the experimental J coupling constants with the ones resulting from the fitting procedure described in the preceding section. The corresponding calculated coupling constants reported by Toal et al. are listed for comparison. Table S2 lists the obtained statistical weights and peak coordinates of the Gaussian sub-distributions for the two non-terminal X,Y residues of the investigated GXYG. Corresponding values reported by Toal et al. are listed in parenthesis for comparison. Fig. S3 shows a representative set of fits to the amide I′ profiles in the IR, isotropic Raman, anisotropic Raman and VCD spectrum of the investigated GXYG peptides. The fits to corresponding band profiles were obtained with the same spectral parameters (i.e. positions and half widths of three Gaussian bands related to transitions into the first excited excitonic state of coupled oscillators) used by Toal et al. It should be noted that owing to the influence of nearest neighbour and non-nearest neighbour coupling, amide I′ profiles reflect the conformational states of the peptide rather than the ones of individual residues, which are specifically probed by the utilized J-coupling constants. Fig. S4–S7 depict the statistical weight of pPII, β-strand and of all helical and turn-supporting conformations for a specific residue in different GXYG peptides (e.g. of alanine in GDAG, GSAG, GALG, GAVG, GAFG and GFAG, target residue in bold). In addition to the revised statistical weights of the GXYG peptides investigated by Toal et al.,57 recent results obtained for GAFG, GFAG, GDFG, GFDG, GDDG and GKKG have been added to the graphs.52–54 Representative Ramachandran plots of GXYG peptides are shown below, where they are compared with corresponding coil library plots.

Similarities and discrepancies between the new and old parameter values of the Gaussian model are briefly delineated in the SI. Here, it can just be stated that the main observations of Toal et al. remain valid, namely the rather pronounced decrease of the pPII propensity of alanine in the presence of unlike neighbours with the exception of phenylalanine (average decrease by 29%). Concomitantly, the β-strand propensity is significantly increased. Aspartic acid ranks highest in terms of the nearest neighbour sensitivity of statistical weights, but in this case pPII increases by an average of 50% in the presence of unlike neighbours. Surprisingly, the pPII fraction of phenylalanine increases on average by 30%. On the contrary, the respective pPII sensitivities of serine, leucine and lysine are comparatively low (average decreases between 12% and 15%), again in line with Toal et al. Aspartic acid and serine itself as neighbour causes significant changes and rather position dependent changes. Like neighbours (i.e. in GKKG, GKKG, GDDG, GDDG, GRRG, GRRG and GLLG produce much more pronounced changes).52–54

In order to facilitate a thermodynamic understanding of the plots in Fig. S4–S7 and S8 depicts the Gibbs free energy difference between pPII and β-strand basins for different populations of the upper left quadrant of the Ramachandran plot. A comparison of these plots with the mole fractions listed in Table S2 reveal that a significant number of NNI induced population changes are associated with Gibbs energy changes of several kJ mol−1. For instance, serine and valine destabilize the pPII conformation of alanine, by Gibbs free energy changes of ca. 4 and 3 kJ mol−1 respectively.

Comparison of peptide and coil library Ramachandran plots

Fig. 2 and 3 show the Ramachandran plots of a selected number of investigated residue pairs in GXYG and all-XY/XY-all segments of the coil library (all indicates the averaging over all 20 neighbours). The plots of the remaining tetra peptides and coil library segments investigated in this paper are shown in Fig. S9–S16. In addition to the tetra-peptide plots, GXG plots for the respective target residues are shown, e.g. GAG for GAVG and all-AG for all-AY, where bold characters indicate the target residue.
image file: d5ra09727d-f2.tif
Fig. 2 Ramachandran plot of alanine residues in cationic GAG (A), cationic GAVG (B), in the coil library segments all-AG and all-AV (C and D), in cationic GLG (E), cationic GLLG (F) and in the coil library segments GL-all and LL-all (G and H). The tripeptide Ramachandran plots were created with the Gaussian parameters reported in ref. 70 The plots for GAVG and GLLG were produced for this study by an analysis of experimental data reported by Toal et al.,57 as described in the main text. The plots for A and L in the above coil library segments were created with files obtained from the website of the Dunbrack group with the kind permission of Dr Dunbrack.63

image file: d5ra09727d-f3.tif
Fig. 3 Ramachandran plot of the aspartic acid residue in cationic GDG (A), cationic GDDG (B), in the coil library segment GD-all and DD-all (C and D), in cationic GFG (E), cationic GDFG (F) and in the coil library segment GF-all and DF-all (F). The tripeptide Ramachandran plots were created with the Gaussian parameters reported in ref. 70 and 71, with slight modifications of the GDG mole fractions.53 The plots for GDDG and GDFG were created by using the respective Gaussian parameters reported in earlier papers.53 The plots for D and F in the above coil library segments were created with files obtained from the website of the Dunbrack group with the kind permission of Dr Dunbrack.63

A first superficial look at all the displayed Ramachandran plot reveals the following qualitative picture. In line with the above discussed statistical weight values of investigated tetrapeptides, nearest neighbour interactions vary the pPII↔β equilibrium, though to a different extent (vide supra). Moreover, for quite a substantial number of peptides, they increase the distance between the two basins along the φ-coordinate (GSAG, GDLG, GSLG, GLLG, GVLG, GSAG, GSKG, and GSVG). These changes do not correlate with population changes of the pPII and β-strand. Turn supporting conformations, which are significant in the Ramachandran plots of D and S containing peptides, are significantly varied by their neighbours. Generally, interactions between like residues in homo-dimers (GDDG, GDDG, GRRG and GRRG) are more pronounced than the ones in hetero-dimers. The coil dimer Ramachandran plots do not convey a similar story. While they clearly differ for different amino acid residues (just compare the all-XG plots), NNI induced changes seem to be rather limited. Most interestingly and also surprisingly, the data suggest a dominant population of the pPII basin for all residues with the exception of valine, for which the β-strand is more significantly populated, while the pPII basin has moved into a region which is generally associated with the ith residue of type II β-turns. The Ramachandran plots of most coil library dimers exhibit a significant distance between pPII and β-basins with the latter positioned outside of the canonical positions of residues in antiparallel and parallel β-sheets. This suggests a non-Arrhenius Kramers-type structure of the transition state.

The above description of peptide and coil Ramachandran is pretty much based on visual inspection. In what follows I use metrics introduced in the preceding section to provide a more quantitative comparison of Ramachandran plots.

Comparison of J-coupling constants

While the above discussed Ramachandran plots suggest significant differences between conformational distributions obtained from a Gaussian analysis of short peptides and those derived from coil library data of Ting et al., these discrepancies themselves do not per se show that peptide and coil library distributions are different, since the Ramachandran plots obtained for the former are model dependent. Fortunately, the Ramachandran plots of the coil library contain enough data points to allow for the calculation of ensemble averaged J-coupling constants. Fig. 4 displays the experimental and calculated J-coupling constants for a representative series of tetrapeptides (experimental and Gaussian analysis) and related XY dimers of the Ting et al. library.
image file: d5ra09727d-f4.tif
Fig. 4 Comparison of experimental, Gaussian modelling and coil library J-coupling constants of the indicated residues (by capital letters) with different neighbours in GXYG peptides (lower case for better identification). Lower case letters indicate the neighbours. Experimental values were taken from ref. 52 and 57 A: 3J(HNH), B: 3J(HNC’), C: 3J(HC’) and D: 1J(NCα).

Generally, the data in Fig. 4 reveal that Gaussian coupling parameter values are closer to experimental ones than coil library-based values. For 3J(HN,H), the differences between coil library and experimental values are most notable for AV, SA and DA pairs. Discrepancies are even more pronounced and numerous for the other three coupling constants of AV, SA, DA, DV, SV, DL, DK and SV. Coil values for the ψ-dependent 1J(NCα) are systematically lower than the experimental GXYG values which, as discussed further below, is indicative of a larger population of right-handed helix like conformations.

A visual inspection of the Ramachandran plots in Fig. 1, 2, and S6–S16 suggests that differences between corresponding coil library distributions are by far less significant than the ones between the Gaussian plots. This impression is put on a more quantitative basis by calculating the average experimental J coupling constants for the GXYG peptides with a common residue (e.g. GFAG, GAFG, GFDG and GDFG) and the corresponding coil library dimers (e.g. all-FA, AF-all, all-FD and DF-all). The respective values are listed in Table 1. First, a comparison with corresponding GXG values reveal residue specific differences for both data sets. Second, while differences between average coupling constant values are noteworthy, they are not as pronounced as the differences between the standard deviations, which for most of the considered sequences are larger for the experimental than for the coil library-based coupling constants. This result strongly suggests that compared with GXYG the side chain specificity of nearest neighbour interactions is significantly reduced in the Ting et al. coil library. It should be noted here that the term ‘standard deviation’ does not indicate that the averaged data set constitute a statistical ensemble in this case. It is solely used as a measure of the distribution width.

Table 1 Listing of average experimental (e)52–54,57 and calculated J-coupling constants of the Ting et al. coil library (c).63 The average was taken over the entire set of residue dimers in GXYG and coil library segments containing the indicated residue type. Coupling parameters of these residues in GXG peptides are listed in parenthesis for comparison70–72
  <3J(HNH)>e [Hz] <3J(HNHCα)>c [Hz] <3J(HNC’)>e [Hz] <3J(HNC’)>c [Hz] <3J(HC’)>e [Hz] <3J(HC’)>c [Hz] <3J(HNCβ)>e [Hz] <3J(HNCβ)>c [Hz]
A 7.03 ± 0.51 (6.11) 7.07 ± 0.54 1.15 ± 0.181 (1.18) 1.07 ± 0.15 2.39 ± 0.42 (2.02) 1.95 ± 0.09 2.12 ± 0.17 (2.32) 1.67 ± 0.1
D 7.2 ± 0.48 (7.44) 7.32 ± 0.12 0.8 ± 0.38 (1.19) 0.98 ± 0.16 2.79 ± 0.27 (3.24) 2.01 ± 0.14 1.43 ± 0.15 (1.25) 1.52 ± 0.06
S 6.82 ± 0.14 (6.99) 7.29 ± 0.15 0.79 ± 0.22 (0.87) 1.27 ± 0.07 3.26 ± 0.31 (2.33) 2.09 ± 0.07 1.78 ± 0.16 (1.71) 1.42 ± 0.08
K 6.82 ± 0.34 (6.63) 7.39 ± 0.2 0.98 ± 0.02 0.96 ± 0.12 2.37 ± 0.59 1.95 ± 0.08 1.67 ± 0.13 1.51 ± 0.11
L 7.63 ± 0.27 (6.78) 7.5 ± 0.11 0.74 ± 0.15 (0.84) 0.71 ± 0.04 2.72 ± 0.49 (2.45) 2.1 ± 0.04 1.64 ± 0.07 (1.75) 1.59 ± 0.04
V 7.94 ± 0.55 (7.46) 8.0 ± 0.17 0.89 ± 0.17 (0.91) 1.05 ± 0.08 2.67 ± 0.5 (2.33) 2.21 ± 0.07 1.45 ± 0.11 (1.59) 1.31 ± 0.08
F 7.33 ± 0.46 (7.45) 7.62 ± 0.11 0.82 ± 0.32 (0.74) 1.45 ± 0.09 1.96 ± 0.5 (2.2) 2.2 ± 0.1 1.59 ± 0.42 (1.79) 1.48 ± 0.19


Comparison of mesostate populations

In the Gaussian model conformational propensities are expressed in terms of statistical weights (mole fractions) of the utilized Gaussian sub-distributions. Within the context of the model this makes sense thermodynamically, since these Gaussian distributions correspond to a classical harmonic approximation of the respective potential energy minimum. However, this parametrization does not allow a quantitative comparison with Ramachandran plots from molecular dynamics simulations and coil libraries. As shown earlier for the use of the former, calculating the populations of mesostates (Fig. S2) provides a handy solution.44

The mesostate populations of the investigated residues in peptides and the Ting et al. coil library are shown in Fig. S17. The populations of right-handed helices and β-turn I/IIi+2 conformations as well as the ones of antiparallel and parallel β-strand mesostates have been added up. For alanine, pPII dominates all distributions. For GAVG, the pPII population is significantly reduced, but it is still slightly larger than the combined β-strand confirmations, while the respective mole fractions are practically identical (Fig. S4). The pPII mesostate of aspartic acid is comparatively weakly populated in GDG. Both, tetra-peptide and coil library data indicate that nearest neighbour interactions significantly increase the pPII population at the expense of asx-turns and βap. pPII is dominantly populated in the distributions of lysine and leucine; some exceptions notwithstanding (GSKG, GVKG, GLLG, GALG and GVLG). Most of the serine distributions exhibit comparable sampling of pPII and βap while most of the valine distributions are dominated by βt and βap (exceptions are DV-all, all-VL and KV-all). The sampling of the combined mesostate of right-handed helices and β-turn I/IIi+2 in the coil library distributions is significantly more pronounced than it is in the investigated peptides. Discrepancies between mesostate populations and the statistical weights displayed in Fig. S4–S7 result from the fact that the Gaussian sub-distributions overlap and may contribute to more than one mesostate. This is certainly true for many pPII and β-strand Gaussians with relatively high ψ-values, which overlap with the ε-mesostate. Generally, pPII sub-distributions may only contribute to the βt mesostate. Since some ψ-values of residues decrease in the presence of non-glycine neighbours, the pPII sampling can increase even if the statistical weight of the sub-distribution decreases. Alternatively, a narrowing of e.g. the pPII sub-distribution reduces the overlap with other mesostates thus increasing the population of its own mesostate.

Fig. 5 compare the changes of mesostate populations in the investigated tetra-peptides and corresponding coil library sequences caused by upstream and downstream neighbours. For the calculation of these changes, GXG peptides were used as a reference for the tetra-peptides, while all-XG and GY-all were used for all-XY and XY-all.


image file: d5ra09727d-f5.tif
Fig. 5 Plot of differences between the population of selected mesostates in GXYG and GXG (GYG) peptides and between populations in all-XY and all-XG as well as between XY-all and GY-all coil library segments. Note that the populations of mesostates associated with the parallel and antiparallel β-sheet structures as well as with helical and turn conformations have been added up, respectively.

The bar diagrams in Fig. 5 reveal a complex picture, from which I distil only the most significant differences and similarities between peptide and coil distributions. For alanine in tetra-peptides all pPII populations decrease, in particular in the presence of serine and valine. This observation is in line with the behaviour of the corresponding Gaussian statistical weights (Fig. S4), but it is in contrast to the pPII changes in the coil library plots of AF-all, AL-all, and all-AV. Similar discrepancies exist at least partially for the series of serine, lysine and valine. However, for aspartic acid and phenylalanine, pPII changes occur in the same direction. The changes obtained for aspartic acid are peculiar, in that nearly all pPII changes are positive, with all-DA as sole exception. For phenylalanine, the positive pPII changes are pronounced for FA and FD. In the coil library, the changes move in same direction, but to a much lesser extent.

As one can infer from Fig. S17, the combined population of the right-handed helical and type II/I β-turn mesostates is much more pronounced in nearly all coil library Ramachandran plots even though it is generally reduced by nearest neighbour interactions with of non-glycine neighbours.

It should be noted that the displayed changes do not add up to zero for the following reasons. First, the Ramachandran plots of GxG peptides exhibit substantial sampling of the so-called ε-region, which lies right above the lower boundary of the lower left quadrant. The population of this region is generally reduced in tetra-peptides. Second, the Ramachandran plots of coil library residues are more inhomogeneous than the Gaussian plots. While they display relatively sharp maxima at the basin positions, the transition regions are more populated in the coil library distributions which are not covered by the mesostates in Fig. S2.

Taken together, a comparison of conformational mesostate occupations in GXYG and in coil libraries reveal major differences. Overall, the nearest neighbour induced changes in the later seem to be less pronounced. Interestingly, however, they are also sometimes qualitatively different. This notion particularly applies to pPII the population of which is enhanced by aliphatic neighbours in coil libraries (V and L), whereas the same type of neighbour decreases the pPII population in GXYG. Right-handed helical and type I/II β-turn i+2 conformations is generally more populated in the Ting et al. library.

Comparison of Hellinger distances and configurational entropies

It should be noted that observed changes of statistical weights convey only a part of the picture. As one can infer from Table S2, noticeable changes of the φ-position and to a more limited extend of the ψ-position were obtained for all investigated series, including the leucine and serine series, for which the propensity changes are moderate. As a consequence, related J-coupling constants are different (Table S1). To illustrate this point, let us look on the propensities and 3J(HNH) value of some leucine residues: The statistical weights for pPII of GALG, GLLG, GVLG and GDLG are 0.42, 0.43, 0.4 and 0.49. Apparently, they are very similar. The first three values are identical within their statistical uncertainty, which generally does not exceed ± 0.5.64 The corresponding 3J(HNH) values are 7.4, 7.62, 8.15 and 6.78 Hz. These variations lie clearly outside of the statistical error limits (<±0.2 Hz). As shown below, changes of basin positions lead to rather large values of the Hellinger distance.

Eqn (3) was used to calculate the Hellinger distance between the Ramachandran plots of GXG (GYG) and GXYG/GXYG for all investigated amino acid residue targets. In addition, Hellinger distances between plots of all-XY/XY-all and all-XG/GY-all were calculated. As clearly indicated by the bar plots in Fig. 6 the Hellinger values obtained for the Ramachandran plots of the tetra-peptides exceed by far those of the Ting et al. library. With the exception of the serine series, all Hellinger values of tetra-peptide residues indicate a moderate or large dissimilarity between corresponding Ramachandran plots of tetra- and tripeptides. On the contrary, the overwhelming majority of the values obtained for the coil library sequences lie in the moderate similarity region. As indicated above, Hellinger distances are very sensitive to changes of basin positions and respond only moderately to changes of basin populations. Thus, the data plotted in Fig. 6 suggest minimal changes of basins in the coil library, in line with what one would infer from a visual inspection of Fig. 1, 2 and S9–S16. Altogether, the plots in Fig. 6 clearly suggest that nearest neighbour induced changes are much less pronounced in the Ting et al. library than in the investigated GXYG peptides.


image file: d5ra09727d-f6.tif
Fig. 6 Hellinger distances and configurational entropy contribution to Gibbs free energy changes at room temperature due to nearest neighbour interactions for the indicated amino acid residues in GXYG peptides (black) and corresponding all-XY and XY-all segments of the Ting et al. library (red).

Since nearest neighbour interactions affect Ramachandran distributions, the configurational entropy changes as well. The corresponding changes of the Gibbs energy contribution are also plotted in Fig. 6. Note that the depicted values were calculated for the indicated residue, not for the entire dimeric XY segments. Negative values for a distinct residue X(Y) imply that its Ramachandran plots is less entropic in the presence of the neighbour Y. The displayed TΔS values suggest a by far more complex picture than the Hellinger distances. While entropy changes in the tetra-peptides are dominant for some residues, they are rather small for others (FA, AV, SA, DF, FD, KL, KV, KK, KK and LL). Even more surprising is the observation that entropy changes of residues in GXYG and coil library dimers carry opposite signs for all serine neighbours and nearly all aspartic acid neighbours (SL and DD being an exception). Opposite behavior is also observed for e.g. KV, VL, SK, and DF. These findings suggest that relatively short side chains with hydrogen bonding capacity such as aspartic acid and serine behave differently as neighbours and as targets in short peptides and in coil libraries. For most GXYG peptides, nearest neighbour interactions decrease the entropic contribution to the Gibbs energy of their neighbours by several hundreds of Joule mol−1.

Thus, these data suggest that a coil state of a proteins might carry less configurational entropy than assumed based on the (local) random coil model. However, the mixture of entropy increases and decreases obtained for the Ting et al. coil library does not allow for such a clear assessment. A more detailed discussion of this issue is given in the subsequent paragraph.

Discussion

Nearly thirty years ago an analysis of 3J(HNH) coupling constants of unfolded proteins lead to the conclusion that Flory's isolated pair hypothesis is not a valid supposition.73 Subsequently, the influence of nearest and non-nearest neighbours on the conformational propensity was mostly inferred from the analysis of coil libraries46,49,50,60,62,63,68,74,75 and from computational work on model peptides.69,75–77 The general view that emerged from these analyses suggested that aliphatic and aromatic neighbours decrease a residue's pPII propensity. For alanine, Avbelji and Baldwin attributed this to conformational dependent changes of the solvation free energy.76 Computational work of the Sosnick group revealed that the observed nearest neighbour interactions involve correlations between the dynamics of interacting residues, which implies that these interactions depend on the adopted conformation of the respective neighbours.75 This is an important information since a violation of the isolated pair hypothesis requires such correlation effects. A conformational detuning that depends solely on the side chain properties of neighbours would not eliminate the additivity of conformational and solvation free energies and would therefore not lead to a violation of the isolated pair hypothesis.

Experimentally, the issue of nearest neighbour interactions was early recognized by the NMR community. Chemical shifts and to a lesser extent 3J(HNH) coupling constants in short glycine-based host-guest peptides were considered as ideal references for a random coil conformation. However, the context dependence of these parameters in proteins led to the conclusion that their dependence on the choice of the neighbour has to be taken into account.78,79 All attempts to resolve this issue without the need of investigating a total of at least 400 amino acid pairs were based on the assumption that nearest neighbour effects depend solely on the neighbour, not on the targeted residue itself. This notion implies for instance that valine as neighbour has the same influence on alanine and aspartic acid. Along this way, Schwarzinger et al. investigated the influence of the X-residue on the adjacent glycine residues in Ac-GGXGG-NH2.80 The use of glycine as representative target was later challenged by Kjaergaard and Poulsen, who introduced glutamine as a more useful alternative.81

The first experimental work aimed at exploring in more detail the residue specific nearest neighbour influence on chemical shifts and 3J(HNH) in short peptides was carried out by Jung et al.82 These authors undertook the incredible task of measuring the 3J(HNH) coupling constants and HN and H chemical shifts of 361 Ac-XY-NH2 blocked dipeptides. They subjected their data solely to a statistical analysis in that they calculated the average coupling constant and the corresponding standard deviation for a given target amino acid residue. Only coupling constants that were outside of this standard deviation were considered as reflecting nearest neighbour coupling. However, using the standard deviation as a criterion for statistical significance of differences between J-coupling constant values is problematic in this context unless it reflects the statistical errors of the experimental values, which are generally small.57 Nevertheless, it should be noted that neighbour induced changes of the J-coupling constant within a given target series is less pronounced for the dipeptide data set than it is for GXYGs. Significant differences between corresponding coupling constants are noteworthy for valine and leucine, mostly at the C-terminal position (AL, AV, VL, KV, with LL as an exception; cf. Fig. S18). One might be tempted to rationalize such differences in terms of end effects, i.e. different influences of the terminal NH3+ and COOH groups of GXYG and the blocking groups the dimers used by Jung et al., but such an explanation seems to be at variance with the experimentally based notion that the conformational propensity of blocked dipeptides with a single aliphatic residues and two peptide groups resemble the one of the respective guest residue in GxG.43 However, one has to note a peculiarity of the peptides used by Jung et al. Generally, dipeptides contain a single amino acid flanked by two peptide groups with methyl groups attached to the carbonyl carbon and the amide nitrogens. They can be regarded as suitable mimics of glycine. A corresponding system with two residues would actually be a blocked tripeptide with three peptide groups and terminal methyl groups. On the contrary, the C-terminal residue of the peptides used by Jung et al. are linked to a primary amide which differ from secondary amides. Their electronic structures can be expected to be different. In water, the amide I wavenumber (predominantly a C[double bond, length as m-dash]O stretching mode) of a primary amide is significantly blue-shifted by approximately 20 cm−1 with respect to its position in secondary amide spectra (acetamide versus N-methylacetamide).83 This suggests a stabilisation of the C[double bond, length as m-dash]O double bond over C[double bond, length as m-dash]N in the peptide's resonance structure.

The study of Jung et al. was followed by the above-described work of Toal et al.57 on GXYG peptides dealt within this paper. Compared with the former, it is rather limited in scope (15 peptide, later augmented by another 8 GXYG and 2 GXYZG peptides by Milorey et al.52–54). However, the data basis for each peptide is much broader (four φ- and one ψ-dependent scalar coupling constants and four different amide I′ profiles in different vibrational spectra). This large data set allowed for the construction of Ramachandran plots in terms of a superposition of two-dimensional Gaussian sub-distributions. While certainly still a simplification, the conformational analysis of GxG peptides with the Gaussian model yielded a far better agreement with experimental data than molecular dynamics simulations with rather different force fields and water models.44,84 Hence, it is justified to use the obtained GXYG Ramachandran distributions as a suitable approximation for a comparison with data sets obtained from restricted coil libraries. The use of the latter is based on the assumption that a large ensemble of residues located in ordered but not regularly structured segments of proteins produce conformational distributions representative of the coiled state of unfolded and disordered peptides and proteins. Restrictions to which even residues in structured loops of proteins are subjected are being thought to average out.

One of the main problems one encounters with the use of coil libraries is that even the enormous number of protein structures used for assembling the latest version of coil libraries insufficient to even explore the total of 8000 XYZ-segments which would account for exploring nearest neighbour interactions between all twenty amino acid residues. In order to address this issue, researchers average over all down- and upstream neighbours to obtain a sufficient number of entries to their Ramachandran plots. Bax and colleagues used this strategy to obtain a set of secondary chemical shifts and scalar coupling constants considered as a good approximation for the coiled state of peptides and proteins.59,60 Ting et al. dealt with the limitations of their data set by an interpolation procedure called the hierarchical Dirichlet process.63,85 Thus, they could produce Ramachandran plots with a 5° resolution in φ and ψ direction. Even with this approach they could not explore the full set of XYZ combinations, but they could restrict residue averaging to either the upstream or downstream neighbour of a selected target residue, which lead to Ramachandran plots for XY-all and all-XY.

The comparison of GXYG and coil library distributions revealed similarities, but also significant differences. In line with results on alanine containing peptides, the coil library confirms the comparatively high propensity of this residue type for pPII. Qualitatively, the coil library distribution also exhibits the relatively low pPII and the high turn (right-handed helical and type II/II i+2) propensities of aspartic acid. The relatively high β-strand propensity of valine is also reproduced. However, these similarities are more than outweighed by differences between the influence of nearest neighbour interactions. They are discussed in detail in the following.

In order to shed some light on the differences between tetra-peptide and coil library distributions, the average population of the combined mesostates of right-handed helical and the very similar type I/II β-conformation in GXYG and the Ting et al. library were compared to each other. Since the respective numbers are very low for the former the averaging was carried out over all neighbours in the investigated GXYG set. The result of this analysis is visualized by the upper plot in Fig. 7. The regression analysis yielded a slope of 0.433 ± 0.12 and a Pearson correlation coefficient value of 0.81. This value is indicative of a strong correlation, but one should keep in mind that the data basis of the plot is limited. The obtained correlation nevertheless indicates that the population of the right-handed helical/type I/II β region, which are merged in many Ramachandran plots, is by about a factor of two higher in the coil library, in spite of the relative to glycine negative influence of neighbours. Such a population enhancement is not surprising. One of the limitations of the use of short peptides as model systems for a statistical coil state of a protein is that the population of the right-handed helical region is underestimated compared with the situation in polypeptides and proteins. All theories of helix [left over right harpoons] coil transitions invoke a cooperative process where the helix formation goes through a nucleation process. The mole fraction of right-handed helices in GXG peptides or blocked dipeptides should be closely related to the nucleation parameters σ and v of the Zimm-Bragg and Lifson–Roig parameters, respectively.70 With an increasing length of an oligo-peptide, the formation of short transient helices becomes more likely. The total fraction would depend on the respective values of the helix propensity parameters s and w. For oligo-alanines, the increase of the helical fraction with the number of residues has been predicted by MD simulations,38,86 while experimental results are somewhat conflicting.87,88 Generally, it is permissible to state that the increased population of right-handed helices and very similar turn-supporting structures in the coil library of Ting et al. is something to be expected. The above scaling factor of 2.3 should be taken as an average subject to the above indicated significant standard deviation of ±0.6.


image file: d5ra09727d-f7.tif
Fig. 7 Correlation plots comparing mesostate populations in GxyG and the Ting et al. coil library.63 Upper panel: Comparison of the combined population of right-handed α-helix and type I/II β-turn i+2 mesostates in GXYG (p) and the coil library (c) averaged over all target neighbours of A, V, S, D, L and F. Lower panel: Comparison of the combined population of pPII and all β-strand mesostates in GXYG (p) and the coil library (c). The solid lines result from a regression analysis described in the text.

If one ignores other less populated β- and γ-turn structures as well as asx-turns and left-handed helices the result of the above analysis suggests most of the differences between nearest neighbour interactions in GXYG and coil library segments involve redistributions between pPII and β-strand. If this supposition is correct, the total fraction of extended mesostates in the two systems should exhibit some correlation. The lower panel plot in Fig. 7 compares the two fractions for all the pairs investigated experimentally. The Pearson coefficient of 0.64 still indicates a medium level correlation. However, it should be noted that the data basis for this correlation plot is much stronger than the one used for the comparison of right-handed helical populations. The slope of the correlation plot for the extended structures is 0.58 ± 0.15, which suggests a reduced variability of the extended structure population in the coil library.

The plots in Fig. 6 reveal that nearest neighbour interactions in the coil library enhance the pPII population more than they do in the investigated GXYGs. The AV dimer in GXYG and all-AV exemplifies this difference, in that valine substantially reduces the pPII population of alanine in GAVG, while it enhances it in all-AV. In order to check whether an upstream valine would have the same influence on alanine we analysed the mesostate population of alanine in VA-all. The obtained populations of A are nearly identical with the ones of GA-all. This result suggests that at least for alanine the apparent influence of upstream neighbours is nearly eliminated by the averaging over all downstream neighbours. I checked this supposition by calculating the mesostate populations for all-X-all for all X-residues investigated in this study. For most of the later, the differences between all-X-all and GX-all were very small (<5%). Only X = D showed a slight enhancement of pPII at the expense of type I/II β-turn.

In order to illustrate the possible influence of averaging over downstream neighbours I performed the following very simple calculation. It was assumed that the target residue in XY-all can adopt only pPII and β-strand conformations. The respective mole factions were calculated as follows:

 
image file: d5ra09727d-t7.tif(7)
 
image file: d5ra09727d-t8.tif(8)
where the partition sum Zj is written as:
 
image file: d5ra09727d-t9.tif(9)

ΔGY denotes the intrinsic Gibbs free energy difference between pPII and β-strand of residue Y, dGXY is the free energy change caused by interaction with the upstream residue X and dGXj represents the free energy change caused by interaction with the jth downstream residue. N = 20 is the number of residues. The distribution of interaction energies dGXj was heuristically described by a Gaussian function:

 
image file: d5ra09727d-t10.tif(10)
j0 is the position of the maximum the choice of which is arbitrary, σj is the standard deviation. All free energies are written in units of RT. Two scenarios, one with a high (0.67) and the other one with a low intrinsic pPII propensity (0.4) were considered. The correspond free energy differences are 0.7 and −0.4. Next, a dGXY value of 0.3 was added to account for the general tendency of downstream neighbours. It reduced the corresponding pPII population to 0.59 and 0.33 (vide infra). In a third step, the Gaussian term in eqn (10) was added. A variation of the free energy term and of the standard deviation revealed that dG0XY = -0.8, j0 = 5 and σj = 5 reproduced the assumed intrinsic propensities for both rather different scenarios. The results of this very heuristic calculation show that the averaging of Ramachandran plots of a distinct Y-residue over all natural downstream neighbours can practically neutralize the weaker nearest neighbour interactions with the upstream residue X. The reader should note that the rather large standard deviation of the utilized Gaussian distribution reflects an averaging over a relatively wide range of negative free energies.

The results of the above analysis suggest an asymmetry between nearest neighbour interactions in the coil library in that the interactions with upstream neighbours dominate over the ones with downstream neighbours. This notion is corroborated further by the observation that out of 26 residue pairs for which the coil library data indicate a nearest neighbour induced increase of the pPII population, 20 are of the all-XY type, while 6 out of 7 neighbours that reduce the pPII population of the target residue are at the upstream position (XY-all type).

The above unbalance between interactions with down- and upstream neighbours is still present in GXYG peptides, but to a lesser extent. In most peptides, nearest neighbour interactions reduce the pPII population. In 12 out of 19 cases this involves the respective upstream neighbours. pPII is enhanced in 14 cases, from which 9 involve downstream neighbours.

Another peculiarity of the coil library distributions is the rather large separation between the visible pPII and β-strand peaks in nearly all distributions. While the respective distances are often larger in GXYG than in respective GXG/GYG plots, they are not really close to the ones in the coil library. Fig. S19 illustrates this point by depicting the φ dependence of the probability density for the serine residue in all-SL at ψ = 150°. The plot actually reveals the existence of an additional very shallow maximum at 110° and of course the aforementioned dominance of pPII which is closer to the one in the corresponding GSLG plot (Fig. S11). Similar tendencies can be seen in Ramachandran plots of the coil library used by the Sosnick group.58 Interestingly, a very detailed coil library investigation of Jiang et al. revealed that the distance between the two basins depends on the χ1 angle of the side chain, where the g rotamer seems to produce the largest distance (their plots also represent an average over all neighbours in the library).89 Hence, the discrepancy between coil library and peptide basin position noted in this study might actually reflect different populations of side chain rotamers. Needless to say, that this would further complicate the picture. Sosnick and coworkers actually provided computational evidence for coupling between side chain and backbone dynamics.49

In view of the differences between the neighbour dependence of residues in GXYG peptides and respective coil library dimers the question arises which of the two data sets are more useful for the identification of coil states. Being able to do so is necessary for the identification of residual structures in unfolded and disordered proteins.59,60,90,91 The availability of a large data set for all natural amino acid residues in the coil library of Ting et al. is a clear advantage over the limited number of GXYG peptides that have thus far been investigated. The comparison of coil library and GXYG distributions has thus far revealed that averaging over neighbours might artificially diminish nearest neighbour effects and of course eliminates the side chain specificity of such interactions for a target residue. It is frequently believed that the differences caused by the latter are statistically insignificant. To check whether such a notion has merits I had a closer look at the reported 3J(HNH) constants of two IDPs that are generally thought to be in an ideal coil state, namely the amyloid peptide Aβ1–40 (ref. 92) and α-synuclein.93 Table 2 list the J-coupling constants of valine, phenylalanine, glutamic acid and alanine. The listed coupling constants of α-synuclein were taken from the C-terminal segment between residue 90 and 140, which does not exhibit any transient secondary structures. Note that experimental uncertainties of 3J(HNH) are very small and do rarely exceed 0.1 Hz.57,70,72 Since the values for α-synuclein had to be taken from a graphic representation, the corresponding error might be slightly larger. The data listed in Table 2 show that the coupling constant of valine varies within a range of 1.2 Hz. The corresponding intervals of alanine, glutamic acid and phenylalanine are 0.8, 1.4 and 0.7 Hz. Interestingly, while coupling constant values of valine scatter around the corresponding GVG value, the coupling constants for A and E are mostly lower, in line with an earlier observation for the tau protein.56,91 This might reflect a higher degree of sampling of right-handed helical conformations. The Ting et al. library value for the corresponding all-F-all distributions is close to the average value of F (7.46 Hz). For all-A-all and all-V-all the Ting library values are significantly larger than the average (5.2 and 7.55 Hz, respectively).

Table 2 List of 3J(HNH) constants in units of Hz of the indicated amino acid residue in 11–40,92 2α-synuclein (residues 89–140),93 3 GXG peptides70–72 and the coil libraries of 4Ting et al.63 and 5Shen et al.60 Values for the latter were obtained by averaging over both neighbours
Valine Phenylalanine Glutamic acid Alanine
6.78 (EVH)1 7.02 (EFR)1 6.3 (EF)1 5.6 (FAE)1
8.3 (LVF)1 7.7 (VFF)1 6.29 (YEV)1 5.5 (GAI)1
7.49 (MVG)1 7.74 (FFA)1 5.95 (AED)1 5.1 (SAA)2
7.94 (VVF)1   6.8 (NEE)2 5.0 (AAA)2
7.94 (GVV)1   6.6 (EEG)2 5.0 (AAT)2
8.02 (VVF)1   6.2 (QEG)2 5.9 (EAT)2
7.2 (MVD)2   7.4 (YEM)2  
7.46 (GVG)3 7.45 (GFG)3 6.99 (GEG)3 6.1 (GAG)
8.1 (all-V-all, Ting et al.)4 7.5 (all-F-all, Ting et al.)4   6.64 (all-A-all, Ting et al.)4


Summary and outlook

This paper presents a comparison of the influence of nearest neighbours on a selected number of amino acid residues in cationic tetra-peptides and the coil library of Ting et al. The way how the latter has been obtained allowed a direct comparison of amino acid pairs and the different influence of upstream and downstream neighbours. However, while these pairs are flanked by glycines in the tetra-peptides, averages were taken over all neighbours of the consider pairs in the coil library. As a consequence, either the influence of the upstream or of the downstream neighbour is explicitly taken into account. Ramachandran plots were compared by means of different metrics; i.e. mesostate populations, Hellinger distances, configurational entropy and several scalar J-coupling constants. Differences between GXYG and coil library distributions are noteworthy. Generally, nearest neighbour interaction induced changes are more pronounced in GXYG. For quite a noticeable number of pairs, the neighbours reduce the pPII propensity of amino acids in GXYG, while they increase it in the coil library. The averaging over either upstream or downstream neighbours for the construction of coil libraries reduce the specific influence of opposite neighbours. This result led to the conclusion that the residue specificities of nearest neighbour interactions should not be ignored. As a consequence, the identification of the coil state of a protein and/or an IDP of IDR is a complicated task. If one took short peptides as a starting point, one would have to investigate a total of 8000 GXYZG peptides just for a thorough assessment of nearest neighbour interactions. The exploration of non-nearest neighbour interactions would require an even larger number of hexa- and even hepta-peptides.

In principle, conducting molecular dynamics simulations could be a promising alternative venue. However, moving along this path requires combinations of force field and water models, which can reproduce intrinsic propensities as well as nearest neighbour interactions. Particularly regarding the latter, current approaches are still in their infancy.44,45,51,54,94 The GXYG peptides used for this study should be used as benchmarks for the necessary force field development.

Why is it necessary to gain a comprehensive picture of nearest neighbour interactions? First, it would be necessary to obtain an accurate, residue specific picture of the statistical coil state of proteins, which can be used as a baseline for identifying the occurrence transient secondary structures. In the field of NMR spectroscopy, the Blackledge group developed the so-called flexible meccano program, which they used to extract a set of possible conformational ensembles for a given polypeptide/protein sequence by exploiting conformational distributions of amino acid residues in a restricted coil library.90,95–97 Such ensembles are then used as a starting point for the structure analysis of IDPs by means of the ASTEROIDs algorithm (a selection tool for ensemble representations of intrinsically disordered states), which adjusts the conformational ensemble to constraints imposed by secondary chemical shifts, J-coupling constants (mostly 3J(HNHC⍺)), residual dipole coupling and experimentally obtained radii of gyration. From the description of flexible meccano in the literature97 it is obvious that nearest neighbour interactions are solely considered by averaging. As argued in an earlier paper, this might lead to misidentification of transient secondary structures in IDPs such as the tau protein.56 Second, correlations between conformational dynamics of residues reduce the configurational entropy of unfolded proteins to a significant extent.46 Being able to quantify the latter is essential for an understanding of the thermodynamics of protein folding and disorder to order transitions. The effect of correlation dynamics on the entropy of GXYG is illustrated in Table S4. It compares the sum of individual configurational entropies of a selected set of X and Y with the total configurational entropy of XY calculated with an interaction model described in ref. 52 and 55 Most of the obtained values (GAVG and GALG being an exception) are negative, indicating entropy reduction. The majority of the respective room temperature Gibbs energy values are in 102 J mol−1 range. Such values could become much larger for longer peptides and even proteins. GFDG stands out with an entropy reduction of 1.54 kJ mol−1. Generally, the values are lower than the ones obtained from MD simulations of Baxa et al., who reported NNI reported entropy changes in the range of −1.5 and 2.1 kJ/residue for the unfolded state of ubiquitin.49,50 However, their results and the ones presented in this study strongly suggest that the assumption of additivity of thermodynamic parameters (i.e. conformational entropy, free Gibbs energy of solvation, etc.) can now longer be considered as valid.

Author contributions

RSS is the sole author of this article. He performed all the data analyses described therein and wrote the paper.

Conflicts of interest

The author does not declare any conflict of interest.

Data availability

All data analysed in this article have been published and are therefore available. Reported secondary data are either listed in the Tables in the main manuscript and the supplementary information (SI) or will be available from the author upon request. Supplementary information is available. See DOI: https://doi.org/10.1039/d5ra09727d.

Acknowledgements

I would like to thank Dr Roland L. Dunbrack from the Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, for providing me access to the files containing the data points of the Ramachandran plots reported in Ting et al. Dr Siobhan E. Toal from Rowan University has checked the writing that concerned the content of ref. 57.

References

  1. V. N. Uversky, Why Are We Interested in the Unfolded Peptides and Proteins, Schweitzer-Stenner R., ed. Hoboken, John Wiley & Sons, 2012, pp. 3–54 Search PubMed.
  2. V. N. Uversky, C. J. Oldfield and A. K. Dunker, Intrinsically disordered proteins in human diseases: Introducing the D 2 concept, Annu. Rev. Biophys., 2008, 215–246,  DOI:10.1146/annurev.biophys.37.032807.125924.
  3. J. Habchi, P. Tompa, S. Longhi and V. N. Uversky, Introducing Protein Intrinsic Disorder, Chem. Rev., 2014, 114(13), 6561–6588,  DOI:10.1021/cr400514h.
  4. U. Jakob, R. Kriwacki and V. N. Uversky, Conditionally and transiently disordered proteins: Awakening cryptic disorder to regulate protein function, Chem. Rev., 2014, 6779–6805,  DOI:10.1021/cr400459c.
  5. V. N. Uversky, C. J. Oldfield and A. K. Dunker, Showing your ID: Intrinsic disorder as an ID for recognition, regulation and cell signaling, J. Mol. Recognit., 2005, 18(5), 343–384,  DOI:10.1002/jmr.747.
  6. A. K. Dunker, M. S. Cortese, P. Romero, L. M. Iakoucheva and V. N. Uversky, Flexible nets: The roles of intrinsic disorder in protein interaction networks, FEBS J., 2005, 5129–5148,  DOI:10.1111/j.1742-4658.2005.04948.x.
  7. A. Deiana, S. Forcelloni, A. Porrello and A. Giansanti, Intrinsically disordered proteins and structured proteins with intrinsically disordered regions have different functional roles in the cell, PLoS One, 2019, 14(8), e0217889,  DOI:10.1371/journal.pone.0217889.
  8. A. Srivastava, D. Yesudhas, S. Ahmad and M. M. Gromiha, Understanding disorder-to-order transitions in protein–RNA complexes using molecular dynamics simulations, J. Biomol. Struct. Dyn., 2022, 40(17), 7915–7925,  DOI:10.1080/07391102.2021.1904005.
  9. A. Soranno, Physical basis of the disorder-order transition, Archives of Biochemistry and Biophysics, Academic Press Inc., 2020, p. 108305,  DOI:10.1016/j.abb.2020.108305.
  10. S. Hadži and J. Lah, The free energy folding penalty accompanying binding of intrinsically disordered α-helical motifs, Protein Sci., 2022, 31, e4370,  DOI:10.1002/pro.4370.
  11. C. J. Oldfield, B. Cue, A. K. Dunker and V. N. Uversky. Binding Promiscurity of Unfolded Peptides, in Schweitzer-Stenner R., ed. Protein and Peptide Folding, Misfolding and Non-folding, Hoboken, Wiley & Sons, Inc, 2012, pp. 241–78 Search PubMed.
  12. A. K. Dunker, C. J. Brown, J. D. Lawson, L. M. Iakoucheva and Z. Obradovic´, Current Topics Intrinsic Disorder and Protein Function, Biochemistry, 2002, 41(21), 6543–6582,  DOI:10.1021/bi012159.
  13. A. Mohan, C. J. Oldfield, P. Radivojac, V. Vacic, M. S. Cortese and A. K. Dunker, et al., Analysis of Molecular Recognition Features (MoRFs), J. Mol. Biol., 2006, 362(5), 1043–1059,  DOI:10.1016/j.jmb.2006.07.087.
  14. O. M. Morris, J. H. Torpey and R. L. Isaacson, Intrinsically disordered proteins: modes of binding with emphasis on disordered domains, Open Biol., 2021, 11(10), 210222,  DOI:10.1098/rsob.210222.
  15. R. Dumarieh, D. Lagasca, S. Krishna, J. Kragelj, Y. Xiao and S. Ansari, et al., Structural Context Modulates the Conformational Ensemble of the Intrinsically Disordered Amino Terminus of α-Synuclein, J. Am. Chem. Soc., 2025, 147(14), 11800–11810,  DOI:10.1021/jacs.4c15653.
  16. A. K. Dunker and Z. Obradovic, The protein trinity - Linking function and disorder, Nat. Biotechnol., 2001, 19(9), 805–806,  DOI:10.1038/nbt0901-805.
  17. C. J. Oldfield, J. Meng, J. Y. Yang, M. Q. Yang, V. N. Uversky and A. K. Dunker, Flexible nets: disorder and induced fit in the association of p53 and 14-3-3 with their partners, BMC Genomics, 2008, 9, 1–20 CrossRef PubMed.
  18. C. M. Dobson, Protein misfolding, evolution and disease, Trends Biochem. Sci., 1999, 24(9), 329–332,  DOI:10.1016/S0968-0004(99)01445-0.
  19. N. Balasco, C. Diaferia, G. Morelli, L. Vitagliano and A. Accardo, Amyloid-Like Aggregation in Diseases and Biomaterials: Osmosis of Structural Information, Front Bioeng Biotechnol, 2021, 9, 641372,  DOI:10.3389/fbioe.2021.641372.
  20. A. Avni, H. M. Swasthi, A. Majumdar and S. Mukhopadhyay, Intrinsically disordered proteins in the formation of functional amyloids from bacteria to humans, Prog Mol Biol Transl Sci, 2019, 166, 109–143,  DOI:10.1016/bs.pmbts.2019.05.005.
  21. T. Mittag and J. D. Forman-Kay, Atomic-level characterization of disordered protein ensembles, Curr. Opin. Struct. Biol., 2007, 17(1), 3–14,  DOI:10.1016/j.sbi.2007.01.009.
  22. R. K. Das and R. V. Pappu, Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of appositely charged residues, Proc. Natl. Acad. Sci. U. S. A., 2013, 110, 13392–13397 CrossRef CAS PubMed.
  23. A. S. Holehouse, K. Garai, N. Lyle, A. Vitalis and R. V. Pappu, Quantitative assessments of the distinct contributions of polypeptide backbone amides versus side chain groups to chain expansion via chemical denaturation, J. Am. Chem. Soc., 2015, 137(8), 2984–2995,  DOI:10.1021/ja512062h.
  24. A. H. Mao, S. L. Crick, A. Vitalis, C. L. Chicoine and R. V. Pappu, Net charge per residue modulates conformational ensembles of intrinsically disordered proteins, Proc. Natl. Acad. Sci. U. S. A., 2010, 107(18), 8183–8188,  DOI:10.1073/pnas.0911107107.
  25. E. W. Martin, A. S. Holehouse, C. R. Grace, A. Hughes, R. V. Pappu and T. Mittag, Sequence Determinants of the Conformational Properties of an Intrinsically Disordered Protein Prior to and upon Multisite Phosphorylation, J. Am. Chem. Soc., 2016, 138(47), 15323–15335,  DOI:10.1021/jacs.6b10272.
  26. P. J. Flory, Statistical Mechanics of Chain Molecules, Ithaca, Cornell University Press, 1953 Search PubMed.
  27. D. A. Brant and P. J. Flory, The Configuration of Random Polypeptide Chains. II. Theory, J. Am. Chem. Soc., 1965, 87(13), 2791–2800 CrossRef CAS.
  28. C. Ramakrishnan and G. N. Ramachandran, Stereochemical Criteria for Polypeptide and Protein Chain Conformation. II. Allowed Conformations for a Pair of Peptide Units, Biophys. J., 1965, 5, 909–933 CrossRef CAS PubMed.
  29. G. N. Ramachandran and V. Sasisekharan, Conformation of Polypeptides and Proteins, Adv. Protein Chem., 1968, 23(C), 283–437,  DOI:10.1016/S0065-3233(08)60402-7.
  30. G. N. Ramachandran, C. Ramakrishnan and V. Sasisekharan, Stereochemistry of polypeptide chain configurations, J. Mol. Biol., 1963, 7(1), 95–99,  DOI:10.1016/S0022-2836(63)80023-6.
  31. G. N. Ramachandran, C. M. Venkatachalam and S. Krimm, Stereochemical Criteria for Polypeptide and Protein Chain Conformations: III. Helical and Hydrogen-Bonded Polypeptide Chains, Biophys. J., 1966, 6(6), 849–872,  DOI:10.1016/S0006-3495(66)86699-7.
  32. J. E. Kohn, I. S. Millett, J. Jacob, B. Zagrovic, T. M. Dillon and N. Cingel, et al., Random-coil behavior and the dimensions of chemically unfolded proteins, Proc. Natl. Acad. Sci. U. S. A., 2004, 101(34), 12491–12496,  DOI:10.1073/pnas.0403643101.
  33. F. Ding, R. K. Jha and N. V. Dokholyan, Scaling Behavior and Structure of Denatured Proteins, Structure, 2005, 13(7), 1047–1054,  DOI:10.1016/j.str.2005.04.009.
  34. K. A. Dill and S. Bromberg, Molecular Driving Forces, Statistical Thermodynamics in Chemistry and Biology, New York, Garland Science, 2002 Search PubMed.
  35. R. Schweitzer-Stenner and S. E. Toal, Entropy reduction in unfolded peptides (and proteins) due to conformational preferences of amino acid residues, Phys. Chem. Chem. Phys., 2014, 16(41), 22527–22536,  10.1039/c4cp02108h.
  36. R. Schweitzer-Stenner, The relevance of short peptides for an understanding of unfolded and intrinsically disordered proteins, Phys. Chem. Chem. Phys., 2023, 11908–11933,  10.1039/d3cp00483j.
  37. R. Schweitzer-Stenner, Local and Global Behavior of Unfolded and Intrinsically Disordered Peptides and Proteins, ChemBioChem, 2025, 26(15), e202500172,  DOI:10.1002/cbic.202500172.
  38. A. E. Garcia, Characterization of non-alpha helical conformations in Ala peptides, Polymer, 2004, 45(2), 669–676,  DOI:10.1016/j.polymer.2003.10.070.
  39. S. Gnanakaran and A. E. Garcia, Validation of an All-Atom Protein Force Field: From Dipeptides to Larger Peptides, J. Phys. Chem. B, 2003, 107(46), 12555–12557,  DOI:10.1021/jp0359079.
  40. R. B. Best and G. Hummer, Optimized molecular dynamics force fields applied to the helix-coil transition of polypeptides, J. Phys. Chem. B, 2009, 113(26), 9004–9015,  DOI:10.1021/jp901540t.
  41. R. B. Best, W. Zheng and J. Mittal, Balanced Protein–Water Interactions Improve Properties of Disordered Proteins and Non-Specific Protein Association, J. Chem. Theory Comput., 2014, 10(11), 5113–5124,  DOI:10.1021/ct500569b.
  42. R. B. Best and J. Mittal, Protein simulations with an optimized water model: Cooperative helix formation and temperature-induced unfolded state collapse, J. Phys. Chem. B, 2010, 114(46), 14916–14923,  DOI:10.1021/jp108618d.
  43. S. Toal, D. Meral, D. Verbaro, B. Urbanc and R. Schweitzer-Stenner, PH-independence of trialanine and the effects of termini blocking in short peptides: A combined vibrational, NMR, UVCD, and molecular dynamics study, J. Phys. Chem. B, 2013, 117(14), 3689–3706,  DOI:10.1021/jp310466b.
  44. S. Zhang, R. Schweitzer-Stenner and B. Urbanc, Do Molecular Dynamics Force Fields Capture Conformational Dynamics of Alanine in Water?, J. Chem. Theory Comput., 2020, 16(1), 510–527,  DOI:10.1021/acs.jctc.9b00588.
  45. A. Suresh, R. Schweitzer-Stenner and B. Urbanc, Amber ff24EXP-GA, Based on Empirical Ramachandran Distributions of Glycine and Alanine Residues in Water, J. Chem. Theory Comput., 2025, 21, 2515–2534,  DOI:10.1021/acs.jctc.4c01450.
  46. A. K. Jha, A. Colubri, K. F. Freed and T. R. Sosnick, Statistical coil model of the unfolded state: Resolving the reconciliation problem, Proc. Natl. Acad. Sci. U. S. A., 2005, 102(37), 13099–13104,  DOI:10.1073/pnas.0506078102.
  47. N. C. Fitzkee and G. D. Rose, Reassessing random-coil statistics in unfolded proteins, Proc. Natl. Acad. Sci. U. S. A., 2004, 101(34), 12497–12502,  DOI:10.1073/pnas.0404236101.
  48. M. R. Jensen, M. Zweckstetter, J. R. Huang and M. Blackledge, Exploring free-energy landscapes of intrinsically disordered proteins at atomic resolution using NMR spectroscopy, Chem. Rev., 2014, 6632–6660,  DOI:10.1021/cr400688u.
  49. M. C. Baxa, E. J. Haddadian, J. M. Jumper, K. F. Freed and T. R. Sosnick, Loss of conformational entropy in protein folding calculated using realistic ensembles and its implications for NMR-based calculations, Proc. Natl. Acad. Sci. U. S. A., 2014, 111(43), 15396–15401,  DOI:10.1073/pnas.1407768111.
  50. M. C. Baxa, E. J. Haddadian, A. K. Jha, K. F. Freed and T. R. Sosnick, Context and force field dependence of the loss of protein backbone entropy upon folding using realistic denatured and native state ensembles, J. Am. Chem. Soc., 2012, 134(38), 15929–15936,  DOI:10.1021/ja3064028.
  51. R. Schweitzer-Stenner, R. Kurbaj, N. O'Neill, B. Andrews, R. Shah and B. Urbanc, Conformational Manifold Sampled by Two Short Linear Motif Segments Probed by Circular Dichroism, Vibrational, and Nuclear Magnetic Resonance Spectroscopy, Biochemistry, 2023, 62(17), 2571–2586,  DOI:10.1021/acs.biochem.3c00212.
  52. R. Schweitzer-Stenner, B. Milorey and H. Schwalbe, Randomizing of Oligopeptide Conformations by Nearest Neighbor Interactions between Amino Acid Residues, Biomolecules, 2022, 12, 684,  DOI:10.3390/biom12050684.
  53. B. Milorey, H. Schwalbe, N. O'Neill and R. Schweitzer-Stenner, Repeating Aspartic Acid Residues Prefer Turn-like Conformations in the Unfolded State: Implications for Early Protein Folding, J. Phys. Chem. B, 2021, 125(41), 11392–11407,  DOI:10.1021/acs.jpcb.1c06472.
  54. B. Milorey, R. Schweitzer-Stenner, B. Andrews, H. Schwalbe and B. Urbanc, Short peptides as predictors for the structure of polyarginine sequences in disordered proteins, Biophys. J., 2021, 120(4), 662–676,  DOI:10.1016/j.bpj.2020.12.026.
  55. R. Schweitzer-Stenner and S. E. Toal, Anticooperative Nearest-Neighbor Interactions between Residues in Unfolded Peptides and Proteins, Biophys. J., 2018, 114(5), 1046–1057,  DOI:10.1016/j.bpj.2018.01.022.
  56. R. Schweitzer-Stenner and S. E. Toal, Construction and comparison of the statistical coil states of unfolded and intrinsically disordered proteins from nearest-neighbor corrected conformational propensities of short peptides, Mol Biosyst, 2016, 12(11), 3294–3306,  10.1039/c6mb00489j.
  57. S. E. Toal, N. Kubatova, C. Richter, V. Linhard, H. Schwalbe and R. Schweitzer-Stenner, Randomizing the Unfolded State of Peptides (and Proteins) by Nearest Neighbor Interactions between Unlike Residues, Chem.—Eur. J., 2015, 21, 5173–5192 CrossRef CAS PubMed.
  58. T. R. Sosnick. Sampling library, https://www.godzilla.uchicago.edu/cgi-bin/rama.cgi Search PubMed.
  59. Y. Shen and A. Bax, Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology, J. Biomol. NMR, 2007, 38(4), 289–302,  DOI:10.1007/s10858-007-9166-6.
  60. Y. Shen, J. Roche, A. Grishaev and A. Bax, Prediction of nearest neighbor effects on backbone torsion angles and NMR scalar coupling constants in disordered proteins, Protein Sci., 2018, 27(1), 146–158,  DOI:10.1002/pro.3292.
  61. F. Jiang, W. Han and Y. D. Wu, The intrinsic conformational features of amino acids from a protein coil library and their applications in force field development, Phys. Chem. Chem. Phys., 2013, 15(10), 3413–3428,  10.1039/c2cp43633g.
  62. C. Y. Zhou, F. Jiang and Y. D. Wu, Residue-specific force field based on protein coil library. RSFF2: Modification of AMBER ff99SB, J. Phys. Chem. B, 2015, 119(3), 1035–1047,  DOI:10.1021/jp5064676.
  63. D. Ting, G. Wang, M. Shapovalov, R. Mitra, M. I. Jordan and R. L. Dunbrack, Neighbor-dependent Ramachandran probability distributions of amino acids developed from a hierarchical dirichlet process model, PLoS Comput. Biol., 2010, 6(4), e10007631–21,  DOI:10.1371/journal.pcbi.1000763.
  64. R. Schweitzer-Stenner, Distribution of conformations sampled by the central amino acid residue in tripeptides inferred from amide i band profiles and NMR scalar coupling constants, J. Phys. Chem. B, 2009, 113(9), 2922–2932,  DOI:10.1021/jp8087644.
  65. R. Schweitzer-Stenner, Secondary structure analysis of polypeptides based on an excitonic coupling model to describe the band profile of amide I′ of IR, raman, and vibrational circular dichroism spectra, J. Phys. Chem. B, 2004, 108(43), 16965–16975,  DOI:10.1021/jp0477654.
  66. J. S. Hu and A. Bax, Determination of φ and χ 1 angles in proteins from 13C- 13C three- bond J couplings measured by three-dimensional heteronuclear NMR. How planar is the peptide bond?, J. Am. Chem. Soc., 1997, 119(27), 6360–6368,  DOI:10.1021/ja970067v.
  67. J. Wirmer and H. Schwalbe, Angular dependence of 1J(Ni,Cαi) and 2J(Ni,Cα(i-1)) couplings constants measured in J-modulated HSQCs, J. Biomol. NMR, 2002, 23(1), 47–55,  DOI:10.1023/A:1015384805098.
  68. J. Debartolo, A. Jha, K. F. Freed and T. R. Sosnick. Local Backbone Preferences and Nearest-Neighbor Effects in the Unfolded and Native States. in Schweitzer-Stenner R., ed. Protein and Peptide Folding, Misfolding, and Non-folding, Chichester, Wiley & Sons, 2012. pp. 79–98,  DOI:10.1002/9781118183373.ch3.
  69. H. T. Tran, X. Wang and R. V. Pappu, Reconciling observations of sequence-specific conformational propensities with the generic polymeric behavior of denatured proteins, Biochemistry, 2005, 44(34), 11369–11380,  DOI:10.1021/bi050196l.
  70. A. Hagarman, T. J. Measey, D. Mathieu, H. Schwalbe and R. Schweitzer-Stenner, Intrinsic propensities of amino acid residues in GxG peptides inferred from amide I’ band profiles and NMR scalar coupling constants, J. Am. Chem. Soc., 2010, 132(2), 540–551,  DOI:10.1021/ja9058052.
  71. R. Schweitzer-Stenner, A. Hagarman, S. Toal, D. Mathieu and H. Schwalbe, Disorder and order in unfolded and disordered peptides and proteins: A view derived from tripeptide conformational analysis. I. Tripeptides with long and predominantly hydrophobic side chains. Proteins: Structure, Proteins:Struct., Funct., Bioinf., 2013, 81(6), 955–967,  DOI:10.1002/prot.24225.
  72. A. Hagarman, D. Mathieu, S. Toal, T. J. Measey, H. Schwalbe and R. Schweitzer-Stenner, Amino acids with hydrogen-bonding side chains have an intrinsic tendency to sample various turn conformations in aqueous solution, Chem.–Eur. J., 2011, 17(24), 6789–6797,  DOI:10.1002/chem.201100016.
  73. C. J. Penkett, C. Redfield, I. Dodd, J. Hubbard, D. L. McBay and D. E. Mossakowska, et al., NMR analysis of main-chain conformational preferences in an unfolded fibronectin-binding protein, J. Mol. Biol., 1997, 274(2), 152–159,  DOI:10.1006/jmbi.1997.1369.
  74. M. B. Swindells, M. W. Macarthur and J. M. Thornton, Intrinsic φ,ψ propensities of amino acids, derived from the coil regions of known structures, Nat Struct Biol, 1995, 2(7), 569–603,  DOI:10.1038/nsb0795-596.
  75. M. H. Zaman, M. Y. Shen, R. S. Berry, K. F. Freed and T. R. Sosnick, Investigations into sequence and conformational dependence of backbone entropy, inter-basin dynamics and the flory isolated-pair hypothesis for peptides, J. Mol. Biol., 2003, 331(3), 693–711,  DOI:10.1016/S0022-2836(03)00765-4.
  76. F. Avbelj and R. L. Baldwin, Origin of the neighboring residue effect on peptide backbone conformation, Proc. Natl. Acad. Sci. U. S. A., 2004, 101(30), 10967–10972,  DOI:10.1073/pnas.0404050101.
  77. R. V. Pappu, R. Srinivasan and G. D. Rose, The Flory isolated-pair hypothesis is not valid for polypeptide chains: Implications for protein folding, Proc. Natl. Acad. Sci. U. S. A., 2000, 97(23), 12565–12570,  DOI:10.1073/pnas.97.23.12565.
  78. A. Bundi and K. Wüthrich, 1H-nmr parameters of the common amino acid residues measured in aqueous solutions of the linear tetrapeptides H-Gly-Gly-X-L-Ala-OH, Biopolymers, 1979, 18(2), 285–297,  DOI:10.1002/bip.1979.360180206.
  79. D. Braun, G. Wider and K. Wüthrich, Sequence-Corrected $^15$N “Random Coil” Chemical Shifts, J. Am. Chem. Soc., 1994, 116, 8466–8469 CrossRef CAS.
  80. S. Schwarzinger, G. J. A. Kroon, T. R. Foss, J. Chung, P. E. Wright and H. J. Dyson, Sequence-dependent correction of random coil NMR chemical shifts, J. Am. Chem. Soc., 2001, 123(13), 2970–2978,  DOI:10.1021/ja003760i.
  81. M. Kjaergaard and F. M. Poulsen, Sequence correction of random coil chemical shifts: Correlation between neighbor correction factors and changes in the Ramachandran distribution, J. Biomol. NMR, 2011, 50(2), 157–165,  DOI:10.1007/s10858-011-9508-2.
  82. Y. S. Jung, K. I. Oh, G. S. Hwang and M. Cho, Neighboring residue effects in terminally blocked dipeptides: Implications for residual secondary structures in intrinsically unfolded/disordered proteins, Chirality, 2014, 26(9), 443–452,  DOI:10.1002/chir.22285.
  83. S. Gómez, F. Egidi, A. Puglisi, T. Giovannini, B. Rossi and C. Cappelli, Unlocking the power of resonance Raman spectroscopy: The case of amides in aqueous solution, J. Mol. Liq., 2022, 346, 117841,  DOI:10.1016/j.molliq.2021.117841.
  84. D. Meral, S. Toal, R. Schweitzer-Stenner and B. Urbanc, Water-Centered Interpretation of Intrinsic pPII Propensities of Amino Acid Residues: In Vitro-Driven Molecular Dynamics Study, J. Phys. Chem. B, 2015, 119(42), 13237–13251,  DOI:10.1021/acs.jpcb.5b06281.
  85. Y. W. Teh, M. I. Jordan, M. J. Beal and D. M. Blei, Hierarchical Dirichlet Processes, J. Am. Stat. Assoc., 2006, 101(476), 1566–1581,  DOI:10.1198/016214506000000302.
  86. S. Gnanakaran and A. E. García, Helix-coil transition of alanine peptides in water: Force field dependence on the folded and unfolded structures, Proteins:Struct., Funct., Genet., 2005, 59(4), 773–782,  DOI:10.1002/prot.20439.
  87. D. Verbaro, I. Ghosh, W. M. Nau and R. Schweitzer-Stenner, Discrepancies between conformational distributions of a polyalanine peptide in solution obtained from molecular dynamics force fields and amide I′ band profiles, J. Phys. Chem. B, 2010, 114(51), 17201–17208,  DOI:10.1021/jp109404r.
  88. J. Graf, P. H. Nguyen, G. Stock and H. Schwalbe, Structure and dynamics of the homologous series of alanine peptides: A joint molecular dynamics/NMR study, J. Am. Chem. Soc., 2007, 129(5), 1179–1189,  DOI:10.1021/ja0660406.
  89. F. Jiang, W. Han and Y. D. Wu, Influence of side chain conformations on local conformational features of amino acids and implication for force field development, J. Phys. Chem. B, 2010, 114(17), 5840–5850,  DOI:10.1021/jp909088e.
  90. M. D. Mukrasch, P. Markwick, J. Biernat, M. Von Bergen, P. Bernadó and C. Griesinger, et al., Highly populated turn conformations in natively unfolded tau protein identified from residual dipolar couplings and molecular simulation, J. Am. Chem. Soc., 2007, 129(16), 5235–5243,  DOI:10.1021/ja0690159.
  91. M. Schwalbe, V. Ozenne, S. Bibow, M. Jaremko, L. Jaremko and M. Gajda, et al., Predictive atomic resolution descriptions of intrinsically disordered hTau40 and α-synuclein in solution from NMR and small angle scattering, Structure, 2014, 22(2), 238–249,  DOI:10.1016/j.str.2013.10.020.
  92. J. Roche, Y. Shen, J. H. Lee, J. Ying and A. Bax, Monomeric Aβ 1–40 and Aβ 1–42 Peptides in Solution Adopt Very Similar Ramachandran Map Distributions That Closely Resemble Random Coil, Biochemistry, 2016, 55(5), 762–775,  DOI:10.1021/acs.biochem.5b01259.
  93. A. B. Mantsyzov, A. S. Maltsev, J. Ying, Y. Shen, G. Hummer and A. Bax, A maximum entropy approach to the study of residue-specific backbone angle distributions in α-synuclein, an intrinsically disordered protein, Protein Sci., 2014, 23(9), 1275–1290,  DOI:10.1002/pro.2511.
  94. B. Andrews, J. Guerra, R. Schweitzer-Stenner and B. Urbanc, Do molecular dynamics force fields accurately model Ramachandran distributions of amino acid residues in water?, Phys. Chem. Chem. Phys., 2022, 24(5), 3259–3279,  10.1039/d1cp05069a.
  95. P. Bernadó, C. W. Bertoncini, C. Griesinger, M. Zweckstetter and M. Blackledge, Defining long-range order and local disorder in native α-synuclein using residual dipolar couplings, J. Am. Chem. Soc., 2005, 127(51), 17968–17969,  DOI:10.1021/ja055538p.
  96. V. Ozenne, R. Schneider, M. Yao, J. R. Huang, L. Salmon and M. Zweckstetter, et al., Mapping the potential energy landscape of intrinsically disordered proteins at amino acid resolution, J. Am. Chem. Soc., 2012, 134(36), 15138–15148,  DOI:10.1021/ja306905s.
  97. V. Ozenne, F. Bauer, L. Salmon, J. R. Huang, M. R. Jensen and S. Segard, et al., Flexible-meccano: A tool for the generation of explicit ensemble descriptions of intrinsically disordered proteins and their associated experimental observables, Bioinformatics, 2012, 28(11), 1463–1470,  DOI:10.1093/bioinformatics/bts172.

Footnote

In Ting et al., H was multiplied by 100 to express it in units of %.

This journal is © The Royal Society of Chemistry 2026
Click here to see how this site uses Cookies. View our privacy policy here.