Pau
Bernadó
*a and
Dmitri I.
Svergun
*b
aInstitute for Research in Biomedicine, Parc Científic de Barcelona, Baldiri Reixac, 10, 08028 Barcelona, Spain
bEuropean Molecular Biology Laboratory, Hamburg Outstation, Notkestrasse 85, 22603 Hamburg, Germany. E-mail: pau.bernado@irbbarcelona.org; svergun@embl-hamburg.de
First published on 22nd September 2011
Small-angle scattering of X-rays (SAXS) is an established method to study the overall structure and structural transitions of biological macromolecules in solution. For folded proteins, the technique provides three-dimensional low resolution structures ab initio or it can be used to drive rigid-body modeling. SAXS is also a powerful tool for the quantitative analysis of flexible systems, including intrinsically disordered proteins (IDPs), and is highly complementary to the high resolution methods of X-ray crystallography and NMR. Here we present the basic principles of SAXS and review the main approaches to the characterization of IDPs and flexible multidomain proteins using SAXS. Together with the standard approaches based on the analysis of overall parameters, a recently developed Ensemble Optimization Method (EOM) is now available. The latter method allows for the co-existence of multiple protein conformations in solution compatible with the scattering data. Analysis of the selected ensembles provides quantitative information about flexibility and also offers insights into structural features. Examples of the use of SAXS and combined approaches with NMR, X-ray crystallography, and computational methods to characterize completely or partially disordered proteins are presented.
Pau Bernadó | Pau Bernadó is a research assistant at the Institute for Research in Biomedicine in Barcelona. At the end of 2011 he will become a group leader at the Centre de Biochimie Structurale in Montpellier. He received his PhD in Chemistry from the University of Barcelona in 2003. His research is focused on methodological developments to study the structure and dynamics of biomolecules and macromolecular complexes, with special interest in highly flexible proteins. To achieve these goals, he is combining Nuclear Magnetic Resonance, small-angle X-ray scattering and computational methods. He has co-authored 51 peer-reviewed publications and several book chapters. |
Dmitri I. Svergun | Dmitri Svergun, a graduate of Moscow University, Russia, is a group leader at the European Molecular Biology Laboratory, Hamburg Outstation. His research activities include small-angle X-ray scattering, applications of synchrotron radiation and neutron scattering in structural research of biological macromolecules and nanostructured materials, and development of mathematical methods for the interpretation of elastic scattering data. He was awarded the 2010 International Rusnanoprize for method development in nanodiagnostics. He is the author/co-author of over 270 peer-reviewed publications and 2 monographs (h-index 38), and a consultant of the Commission on Small-Angle Scattering of the International Union of Crystallography. |
In contrast to most other structural methods, one of the major advantages of SAS lies in its applicability not only to monodisperse solutions of well-folded and relatively rigid macromolecules and complexes but also to mixtures of diverse types of macromolecules. This versatility makes the technique applicable to the analysis of oligomeric mixtures,9self-assembly processes,10 and protein or RNA folding.11–13 The technique is also effectively employed to study systems with conformational polydispersity, i.e. completely or partially disordered macromolecules, including multi-domain proteins with flexible linkers and intrinsically disordered proteins (IDPs). In the past, characterization of flexibility by means of SAS was done qualitatively, generally being restricted to the determination of a few simple geometric parameters. Recent developments have allowed quantitative description of the conformational ensembles for flexible macromolecules in solution.14,15 These new approaches will be briefly outlined below. Here we will mainly consider SAXS, which is the most popular technique; however, in principle these new approaches are also applicable for SANS studies. The latter method can be useful in some specific cases, in particular for multi-component complexes, when one can make use of contrast variation by isotopic hydrogen–deuterium exchange (e.g. specific perdeuteration).16 For IDPs, SAXS is the method most often used because it is faster, requires less material and usually provides more precise experimental data.
The basic scheme of a SAXS experiment is illustrated in Fig. 1. A solution of macromolecules placed in a capillary or cuvette is illuminated by a collimated monochromatic X-ray beam and the intensity of the scattered beam is measured as a function of the scattering angle (2θ). In solution, scattering is usually isotropic due to the random orientation of particles, but SAXS patterns are generally recorded by two-dimensional detectors, which provide better statistical accuracy of the signal after radial averaging. A similar measurement is performed on the same container filled by solvent, and the latter scattering is then subtracted from that of the macromolecular solution. The difference pattern arises from the dissolved particles and provides information about their structure. Note that structural studies require solute concentrations of a few mg ml−1 (i.e. below 1%), in which the interactions between the macromolecules are usually negligible. Only this diluted regime is considered below (measurements on concentrated solutions, which may also provide information about interparticle interactions are outside the scope of this review).
Fig. 1 Basic scheme of a SAXS experiment. |
The scattering of X-rays by a macromolecular solution is proportional to the number of biomolecules in the illuminated volume (i.e. to the solute concentration) and to the difference in the electron density of the solute and solvent (often also called the contrast). As biomolecules are composed of light atoms, the contrast and thus the useful signal in aqueous solutions is usually rather small. The most precise biological SAXS experiments are therefore conducted on synchrotron sources, which provide high brilliance X-rays. All major synchrotrons, like ESRF (Grenoble), DESY (Hamburg), Diamond (Oxford), SSRL (Stanford), or Spring-8 (Himeji), have SAXS beamlines that are optimized for biological solution scattering experiments. In principle, satisfactory results can also be obtained on dedicated SAXS cameras using laboratory X-ray sources (for example those produced by Rigaku, Bruker and other firms), which yield a much lower flux but still a low background. The typical exposure times range from seconds or minutes on synchrotrons to hours on laboratory sources but the amount of sample required is approximately the same in both cases. Typically, a few tens of microlitres of solution per measurement and a total of 1–2 mg of purified protein per construct are needed, including a compulsory concentration series measurement to extrapolate to infinite dilution. Regarding synchrotrons, the intense X-ray beams often damage samples. For the meaningful analysis of the overall shape but also flexibility of macromolecules, solutions containing single molecular species without aggregates are needed. Typically, monodispersity over 90% is required, which must be verified by other methods such as gel filtration chromatography, dynamic light scattering (DLS), or analytical ultracentrifugation (AUC) before measuring in a synchrotron.
Aqueous solutions of macromolecules yield an isotropic scattering intensity that depends on the modulus of the momentum transfer s (s = 4πsin(θ)/λ, where 2θ is the angle between the incident and scattered beam and λ is the X-ray wavelength):
(1) |
A(s) = ℑ[ρ(r)] = ∫Δρ(r)exp(isr)dr, | (2) |
Several relevant structural parameters can be obtained directly from these curves, thus providing information about the size, oligomeric state and overall shape of the molecule. For monodisperse dilute solutions of folded purified proteins, nucleic acids or macromolecular complexes, the background corrected intensity after subtraction of the solvent scattering is proportional to the scattering of a single particle, averaged over all orientations. Novel methods of SAXS data analysis allow the structural determination of these systems at low resolution (1–2 nm).5,6
Flexible macromolecules belong to another more complicated case of polydisperse systems, in which the solution contains various types of particles. In general, polydispersity makes particle structure analysis difficult or impossible, because, contrary to monodisperse systems, the intensity measured is not related to the scattering of a single particle. In general, given K distinct particle types (components) in a mixture, the SAXS intensity is a linear combination of their individual contributions
(3) |
Both IDPs and modular multi-domain proteins with flexible linkers can be represented as mixtures of different conformations of the same molecule, and their scattering is therefore described by eqn (3). However, these mixtures contain a very large number of configurations (k ≫ 1) and thus plain decomposition, as in the case of oligomeric mixtures, is not feasible. Below we describe the methods specifically developed for the studies of highly flexible systems.
Fig. 2 Effect of the conformational sampling of IDPs on SAXS profiles and their Kartky representations. (A) Ten 100-residue long polyalanine conformations built with Flexible-Meccano, chosen from a pool of 10000, representing the variety of sizes and shapes encountered in a disordered protein. (B) Individual SAXS profiles (black) and (C) Kratky plots (black) of these ten randomly selected chains. The average of the SAXS intensities and the Kratky plots from the 10000 conformations are shown in red in (B) and (C), respectively. Averaged curves indicate the common behavior of fully disordered proteins. |
Traditionally, Kratky plots (I(s)·s2 as a function of s) have been used to qualitatively identify disordered states and distinguish them from globular particles. The Kratky representation has the capacity to enhance particular features of scattering profiles, thus allowing easier identification of degrees of compactness.12 The scattering intensity of a globular protein behaves approximately as 1/s4 conferring a bell-shaped Kratky plot with a well defined maximum. Conversely, an ideal Gaussian chain has a 1/s2 dependence of I(s) and therefore presents a plateau at large s values. In the case of a chain with no thickness, the Kratky plot also presents a plateau over a specific range of s, which is followed by a monotonic increase. The latter behavior is normally observed experimentally in unfolded proteins. Unfolding processes monitored by SAXS are good examples of how the decrease in protein compactness is translated into Kratky plots.19,20 To exemplify what typically occurs in an IDP, we present the Kratky plots of the above-mentioned ten individual SAXS profiles compared with the averaged one (Fig. 2C). The diversity of profiles observed is even more noticeable when using the Kratky representation. While some profiles present a clear maximum, indicating a compact conformation, others present less evident shapes. When averaging the 10000 curves computed, the average Kratky plot presents a monotonic increase, as expected for an unstructured system. Multi-domain proteins, where two or more globular domains are connected by flexible linkers, are a special case of IDPs that present a dual behavior. Consequently, SAXS profiles and Kratky plots present contributions from both structurally distinct regions.
In the Guinier approximation, Rg is obtained by a simple linear fit in logarithmic scale, assuming that at very small angles (s < 1.3/Rg) the intensity is represented as I(s) = I(0) exp(−(sRg)2/3).21 Alternatively, Debye's approximation, eqn (4), can be more precise than Guinier's as its validity extends to larger momentum transfer ranges.22
(4) |
The experimental Rg is a single value representation of the size of the molecule, which for disordered states represents a z-average1 over all accessible conformations in solution. The most common quantitative interpretation of Rg for unfolded proteins is based on Flory's equation,23 which relates it to the length of the protein chain through a power law,
Rg = R0·Nν | (5) |
The comparison of Rg for proteins with threshold values derived from Flory's equation is a very common strategy to assign disorder to a protein. However, the question as to whether conformational sampling in a chemically denatured state is equivalent to that found for IDPs in native conditions must be clarified. It has been postulated that the interaction of denaturants such as urea or guanidinium chloride with backbone and/or side chain atoms could perturb the Ramachandran populations that define the conformational variability of disordered proteins (see ref. 27 and references therein). A recent NMR study based on the measurement of several Residual Dipolar Couplings (RDCs) along the ubiquitin backbone indicated that chemically denatured proteins have an enhanced population of extended conformations compared to IDPs.28 As a result of this perturbation at residue level, the overall properties in the two conditions could also differ. This issue has been recently addressed by comparing Rg values measured for chemically denatured and natively disordered proteins with those computed from large ensembles of conformations with distinct percentages of extended conformations at residue level.29 That study found that a 15% increase in extended conformations was required to describe Rg data measured for chemically denatured proteins, in perfect agreement with the previously mentioned RDC study.28 The same enhancement had to be invoked to properly describe hydrodynamic radii measured with pulsed-field gradient diffusion NMR experiments. These results suggest that parameters of Flory's relationship derived for denatured proteins are not appropriate for IDPs (see below).
Protein | #Residuesa | R g exp/Å | R g RC/Åb | Ref. |
---|---|---|---|---|
a When present, His-tags were considered part of the protein. b Threshold Rg value obtained from the parametrization of Flory's relationship with the coil database. c Length of the most populated isoform of the samples was used. d R g derived from averaging conformations selected with EOM. e Data measured by SANS in highly crowded conditions (130 mg ml−1 of BPTI). f Data derived from the 10 mM Arg/Glu buffer. | ||||
MeCP2 | 486 | 62.5 ± 4.5 | 64.2 | 30 |
Msh6 N-term | 304 | 56 ± 2 | 50.2 | 31 |
Ki-1/57 | 292 | 47.5 ± 1.0 | 49.2 | 32 |
MeCP2 (78–305) | 228 | 37.0 ± 0.9 | 43.3 | 30 |
Synthetic Resilin | 185 | 50 ± 5 | 38.8 | 33 |
Pig Calpastatin domain I | 148 | 35.4 | 34.5 | 34 |
HrpO | 147 | 35.0 | 34.4 | 35 |
II-1 | 141c | 41.0 ± 1.0 | 33.6 | 36 |
α-Synuclein, pH 7.5 | 140 | 40 ± 1 | 33.5 | 37 |
α-Synuclein, pH 3.0 | 140 | 30 ± 1 | 33.5 | 37 |
N-tail nucleoprotein MV | 139 | 27.2 ± 0.5 | 33.4 | 38 |
β-synuclein | 137 | 49 ± 1 | 33.1 | 39 |
Human NHE1 cdt (5 °C) | 131 | 37.1d | 32.4 | 40 |
Human NHE1 cdt (45 °C) | 131 | 35.3d | 32.4 | 41 |
ERM Transactivation Domain | 130 | 39.6 ± 0.7 | 32.2 | 42 |
Neuroligin 3 | 118 | 31.5 ± 1.0 | 30.6 | 43 |
elF4E binding Protein (4E-BP) | 117 | 48.8 ± 0.2 | 30.5 | 44 |
Prothymosin α, pH 7.5 | 109 | 37.8 ± 0.9 | 29.4 | 45 |
Prothymosin α, pH 2.5 | 109 | 27.6 ± 0.9 | 29.4 | 45 |
paNHE1 cdt (5 °C) | 107 | 32.8d | 29.1 | 40 |
paNHE1 cdt (45 °C) | 107 | 32.9d | 29.1 | 41 |
N-protein of bacteriophage λ | 107 | 33 ± 2e | 29.1 | 46 |
FEZ1 monomer | 103 | 36 ± 1 | 28.5 | 47 |
HIV-1 Tat133 | 101 | 33.0 ± 1.5 | 28.3 | 48 |
p53 (1–93) | 93 | 28.7 ± 0.3 | 27.1 | 49 |
Sic1 | 92 | 34.7 | 26.9 | 50 |
pSic1 (hexaphosphorylated) | 92 | 34.0 | 26.9 | 50 |
PIR domain | 75 | 26.5 ± 0.5 | 24.2 | 51 |
IB5 | 73c | 27.9 ± 1.0 | 23.8 | 36 |
ACTR (5 °C) | 71 | 25.8d | 23.5 | 41 |
ACTR (45 °C) | 71 | 23.8d | 23.5 | 41 |
N-term VS Virus phosphoprotein | 68 | 26 ± 1f | 23.0 | 52 |
Tau ht40 | 441 | 65 ± 3 | 61.0 | 53 |
Tau K32 | 202 | 42 ± 3 | 40.6 | 53 |
Tau K16 | 174 | 39 ± 3 | 37.5 | 53 |
Tau K18 | 130 | 38 ± 3 | 32.2 | 53 |
Tau ht23 | 352 | 53 ± 3 | 54.2 | 53 |
Tau K27 | 171 | 37 ± 2 | 37.2 | 53 |
Tau K17 | 143 | 36 ± 2 | 33.9 | 53 |
Tau K19 | 99 | 35 ± 1 | 28.0 | 53 |
Tau K44 | 283 | 52 ± 2 | 48.4 | 53 |
Tau K10 | 167 | 40 ± 1 | 36.7 | 53 |
Tau K25 | 185 | 41 ± 2 | 38.7 | 53 |
Tau K23 | 254 | 49 ± 2 | 45.7 | 53 |
Tau K32 AT8 AT100 | 202 | 41 ± 3 | 40.6 | 53 |
Tau ht23 S214E | 352 | 54 ± 3 | 54.2 | 53 |
Tau ht23 AT8 AT100 | 352 | 52 ± 3 | 54.2 | 53 |
Tau K18 P301L | 130 | 35 ± 2 | 32.2 | 53 |
Tau ht40 AT8 AT100 PHF1 (10 °C) | 441 | 66 ± 3 | 61.0 | 54 |
Tau ht40 AT8 AT100 PHF1 (50 °C) | 441 | 67 ± 3 | 61.0 | 54 |
An excellent example of the synergy between SAXS and other experimental and computational tools to structurally characterize IDPs is the 139-residue-long C-terminal Measles virus nucleoprotein case, N-tail.38 The CD spectra with random coil signatures and the low dispersion of the 1H NMR spectra clearly suggested that N-tail was disordered. These observations were in agreement with the notably large Rg obtained from Guinier analysis of the small-angle region (27.5 ± 0.7 Å). However, N-tail Kratky plot presented a clear bump at s = 0.08 Å−1 followed by a flat region. This dual behavior indicates a certain degree of compactness. A bioinformatics analysis predicted a 16-residue-long fragment with a strong tendency to form an α-helix. In a subsequent crystallographic study, this fragment was found to form an α-helix in complex with its natural partner, the C-terminal region of the phosphoprotein. This observation confirmed that this stretch was a molecular recognition element.56 Interestingly, the same region of the protein belonging to the related Sendai virus family was identified to have a large population, around 75%, of helical conformations,57,58 thus substantiating the results for Measles N-tail. Importantly, populations of nascent secondary structural elements such as the one found in this study are often related to biomolecular recognition events involved in regulation and signaling59–61
There have been attempts to reconstruct low resolution structures from SAXS data for disordered chains using standard programs designed to address rigid systems. The resulting ab initio reconstructions or the rigid-body models display highly elongated shapes, dictated by their large Dmax. The validity of these single conformation approaches to describe a highly plastic protein is uncertain, although clearly helps to visualize a largely diffuse molecule.
Intrinsically disordered fragments are often attached to or tether folded domains. This architecture provides advantages in recognition events. There are several SAXS studies of partially folded proteins. Some examples are the transcriptional repressor CtBP,62 Factor H,63,64 the C-terminus of Sendai virus phosphoprotein,14 N-terminus of Msh6,31 p53,65 Prion protein,66 Alix,67 proNGF,68 Phd N-terminal domain,69XRCC4,70TIAR,71Replication Protein A,72 MMP-1,73FERM domains,74 and the entire Measles virus nucleocapsid.75 The presence of highly flexible regions attached to globular particles induces a dual behavior in the Kratky plots of partially folded proteins. These plots display a clear maximum, corresponding to the folded part of the protein, and a contribution with a continuous rise produced by the presence of disordered regions. The relative number of amino acids from these structurally distinct domains dictates the features of the Kratky plot. Fig. 3 displays the Kratky plot for three distinct constructs of Src-Kinase and exemplifies the prototypical features of globular, fully disordered and partially folded proteins. In a recent study, the effects of disordered linkers in SAXS data analysis were systematically addressed using synthetic data.76 In that study, a decrease in the structural features in SAXS profiles, which report on the loss of structural correlations, was observed when moving from static scenarios to highly dynamic ones. This systematic smoothening of the profiles was enhanced in Kratky representations of the same data that displayed a single broad maximum. These observations are in agreement with several experimental studies of multi-domain proteins.72,77
Fig. 3 Kratky plot for three constructs of Src-Kinase. The globular SH3 domain (blue), the fully disordered Unique domain (red), and a construct joining both domains (purple). The prototypical features of globular and disordered domains are combined in the partially folded construct. Courtesy of Yolanda Pérez and Miquel Pons (Institute for Research in Biomedicine, Barcelona). |
Proteins with large disordered tails or flexible linkers show specific features when modeled using approaches based on the presence of a single conformation. Ab initio reconstructions display an apparent decrease in the resolution of the envelopes, where densities corresponding to the globular units are difficult to identify. Rigid body models show very elongated shapes with isolated globular domains. These resulting structural trends have been proposed as an indicator of large-scale motions and flexibility in multi-domain proteins.76
Prothymosin α has been studied by SAS in range of conditions. Of its 109 residues, roughly half are aspartic and glutamic acids, such that the protein is predicted to have a global charge of −54 at neutral pH. SAXS curves of prothymosin α have been measured in near neutral (pH 7.5) and acidic conditions (pH 2.5).45 Although Kratky plots indicated that the system was unstructured in both conditions, a dramatic reduction in Rg from 37.8 ± 0.9 Å to 27.6 ± 0.9 Å was observed in response to a decrease in pH. Interestingly, a similar level of compaction, down to Rg of 28.1 ± 0.8 Å, was detected when 15 mM of Zn2+ was added to the sample at neutral pH.83 These dramatic changes in protein size can be explained by a screening of the electrostatic repulsion within the chain at low pH and by the presence of cations bound to prothymosin α.
The effect of temperature on IDP structure has recently been addressed in two studies that combined SAXS with other biophysical tools.41,54 Kjaergaard et al. found a contraction in the Rg values of two IDPs, human NHE1 and ACTR, when temperature was increased from 5 to 45 °C. This observation thus reflects changes in the conformational sampling at residue level (see Table 1). CD and NMR experiments in the same experimental conditions demonstrated that transient α-helices partially unfold upon increasing temperature. Assuming that the presence of an α-helix compacts the ensemble, the authors proposed that the Ramachandran space sampled by IDPs is redistributed with temperature and the population of Polyproline II (PPII) conformations is transformed into the less extended β-strand ones. The effect of temperature jumps on the structure of protein Tau (441 residues) has recently been addressed by Shkumatov et al.54SAXS data on wild-type Tau and a mutant that mimics a phosphorylated state were collected at 10 °C and 50 °C and no changes in the apparent Rg were observed (66 ± 3 and 65 ± 3 Å, respectively). However, SAXS curves measured after a rapid temperature jump, from 10 °C to 50 °C or from 50 °C to 10 °C, presented Rg values that were notably smaller than curves measured in equilibrium, 55 ± 3 and 56 ± 3 Å, respectively. Interestingly, the structural compaction was preserved for several hours after the temperature jump until it reached the conformational equilibrium. The authors attributed this intriguing effect to structural memory of the protein, which may be associated with the more compact nature of hyperphosphorylated Tau involved in neurodegeneration in Alzheimer's disease.
In a recent study, the effect induced by crowders on the level of IDP compaction was evaluated for N-protein of bacteriophage λ in the presence of increasing concentrations (up to 130 mg ml−1) of bovine pancreatic trypsin inhibitor (BPTI).46 Performed with SANS, this pioneering study allowed the contrast matching of protonated BPTI. Deuterated N-protein displayed a severe compaction when going from 0 to 65 mg ml−1 of BPTI, but this level was maintained when crowder concentration was increased to 130 mg ml−1. Interestingly, the formation of large oligomers was inhibited at this high BPTI concentration. This non-linear compaction with the concentration of crowders suggests complex conformational behavior of IDPs in conditions that resemble those found in vivo. This study opens the door to the use of SAS to further study crowding effects at molecular level by approaching conditions similar to those encountered by biomolecules in cells.
The Msh2–Msh6 complex recognizes mismatched bases in DNA during mismatch repair. The N-terminal region of Msh6, a 304-residue-long IDP, recognizes PCNA, a homotrimeric protein that controls the processivity of DNA polymerases. Shell and co-workers demonstrated this direct interaction with SAXS.31 A comparison of the Rg, Kratky plots and p(r) functions of the isolated partners and the complex showed that PCNA does not induce substantial structure to the N-terminal region of Msh6, which remains mainly disordered and proteolytically accessible upon binding. The interaction of the Msh2–Msh6 complex with PCNA was also addressed by SAXS. The interaction was shown to produce a complex that could be considered a highly flexible dumbbell, where both globular domains are tethered by the N-terminal Msh6 fragment, which acts as a molecular leash. These observations were further confirmed in an additional experiment with a biologically active deletion mutant of Msh6 with a notably shorter N-terminal tail. In these conditions, the large changes in size upon binding were easily monitored by the p(r) and Dmax derived from the SAXS profiles.
The tumor suppressor p53 is a multifunctional protein that plays a crucial role in processes like the regulation of apoptosis and DNA repair. p53 is a homotetramer with two folded domains that are tethered and flanked by unstructured regions that represent 37% of the whole sequence. Rigid-body modeling of SAXS data measured for p53 suggests that the protein is an open cross-like tetrameric assembly that collapses to tightly embrace DNA.65 This flexible architecture helps p53 to fulfil its function of constantly watching over the DNA in the cell. Interestingly, the SAXS model of the nucleoprotein without the disordered N-terminal transactivation domains is in excellent agreement with an independent electron microscopy map of the same complex.
The quality of in silico structural models has been evaluated by their capacity to properly reproduce experimental data measured for disordered proteins. SAXS has arguably been the most popular technique, although hydrodynamic measurements and NMR have also been used.91 Several types of models with varying degrees of structural resolution have been validated with previously mentioned compilations of Rg for denatured proteins.29,92–99 These studies report remarkable agreement with the experimental data. However, for most of the models an additional parameterization accounting for the exclusion, the solvation terms, or the conformational sampling at residue level was required.
In more advanced studies, complete scattering profiles were considered. Extending the momentum transfer range has stimulated the development of more adequate ensemble models, which have to be tested at higher resolution. Zagrovic et al. addressed the conformational study of a synthetic peptide by combining state-of-the-art molecular dynamic simulations with the experimentally measured SAXS profile.100 In another example, scattering patterns of the reduced ribonuclease A in a range of denaturing conditions were well reproduced with a proper selection of the solvation term.99 A very simplistic structural model based on polymer theory was used to describe the SANS curve of denatured phosphoglycerate kinase.101
Fewer studies of this kind have been performed for IDPs. To explain the SAXS curve measured for the complex of p27kip1/Cdk2/Cyclin A, the plasticity of the system was explicitly accounted for.86 Hundreds of snapshots of the molecular dynamics simulation of the complex were collected, and their theoretical SAXS profiles were computed and averaged, yielding a curve with a better description of the experimental data than single conformation models.
The Flexible-Meccano (FM) program has been the most tested structural model for IDPs. FM assembles peptidic units, considered rigid entities, in a consecutive way.14 The force-field used for this algorithm includes a coil description of the residue-specific Ramachandran space sampled by the amino acids, and a coarse-grained description of the side-chains thereby preventing steric clashes. This program has been tested for a large number of IDPs and has successfully described several NMR observables and SAXS data measured for these proteins.102 Remarkable agreement was found between the SAXS curve derived from FM ensembles and the experimental ones for the partially folded Sendai virus PX,14 the transactivation domain of p53,49 and the K18 construct of Tau protein (Fig. 4).53,103 In these two latter cases, transient levels of structuration were found by RDCs. Remarkably, in all these cases the same structural model simultaneously described NMR properties, which mainly report on the conformational (local) properties, and SAXS curves, which report on the size and shape (global) of the proteins. Again, these results underline the synergy between SAXS and NMR observables.
Fig. 4 Comparison of the experimental SAXS profiles (empty dots) with the theoretical ones derived from large ensembles computed with Flexible-Meccano (FM)(red) for Sendai virus PX, Transactivation domain of p53, and K18 construct of Tau protein. FM ensembles were modified to reproduce RDC data in these cases. The excellent agreement in these three examples indicates the proper description of the overall properties of IDPs coded in FM. Note that these plots come from a direct comparison of SAXS curves and not from a fitting. |
The encouraging results obtained for the structural modeling of several IDPs using FM prompted the derivation of a specific parametrization of Flory's equation for these proteins (eqn (5)).29 This aim was accomplished by fitting Rg values derived from synthetic SAXS curves computed for large conformational ensembles, built with FM, of several proteins covering a large spectra of sizes (eqn (6)).
Rg = (2.54 ± 0.01)·N(0.522 ± 0.01) | (6) |
The exponential value obtained from the parametrization, ν = 0.522 ± 0.01, was notably smaller than that derived from the dataset of denatured proteins, ν = 0.598 ± 0.028, indicating that IDPs are more compact than chemically denatured proteins. Given a natively disordered protein of N residues, this new parametrization provides the expected Rg assuming random coil behaviour as implemented in FM (see above). Fig. 5 displays Rg values measured for 50 IDPs (Table 1) as a function of their length compared with Flory's models derived for chemically denatured and natively disordered proteins. As some IDPs are expected to have populations of secondary or tertiary structure, this parametrization can be used as an interpretative tool, and departures from expected values are indications of compactness of extendedness of the protein. Several proteins present considerable degrees of extendedness (Fig. 5) and, in some cases, this can be rationalized by analyzing the secondary structure predictions. FEZ1 monomer is predicted to have regions prone to form coiled coils.47 Synthetic resilin, (GAPAQTPSSQY)16, probably observes some structuration in its repetitive units that cooperatively enlarges the apparent size.33 II-1 and IB5 expectedly display large Rg values as a result of their high content of proline residues known to form PPII secondary structures.36 A number of Tau protein constructs are also systematically above the threshold because of the presence of proline-rich domains and partially structured regions, which have been identified and characterized by NMR.103,104 According to this compilation of data, the presence of long-range contacts that induce protein compactness is less common. The N-tail of Measles virus, fragment 78–305 of MeCP2, and α-synuclein are examples of compaction. For the latter, NMR studies report transient end-to-end interactions, which are hypothesized to prevent aggregation.105,106
Fig. 5 Dependence of the experimentally measured Rgs with the chain length for IDPs listed in Table 1. Full dots correspond to Tau protein constructs. Straight lines correspond to Flory's relationship parametrizations for the denatured proteins (dashed) and IDPs shown in eqn (6) (solid). Examples highlighted in text presenting enhanced extendedness (blue) or compactness (red) are localized in the plot. |
Fig. 6 Low-resolution SAXS models of the oligomer, fibril, and an on-pathway oligomer of α-synuclein. (A) The p(r) of the early and late fibrils (large spheres, light and dark colors respectively), and the cross section of the early fibril (inset, black spheres) compared with the p(r) of the oligomer [inset, (white spheres)]. (B) SAXS-derived structure of the α-synuclein oligomer. Two orthogonal views of the average structure (mesh representation) and the filtered averaged structure (surface representation) are displayed and superimposed. The filtered structure has a volume corresponding to the average volume of individual models, and the difference between the average and filtered structure indicates the general level of differences between individual models. (C) The SAXS-derived model of the symmetric, early fibril that is present in solution in equilibrium with native species and a fourth component. A single repeating unit is shown in cyan, with the averaged model and the filtered averaged models superimposed in mesh-representation. The principle of the repeats building the mature fibrils is shown to the right, where three repeats of the filtrated model are displayed. Repeats two and three have been translated 880 Å vertically with respect to the first repeat, and the model to the right is rotated 90° around a vertical axis with respect to the left model. (D) Model for the elongation of fibrils. In pink/purple, 26 oligomers constituting one repeating unit of the mature fibril (averaged and filtered model shown in cyan mesh) are displayed in surface representation. Below, the 26 oligomers are superimposed with the fibril repeating unit, whereas the two models are separated above. The lowest representation is rotated 90° around a horizontal axis with respect to the top two models. Figure reprinted from ref. 108. |
The structural characterization of soluble oligomers in amyloidogenic processes is difficult to achieve as these molecules are in dynamic equilibrium with other species, are difficult to isolate, and are sensitive to surface conditions imposed by microscopy techniques. This pioneering work overcame these intrinsic limitations by isolating the scattering signal from complex mixtures instead of isolating molecular species. Using equivalent or similar approaches, we anticipate a pivotal role for SAXS in future studies of amyloidogenic processes linked to disease.
Fig. 7 Schematic representation of the EOM strategy for the analysis of SAXS data in terms of Rg distributions. The M conformations/curves of the pool (random distribution), left part of the figure, are used to generate the initial Cchromosomes and to feed the genetic operators (mutations, crossing and elitism) along the GA process that runs for G generations. The complete process is repeated R independent times, and each run provides N selected structures/curves that fit the experimental profile. The structural analysis of the resulting conformations is displayed on the right part of the scheme, the Rg distribution of the selected (N × R) conformations is compared with that derived from the pool that is considered as a complete conformational freedom scenario. From this comparison it is possible to derive a quantitative structural estimation of the protein conformations coexisting in solution. |
The scattering data from such an ensemble are computed by summing up the individual scattering patterns (eqn (2)). It is assumed that all conformers equally populate the subset, so that the intensity of a subset I(s) containing N conformers is
(7) |
(8) |
The possibility to choose from among millions of combinations of scattering patterns in order to fit the experimental data may give the impression that EOM could provide aliased results fitting SAXS curves with artefacts such as oligomerization, aggregation or interparticle interactions. However, in our experience it is not possible to fit a SAXS curve when the structures contained in the pool are not a reliable representation of what is present in solution. In these cases, the χ2 values obtained are large, and systematic errors are found, especially in the initial part of the curve. In a recent study, Mosbæk et al. achieved good agreement with the experimental SAXS curve measured for Factor VIIa only when the pool was populated with dimers.120 Therefore, in addition to describing the conformational fluctuations of the protein, the oligomerization equilibrium was also characterized. As a consequence of this ability, EOM can also be used to discard or quantify the presence of oligomeric species in solution.113,120
Due to the low-resolution nature of SAXS data, selected conformations cannot be analyzed at residue level as they do not hold this detailed structural information. The structures selected by the EOM, even the most populated ones, cannot be claimed to truly be present in solution. Instead, the EOM ensemble must be interpreted and used as a tool to describe the size and shape properties in terms of the distributions of disordered proteins in solution. This interpretation is supported by the observation that selected ensembles derived from repeated EOM runs starting from distinct random chromosomes contain different conformations but provide similar Rg, Dmax and anisotropy distributions. Therefore, the algorithm is able to find equivalent minima in terms of distributions but not in terms of individual molecular configurations: the latter are not identifiable because of the low resolution of SAXS. Therefore, several high-resolution descriptions of the system give rise to common structural descriptions at a lower resolution. However, the distributions provided by EOM represent a major advance over traditional approaches that condense all structural characteristics of disordered systems in averaged parameters.
It is highly instructive to compare EOM-derived distributions with those of the random ensembles (initial pools) to detect the potential presence of nascent secondary structural elements or low populations of transient long-range contacts as perturbations of the distributions with respect to the pool. The presence of secondary structure induces Rg distributions that are shifted towards larger values. Transient long-range contacts and residual tertiary structure provide more compact distributions than the random coil scenario represented in the pool. It is worth mentioning that this comparison is valid only when the structural model used to build the pool is physically meaningful. In other words, it must provide a true representation of the structural features of complete disorder in the regions defined as flexible. As discussed above, FM and RanCh ensure proper random coil distributions for IDPs and intrinsically disordered regions of proteins. Other approaches, such as molecular dynamics simulations, even when yielding good fits can provide structurally biased ensembles that can induce a misinterpretation of the structural properties of the system when the previously mentioned comparison of distributions is applied.
HMGB1 consists of two tandem HMG-box domains joined by a linker with an acidic C-terminal tail. NMR studies showed that the acidic tail regulates HMGB1-DNA recognition by interacting intramolecularly with the DNA-binding surfaces of both HMG-boxes. On the basis of these studies, which were sensitive mainly to binding events at residue level, it was proposed that HMGB1 is in a dynamic equilibrium between a collapsed tail-bound and an open tail-free state. This scenario has been examined by the combined use of SAXS and NMR for the full-length protein and two deletion mutants lacking various residues of the acidic tail.121 The EOM analysis of a SAXS curve of HMGB1 at 0 mM salt displayed a relatively narrow Rg distribution shifted towards compact structures when compared to the pool of random conformations. The presence of the open conformation was not detected by this analysis, thereby suggesting the predominance of collapsed conformers in solution. Upon the increase of the ionic strength, the Rg distribution became slightly wider as a result of the breakage of some of the anchoring points of the tail, which are mainly of electrostatic nature. Importantly, not fully open conformations were detected in the EOM ensembles. Conversely, deletion mutants, unable to perform intramolecular interactions, displayed Rg distributions that were very similar to the disordered state represented in the pool. The collapsed nature of HMGB1 driven by the acidic tail was validated by NMR relaxation experiments, which yielded a large correlation time for HMGB1 when compared to deletion counterparts, for which the two HMG-boxes showed independent motions. Despite the residue level information provided by NMR, the domain architecture of HMGB1 was achieved only by performing EOM analysis of the SAXS data.
Mylonas et al. applied the multiple SAXS curve fitting approach with EOM to the structural analysis of tau protein,53 an IDP involved in neuronal microtubule stabilization, and found in abnormal deposits in the brain of patients with Alzheimer's disease. Two tau isoforms were studied, and SAXS data for the full-length and several different deletion mutants for each isoform were used. The EOM unambiguously identified the so-called repeat region as the source of residual secondary structure in tau, in perfect agreement with previous NMR data indicating the presence of turns and extended fragments in this region.103,122 The multiple curve fitting, through the averaged Cα–Cα inter-residue distance matrix, identified a distinct conformational behavior depending on the number of repeats present in the isoforms. For isoform ht23, with three repeats, the maximum separation was found within the repeat domain itself. The ht40 isoform, with four domains, revealed an enhanced separation between the repeat domain and the preceding region. These results suggest that the variation in the number of turns (one per repeat) leads to distinct global arrangements of the chain in that region, thus enhancing or shortening the average interdomain distances expected from a random coil.
The EOM approach provides information about the relative interdomain position in flexible multidomain proteins. However, the orientational configuration is missing because of the low-resolution nature of SAXS data. This limitation can be compensated by the use of NMR parameters such as RDCs and Pseudo-Contact Shifts (PCS), which are sensitive to the relative orientation of the domains. This synergy has recently been used to address the structural disorder of calmodulin, a two-domain flexible protein.130 Three RDC and PCS datasets measured in Tb3+, Tm3+ and Dy3+ loaded in the N-terminal domain of calmodulin demonstrated the presence of extensive interdomain mobility. NMR and SAXS data were integrated using a novel and innovative concept called Maximum Occurrence (MO). MO is defined as the fraction of time the molecule can spend in a given conformation. The MO of each conformation is achieved by computing conformational ensembles with an increasing population of this given confirmation whereas the percentage of the other members is optimized to collectively describe all the experimental data. The MO is set as the population at which the disagreement with experimental data exceeds a predefined threshold. This computationally expensive procedure was applied to 400 conformations belonging to a large pool of 56000 that exhaustively sample the translational and orientational interdomain space (Fig. 8). Representation of the MO for each conformation on a common frame showed the prevalence of extended conformations (35%) while closed and fully extended structures, normally trapped in crystallographic studies, reached only 5% and 15% MO, respectively. In addition, regions with very low MO also appeared in the analysis, thus indicating that a fraction of the interdomain space was not sampled (Fig. 8).
Fig. 8 Preferred positions and orientations of the C-terminal domain of calmodulin relative to the N-terminal domain. Frames centered in the center-of-mass of the C-terminal domain of calmodulin, color-coded with respect to the MO of the corresponding conformation from blue (low occupancy < 5%) to red (high occupancy > 40%) for 400 structures generated randomly with Ranch. The MO were obtained by the simultaneous analysis of NMR (RDCs and PCS) and SAXS data. Reprinted with permission from ref. 130. |
Ensemble approaches have proven useful to derive precise pictures of the conformational disorder achieved by flexible macromolecules. However, the time-scale at which these motions proceed remains elusive because of the averaged nature of the information coded in a SAXS experiment. NMR relaxation rates can, in principle, provide this missing information as they are sensitive to reorientation phenomena in solution in a time-scale range that extends from hundreds of picoseconds to tens of nanoseconds. However, the application of NMR relaxation rates to flexible proteins is hampered by the coupling between motion of the individual domains and the overall molecular reorientation.131 In a recent study, these two facets of molecular dynamics, namely structural disorder and time-scale of motions, have been embedded in a single structural/dynamic model by combining SAXS with relaxation rates. This combination has been done using the isotropic Reorientational Eigenmode Dynamics (iRED) approach. Through a spherical harmonics normal mode description, iRED provides a framework to interpret relaxation experiments when a reliable description of the protein ensemble is available.132 This combination has been applied to the ribosomal protein L12, a dimeric two-domain protein connected by a 20 amino acid-long linker (Fig. 9A).133 The EOM analysis of the SAXS curve provided information on the three-dimensional space sampled by the C-terminal domains with respect to the dimeric N-terminal one, and indicated that the overall shape of L12 is larger, more anisotropic, and has longer interdomain distances than that expected for a random linker (Fig. 9). Interestingly, this analysis indicated asymmetric behavior of the two linkers, which is hypothesized to be relevant for biological function when the protein is docked on the ribosome (Fig. 9E). The iRED analysis of the EOM ensemble indicated a small degree of motional coupling between the two domains, thereby suggesting a transient structuration of the linker. Additionally, the asymmetric behavior of the linker was manifested in a degeneracy breakage of some motional modes. Correlation times derived when fitting relaxation rates (R1, R2 and het-NOE) to the motional modes ranged between 4.9 and 9 ns, indicating that the entire range of reorientational motions takes place on a rapid time-scale. This study, like previously described examples, demonstrates that complementarity between SAXS and NMR provide structural and dynamic information that is far beyond the sum of the individual techniques.
Fig. 9 EOM analysis of ribosomal L12 protein. (A) Cartoon of a L12 conformation (1rqu) with the dimerization N-terminal domain (NTD) in green and the C-terminal domain (CTD) in blue, flexible linker is shown in red. (B) Rg distributions from the EOM-selected ensemble (red) and that corresponding to the pool (black). Interdomain distance distributions, NTD-CTD (C) and CTD-CTD (D), from the EOM-selected ensemble (red) and the pool (black). The shift in the distributions of the selected ensemble indicates that L12 behaves as a more extended particle than expected from a random coil linker, and some degree of interdomain correlation is present. (E) Correlation between interdomain NTD-CTD distances in each conformation of the EOM-selected ensemble with respect to those found in the random pool. Positive values indicate overpopulation in the selected ensemble, whereas negative values imply depleted correlations. Peaks outside of the diagonal indicate a strong degree of asymmetry in selected conformations. Adapted with permission from ref. 133. |
SAS | Small-angle scattering |
SAXS | Small-angle X-ray scattering |
SANS | Small-angle neutron scattering |
IDP | Intrinsically disordered protein |
EOM | Ensemble optimization method |
NMR | Nuclear magnetic resonance |
RDC | Residual dipolar coupling |
GA | Genetic algorithm |
FM | Flexible-Meccano |
AUC | Analytical ultracentrifugation |
DLS | Dynamic light scattering |
MO | Maximum occurrence |
CS | Chemical shifts |
PRE | Paramagnetic relaxation enhancement |
Footnote |
† Published as part of a Molecular BioSystems themed issue on Intrinsically Disordered Proteins: Guest Editor M. Madan Babu. |
This journal is © The Royal Society of Chemistry 2012 |