DOI:
10.1039/C1MB05235G
(Review Article)
Mol. BioSyst., 2012,
8, 185-193
An omics perspective of protein disorder†
Received
14th June 2011
, Accepted 29th October 2011
First published on 18th November 2011
Abstract
Disordered regions within proteins have increasingly been associated with various cellular functions. Identifying the specific roles played by disorder in these functions has proved difficult. However, the development of reliable prediction algorithms has expanded the study of disorder from a few anecdotal examples to a proteome-wide scale. Moreover, the recent omics revolution has provided the sequences of numerous organisms as well as thousands of genome-wide data sets including several types of interactomes. Here, we review the literature regarding genome-wide studies of disorder and examine how these studies give rise to new characterizations and categories of this elusive phenomenon.
Introduction
Intrinsic protein disorder has presented both a technical and conceptual challenge to the study of proteins. While it has long been recognized as a possible type of protein structure1 and has been observed in specific contexts, its role (or rather, its many roles) in biological systems has not been satisfactorily uncovered.
One of the major issues in understanding the role of disordered proteins lies with the difficulty in studying them with traditional experiments.2 For instance, the gold standard in protein structure determination, X-ray crystallography, can usually not observe unstructured regions as they are missing from crystal structures.3NMR spectroscopy is more adept at observing protein dynamics, however, because of a number of technical issues and its quite resource intensive nature, it is limited to studying single proteins, rather than performing large-scale analyses. Despite these difficulties, such studies have revealed that protein disorder really is a multi-faceted phenomenon and that many different kinds of disorder exist, further confounding its study.4 Broadly, disordered regions range from forms close to random coils to a molten globule-like state to forms that are generally found structured when bound to a ligand.5
Beginning with Romeo et al. in the late 1990s, it was recognized that disordered regions have properties at the primary amino acid sequence level that could be exploited for predicting their location. The prediction of protein disorder began with the observation that amino acids in disordered regions had a clear hydrophilic bias in comparison to ordered regions. The first algorithm designed for specific disorder prediction was PONDR (Predictor of Naturally Disordered Regions)2 followed by DisEMBL6 and GlobPlot,7 and then FoldIndex8 and NORS.9 At the same time, the first Critical Assessment of protein Structure Prediction (CASP) experiment was organized, initially focused on the prediction of structured proteins.10 Due to the increasing number of proteins discovered with disordered regions, the prediction of disordered regions in proteins was added to the experiment in CASP5.10 Since then, several other algorithms have become available including DISOPRED,11 IUPred,12 FoldIndex,13RONN,14SPRITZ,15 Wiggle16 and MD.17 These predictors allowed the examination of protein disorder to broaden from a set of known experimental examples to an effectively proteome wide scale including a variety of previously unstudied proteins and organisms. Now, a large array of predictors exists that, in their third generation, have adopted modern machine learning methods, such as SVMs,11 neural networks,18 semi-supervised learning19 and meta-learning.20 Recently, the accuracy of these algorithms to predict disorder has become quite high with an area under the receiver operating characteristic curve (AUC) consistently above 0.8.21,22
Around the same time that the study of protein disorder became a field of study for computational biologists, advances in technology led to a massive invigoration of systems biology, and a variety of high throughput, genome wide data sets were becoming available for various model organisms. This included data sets of a variety of gene or protein associated features, such as gene expression level, protein count and half-life, expression noise, deletion phenotypes, and over-expression phenotypes. Moreover, a number of features describing interactions were measured at a genomic scale, such as protein–protein interactions, protein–DNA interactions, and genetic interactions. Finally, the comparative genomics revolution yielded a wealth of protein sequence features, such as protein domains, linear motifs or phosphosites,23–26 as well as their relationship to protein disorder. This wealth of data enabled great advances in the investigation of the function of protein disorder. Subsequently, a great variety of different roles of protein disorder have been discovered (for a review see ref. 27–29).
This review focuses on the various recent genome wide approaches to characterize disorder, and the picture being painted of this surprisingly ubiquitous phenomenon. In particular, we aim to bring together categorizations of protein disorder with its different functions, thereby structuring this diverse phenomenon. We begin by reviewing general properties and functions associated with protein disorder. We then discuss the evolutionary properties of disordered proteins and examine a number of different categorizations of disorder. Finally, we look at how the protein interaction network, the genetic interaction network and gene regulation have revealed types of protein disorder and how protein disorder can be used to characterize these phenomena.
Protein disorder as a diverse and widespread phenomenon
The availability of reliable disorder predictors has allowed studies of protein disorder to analyze any sequenced organism and to encompass their entire proteomes. This has allowed for both cross-species comparisons and unbiased comparisons among proteins of a given species.11 Perhaps the first and most surprising result of disorder prediction was the ubiquity of protein disorder among eukaryotes. Indeed disordered regions (more than 30 amino acids) were found in 2.0% of archaean, 4.2% of eubacterial and 33.0% of eukaryotic proteins.11 In the budding yeast S. cerevisiae, 50–60% of proteins contain at least one disordered region.30 On the basis of individual residues, disordered residues account for roughly 33% of all residues in the human proteome. Even if disorder is far less common in prokaryotes, it still appears to serve many similar roles in certain functions as in eukaryotes.30 The prevalence of disorder among eukaryotes has been hypothesized to be an important step in the evolution of complexity and to reflect the complexity of signaling and regulatory process within these organisms.31
Disordered proteins are present in a variety of cellular processes. Ward et al. performed a functional analysis of the yeast proteome using GO.11 Disordered proteins were over-represented in various molecular functions including transcription, phosphorylation, nucleic acid and protein binding activity. In terms of biological processes, transposition, development, morphogenesis, regulation, transcription and signal transduction were significantly enriched. In human proteins, Lobley et al.32 found that disordered proteins are enriched for regulation of transcription, protein phosphorylation, mRNA metabolism, RNA processing and regulation of cell cycle to name a few. Disorder has a major role in signaling and regulation. It is for instance a signature of post-translational modifications such as phosphorylation33 and has a key role in alternative splicing since regions of disorder were shown to correlate with alternative splicing breaks in human proteins.34 While this might indicate that disorder serves only a regulatory role in the cell, it also is known to occur in other key functional groups such as ribosomal proteins and chaperones.5 The chaperone HSP90 indeed contains two negatively charged and unstructured regions, which were recently shown to contribute to its anti-aggregation activity.35
Disordered regions often contain short linear motifs such as SH3 domains that are important for protein function and interactions.6 In line with this, Beltrao et al. showed that intrinsically disordered regions are enriched in binding sites for peptide recognition domains compared to ordered regions.36 In addition, some proteins are unstructured when they are isolated but acquire a fixed conformation upon binding to a partner. For instance, calmodulin-binding kinases change from the disordered state to the ordered state when binding with calmodulin. Their calmodulin binding domain transitions from a non-compact extended structure to compact α-helix.37
The role of disorder in transport was recently illustrated in nuclear pore complexes in yeast.38 Nucleoporin proteins are indeed known to contain disordered regions with multiple FG domains (phenylalanine–glycine repeats). Yamada et al. showed that those domains contains different disordered structures that contribute to the formation of a tubular gate structure at the nuclear pore complex center.38
Recently, disorder has also been associated with human disease.39 For instance, the tumor suppressor gene encoding the breast cancer protein 1 (BRCA1) is implicated in many cellular pathways including transcription, cell-cycle checkpoint control, apoptosis and DNA repair. Mark et al. showed that more than 80% of BRCA1 is unstructured suggesting that the central part of the protein acts as a long flexible scaffold for the integration of multiple signals in the DNA damage response pathway through intermolecular interactions.40
Evolutionary categorizations of protein disorder
Studies of the evolution and conservation of disordered proteins has produced a number of interesting and, at first glance, contradictory results. Several groups have pointed out that disordered regions evolve quicker than structured regions, e.g., as measured by the ratio of non-synonymous to synonymous mutations.41,42 This has generally been understood as reflective of a lack of structural constraint on these disordered regions. Additionally, highly disordered proteins are, in general, less conserved in terms of the presence of an ortholog across the yeast clade.43
Conversely, the property of disorder tends to be conserved in many regions between species even if their amino acid sequence is not.43–45 That is, the amino acid sequence may drift between species but the region is maintained as disordered. Curiously, there appears to be a functional distinction between disordered proteins where the amino acid sequence is allowed to drift versus those where the sequence is maintained;43proteins where disordered regions had a “flexible” amino acid sequence are associated with cell signaling and regulation, while disordered regions where amino acid sequence is conserved are associated with ribosomal proteins and protein chaperones.43 In an in-silico study,46 the authors found that disordered regions were actually more sensitive to mutation than conventional types of secondary structure. This, in combination with the abundant conservation of disordered regions, may indicate that there is a great deal of evolutionary pressure exerted in maintaining these regions despite the lack of sequence conservation. Bellay et al. speculated that the difference in conservation may have to do with the function the disorder serves, and whether the disordered region has an alternative structured state under certain conditions.
These varying evolutionary properties of disordered regions were exploited to categorize protein disorder into three groups (Fig. 1). First, following the observations above, “flexible” disorder corresponds to regions where the property of disorder is conserved, but not the sequence. Second, in constrained disorder both sequence and disorder are conserved. Lastly, in non-conserved disorder the property of disorder is not conserved.
 |
| Fig. 1 A schematic relating various proposed categories of protein disorder to the sequence defined categories of “flexible” and “constrained” disorder. The left gray rectangle contains properties and categories associated with flexible disorder while the rectangle on the right contains properties and categories associated with constrained disorder. The upper region of each rectangle includes associated GO terms, while the lower regions show speculated memberships of other categories of disorder. | |
Regulation of disordered proteins
Disordered proteins have been shown to be under tight regulation in ref. 47. The authors show that disordered proteins are both under-expressed compared to other proteins and also have a shorter half-life. A natural interpretation of this result is that cells must careful control production of disordered proteins due to their propensity to aggregate as reported in ref. 48. Complementing this result, in ref. 49 Vavouri et al. observed that the proportion of disorder in a protein better predicted fitness defects resulting from over-expressing this protein than any other predictors including protein abundance, aggregation potential, half-life, and codon usage. They propose that as disordered proteins take part in multiple specific but low affinity protein interactions in cell signaling,50,51 over-expression of these proteins results in promiscuous molecular interactions and general malfunction of the cell.49,52 In addition, some evidence was presented that disorder itself served as a signal for protein degradation53 through ubiquitin-dependent or -independent proteasome-mediated degradation mechanism.53,54 For example, highly unstructured domains in N and C termini of p53 are responsible for their susceptibility to 20S proteasomal degradation.55
It is not the case, however, that tight regulation is a property of all kinds of disorder. In fact, in ref. 43 it is shown that only disordered regions that are evolutionarily flexible are under tight regulation. Proteins with sequentially conserved disordered regions are often highly expressed (for example, ribosomal proteins and chaperones). In combination these results appear to imply that the tight regulation observed for disordered proteins is actually the tight regulation of proteins enriched for molecular recognition sites within their disordered regions (i.e.phosphorylation motifs). This implication has an interesting corollary: if these signaling proteins are tightly regulated, then the post-transcriptional signaling systems of the cell are closely linked to transcriptional regulation. Stoichiometric relationships in this signaling network must be tightly regulated as indicated in ref. 49, and the fitness decrease caused by over-expression is a result of signaling failure rather than the physical properties of disordered proteins.
The role of protein disorder in protein interactions
As suggested by the enrichment of regulatory and signaling functions of highly disordered proteins, protein disorder has often been associated with the binding properties of proteins. In recent studies, it has become clear that disorder is important for many protein–protein interactions (PPIs), and it contributes to these interactions through a variety of mechanisms.57Protein disorder only shows a weak correlation with protein interaction degree as measured by various high throughput studies. Previously, it was reported in ref. 59 that there is no correlation between percent disorder and protein interaction degree. This complete lack of correlation may in part be due to the biased nature of the protein interaction studies available at the time; percent disorder does correlate with degree in recent genome wide complex pull-downs,60,61 yeast-two-hybrid,62 and kinase interactions,63 though not with PCA derived interactions64 (see Fig. 2A). However, even without these unbiased data sets, this lack of correlation was addressed in ref. 42, where the authors divide protein interaction hubs into those with few interfaces (singlish-interface hubs) and those with many interfaces (multi-interface hubs). They found that singlish-interface hubs are clearly more disordered than multi-interface hubs; the implication being that disorder more commonly plays a role in transient protein interactions (which is implied by a protein with a single interface but multiple partners) than in stable interactions (which may be the case in proteins that have multiple interfaces and thus may be members of stable protein complexes). This interaction of singlish and multi-interface hubs with disorder was explored further in ref. 43, where it is shown that single interface protein interaction hubs are enriched for “flexible” disorder, while no similar differentiation can be seen in protein hubs with multiple interfaces. Finally, in ref. 65 the authors examine protein complexes and find that three quarters include disordered domains.
 |
| Fig. 2 (A) The Spearman correlation coefficients between the degree on various networks and percent of total disorder, flexible disorder, and constrained disorder. The networks were derived from the following sources: AP/MS is the combination of ref. 60 and 61, Y2H is from ref. 62, PCA is from ref. 64, KIN-in is the number of interactions a protein has with the kinase set of ref. 63, while KIN-out is the number of interactions of the kinase set of ref. 63, TF-in is the number of transcription factors that bind the DNA of a gene, while TF-out is number of DNA binding interactions of the protein. (B) The mean percent disorder of modular and non-modular hubs in the genetic interaction network. Modular hubs have more than 80% of their interactions in modular structures, while non-modular hubs have equal or fewer than 80% of their interactions in modular structures. The error bars are bootstrapped 95% confidence intervals. | |
Protein interactions have been traditionally assumed to occur between globular domains, which are large (30 AAs or more) and are remarkably well conserved between species. In contrast, linear motifs are short sequences (often less than 10 AAs)70 and their rate of conservation varies greatly between species. Short peptide sequences that are associated with recognition have been described for at least two decades.71 A number of specific protein–protein interactions that are facilitated by linear motifs have been characterized.72 For instance, many linear motifs correspond to the binding regions of peptide recognition domains, such as SH3 or PDZ domains, which are central in cellular signaling.73 They are well known to act in molecular targeting. Sometimes a single motif is sufficient but in many cases multiple linear motifs add specificity to an interaction,74 or a single module is capable of binding several different motifs.75 These motifs may be involved in stable interactions as well.76
Like protein disorder, the study of linear motifs has greatly benefitted from the omics revolution, and several linear motif prediction approaches have allowed for study of an otherwise elusive phenomenon in a variety of species.23,25,26 The lack of conservation of motifs between species is an especially interesting point. It has been suggested that the short size of the motifs allow for evolutionary flexibility in that a single mutation can bring a new linear motif into being inducing a new interaction, or reduce the affinity of two previous binding proteins.77,78 It could be imagined that these small variations might be a way in which the signaling network moves towards a more robust dynamic in the sense proposed by ref. 79.
The majority of linear motifs reside in disordered regions of proteins,66,77 suggesting that disorder facilitates the binding to them. The same is true for phosphosites.43 In particular, it was found that the regions housing phosphosites or linear motifs were mostly composed of “flexible disorder”, i.e. unconstrained and freely evolving.43 In a similar result, it was observed in ref. 80 that while the recognition site itself is often well conserved, the surrounding disordered region may evolve quickly. The disordered regions that flank linear motifs appear to be under less evolutionary constraint than other parts of proteins or even other disordered regions43,80,81 These recognition sites appear to display a great deal of evolutionary flexibility,78 and may reflect the flexible nature of the signaling pathways they support.77
Protein disorder in genetic interaction networks
Disordered proteins demonstrate interesting characteristics on other types of interaction networks as well. In ref. 86, it was observed that percent disorder had a positive correlation with negative genetic interaction degree. A negative genetic interaction between two genes implies that the deletion of both genes produces a surprisingly severe fitness defect in comparison with the fitness defects induced by deletion of either gene by itself. The enrichment of the often fast evolving disordered proteins for genetic interactions was in contrast to other genetic interaction hubs which tended to be conserved both in terms of dN/dS and phylogenetic persistence.43
Given the association of flexible disordered regions and recognition sites, we hypothesized that this enrichment of genetic interactions was largely due to the important signaling role played by disordered proteins. However, explaining the trend in this fashion proved difficult. In ref. 87, Bellay et al. used an exhaustive modular decomposition of the genetic interaction network to differentiate between negative interactions that occurred between redundant modules and those that were non-modular and hence might be an indication of a general sensitivity induced by the gene's deletion. Using this dichotomy between modular and non-modular interactions, they find that among genetic interaction hubs, hubs with mostly modular interactions (>80%) are enriched for flexible disorder (p < 0.005, KS-test), while hubs with fewer modular interactions (<80%) show no distinction between types of disorder (p > 0.5, KS-test, Fig. 2B). Thus, an enrichment of flexible disorder implies a clear “signaling” signature and therefore proteins with regions of flexible disorder occur in multiple contexts. Their tendency to have more genetic interactions is due to their presence in various, often buffered, functions and pathways, and their genetic interactions are contained in modules that correspond to these different contexts. However, disorder exists in other important types of proteins as well as indicated by the enrichment of genetic hubs for both types of disorder.
Structural characterizations of protein disorder and their roles in protein interactions
A number of other, often conceptual rather than systematic, categorizations of disorder have been proposed (Table 1). For instance, a continuum of disordered regions has been proposed based upon their dynamic properties.5,88,89 Undoubtedly, this variability exists, but it is difficult to ascertain for a given protein without appropriate data. We examined specific examples given in ref. 89 for “coil-like” and “premolten-globule-like” (PMG) proteins where homologs to yeast proteins were available. Curiously, there did not appear to be a clear trend with the categories of flexible and constrained disorder. For example, ribosomal proteins, which have tightly constrained sequences, appear as both coil-like and PMG proteins. Tompa and Fuxreiter proposed a different categorization of disorder based on the presumed behavior of different disordered regions upon binding.56 Tompa also defined a hierarchy of disorder based on both structural and binding characteristics,57 with probably some correspondence to flexible and constrained disorder. Finally, Rauscher and Pomes defined disorder based upon its observed behaviour in molecular dynamics simulations. In Fig. 1, we propose loose relationships between the categorizations, but it would be interesting to explore them in a systematic manner. For now there seems to be a range of structural flexibility and binding properties in disorder, both of which manifests in different evolutionary dynamics.
Table 1
Ref. |
Classification type |
Categories |
Bellay et al. (2011)43 |
Classified by conservation of disorder and conservation of AA sequence |
Flexible Disorder: Disorder is conserved but specific AA sequence is not. Enriched in signaling proteins. |
Constrained disorder: Disorder and AA sequence are conserved. Enriched in ribosome and protein refolding. |
Non-conserved disorder: Disorder is not conserved across species. |
|
Dyson & Wright (2005)5 |
Classified by degree of disorder |
Unstructured: Entirely unstructured through may fold upon binding. |
Molten Globule: Mostly disordered but can form a compact globule. |
Linked domain: Disorderd regions between binding domains. |
Mostly folded: Structured protein with a small disordered region. |
|
Tompa & Fuxreiter (2008)56 |
Classified by behavior after binding. |
Static: Fold into a fixed confirmation upon binding. |
Static Polymorphic: Have multiple possible folded conformations. |
Dynamic Clamp: Disordered region between domains. |
Dynamic Flanking: Disorder flanks a binding site such as a linear motif. |
Dynamic Random: Disorder has no fixed conformation even after binding. |
|
Rauscher & Pomes (2010)92 |
Classified by whether disordered region folds after binding. |
Folders: Disordered proteins that fold after binding. |
Unfolders: Disordered proteins that do not have an alternative structured confirmation. |
|
Tompa (2005)57 |
A hierarchy of disorder based on both structural and binding characteristics. |
Entropic Chains: completely disorder proteins that have no structured conformation. |
Display sites: Regions of disorder that house molecular recognition sites. |
Effectors: Modulate the activity of a partner molecule. |
Assemblers: assemble complexes. |
Scavengers: Neutralize small ligands. |
|
Vucetic et al. (2003)58 |
Unsupervised learning of categories. |
Flavor V: Associated with ribosomal proteins. |
Flavor C: Associated with modification sites and DNA binding. |
Flavor S: Associated with protein binding. |
As another systematic classification, Vutecic and co-workers applied an unsupervised learning algorithm to different proteins enriched in protein disorder and classified them into three different “flavors”,58 with proteins belonging to the V flavor being largely enriched in constrained disorder, whereas proteins that belong to the C or S flavor are enriched in flexible disorder (Fig. 1).
Likewise, different categories of disorder mediated protein binding have been proposed.56–58,66 Interestingly, in ref. 56 it is observed that protein disorder may play roles in a wide assortment of protein–protein interactions. While the authors acknowledge that disordered regions of proteins may undergo a disorder-to-order transition upon binding to a target protein as was observed in ref. 67, they also note that the disordered region may not have a well defined conformation even after binding and may range from having several conformations to remaining completely disordered and random. In fact, there is significant evidence that some of these regions that don't have an alternative structured conformation do not lose their functionality if the specific AA sequence is scrambled,68,69 along with the aforementioned observation that many disordered regions are conserved as such but the sequence is not conserved. Tompa and Fuxreiter propose that disorder may allow for multiple binding types between two proteins (moon lighting), may provide a flexible tether between two structured binding domains, may flank a binding or recognition domain, or in some cases maintain a completely disordered state even after binding.
A different type of disorder involved in protein interactions is involved in linear motifs. As these are embedded in disordered regions, it is tempting to speculate that the physical flexibility facilitates binding to the linear motifs, perhaps by increasing accessibility of the short stretch of amino acids the motif is composed of. After all, disordered regions would most closely resemble isolated peptides, and many peptide recognition domains or protein kinases bind peptides well.73,82
Disorder appears to be essential in other types of PPIs as well. For example, the ribosomal protein Rpl5, consisting largely of constrained disorder, undergoes an disorder-to-order transition upon binding to 5S rRNA.67 An equally important but more mysterious case is that of protein chaperones, which have long regions of disorder. The importance of these regions is well established,83,84 but the function of these regions is an area of active investigation.85 Various mechanisms have been proposed to explain disorder's importance in these proteins, including that the hydrophilicity of disordered regions stabilize the chaperone's client, that disordered regions may shield the client from interactions with other molecules, or that there may be a disorder-to-order transition that aids in refolding the client protein (see ref. 85 for a comprehensive review).
Conclusion
The combination of the availability of genome wide data sets with the simultaneous development of reliable protein disorder prediction opens new approaches to characterizing the roles of protein disorder. Genome-wide characteristics can be used to differentiate the particular contexts in which protein disorder functions, and the characteristics of an incidence of disorder that facilitates that function. This approach promises to lead to advances in areas that are traditionally difficult to access for experimental approaches. However, two main stumbling blocks continue to plague genomics approaches to protein disorder. First, bioinformatic approaches have been used to discover regions of disorder and study properties of disordered proteins, but rarely have they been used to specify mechanisms or functions of specific regions of disorder. Second, the hypotheses generated using genome wide data and protein disorder prediction must eventually be validated through biophysical experiments; a step that is sorely missing from almost all current genome-wide analyses of disorder.
Both of these problems might be addressed as disorder bioinformaticians turn from considering the properties of disordered proteins, to specific functions of regions of disorder. While metrics such as percent disorder have uncovered interesting trends relating disordered proteins to various biological characteristics and diseases, it is becoming increasingly clear that disordered regions do not necessarily serve the same function between different proteins, or even within the same protein. Similar to the efforts to identify and classify protein domains, a program of disorder region classification appears to be warranted.
Assigning functions to disordered regions will certainly provide new challenges to the bioinformatics community. Functional regions of disorder (for example those that house linear motifs) often show little conservation at the sequence level and therefore elude many of the approaches used to classify and track protein domains. However, progress has already made on differentiating disordered regions; for example, by tracking the placement of linear motifs and the conservation of the specific AA sequence as mentioned above. More importantly, by assigning putative functions to specific regions of disorder, bioinformaticians provide experimentalist with clear targets to study. The collection of small scale experimental results (such as are collected in the database DisPROT90) provides the link between hard won experimental results and expansion by leveraging genomics data.
Protein disorder prediction itself could also benefit from a focus on disorder function. Rather than simply predicting the physical quality of protein disorder, perhaps it might be possible to predict functionally specific regions of disorder. In addition, differentiating types of disorder might allow for more accurate predictors, even only on a structural level. Finally, omics approaches can be used to better understand the role played by disorder in more complex eukaryotic functions, and its associations with disease. This has already begun in such works as ref. 91. For too long the protein disorder community has been in the business of proving the existence and importance of protein disorder. It is time to take the next step, and with the help of genomics, understand the particulars and varieties of this phenomenon.
References
- L. Pauling, A Theory of the Structure and Process of Formation of Antibodies*, J. Am. Chem. Soc., 1940, 62, 2643–2657, DOI:10.1021/ja01867a018.
-
P. Romero, Z. Obradovic, C. Kissinger, J. E. Villafranca, A. K. Dunker, Identifying disordered regions in proteins from amino acid sequence, in Neural Networks, 1997, International Conference on. vol. 1. pp. 90–95 vol. 1. DOI: 10.1109/ICNN.1997.611643.
- R. Huber, Conformational flexibility and its functional significance in some protein molecules, Trends Biochem. Sci., 1979, 4, 276–277, DOI:10.1016/0968-0004(79)90298-6.
- A. K. Dunker, C. Oldfield, J. Meng, P. Romero and J. Yang,
et al. The unfoldomics decade: an update on intrinsically disordered proteins, BMC Genomics, 2008, 9, 272–276 Search PubMed.
- H. J. Dyson and P. E. Wright, Intrinsically unstructured proteins and their functions, Nat. Rev. Mol. Cell Biol., 2005, 6, 197–208, DOI:10.1038/nrm1589.
- R. Linding, L. J. Jensen, F. Diella, P. Bork and T. J. Gibson,
et al. Protein disorder prediction: implications for structural proteomics, Structure, 2003, 11, 1453–1459 CrossRef CAS.
- R. Linding, R. B. Russell, V. Neduva and T. J. Gibson, GlobPlot: Exploring protein sequences for globularity and disorder, Nucleic Acids Res., 2003, 31, 3701–3708 CrossRef CAS.
- J. Prilusky, C. E. Felder, T. Zeev-Ben-Mordehai, E. H. Rydberg and O. Man,
et al. FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically unfolded, Bioinformatics, 2005, 21, 3435–3438, DOI:10.1093/bioinformatics/bti537.
- J. Liu and B. Rost, NORSp: Predictions of long regions without regular secondary structure, Nucleic Acids Res., 2003, 31, 3833–3835 CrossRef CAS.
- J. Moult, J. T. Pedersen, R. Judson and K. Fidelis, Proteins: Struct., Funct., Genet., 1995, 23, ii–v, DOI:10.1002/prot.340230303.
- J. J. Ward, J. S. Sodhi, L. J. McGuffin, B. F. Buxton and D. T. Jones, Prediction and functional analysis of native disorder in proteins from the three kingdoms of life, J. Mol. Biol., 2004, 337, 635–645, DOI:10.1016/j.jmb.2004.02.002.
- Z. Dosztányi, V. Csizmok, P. Tompa and I. Simon, IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content, Bioinformatics, 2005, 21, 3433–3434, DOI:10.1093/bioinformatics/bti541.
- J. Prilusky, C. E. Felder, T. Zeev-Ben-Mordehai, E. H. Rydberg and O. Man,
et al. FoldIndex©: a simple tool to predict whether a given protein sequence is intrinsically unfolded, Bioinformatics, 2005, 21, 3435–3438, DOI:10.1093/bioinformatics/bti537.
- Z. R. Yang, R. Thomson, P. McNeil and R. M. Esnouf, RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins, Bioinformatics, 2005, 21, 3369–3376, DOI:10.1093/bioinformatics/bti534.
- A. Vullo, O. Bortolami, G. Pollastri and S. C. E. Tosatto, Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines, Nucleic Acids Res., 2006, 34, W164–168, DOI:10.1093/nar/gkl166.
- J. Gu, M. Gribskov and P. E. Bourne, Wiggle-predicting functionally flexible regions from primary sequence, PLoS Comput. Biol., 2006, 2, e90, DOI:10.1371/journal.pcbi.0020090.
- A. Schlessinger, M. Punta, G. Yachdav, L. Kajan and B. Rost, Improved disorder prediction by combination of orthogonal approaches, PLoS One, 2009, 4, e4433, DOI:10.1371/journal.pone.0004433.
- P. Romero, Z. Obradovic, X. Li, E. C. Garner and C. J. Brown,
et al. Sequence complexity of disordered protein, Proteins: Struct., Funct., Genet., 2001, 42, 38–48 CrossRef CAS.
- K. Shimizu, Y. Muraoka, S. Hirose, K. Tomii and T. Noguchi, Predicting mostly disordered proteins by using structure-unknown protein data, BMC Bioinformatics, 2007, 8, 78, DOI:10.1186/1471-2105-8-78.
- T. Ishida and K. Kinoshita, Prediction of disordered regions in proteins based on the meta approach, Bioinformatics, 2008, 24, 1344–1348, DOI:10.1093/bioinformatics/btn195.
- O. Noivirt-Brik, J. Prilusky and J. L. Sussman, Assessment of disorder predictions in CASP8, Proteins: Struct., Funct., Bioinf., 2009, 77, 210–216, DOI:10.1002/prot.22586.
- G. Chopra, N. Kalisman and M. Levitt, Consistent refinement of submitted models at CASP using a knowledge-based potential, Proteins: Struct., Funct., Bioinf., 2010, 78, 2668–2678, DOI:10.1002/prot.22781.
- C. M. Gould, F. Diella, A. Via, P. Puntervoll and C. Gemund,
et al. ELM: the status of the 2010 eukaryotic linear motif resource, Nucleic Acids Res., 2009, 38, D167–D180, DOI:10.1093/nar/gkp1016.
-
Protein Data Bank (n.d.) [http://www.pdb.org].
- R. J. Edwards, N. E. Davey and D. C. Shields, SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in Proteins, PLoS One, 2007, 2, e967, DOI:10.1371/journal.pone.0000967.
- J. C. Obenauer, L. C. Cantley and M. B. Yaffe, Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs, Nucleic Acids Res., 2003, 31, 3635–3641 CrossRef CAS.
- H. Xie, S. Vucetic, L. M. Iakoucheva, C. J. Oldfield and A. K. Dunker,
et al. Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions, J. Proteome Res., 2007, 6, 1882–1898, DOI:10.1021/pr060392u.
- S. Vucetic, H. Xie, L. M. Iakoucheva, C. J. Oldfield and A. K. Dunker,
et al. Functional Anthology of Intrinsic Disorder. 2. Cellular Components, Domains, Technical Terms, Developmental Processes, and Coding Sequence Diversities Correlated with Long Disordered Regions, J. Proteome Res., 2007, 6, 1899–1916, DOI:10.1021/pr060393m.
- H. Xie, S. Vucetic, L. M. Iakoucheva, C. J. Oldfield and A. K. Dunker,
et al. Functional Anthology of Intrinsic Disorder. 3. Ligands, Post-Translational Modifications, and Diseases Associated with Intrinsically Disordered Proteins, J. Proteome Res., 2007, 6, 1917–1932, DOI:10.1021/pr060394e.
- P. Tompa, Z. Dosztanyi and I. Simon, Prevalent structural disorder in E. coli and S. cerevisiae proteomes, J. Proteome Res., 2006, 5, 1996–2000, DOI:10.1021/pr0600881.
- A. Schlessinger, C. Schaefer, E. Vicedo, M. Schmidberger and M. Punta,
et al. Protein disorder—a breakthrough invention of evolution?, Curr. Opin. Struct. Biol., n.d. Search PubMed In Press, Corrected Proof. Available: http://www.sciencedirect.com/science/article/B6VS6-52NNJ21-2/2/98c2edf98dfd1ed1a2fbe5656d2d0c3e. Accessed 13 May 2011.
- A. Lobley, M. B. Swindells, C. A. Orengo and D. T. Jones, Inferring Function Using Patterns of Native Disorder in Proteins, PLoS Comput. Biol., 2007, 3, e162, DOI:10.1371/journal.pcbi.0030162.
- L. M. Iakoucheva, P. Radivojac, C. J. Brown, T. R. O'Connor and J. G. Sikes,
et al. The importance of intrinsic disorder for protein phosphorylation, Nucleic Acids Res., 2004, 32, 1037–1049, DOI:10.1093/nar/gkh253.
- M. M. Pentony and D. T. Jones, Modularity of intrinsic disorder in the human proteome, Proteins: Struct., Funct., Bioinf., 2010, 78, 212–221, DOI:10.1002/prot.22504.
- N. Wayne and D. N. Bolon, harge-rich regions modulate the anti-aggregation activity of Hsp90, J. Mol. Biol., 2010, 401, 931–939, DOI:10.1016/j.jmb.2010.06.066.
- P. Beltrao and L. Serrano, Comparative Genomics and Disorder Prediction Identify Biologically Relevant SH3 Protein Interactions, PLoS Comput. Biol., 2005, 1, e26, DOI:10.1371/journal.pcbi.0010026.
- Y. Zhang, H. Tan, G. Chen and Z. Jia, Investigating the disorder-order transition of calmodulin binding domain upon binding calmodulin using molecular dynamics simulation, J. Mol. Recognit., 2009, 23, 360–368, DOI:10.1002/jmr.1002.
- J. Yamada, J. L. Phillips, S. Patel, G. Goldfien and A. Calestagne-Morelli,
et al. A bimodal distribution of two distinct categories of intrinsically-disordered structures with separate functions in FG nucleoporins, Mol. Cell. Proteomics, 2010 Search PubMed Available: http://www.mcponline.org/content/early/2010/04/05/mcp.M000035-MCP201.abstract. Accessed 3 Jun 2011.
- V. Uversky, C. Oldfield, U. Midic, H. Xie and B. Xue,
et al. Unfoldomics of human diseases: linking protein intrinsic disorder with diseases, BMC Genomics, 2009, 10, S7, DOI:10.1186/1471-2164-10-S1-S7.
- W.-Y. Mark, J. C. C. Liao, Y. Lu, A. Ayed and R. Laister,
et al. Characterization of segments from the central region of BRCA1: an intrinsically disordered scaffold for multiple protein–protein and protein–DNA interactions?, J. Mol. Biol., 2005, 345, 275–287, DOI:10.1016/j.jmb.2004.10.045.
- Y. Xia, E. A. Franzosa and M. B. Gerstein, Integrated Assessment of Genomic Correlates of Protein Evolutionary Rate, PLoS Comput. Biol., 2009, 5, e1000413, DOI:10.1371/journal.pcbi.1000413.
- P. M. Kim, A. Sboner, Y. Xia and M. Gerstein, The role of disorder in interaction networks: a structural analysis, Mol. Syst. Biol., 2008, 4 Search PubMed . Available: http://dx.doi.org/10.1038/msb.2008.16. Accessed 3 Mar 2010.
- J. Bellay, S. Han, M. Michaut, T. Kim and M. Costanzo,
et al. Bringing order to protein disorder through comparative genomics and genetic interactions, GenomeBiology, 2011, 12, R14 CrossRef CAS.
- J. W. Chen, P. Romero, V. N. Uversky and A. K. Dunker, Conservation of Intrinsic Disorder in Protein Domains and Families: I. A Database of Conserved Predicted Disordered Regions, J. Proteome Res., 2006, 5, 879–887, DOI:10.1021/pr060048x.
- J. W. Chen, P. Romero, V. N. Uversky and A. K. Dunker, Conservation of Intrinsic Disorder in Protein Domains and Families: II. Functions of Conserved Disorder, J. Proteome Res., 2006, 5, 888–898, DOI:10.1021/pr060049p.
- C. Schaefer, A. Schlessinger and B. Rost, Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be, Bioinformatics, 2010, 26, 625–631, DOI:10.1093/bioinformatics/btq012.
- J. Gsponer, M. E. Futschik, S. A. Teichmann and M. M. Babu, Tight Regulation of Unstructured Proteins: From Transcript Synthesis to Protein Degradation, Science, 2008, 322, 1365–1368, DOI:10.1126/science.1163581.
- A.-M. Fernandez-Escamilla, F. Rousseau, J. Schymkowitz and L. Serrano, Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins, Nat. Biotechnol., 2004, 22, 1302–1306, DOI:10.1038/nbt1012.
- T. Vavouri, J. I. Semple, R. Garcia-Verdugo and B. Lehner, Intrinsic protein disorder and interaction promiscuity are widely associated with dosage sensitivity, Cell, 2009, 138, 198–208, DOI:10.1016/j.cell.2009.04.029.
- V. Narayan, P. Halada, L. Hernychova, Y. P. Chong and J. Zakova,
et al. A multi-protein binding interface in an intrinsically disordered region of the tumour suppressor protein interferon regulatory factor-1, J. Biol. Chem., 2011 Search PubMed . Available: http://www.jbc.org/content/early/2011/01/18/jbc.M110.204602.abstract. Accessed 24 May 2011.
- M. O. Collins, L. Yu, I. Campuzano, S. G. N. Grant and J. S. Choudhary, Phosphoproteomic Analysis of the Mouse Brain Cytosol Reveals a Predominance of Protein Phosphorylation in Regions of Intrinsic Sequence Disorder, Mol. Cell. Proteomics, 2008, 7, 1331–1348, DOI:10.1074/mcp.M700564-MCP200.
- E. M. Marcotte and M. Tsechansky, Disorder, promiscuity, and toxic partnerships, Cell, 2009, 138, 16–18, DOI:10.1016/j.cell.2009.06.024.
- P. Tompa, J. Prilusky, I. Silman and J. L. Sussman, Structural disorder serves as a weak signal for intracellular protein degradation, Proteins: Struct., Funct., Bioinf., 2008, 71, 903–909, DOI:10.1002/prot.21773.
- S. Prakash, L. Tian, K. S. Ratliff, R. E. Lehotzky and A. Matouschek, An unstructured initiation site is required for efficient proteasome-mediated degradation, Nat. Struct. Mol. Biol., 2004, 11, 830–837, DOI:10.1038/nsmb814.
- P. Tsvetkov, N. Reuven, C. Prives and Y. Shaul, Susceptibility of p53 Unstructured N Terminus to 20 S Proteasomal Degradation Programs the Stress Response, J. Biol. Chem., 2009, 284, 26234–26242, DOI:10.1074/jbc.M109.040493.
- P. Tompa and M. Fuxreiter, Fuzzy complexes: polymorphism and structural disorder in protein–protein interactions, Trends Biochem. Sci., 2008, 33, 2–8, DOI:10.1016/j.tibs.2007.10.003.
- P. Tompa, The interplay between structure and function in intrinsically unstructured proteins, FEBS Lett., 2005, 579, 3346–3354, DOI:10.1016/j.febslet.2005.03.072.
- S. Vucetic, C. J. Brown, A. K. Dunker and Z. Obradovic, Flavors of protein disorder, Proteins: Struct., Funct., Genet., 2003, 52, 573–584, DOI:10.1002/prot.10437.
- S. Schnell, S. Fortunato and S. Roy, Is the intrinsic disorder of proteins the cause of the scale-free architecture of protein–protein interaction networks?, Proteomics, 2007, 7, 961–964, DOI:10.1002/pmic.200600455.
- A.-C. Gavin, P. Aloy, P. Grandi, R. Krause and M. Boesche,
et al. Proteome survey reveals modularity of the yeast cell machinery, Nature, 2006, 440, 631–636, DOI:10.1038/nature04532.
- N. J. Krogan, G. Cagney, H. Yu, G. Zhong and X. Guo,
et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae, Nature, 2006, 440, 637–643, DOI:10.1038/nature04670.
- H. Yu, P. Braun, M. A. Yildirim, I. Lemmens and K. Venkatesan,
et al. High-Quality Binary Protein Interaction Map of the Yeast Interactome Network, Science, 2008, 322, 104–110, DOI:10.1126/science.1158684.
- A. Breitkreutz, H. Choi, J. R. Sharom, L. Boucher and V. Neduva,
et al. Global Protein Kinase and Phosphatase Interaction Network in Yeast, Science, 2010, 328, 1043–1046, DOI:10.1126/science.1176495.
- K. Tarassov, V. Messier, C. R. Landry, S. Radinovic and M. M. S. Molina,
et al. An in vivo Map of the Yeast Protein Interactome, Science, 2008, 320, 1465–1470, DOI:10.1126/science.1153878.
- J. H. Fong, B. A. Shoemaker, S. O. Garbuzynskiy, M. Y. Lobanov and O. V. Galzitskaya,
et al. Intrinsic disorder in protein interactions: insights from a comprehensive structural analysis, PLoS Comput. Biol., 2009, 5, e1000316, DOI:10.1371/journal.pcbi.1000316.
- P. E. Wright and H. J. Dyson, Linking folding and binding, Curr. Opin. Struct. Biol., 2009, 19, 31–38, DOI:10.1016/j.sbi.2008.12.003.
- J. P. DiNitto and P. W. Huber, Mutual induced fit binding of Xenopus ribosomal protein L5 to 5S rRNA, J. Mol. Biol., 2003, 330, 979–992 CrossRef CAS.
- I. A. Hope, S. Mahadevan and K. Struhl, Structural and functional characterization of the short acidic transcriptional activation region of yeast GCN4 protein, Nature, 1988, 333, 635–640, DOI:10.1038/333635a0.
- K. P. Ng, G. Potikyan, R. O. V. Savene, C. T. Denny and V. N. Uversky,
et al. Multiple aromatic side chains within a disordered structure are critical for transcription and transforming activity of EWS family oncoproteins, Proc. Natl. Acad. Sci. U. S. A., 2007, 104, 479–484, DOI:10.1073/pnas.0607007104.
- M. L. Miller, L. J. Jensen, F. Diella, C. Jorgensen and M. Tinti,
et al. Linear Motif Atlas for Phosphorylation-Dependent Signaling, Sci. Signaling, 2008, 1, ra2, DOI:10.1126/scisignal.1159433.
- J. F. Dice, Peptide sequences
that target cytosolic proteins for lysosomal proteolysis, Trends Biochem. Sci., 1990, 15, 305–309 Search PubMed.
- R. Ren, B. Mayer, P. Cicchetti and D. Baltimore, Identification of a ten-amino acid proline-rich SH3 binding site, Science, 1993, 259, 1157–1161, DOI:10.1126/science.8438166.
- R. Tonikian, X. Xin, C. P. Toret, D. Gfeller and C. Landgraf,
et al. Bayesian modeling of the yeast SH3 domain interactome predicts spatiotemporal dynamics of endocytosis proteins, PLoS Biol., 2009, 7, e1000218, DOI:10.1371/journal.pbio.1000218.
- F. Diella, N. Haslam, C. Chica, A. Budd and S. Michael,
et al. Understanding eukaryotic linear motifs and their role in cell signaling and regulation, Front. Biosci., 2008, 13, 6580–6603 CrossRef CAS.
- D. Gfeller, F. Butty, M. Wierzbicka, E. Verschueren and P. Vanhee,
et al. The multiple-specificity landscape of modular peptide recognition domains, Mol. Syst. Biol., 2011, 7 Search PubMed . Available: http://dx.doi.org/10.1038/msb.2011.18. Accessed 2 Jun 2011.
- J. C. D. Houtman, H. Yamaguchi, M. Barda-Saad, A. Braiman and B. Bowden,
et al. Oligomerization of signaling complexes by the multipoint binding of GRB2 to both LAT and SOS1, Nat. Struct. Mol. Biol., 2006, 13, 798–805, DOI:10.1038/nsmb1133.
- M. Fuxreiter, P. Tompa and I. Simon, Local structural disorder imparts plasticity on linear motifs, Bioinformatics, 2007, 23, 950–956, DOI:10.1093/bioinformatics/btm035.
- P. Beltrao, J. C. Trinidad, D. Fiedler, A. Roguev and W. A. Lim,
et al. Evolution of Phosphoregulation: Comparison of Phosphorylation Patterns across Yeast Species, PLoS Biol., 2009, 7, e1000134, DOI:10.1371/journal.pbio.1000134.
- S. Ciliberti, O. C. Martin and A. Wagner, Robustness Can Evolve Gradually in Complex Regulatory Gene Networks with Varying Topology, PLoS Comput. Biol., 2007, 3, e15, DOI:10.1371/journal.pcbi.0030015.
- A. N. Nguyen Ba and A. M. Moses, Evolution of Characterized Phosphorylation Sites in Budding Yeast, Mol. Biol. Evol., 2010, 27, 2027–2037, DOI:10.1093/molbev/msq090.
- C. S. H. Tan, B. Bodenmiller, A. Pasculescu, M. Jovanovic and M. O. Hengartner,
et al. Comparative Analysis Reveals Conserved Protein Phosphorylation Networks Implicated in Multiple Diseases, Sci. Signaling, 2009, 2, ra39, DOI:10.1126/scisignal.2000316.
- J. Mok, P. M. Kim, H. Y. K. Lam, S. Piccirillo and X. Zhou,
et al. Deciphering Protein Kinase Specificity Through Large-Scale Analysis of Yeast Phosphorylation Site Motifs, Sci. Signaling, 2010, 3, ra12, DOI:10.1126/scisignal.2000482.
- M. Hessling, K. Richter and J. Buchner, Dissection of the ATP-induced conformational cycle of the molecular chaperone Hsp90, Nat. Struct. Mol. Biol., 2009, 16, 287–293, DOI:10.1038/nsmb.1565.
- K. Machida, A. Kono-Okada, K. Hongo, T. Mizobata and Y. Kawata, Hydrophilic residues 526 KNDAAD 531 in the flexible C-terminal region of the chaperonin GroEL are critical for substrate protein folding within the central cavity, J. Biol. Chem., 2008, 283, 6886–6896, DOI:10.1074/jbc.M708002200.
- P. Tompa and D. Kovacs, Intrinsically disordered chaperones in plants and animals, Biochem. Cell Biol., 2010, 88, 167–174, DOI:10.1139/O09-163.
- M. Costanzo, A. Baryshnikova, J. Bellay, Y. Kim and E. D. Spear,
et al. The Genetic Landscape of a Cell, Science, 2010, 327, 425–431, DOI:10.1126/science.1180823.
- J. Bellay, G. Atluri, T. L. Sing, K. Toufighi and M. Costanzo,
et al. Putting genetic interactions in context through a global modular decomposition, Genome Res. Search PubMed (In press).
- A. K. Dunker, J. D. Lawson, C. J. Brown, R. M. Williams and P. Romero,
et al. Intrinsically disordered
protein, J. Mol. Graphics Modell., 2001, 19, 26–59 CrossRef CAS.
- V. N. Uversky, Natively unfolded proteins: a point where biology waits for physics, Protein Sci., 2002, 11, 739–756, DOI:10.1110/ps.4210102.
- M. Sickmeier, J. A. Hamilton, T. LeGall, V. Vacic and M. S. Cortese,
et al. DisProt: the Database of Disordered Proteins, Nucleic Acids Res., 2007, 35, D786–793, DOI:10.1093/nar/gkl893.
- U. Midic, C. J. Oldfield, A. K. Dunker, Z. Obradovic and V. N. Uversky, Protein disorder in the human diseasome: unfoldomics of human genetic diseases, BMC Genomics, 2009, 10, S12–S12, DOI:10.1186/1471-2164-10-S1-S12.
- S. Rauscher and R. Pomès, Molecular simulations of protein disorder, Biochem. Cell Biol., 2010, 88, 269–290, DOI:10.1139/O09-169.
Footnotes |
† Published as part of a Molecular BioSystems themed issue on Intrinsically Disordered Proteins: Guest Editor M. Madan Babu. |
‡ Current address: Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA. |
|
This journal is © The Royal Society of Chemistry 2012 |
Click here to see how this site uses Cookies. View our privacy policy here.