Kevin
Rue-Albrecht
ab,
Denis C.
Shields
a and
Nora
Khaldi
*a
aUCD Conway Institute of Biomolecular and Biomedical Research, School of Medicine and Medical Sciences, and UCD Complex and Adaptive Systems Laboratory, University College Dublin, Dublin 4, Republic of Ireland. E-mail: kevin.rue@ucdconnect.ie; denis.shields@ucd.ie; nora.khaldi@ucd.ie; Fax: +353 1 716 5396; Tel: +353 1 716 5335
bEcole Polytechnique de l'Université de Nice, Sophia Antipolis, 1645 Route des Lucioles, 06410 Biot, France
First published on 23rd November 2011
Protein disorder has been frequently associated with protein–protein interaction. However, our knowledge of how protein disorder evolves within a network is limited. It is expected that physically interacting proteins evolve in a coordinated manner. This has so far been shown in their evolutionary rate, and in their gene expression levels. Here we examine the percentage of predicted disorder residues within binary and complex interacting proteins (physical and functional interactions respectively) to investigate how the disorder of a protein relates to that of its interacting partners. We show that the level of disorder of interacting proteins are correlated, with a greater correlation seen among proteins that are co-members of the same complex, and a lesser correlation between proteins that are documented as binary interactors of each other. There is a striking variation among complexes not only in their disorder, but in the extent to which the proteins within the complex differ in their levels of disorder, with RNA processes and protein binding complexes showing more variation in the disorder of their proteins, whilst other complexes show very little variation in the overall disorder of their constituent proteins. There is likely to be a stronger selection for complex subunits to have similar disorder, than is seen for proteins involved in binary interactions. Thus, binary interactions may be more resilient to changes in disorder than are complex interactions. These results add a new dimension to the role of disorder in protein networks, and highlight the potential importance of maintaining similar disorder in the members of a complex.
The flexibility and functionality disorder provides to a protein is just one side of the story. Indeed, the importance of disorder in protein–protein interactions is increasingly considered highly beneficial for protein complexes. Besides the proposed benefits such as hosting sites for post-translational modifications, and providing an interface for binding of other partners, it has been argued that disorder can increase the stability and prevent degradation of complexes.5–10 Disorder is also thought to increase the backbone conformational entropy upon ligand binding.11–13
Some studies have focused on protein disorder in the context of an interaction network, one of these include a study by Hegyi et al.14 This study has shown a significant correlation between disorder and the number of protein subunits of a complex. Another example is that hub proteins (defined as proteins that have many interaction partners) tend to have more disorder when compared to non-hub proteins.15–18 Kim PM and co-authors19 showed that proteins with single interfaces, which have significantly higher levels of disorder, seem to also have partners with significantly more disorder than other types of proteins. So, multiple hub proteins, which have less disorder than single-hub proteins, interact with proteins that like them also have less disorder than the partners of single interface proteins. Hence, it seems reasonable to hypothesize that the level of disorder in proteins may be linked somehow to the disorder level in their interacting partners. In other words, is there a mechanism governing the overall disorder of a set of interacting partners?
Our interest in this study is three-fold. Firstly, as mentioned above, we want to examine if there exists a correlation between the disorder level of a set of interacting proteins. In other terms, are interacting proteins correlated in their disorder levels?
Secondly, to determine the functional constraints giving rise to such correlations, we contrasted two types of interactions: binary interactions and complex interaction. Fundamental differences exist between both types of interaction. Binary interaction include a wide range of high and low affinity protein–protein transient interactions,20 while the interactions between proteins in complexes are usually more stable and persistent in time. We wanted to examine the disorder of proteins in both binary and complex interactions, as this will help shed light on possible differences on how the type of interaction can affect the disorder in interacting partner(s). In other words does the level of disorder in the individual proteins of a complexes share similar evolutionary constraints as that of the binary interactions?
Thirdly, we wished to investigate whether protein complexes with widely divergent levels of disorder among their constituent proteins, and those with proteins of similar disorder, might represent different types of complexes.
To investigate these questions we use the protein sequence data of S. cerevisiae along with its interaction data to examine the correlation of the level of disorder in a protein with the level in its interacting partners. We show that the level of disorder in a protein and its interacting partners is often correlated and that this correlation is significantly higher than seen in non-interacting proteins. We argue that interacting proteins may co-evolve in their disorder levels and that selection seems to be maintaining similar levels of disorder in some interacting partners and less so in others. We investigate which complexes in terms of cellular function have similar disorder levels in the individual interacting proteins, and seem to be under this selection, and which are not.
From the binary set we identified 1799 interactions involving 1271 proteins. The numbers of binary partners of a protein varied from 1 to 84, with an average of 2.6 partners per protein. Regarding the complex interactions we find 11136 interactions with an average of 18.3 partners per interaction, complex size varies from 2 to 81 proteins.
In the remainder of this work, and for simplicity reasons, we will refer to the percentage of disordered residues (IUPred score > 0.5) in a protein as the level of disorder in a protein (see Methods).
![]() | ||
Fig. 1 Comparison of the correlation coefficient of disorder between interacting proteins , and non-interacting proteins in S. cerevisiae. (A) Histogram representing the distribution of 100 correlations in disorder level between each protein, and the set of all random non-interacting partners generated from the binary interaction dataset. The blue vertical line in bold represents the correlation of the real data at x = 0.13 (p = 6e−06). The density line of the random distribution is represented in red. (B) Similar to (A) but generated from the S. cerevisiae complex dataset. When two ohnologs share the exact same set of partners, we only considered the most disordered protein of the pair which reduced the set from 1615 to 1525 proteins. The blue vertical line in bold represents the correlation of the real data at x = 0.25 (p < 2.2e−16). The density line of the random distribution is represented in red. |
We carried out a similar analysis for the protein complexes retrieved from the high-confidence protein–protein interactions involving 1615 proteins (Fig. 1–B).21 Because most complexes containing ohnologs (genes duplicated as part of a genome duplication) usually contain both ohnologs, we chose to only represent the highest disordered ohnolog if both were present in the same complex, which reduced the set to 1525 proteins. For this set we also observe a correlation between the disorder of a protein and the disorder in its interacting partners (r = 0.25, p < 2.2e−16; robust simulation-based p < 0.01, Fig. 1–B).
It is of interest to note that the mean disorder is quite similar in the proteins present in complexes but without binary interactors (0.19) and in those with binary partners but not found in any complexes (0.19), but is curiously somewhat higher in the 449 proteins that have information on both datasets (0.22).
The values of the correlations although low, are shown by the random simulation of non-interacting proteins to be significant. Indeed our results show that the correlation values between interacting proteins are significantly higher than that of non-interacting proteins. At the same time the low values of the correlations in binary and complex interactions is interesting in that it reflects the diverse nature of the complexes. Some proteins within a complex will have higher disorder correlation with their partners, while other may vary. This diverse nature may come as a result of the different functions of the complexes and their differing functional constraints.
![]() | ||
Fig. 2 Correlation between the level of disorder in a protein and that found in its binary interacting proteins , binned accordingly to the level of disorder in each protein . The x-axis represents the level of disorder in a protein, which is split in 20 bins, representing an increment of 5% in disorder level from one bin to the next. A 100% for example represents fully disordered proteins. The y-axis represents the average disorder level found in all the interacting proteins. The dashed line represents the linear regression. Error bars are represented in black vertical lines around each point. The correlation coefficient is r = 0.43, with p = 0.03. |
To examine the diversity of the complexes in terms of the disorder of their individual proteins, we used the standard deviation of disorder between proteins of a complex. The standard deviation will capture if the proteins of a complex share homogenous or heterogeneous levels of disorder. We plotted the mean disorder of the proteins within in each complex against the standard deviation (Fig. 3; supplementary file 1). We split the graph into nine bins for ease of further analysis. We were particularly interested in complexes which have high degree of disorder in their proteins and an important variation in the level of their disorder (bins 8 [mean disorder between 0.16–0.35 and std > 0.3]; and bin 9 [mean disorder > 0.35 and std > 0.3]), which contrasted with those with a similar level of disorder, but much less variation between the proteins within the complex (bins 2 [mean disorder between 0.16–0.35 and std < 0.12]; and bin 3 [mean disorder > 0.35 and std < 0.12]).
![]() | ||
Fig. 3 Segregation of the proteins complexes according to the level of variation in disorder level between the members of a complex. The 9 numbered bins represent different average and variations of disorder level of the proteins within each of the complex interaction dataset (see methods for bin border information). The whiskers of the boxplot on the x and y-axis represent the 1st quartile, median, 3rd quartile of the average and standard deviation level of disorder respectively. |
Complexes where all the proteins have high disorder may relate to common localization in subcellular locations where high disorder is acceptable or indeed desirable. We wished to understand these and other potential biological reasons underlying this striking variation. Accordingly, we investigated the distribution of Gene Ontology (GO) terms among the complexes shown in Fig. 3.
It is interesting to note that although proteins carrying a catalytic activity function have been shown to be depleted in disorder,16 our results show that they are found in complexes with high disorder in many members of the complex. The lack of disorder in proteins with a catalytic activity might be compensated by the high level of disorder in the other members of the complex. This disorder might be to bind the complex together or to regulate it.
An ideal analysis of GO-term enrichment would just compare the sets of complexes with the most disordered proteins to the complexes with the low disordered proteins. As this analysis was somewhat restricted, being limited to a small number of complexes in bin 9, we wished to analyse the entire dataset of complexes. However, it was important to counter the confusing effect of the enrichment of complexes with highly disordered subunits among the complexes that have a higher standard deviation between the disorder of the subunit. Indeed the pattern of the standard deviation in Fig. 3 gives more emphasis to complexes with high disordered proteins. Accordingly, we calculated the residual of the standard deviation, regressed against the mean disorder of each complex (supplementary Fig. 1). This residual is therefore a quantitative measure of how much variation there is in the disorder between members of a complex, which has been corrected for the observed increase in standard deviation with increasing disorder seen in Fig. 3. While this correction is not perfect, it largely corrects for the bulk of the relationship between disorder and its standard deviation. We then investigated whether there was enrichment of GO terms among the genes that have a high residual standard deviation. Using Gorilla we found that the processes most associated with a high residual (Supplementary file 4) were mRNA metabolism and RNA processing (approximately twofold enrichment with over 120 genes of this category defined as having high residuals). The functions most associated were ribosome, RNA polymerase, and RNA binding. The components most associated were RNA–protein complexes including ribosome and others. Thus, from these three ontologies, a picture emerges that the complexes with the greatest variation in disorder are associated with RNA. Why might this be the case? It is possible that the proteins directly coming into contact with the RNA in these complexes are more disordered, whilst those involved in stabilizing other complex interactions are more ordered. Although most ribosomal proteins are ordered proteins, it has been shown that the ribosomal proteins L15 and L19e have important disorder regions.27,28 These regions becomes ordered upon binding ribosomal RNA but do not take a particular form but rather any shape that will allow them to fill in the gaps in the ribosome structure and bind the necessary parts together.27,28 This function of disorder is referred to as “structural mortar”.
Previous work has shown that disordered proteins are enriched in protein binding classes,1,29–31 and more generally shows the involvement of unstructured regions in binding interactions.32–36 We find that the complexes with high disordered subunits (bin3, 6, and 9; Supplemental 5) are significantly enriched with the term ‘protein binding’ compared to the complexes with the more ordered proteins (1, and 4; bin 7 is empty) (OR = 0.39; 95% CI 0.22–0.68; p = 0.001). We show that complexes with low disorder variation between their subunits (bin2, and 3) are associated more significantly with ‘protein binding’ than complexes with high disorder variation between their subunits (bin8, and 9) (OR = 0.33; 95% CI 0.14–0.79; p = 0.02).
Further, the examination of the percentage of a complex that contains proteins with a ‘protein binding’ designation reveals two results. Firstly this value is much higher for complexes with high disordered subunits compared to ones with low disordered subunits. For example complexes with low disordered proteins (bin1 with an average disorder below 0.16 and a standard deviation below 0.12) contain an average of 14.5% proteins with the term ‘protein binding’, in contrast to 39.6% for complexes with high disorder (bin 3, Supplemental 5). Secondly, this value is much higher for complexes with low disorder variation between its subunits compared to one with high disorder variation between its subunits. For example complexes with low disorder variation between its proteins (bin3) contain an average of 39.6% proteins with the term ‘protein binding’, in contrast to 15% for complexes with high disorder variation between its subunits (bin 9, Supplemental 5). There are five ribosomal complexes over the 408 complexes that we studied. These five complexes contain proteins that bind to both proteins and RNA. Our observation above (protein binding enriched in complexes with low standard deviation of disorder between the subunits) seems to be a general trend. Indeed, this statement remains true for the comparison of bin 3 (high disorder with low standard variation between subunits; Fig. 3), and bin 9 (high disorder with high standard variation between subunits; Fig. 3), which do not contain RNA complexes (Supplemental 1).
These results may reflect the homogeneity in functional terms of the complexes with low variation in the disorder of its proteins (bin 2, and 3; mean disorder between 0.16 and 0.35 and std < 0.12 for bin2; and mean disorder > 0.35 and std < 0.12 for bin3) as opposed to the ones with high variation in the disorder of its proteins (bin 8, and 9; mean disorder between 0.16 and 0.35 and std < 0.3 for bin8; and mean disorder > 0.35 and std > 0.3 for bin9), which inversely, may reflect their functional heterogeneity. Functional homogeneity here has been simply argued for using the term ‘protein binding’, this—although restricted to one term—shows that the diversity in the disorder level of proteins within a complex may be due to very different functionalities played by the different partners in a network. As opposed to this is bin 3 with complexes containing proteins with more functional communalities, thus maybe also explaining the higher correlation in disorder.
Our work reveals an important feature of disorder in the context of protein networks, namely that proteins of a complex tend to resemble each other in terms of the level of their disorder. When examining a protein's disorder in the context of its interacting partners, we realize that it may be maintained in relation to its network of interacting partners. It is possible that there are evolutionary constraints maintaining the levels of disorder in interacting proteins.
Further, what seems fascinating is the difference between binary interactions, and complex interactions and their effects on the overall disorder of the interacting proteins. The protein disorder in complexes appear to be under tighter control, since protein subunits seems to carry similar levels of disorder more than for the binary interactions. There may be a higher resilience to change in protein disorder in binary interacting proteins than for complex interacting proteins. There are a number of potential reasons why this could be the case, which fall into two broad groupings (1) This might be due to the fact that the proteins of complexes stick together and form blocks influencing each other, than the more transient interactions present in the binary interaction dataset (i.e. the proteins influence each other: such as disordered regions that may interact with each-other). (2) Alternatively, proteins in a complex may share similar functional constraints (in terms of pH, subcellular location, or other common selection pressures).
It has been shown that the amount of disorder of a protein varies greatly depending on its number of binding interfaces.19 For example, single-interface hub proteins (containing one or two binding interfaces) have significantly more disorder than multiple-interface hubs.19 Alongside this result, the interacting partners of single hubs also share the fact of having significantly more disorder than other proteins.19 The authors suggest that this may be explained by the tendency of single hub proteins to bind to each other.19 These findings fit in very well with our results in that their seems to be a constraint on this type of hub protein to correlate by maintaining higher levels of disorder in itself and in its interacting proteins, while the opposite seems to be occurring for multi-hub proteins and their interacting partners (by maintaining lower levels of disorder). It would be of great interest in future work to incorporate the distinct sets of single and multiple hub proteins discussed by Kim PM et al.19 with our results to examine the extent of correlation in disorder levels between these distinct types of proteins and their interactors.
The low value of the correlations we observed is interesting in that it reflects the diverse nature of protein complexes. Some proteins within a complex will have higher disorder correlation with their partners, while other may vary. We have identified that certain complexes are highly disordered, but even among these complexes, we note (Fig. 3; Supplemental Fig. 1) that some show a marked variation in the disorder of their constituent proteins (high standard deviation or high residual variation for a given mean disorder), whilst others appear to have a set of proteins of more similar disorder (low standard deviation or low residual variation for a given mean disorder). We are very interested in the hypothesis that there may be particular evolutionary constraints on certain complexes for a tight co-regulation of the extent of disorder, whilst for other complexes this may be much less. Thus, in certain complexes with high residual deviation, the disorder selective constraints may be on the individual proteins, not on the overall complex. It will be of great interest to determine if such complexes have looser packing, which would enable this. Suggestive indications from our analysis of Gene Ontology would support the idea that complexes that interact with RNA show a high residual deviation, and this may be associated with a looser packing and association, with the RNA sequences providing some structural support to the complex in the place of tighter protein packing.
We found that complexes whose members are associated with “protein-binding” functions tend to have a lower standard variation in their disorder. We interpret this finding with some caution, since some protein-binding functions may themselves be simply defined on the basis of the protein interactions within the complex. It is possible, however, that complexes which play a role in protein binding tend to form complexes in which the level of disorder is similar. This could be for a variety of reasons, for example a series of proteins that bind DNA such as transcription factors, also have important protein binding functions as a complex, recruiting activator and repressor proteins. Complexes which process proteins in various ways, may actively use a certain level of disorder to destabilize or re-fold the proteins they interact with, in the manner of chaperones, and this may impose a constraint for lower variation in their levels of disorder. However, such arguments are currently speculative, and for the moment it is simply an observation, which may represent a useful fact to aid those seeking to interpret the patterns of disorder in complexes with protein binding roles.
The correlations of disorder levels between interacting proteins may suggest a more general phenomenon namely co-evolution. Indeed, if interacting proteins are co-evolving for a given trait, they will clearly share this trait to a significantly greater extent than non-interacting proteins (which is what we show in this work). For example, correlation of the expressions levels of interacting proteins was initially used—as a first step—to suggest possible co-evolution between interacting proteins.38 The hypothesis that the correlation observed underlines a more general trend of co-evolution of disorder levels between interacting proteins is a very attractive and interesting idea. However, it is hard to tease apart whether the similarity in the level of disorder is due to common evolutionary constraints on the interacting proteins, or/and to the actual adaptation of interacting proteins to each other. Although our results may suggest co-evolution and thus selection maintaining the overall disorder of interacting proteins, further studies in the dynamics of these complexes over time should be investigated. One possible approach to investigate this question is to examine changes of disorder in a protein through evolution and how that impacts the disorder in its interacting partners.
Here we have shown that disorder correlated in its level between many interacting proteins, arguing for selection maintaining a tight control over its quantity in a protein with regard to its interacting partners. This selection seems to be even greater for complex interactions than for binary interactions. This work emphasizes once more the importance of disorder in a protein in the context of its interacting protein and highlights the pressures exerted on proteins as a result of them being in a partnership.
The binary interaction dataset20 was obtained from Yeast-2-Hybrid (Y2H) data. This is a method to assess the interaction between two known proteins. The first protein is artificially linked to a binding factor, i.e. a protein region known to bind to a particular motif of DNA. The second protein is artificially linked to a protein region known to enhance the expression of a particular gene. A significant binding of the two proteins of interest will reunite the two protein regions and therefore cause a significant expression of the gene. Y2H may give a certain number of false positive/negative results because of non-specific interactions and non-physiological contexts, but gold standard Y2H datasets have been shown to perform well in representing real biological interactions, when compared with traditional literature data.20
We compare a protein to all the unique subunits of a protein complex. This is to avoid obvious correlation. For the same reason, all interactions that are only involving self-interactions with no other partners were entirely discarded from the analysis.
The complex interaction originates from a manually curated study of complexes identified in both experimental and high-throughput work. Therefore, the dataset has a considerable lower number of proteins compared to the binary set, but is reliable (See http://wodaklab.org/cyc2008/).
To calculate the significance of the observed correlation coefficients among interacting pairs, we compared it with set of correlations that we calculated from the random assignment of partners. A p-value is calculated on the basis of how many times the random distribution reached or surpassed the real correlation value.
The non-independence of the correlation calculations does not bias our results. This is because we do not rely on the nominal p-values but carry out random simulations using this same method to calculate the correlations. The significance of our results comes as a result of the real data being significantly higher than the random simulations.
Footnotes |
† Published as part of a Molecular BioSystems themed issue on Intrinsically Disordered Proteins: Guest Editor M. Madan Babu. |
‡ Electronic supplementary information (ESI) available. See DOI: 10.1039/c1mb05214d |
This journal is © The Royal Society of Chemistry 2012 |