Comparison of the interactomic networks of different species in terms of accessibility

Francisco A. Rodrigues a and Luciano da Fontoura Costa *b
aInstituto de Física de São Carlos, Universidade de São Paulo, Av. Trabalhador São Carlense 400, Caixa Postal 369, CEP 13560-970, São Carlos, São Paulo, Brazil
bNational Institute of Science and Technology for Complex Systems, USA. E-mail: luciano@if.sc.usp.br

Received 7th April 2009 , Accepted 24th August 2009

First published on 29th September 2009


Abstract

Proteinprotein interaction networks were investigated in terms of outward accessibility, which quantifies the effectiveness of each protein in accessing other proteins and is related to the internality of nodes. By comparing the accessibility between 144 orthologproteins in yeast and the fruit fly, we found that the accessibility tends to be higher among proteins in the fly than in yeast. In addition, z-scores of the accessibility calculated for different species revealed that the protein networks of less evolved species tend to be more random than those of more evolved species. The accessibility was also used to identify the border of the yeast protein interaction network, which was found to be mainly composed of viable proteins.


Introduction

The behavior of cells can seldom be reduced to their molecular components. Vogelstein, Lane and Levine suggested that in order to understand cancer development, it is necessary to look at the gene p53 (a suppressor of tumor) not in isolation, but instead investigate what they called the p53 network, i.e. a comprehensive set of molecules and genes interacting with that gene.1 In fact, a combined attack on the genes connected to p53 was observed to cause more severe effects than the removal of that gene.2 Such results have motivated a more integrated approach, instead of an exclusively reductionist program of research. Fortunately, the concepts and tools for representing, analyzing and modeling biological networks are now available thanks to recent advances and integration between the areas of graph theory and statistical mechanics, giving rise to the new field of complex networks.3–5

Another new area of science, systems biology, focuses on the systematic study of complex interactions in biological systems.6–9 Basically, cellular organization can be divided into three main levels of interaction: genes, proteins and metabolites.10Genes are regulated by transcription factors, the proteome organizes itself into a protein interaction network, and metabolites are interconnected through an intricate network of metabolites. At the protein level, molecules bind to one another while respecting shape and affinity constraints in order to control biochemical reactions and provide the physical scaffolding for life. Therefore, the integrated understanding of such networks holds the key for seminal advances in biology.

The interactions between proteins are particularly important in defining biological functions (e.g.ref. 11 and 12). For example, signals coming from the exterior of a cell are mediated by proteinprotein interactions. This process, called signal transduction, plays a fundamental role in many biological systems and diseases. Proteins might interact for a long time to form part of a protein complex. Alternatively, they can interact only briefly with another protein in order to modify it (e.g. a protein kinase will add a phosphate to a target protein). Many partial protein interaction maps for several eukaryotic species are now available,11,13–15 motivating several studies aimed at analyzing the structure and evolution of such networks.16–18 Conversely, the study of proteinprotein interaction networks in terms of simulated dynamics has been addressed only more recently, e.g. through self-avoiding random walks.19

In the current work, in order to obtain further insights about protein interaction networks, we consider the dynamics of self-avoiding random walks, which involves agents moving through a network without visiting any vertex more than once. The choice of this particular type of non-linear dynamics in our analysis is justified biologically because it is naturally related to sequential proteinactivation, such as in signal transduction.20 The signals are carried by messenger proteins, which transmit a signal from one part of the cell to another, e.g. from the cytosol to the nucleus. In addition, insulin receptor substrate proteins define critical interactions for transmitting the signal downstream. These intracellularproteinprotein interactions are essential in transmitting the signal from the receptor to the final cellular species, such as translocation of vesicles containing GLUT4 glucose transporters from the intracellular pool to the plasma membrane, activation of glycogen or protein synthesis, and initiation of specific genetranscription.21 Because of the purposeful nature of their interactions, such dynamics can be suitably modeled by self-avoiding random walk dynamics.

Self-avoiding walks are highly dependent on the network structure and are thus able to sense specific structural patterns. In addition, self-avoiding random walks necessarily generate paths of limited length in finite-sized biological networks, while traditional random walks would yield highly redundant paths of infinite length (implied by repetitions of the same interactions). The quantification of the properties of random walk dynamics can be done by considering different network dynamical measurements.22 The choice of such measurements is typically performed in terms of the properties one wants to analyze in the network. For instance, while the outward activation is related to the influence of proteins along the network path,19 the accessibility quantifies the effectiveness of proteins in interacting with all the other proteins in the network. Previous investigations19 identified important relationships between the outward activation and protein lethality, in the sense that lethal proteins tend to present higher outward activation than viable proteins.

The outward accessibility is also related to the internality of nodes in a network, since nodes with the smallest accessibility values tend to belong to the borders of networks,23 while those with large values define the interior of networks. For instance, in Fig. 1, the accessibility of the protein YFR021w (dark gray node) is higher than that obtained for protein YLR309c. Note that these proteins occupy different regions of the network. The estimation of accessibility also allowed us to identify, possibly for the first time, the borders of the proteinprotein interaction networks. This is particularly useful because the borders of complex systems are known (e.g.ref. 23) to be capable of substantially biasing the characterization of real-world systems, as the components at the borders tend to exhibit structural and dynamical features which are distinct to those observed in the rest of the network. The identification of the borders of proteinprotein interaction networks is also interesting in itself, as the border proteins are potentially less important for the overall biological processes. Since nodes with high accessibility for a given path length tend to visit, on average, all reachable nodes at that length in the shortest period of time during a random walk, the most critically important proteins are likely to present the highest outward accessibility values, i.e. to belong to the interior of the network.


Illustration of the concepts of outward accessibility. While in (a) the accessibility of the protein YFR021w (dark gray node) is equal to 0.0022 for h = 2 because of the equal transition probabilities, in (b) the accessibility of protein YLR309c is smaller (equal to 0.0013) because of the rather different transition probabilities for h = 2.
Fig. 1 Illustration of the concepts of outward accessibility. While in (a) the accessibility of the protein YFR021w (dark gray node) is equal to 0.0022 for h = 2 because of the equal transition probabilities, in (b) the accessibility of protein YLR309c is smaller (equal to 0.0013) because of the rather different transition probabilities for h = 2.

The present work reports an investigation of the accessibility between proteins in proteinprotein interaction networks. By comparing the accessibility of orthologproteins (also called interlogs) in the yeast and fruit fly, we observed that the accessibility tends to be higher between the proteins in the fly than in the yeast. At a higher topological scale, calculating the average accessibility in the networks of four species, namely Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans and Homo sapiens, and comparing them in terms of z-scores, we verified that higher z-scores are obtained for the most evolved species. So, while the protein network of the H. sapiens tends to present the highest accessibility values compared to its randomized counterpart, the yeast presents the smallest z-score values. We also determined the border of the yeast proteinprotein interaction network and found that most proteins at that place tend to be viable (i.e. non-essential).

The present article starts by presenting, in introductory and didactic fashion, the concepts related to network dynamical analysis, with special attention placed on the concept of accessibility, and proceeds by reporting and discussing the results with respect to several species.

Concepts and methods

An undirected complex network is formed by a set of N nodes connected by E edges. The network can be represented by its adjacency matrix A, whose elements aij and aji are equal to one whenever there is a connection between the nodes i and j, and equal to zero otherwise. Two nodes are adjacent if they are connected through an edge and two edges are adjacent if they connect to the same node. A walk is given by a sequence of adjacent edges, possibly with repetition of edges or nodes. A path is defined as a walk which never repeats either edges or nodes. The length of a path is given by its number of edges.

The characterization of networks can be performed in terms of structure and dynamical measurements.4,5 A simple topological measurement is given by the number of connections of a given node i, called the degree, which can be computed as ki = ∑jaij. In brief, the degree of a node is simply the number of its connections. Despite the intrinsic simplicity of this structural feature, some studies have revealed that the degree is highly associated to protein functionality, such as lethality (e.g.ref. 16 and 19). Another important structural measurement related to protein functions is the betweenness centrality of a node,24 which is defined as

 
ugraphic, filename = b906966f-t1.gif(1)
where σ(i,u,j) is the number of shortest paths between vertices i and j that pass through node u, σ(i,j) is the total number of shortest paths between i and j, and the sum takes place over all pairs i,j of distinct vertices. The shortest path between vertices i and j is the path with minimal length that connects these vertices.

In addition to structural measurements, networks can be characterized in terms of dynamical features.4 The choice of a particular dynamics should be compatible with functions to be observed in each specific case. For instance, it is suitable to represent the breakdown disruption in power distribution grids in terms of cascade failure dynamics.25 In the case of proteins, the dynamics of interactions between them, such as in signal transduction,20 can be approximated in terms of self-avoiding random walks dynamics,19 which involves agents moving through the network without visiting any vertex or edge more than once. This type of dynamics allows more purposeful spreading of activations, avoiding the many backward, repeated activations which would be otherwise obtained by using traditional random walks. Although biological networks do involve a relatively small number of backward activations, for simplicity’s sake in the present work we focus attention on purely self-avoiding random walks. It would be possible, though, to modify the simulations to allow any desired level of backward protein interaction.

A non-preferential self-avoiding random walk is obtained by having a moving agent start a walk at a specific node and then to proceed to other nodes by taking the outgoing edges, except those leading to already visited nodes, with uniform probability. The probability of arriving at a node i after the moving agent started at node j, h steps distant from i, is given by the respective probability of transition, henceforth expressed as Ph(j,i). These transition probabilities can be progressively calculated by dividing the probability Ph−1(i,k), where k is connected to i and j, by the number of neighbors of k that have not been visited yet. For instance, in Fig. 2, the probability to go from node 1 to 10 is given by the probability to go from 1 to 7 divided by the number of nodes which the walk can propagate without repeating any edge, i.e.P(1,10) = P(1,7)/2 = (1/4)/2 = 0.125. For the first move, we have P1(i,j) = 1/ki. The probability of arriving at a node i after having started h steps before from node j is given by the sum of the probabilities of all paths of length h between those two nodes.


Examples showing the calculation of the probability of transition with respect to the continuous and dashed arrows. The probability to go from node 1 to 2, P(1,2), is given by one divided by the number of neighbors of node 1. Similarly, to go from 1 to 4, the probability is given by P(1,2) divided by 5, i.e. the number of nodes connected to node 2 excluding node 1, which has already been visited. The other probabilities are calculated similarly.
Fig. 2 Examples showing the calculation of the probability of transition with respect to the continuous and dashed arrows. The probability to go from node 1 to 2, P(1,2), is given by one divided by the number of neighbors of node 1. Similarly, to go from 1 to 4, the probability is given by P(1,2) divided by 5, i.e. the number of nodes connected to node 2 excluding node 1, which has already been visited. The other probabilities are calculated similarly.

The probability of transition provides an important resource for quantification of random walk dynamics. Different topological measurements, such as activation, diversity and accessibility22 can be adopted, depending on the type of phenomena to be investigated. The activation quantifies the extension of the random walks initiating at each node. The activation has been recently considered in protein lethality analysis.19 On the other hand, the accessibility between nodes can be characterized by an outward accessibility measurement, which quantifies the effectiveness of a node i in accessing all the other nodes in the network under specific dynamics (in our case self-avoiding random walks). The outward accessibility of node i after h steps is defined as

 
ugraphic, filename = b906966f-t2.gif(2)
where Ω is the set containing all nodes different from i and Eh(Ω,i) is the entropy (e.g.ref. 26) of the non-zero probabilities Ph(j,i), i.e.
 
ugraphic, filename = b906966f-t3.gif(3)
The entropy is an important concept/measurement from statistical physics (e.g.ref. 27) that reflects the uniformity of distribution of a set of values. Therefore, it reaches its minimal value of zero when all data values are equal to one another, while its largest value is obtained when all data values are distinct. It can be shown that 0 < OAh(i) ≤ 1. The maximum outward accessibility of a node i corresponds to the situation where it can reach all the other N − 1 nodes after h steps with identical probabilities 1/(N − 1). The outward accessibility decreases when the transition probabilities become distinct from one another, implying a higher probability of visits to specific nodes (see Fig. 1). Observe that if a few nodes have very high chances of being reached after h steps from a given reference node, the other nodes will be rarely accessed, implying a highly unbalanced interaction between the reference node and the reachable nodes at a distance h. Higher accessibility also has the important implication that the reachable nodes will be all accessed, on average, after a shorter period of time during the random walk dynamics. The maximum level of interaction between a reference protein and those at a distance h happens when all the transition probabilities from the reference node to the reachable proteins are equal.

Networks can be compared globally in terms of the outward accessibility of their nodes. Since the networks can present different numbers of nodes and vertices, it is fundamental to resort to analyzes which do not depend on network scale. A possible approach to perform this task is to compare the real networks with their randomized counterparts. To describe the deviations of the observed interaction frequencies from the random expectation, we can consider the z-score, which is calculated as28

 
ugraphic, filename = b906966f-t4.gif(4)
where μh is the average of the outward accessibility at distance h in the real network, and μrandom and σrandom are the average and standard deviation of the outward accessibility of the respective randomized network ensemble, which were generated by the configuration model29 and therefore present the same number of nodes and degree distribution as the respective real-world network under analysis.29 A large positive z-score means that the corresponding network has a relatively large accessibility compared to the random counterparts. On the other hand, negative z-scores mean that the network presents smaller accessibility than the random counterparts. Values near to zero imply that the network under analysis has similar accessibility to its random counterparts. By using this metric, we can compare networks of different species, with different numbers of nodes and connections.

Another important feature of the outward accessibility is related to its ability to detect the borders of networks.23 More specifically, peripheral nodes have been found to present low accessibility values since they do not have many options for the random walk other than to access the internal nodes of the network. In contrast, non-border nodes tend to have more effective and balanced access to the most part of the network, resulting in higher accessibility values.

Results and discussion

In order to verify the variations in the outward accessibility of networks of different species we start by performing ortholog analysis.30 The current databases of protein interactions of different species are poorly suitable for direct comparison, as the overlap between these bases are still small. At the same time, since the four most complete eukaryotic species (S. cerevisiae, D. melanogaster, C. elegans, and H. sapiens) shared a common ancestor more than 900 million years ago,31 the lack of overlap could partly correspond to evolutionary divergence and not only to poor coverage. To any extent, we identified 114 orthologproteins between the yeast S. cerevisiae and fruit fly D. melanogaster by considering the Inparanoid database.30 We used the yeast database provided by the Center for Cancer Systems Biology (CCSB),32 formed by 1278 nodes and 1810 interactions, and Drosophila Interactions Database (DroID),15 constituted by 1345 nodes and 3172 edges.

After calculating the accessibility of all the 144 orthologproteins, we obtained the cumulative distributions of accessibility depicted in Fig. 3, for h = 2,…,5. It is clear from these results that the accessibility tends to be larger for the proteins in the fruit fly network than for the orthologproteins in the yeast network. Fig. 4 illustrates this trend with by showing the local structure around a pair of orthologproteins (in red) identified in these species. Since the fly presents a more complex biological network than the yeast, it is possible that more effective and balanced access to the most part of the network is required for its proper operation. Thus, the dynamical outward accessibility measurement can be used to obtain insight into the level of modifications underwent by protein networks along the evolutionary process.


Comparison of the cumulative accessibility distribution between 114 putative identifiable orthologs in the yeast and fruit fly.
Fig. 3 Comparison of the cumulative accessibility distribution between 114 putative identifiable orthologs in the yeast and fruit fly.

Example of the variation in the structure of an interlog present in the yeast (a) and the fruit fly (b), indicated by the red nodes. The nodes distant one edge from the interlogs are shown in blue and those distant two edges, in yellow. While in (a) the protein YDR142C presents OA2 = 0.009, in (b) the orthologprotein FBgn0035922 presents OA2 = 0.059.
Fig. 4 Example of the variation in the structure of an interlog present in the yeast (a) and the fruit fly (b), indicated by the red nodes. The nodes distant one edge from the interlogs are shown in blue and those distant two edges, in yellow. While in (a) the protein YDR142C presents OA2 = 0.009, in (b) the orthologprotein FBgn0035922 presents OA2 = 0.059.

In order to compare the accessibility for ortholog and non-ortholog proteins, we obtained the distributions presented in Fig. 5. As observed before, the orthologproteins in the fly tend to have higher outward accessibility than those orthologproteins in the yeast. In addition, orthologproteins present smaller accessibility than non-ortholog proteins. This tendency was observed for all values of h. Therefore, proteins with small accessibility values, which tend to be at the border of networks,23 seem to be more conserved than those at the centre. This effect could be related to the small influence that proteins at the border should suffer from other proteins. In fact, proteins at the centre of networks tend to have many connections and paths to other proteins. However, although this finding is a potentially interesting result, it should be borne in mind that our results may change for more complete databases of orthologproteins. Note that the fraction of identified proteins in our data is small, mainly due to the relatively small overlap between the yeast and fly databases.


Distribution of accessibility for ortholog and non-ortholog proteins.
Fig. 5 Distribution of accessibility for ortholog and non-ortholog proteins.

In order to verify the variation of accessibility between different species, we analysed the following databases: (i) S. cerevisiae: composed of 2708 proteins and 7123 proteinprotein interactions;13 (ii) D. melanogaster: formed by 1345 node and 3172 edges;15 (iii) C. elegans: composed of 2528 nodes and 3865 interactions;14 and (iv) Homo sapiens: with 1549 nodes and 2755 edges.11 Because each of these networks have different numbers of nodes and edges, we perform the comparison in terms of the respective z-scores.28 We performed 100 randomizations of the networks and calculated the mean and standard deviations. z-Scores larger than zero indicate that the outward accessibility in the real networks is larger than those observed in the random respective counterpart. On the other hand, values of z-score smaller than zero imply that the networks exhibit accessibility smaller than that observed in the random counterparts. Calculating the average accessibility for these networks and for their random counterparts, we obtained the z-scores presented in Fig. 6 and described in terms of different values of h. Note that the order of the curves partially reflects the complexity of the organisms for h = 6, since H. sapiens presents the highest accessibility and S. cerevisiae the smallest values. The increase in the accessibility through evolution may be a consequence of the increasingly biological complexity required by the respective species. The abrupt variation in the z-scores at h = 3, defining valleys except for S. cerevisae, could be a consequence of the fact that the highly evolved species have more complex signalling circuitry, which results in higher accessibility at bigger steps. Note that H. sapiens present the highest z-score for h = 6.


z-Score values calculated for S. cerevisiae (black squares), D. melanogaster (red circles), C. elegans (blue diamonds) and H. sapiens (yellow squares).
Fig. 6 z-Score values calculated for S. cerevisiae (black squares), D. melanogaster (red circles), C. elegans (blue diamonds) and H. sapiens (yellow squares).

Since protein interaction networks are known to present sampling bias,33 we performed a perturbation analysis in order to verify the stability of the accessibility measurement. In this way, we removed 10% of proteins randomly and compared the obtained accessibility with those obtained for the original networks. We verified that the variations are smaller than 10% for h = 2 and smaller than 5% for h > 2. Therefore, the sampling bias does not appear to influence substantially our results.

It is also interesting to analyse the overall distribution of protein accessibility in the interaction networks. In order to do so, we represented each network node i as a vector whose elements correspond to the outward accessibility at each distance h, [v with combining right harpoon above (vector)](i) = {OA1(i),…,OA6(i)}. These vectors were then projected into the two-dimensional space by considering principal component analysis (PCA), which optimally reduces the dimensionality while completely removing the correlations between the data.34,35 We checked the percentage of dispersion along the two first axes by taking into account the coefficient d = (λ1 + λ2)/∑iλi. This coefficient allows quantification of the dispersion of the outward accessibility for each species along the two first axes. We obtained the following coefficients: (i) yeast: d = 0.87, (ii) fly, d = 0.71, (iii) worm, d = 0.65, and (iii) human, d = 0.73. Therefore, the simplest organism presents the largest dispersion.

The biological importance of a protein can be associated to its relative position in the respective network. For instance, more central proteins tend to propagate their influence more effectively along the network through smaller and more numerous paths than the peripheral proteins. So, it could be expected that the more important a protein is, the smaller the probability that it will be at the border of the network. Since proteinprotein interaction networks tend to display properties which are typical of geographical networks (e.g.ref. 5 and 36), accessibility measurements are a particularly suitable approach to border detection in these networks.23,37 In order to verify the relationship between the accessibility and degree or betweenness centrality, which can also be used for border definition, for the adopted networks, we calculated the Pearson correlation coefficient between such measurements. Fig. 7 presents the Pearson correlations in terms of h, which shows that the correlations tend to decrease substantially as the distance h is increased. This means that the accessibility can provide information complementary to that supplied by the degree or betweenness centrality.


Pearson correlation coefficients between the accessibility and degree k, given by the number of connections of a node, (black circles) and between the accessibility and betweenness centrality B, given by the fraction of shortest paths passing by a node, (gray circles) for the yeast protein interaction network.
Fig. 7 Pearson correlation coefficients between the accessibility and degree k, given by the number of connections of a node, (black circles) and between the accessibility and betweenness centrality B, given by the fraction of shortest paths passing by a node, (gray circles) for the yeast protein interaction network.

We obtained the borders of the adopted proteinprotein interaction networks by applying the accessibility concepts and methods described by Travençolo et al.23 Thus, we determined that the border of the yeast S. cerevisae network corresponds to those proteins with the smallest accessibility for each value of h. Note that for each value of h, we obtained a different threshold Th, i.e.T2 = 0.058, T3 = 0.010, T4 = 0.0039, T5 = 0.002, and T6 = 0.001. We verified that the number of border proteins corresponds to about 5% of the total number of proteins for all distances h.

We used the S. cerevisiaeproteinprotein interaction networks from Krogan et al.,13 which is a highly reliable protein interaction map obtained by tandem affinity purification. We considered the core data set of the database,13 which comprises 2708 proteins and 7123 proteinprotein interactions. Among these proteins, 648 are known to be lethal, 1918 are viable and 142 are unknown proteins. The protein lethality and viability were identified by using data from the Munich Information Center for Protein Sequences (MIPS).38 We calculated the fraction of lethal and viable proteins at the border for h = 1,…,6, as illustrated in Fig. 8. The percentage of lethal proteins at the border is approximately conserved, varying from 13% (for h = 1) to 19% (for h = 4). Note that the percentage of lethal proteins in the whole network is equal to 24%, i.e. the probability of finding a lethal protein at the border is almost half that of finding it in the whole network. It is interesting to note that the fraction of lethal proteins at the border tends to be similar for all distances h, as shown in Fig. 8.


The fraction of lethal (black circles) and viable (gray circles) proteins at the border of the network in terms of the distance h. Note that most viable proteins tend to belong to the border of the network whatever the value of h.
Fig. 8 The fraction of lethal (black circles) and viable (gray circles) proteins at the border of the network in terms of the distance h. Note that most viable proteins tend to belong to the border of the network whatever the value of h.

Conclusion

Proteinprotein interactions underlie many biological functions, such as signal transduction.20 Such dynamical interactions are related to sequential proteinactivations and can be suitability modeled by self-avoiding non-linear dynamics (e.g.ref. 19), which involves agents moving through the protein network without visiting any node more than once. The properties of such random walks can be quantified by considering different measurements, such as diversity, activation and accessibility.39 The outward activation, which determines the extension of the influence of proteins through the network, has been revealed to be related to protein lethality, in the sense that lethal proteins tend to present higher outward activation than viable proteins.19 The diversity and accessibility, on the other hand, are related to internality of nodes in a network, in the sense that nodes with small accessibility (or diversity) tend to be at the border of the network,23 while those with large values of accessibility define its interior. So, the consideration of outward accessibility can help to study important features of complex networks.

In the current work, we investigated proteinprotein interaction in terms of the outward accessibility exhibited by each node for several distance values h. Three related investigations were reported in this work. Initially, we determined the interlog proteins in the fruit fly and yeast and verified that proteins in the fly tend to present higher accessibility values than those in the yeast. This result suggests that the conserved proteins are more likely to be internal to the fly network than in the yeast. Next, by comparing four different species in terms of z-scores, which allows comparison between networks with different sizes and numbers of connections, we found that the z-scores tend to partially reproduce natural evolution, with the protein network of H. sapiens presenting the highest z-score while that of S. cerevisase yielded the lowest one. By projecting the outward accessibilities for several values of h by principal component analysis, we verified that the obtained distributions tend to be similar for all species, suggesting a universal feature shared by many species. Finally, we investigated the essentiality of proteins at the border of the yeast protein interaction network. That border was found to be formed by the proteins presenting the lowest accessibility. We verified that the border proteins tend to be viable, with the probability of finding a lethal protein at the border being almost half that of the whole network.

Further investigations can be performed by considering other dynamical measurements. For instance, it is possible to consider random walks with different propagation probabilities depending on the functionality of each protein or group of proteins. In addition, it would be particularly interesting to repeat the described accessibility analysis for other biological networks, such as transcription-regulatory and metabolic networks.

Acknowledgements

Luciano da F. Costa thanks CNPq (301303/06-1) and FAPESP (05/00587-5) for sponsorship. Francisco Aparecido Rodrigues is grateful to FAPESP (07/50633-9).

References

  1. B. Vogelstein, D. Lane and A. J. Levine, Surfing the p53 network, Nature, 2000, 408(6810), 307–310 CrossRef CAS.
  2. D. S. Franklin, V. L. Godfrey, D. A. O’Brien, C. Deng and Y. Xiong, Functional collaboration between different cyclin-dependent kinase inhibitors suppresses tumor growth with distinct tissue specificity, Mol. Cell. Biol., 2000, 20(16), 6147–6158 CrossRef CAS.
  3. R. Albert and A. L. Barabási, Statistical mechanics of complex networks, Rev. Mod. Phys., 2002, 74(1), 47–97 CrossRef.
  4. S. Boccaletti, V. Latora, Y. Moreno, M. Chaves and D.-U. Hwang, Complex networks: structure and dynamics, Phys. Rep., 2006, 424(4–5), 175–308 CrossRef.
  5. L. da F. Costa, F. A. Rodrigues, G. Travieso and P. R. Villas Boas, Characterization of complex networks: A survey of measurements, Adv. Phys., 2007, 56(1), 167–242 CrossRef.
  6. J. C. Smith and D. Figeys, Proteomics technology in systems biology, Mol. BioSyst., 2006, 2(8), 364–370 RSC.
  7. T. Sakata and E. A. Winzeler, Genomics, systems biology and drug development for infectious diseases, Mol. BioSyst., 2007, 3(12), 841–848 RSC.
  8. L. da F. Costa, F. A. Rodrigues and A. S. Cristino, Complex networks: the key to systems biology, Genet. Mol. Biol., 2008, 31(3), 591–601 Search PubMed.
  9. A. Wuster and M. Madan Babu, Chemogenomics and biotechnology, Trends Biotechnol., 2008, 26(5), 252–258 CrossRef CAS.
  10. Z. N. Oltvai and A.-L. Barabási, Systems biology life’s complexity pyramid, Science, 2002, 298(5594), 763 CrossRef CAS.
  11. J. F. Rual, K. Venkatesan, T. Hao, T. Hirozane-Kishikawa, A. Dricot, N. Li, G. F. Berriz, F. D. Gibbons, M. Dreze and N. Ayivi-Guedehoussou, et al., Towards a proteome-scale map of the human protein–protein interaction network, Nature, 2005, 437(7062), 1173–1178 CrossRef CAS.
  12. R. Sharan, S. Suthram, R. M. Kelley, T. Kuhn, S. McCuine, P. Uetz, T. Sittler, R. M. Karp and T. Ideker, Conserved patterns of protein interaction in multiple species, Proc. Natl. Acad. Sci. U. S. A., 2005, 102(6), 1974–1979 CrossRef CAS.
  13. N. J. Krogan, G. Cagney, H. Yu, G. Zhong, X. Guo, A. Ignatchenko, J. Li, S. Pu, N. Datta and A. P. Tikuisis, et al., Global landscape of protein complexes in the yeast Saccharomyces cerevisiae, Nature, 2006, 440(7084), 637–643 CrossRef CAS.
  14. N. Simonis, J.-F. Rual, A.-R. Carvunis, M. Tasan, I. Lemmens, T. Hirozane-Kishikawa, T. Hao, J. M. Sahalie and K. Venkatesan, et al., Empirically controlled mapping of the Caenorhabditis elegans protein–protein interactome network, Nat. Methods, 2009, 6(1), 47 CrossRef CAS.
  15. J. Yu, S. Pacifico, G. Liu and R. L. Finley Jr, DroID: the Drosophila Interactions Database, a comprehensive resource for annotated gene and protein interactions, BMC Genomics, 2008, 9(1), 461 CrossRef.
  16. H. Jeong, S. P. Mason, A.-L. Barabási and Z. N. Oltvai, Lethality and centrality in protein networks, Nature, 2001, 411(6833), 41–42 CrossRef CAS.
  17. M. P. Joy, A. Brock, D. E. Ingber and S. Huang, High-Betweenness Proteins in the Yeast Protein Interaction Network, J. Biomed. Biotechnol., 2005, 2005(2), 96–103 Search PubMed.
  18. L. da F. Costa, F. A. Rodrigues and G. Travieso, Protein domain connectivity and essentiality, Appl. Phys. Lett., 2006, 89(17), 174101 CrossRef.
  19. F. A. Rodrigues and L. da F. Costa, Protein lethality investigated in terms of long range dynamical interactions, Mol. BioSyst., 2009, 5(4), 385–390 RSC.
  20. A. J. Shaywitz, S. L. Dove, M. E. Greenberg and A. Hochschild, Analysis of Phosphorylation-Dependent Protein–Protein Interactions Using a Bacterial Two-Hybrid System, Sci. STKE, 2002, 2002(142), pl11 Search PubMed.
  21. A. Virkamäki, K. Ueki and C. R. Kahn, Protein–protein interaction in insulin signaling and the molecular mechanisms of insulin resistance, J. Clin. Invest., 1999, 103(7), 931 CrossRef CAS.
  22. L. da F. Costa and F. A. Rodrigues, Superedges: Connecting structure and dynamics in complex networks, 2008, arXiv:0801.4068.
  23. B. A. N. Travençolo, M. P. Viana and L. da F. Costa, Border detection in complex networks, New J. Phys., 2009, 11, 063019 CrossRef.
  24. L. C. Freeman, A set of measures of centrality based on betweenness, Sociometry, 1977, 40, 35–41 CrossRef.
  25. A. E. Motter, Cascade control and defense in complex networks, Phys. Rev. Lett., 2004, 93(9), 098701 CrossRef.
  26. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, New York, 1991 Search PubMed.
  27. T. M. Cover, J. A. Thomas, J. Wiley and W. InterScience, Elements of Information Theory, Wiley, New York, 1991 Search PubMed.
  28. R. J. Larsen and M. L. Marx, An Introduction to Mathematical Statistics and its Applications, Prentice-Hall, 1981 Search PubMed.
  29. E. A. Bender and E. R. Canfield, The asymptotic number of labeled graphs with given degree sequences, Journal of Combinatorial Theory Series A, 1978, 24(3), 296–307 Search PubMed.
  30. K. P. O’Brien, M. Remm and E. L. L. Sonnhammer, Inparanoid: a comprehensive database of eukaryotic orthologs, Nucleic Acids Res., 2005, 33(Database Issue), D476 CAS.
  31. S. B. Hedges, The origin and evolution of model organisms, Nat. Rev. Genet., 2002, 3(11), 838–849 CrossRef CAS.
  32. H. Yu, P. Braun, M. A. Yildirim, I. Lemmens, K. Venkatesan, J. Sahalie, T. Hirozane-Kishikawa, F. Gebreab, N. Li and N. Simonis, et al., High-quality binary protein interaction map of the yeast interactome network, Science, 2008, 322(5898), 104 CrossRef CAS.
  33. J. D. J. Han, D. Dupuy, N. Bertin, M. E. Cusick and M. Vidal, Effect of sampling on topology predictions of protein–protein interaction networks, Nat. Biotechnol., 2005, 23, 839–844 CrossRef CAS.
  34. L. da F. Costa and R. M. Cesar Jr, Shape Analysis and Classification Theory and Practice, CRC Press, 2001 Search PubMed.
  35. R. A. Johnson and D. W. Wichern, Applied Multivariate Statistical Analysis, Prentice Hall, Upper Saddle River, NJ, 2002 Search PubMed.
  36. N. Przulj, D. G. Corneil and I. Jurisica, Modeling interactome: scale-free or geometric?, 2004 Search PubMed.
  37. M. P. Viana, B. A. N. Travençolo, E. Tanck and L. da F. Costa, Characterizing the Diversity of Dynamics in Complex Networks Without Border Effects, arXiv:0805.2298, 2008 Search PubMed.
  38. H. W. Mewes, D. Frishman, U. Guldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgesnter, M. Munsterkotter, S. Rudd and B. Weil, Mips: A database for genomes and protein sequences, Nucleic Acids Res., 2002, 30, 31–34 CrossRef CAS.
  39. B. A. N. Travençolo and L. da F. Costa, Accessibility in complex networks, Phys. Lett. A, 2008, 373, 89–95 CrossRef CAS.

This journal is © The Royal Society of Chemistry 2009
Click here to see how this site uses Cookies. View our privacy policy here.