Emanuel
Gonçalves
a,
Joachim
Bucher
b,
Anke
Ryll
c,
Jens
Niklas
b,
Klaus
Mauch
b,
Steffen
Klamt
c,
Miguel
Rocha
d and
Julio
Saez-Rodriguez
*a
aEMBL-EBI European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, Cambridge, UK. E-mail: saezrodriguez@ebi.ac.uk
bInsilico Biotechnology AG, Meitnerstraße 8, D-70563 Stuttgart, Germany
cMax Planck Institute for Dynamics of Complex Technical Systems, Sandtorstraße 1, D-39106 Magdeburg, Germany
dComputer Science and Technology Centre - CCTC, School of Engineering, University of Minho, Portugal
First published on 1st March 2013
Mathematical modelling is increasingly becoming an indispensable tool for the study of cellular processes, allowing their analysis in a systematic and comprehensive manner. In the vast majority of the cases, models focus on specific subsystems, and in particular describe either metabolism, gene expression or signal transduction. Integrated models that are able to span and interconnect these layers are, by contrast, rare as their construction and analysis face multiple challenges. Such methods, however, would represent extremely useful tools to understand cell behaviour, with application in distinct fields of biological and medical research. In particular, they could be useful tools to study genotype–phenotype mappings, and the way they are affected by specific conditions or perturbations. Here, we review existing computational approaches that integrate signalling, gene regulation and/or metabolism. We describe existing challenges, available methods and point at potentially useful strategies.
Cellular metabolism comprises catabolic pathways, which break down molecules to produce energy, and anabolic reaction routes, which provide the essential building blocks required by the cell to synthesise the molecules it consists of and, thus, to enable homeostasis and growth. The required enzymes catalysing these metabolic processes are encoded in the genome and are translated from the intermediate messenger RNA (mRNA). A cell needs to control and adapt its enzyme production and behaviour depending on its requirements. This is achieved through diverse regulation mechanisms at the level of transcription (where DNA is copied into RNA) and translation (where mRNA is decoded into proteins).2 Finally, the cell has to sense its environment to react accordingly via signal transduction mechanisms that are closely related to regulatory mechanisms through signalling cascades.3
All these processes are clearly interconnected (as illustrated in Fig. 1) but have usually been studied separately in distinct sub-disciplines of cellular biology. While all types of biological molecules and interactions directly or indirectly influence all processes, there are three main layers with distinct characteristics: (i) metabolism: the production and consumption of diverse small molecules (metabolites) in enzyme-driven metabolic reactions, (ii) gene regulation: the control of the abundance and translation of transcripts (mRNA) and consequently proteins, and (iii) signalling: the interaction of proteins that generate and process flows of information.
Fig. 1 Schematic of the interconnection among signalling, gene regulation and metabolism. In a cell, signalling networks are activated by external signals, e.g. ligands (grey shapes) binding to a receptor (black semi-circles) located in the cell membrane. The signal is then internally propagated in the cell by means of e.g. protein phosphorylation cascades. These cascades may lead to alterations in the expression of genes by activating or inhibiting transcription factors (TFs). Gene regulatory networks control the transcriptional level of genes, and thus the production of messenger RNA molecules, which are subsequently translated into proteins. These proteins are in turn involved in cellular functions, including signal transduction and the catalysis of metabolic reactions. Specific metabolites are known to affect proteins' activity (e.g. binding to the allosteric site) and can also influence gene regulation. As illustrated in the scheme, signalling, gene regulation, and metabolism are tightly interconnected showing that the systems' behaviour can only be accurately modelled and understood by properly integrating the sub-systems. The interactions between the molecules are represented by edges: arrow shaped edges represent activating interactions; blunt edges represent inhibitory interactions; and edges with a circle on the top end depict enzyme reaction catalyses. |
The phenotype of a cell results from the interoperation of the three different layers of biological processes since, as stated above, they are linked through diverse types of interactions. An important, well-known example is the regulation of blood sugar in humans.4 The liver regulates the level of sugar in the blood by releasing or storing it in the form of glycogen. Glucose levels are exquisitely regulated by complex control mechanisms driven by extracellular signals. The main hormonal drivers are insulin and glucagon, both being synthesised and released from specialised pancreatic cells in a glucose-dependent manner. The information contained in these extracellular signals is decoded and translated by liver cells via signalling and regulatory processes.
Complex diseases are systemic phenomena affecting multiple cellular processes. For example, cancer is characterised by a deregulation of the mechanisms that govern transduction of extracellular signals into the gene expression system, but also by an impaired functioning of its metabolic machinery.5 Hence, only an integrated view of the processes involved can lead to a comprehensive understanding that may shed new light on the development of these diseases and, therefore, provide new treatment opportunities.
Mathematical modelling has become a key methodology for gaining a deeper understanding of complex biological phenomena and for predicting phenotypes under different conditions. Similarly to what happens in experimental studies, signalling, gene expression, and metabolism are often modelled separately and integrated models are still scarce. Accordingly, mathematical formalisms have been developed independently, tailored to the nature of the biochemical interactions and molecules involved, and to the specific features of the processes in each domain. Some efforts have attempted to connect these different processes, both experimentally and computationally. Given the complexity of the task, studies in this direction so far have been limited in number and scope.6
We believe that the time has come to address this challenge. The main motivation, in our opinion, is the rapid development of high-throughput measurement techniques for these different types of data, associated with the corresponding ‘omics’ label: in historical order, genomics, transcriptomics (gene expression), proteomics and metabolomics. While there are undoubtedly many challenges, modelling approaches that leverage these data, as well as improved parameter optimisation algorithms7 and high-performance computing, should be a major avenue of research in the coming years in systems biology. This should lead to a broad range of potential applications in biotechnology, biomedicine, and pharmaceutical research.
In this review, after a brief summary of existing modelling approaches for signalling, metabolic, and gene regulatory processes, we describe recent efforts to connect these layers (Fig. 2).
Fig. 2 Overview of formalisms for modelling signalling, gene regulatory, and metabolic networks. Multiple formalisms and simulation methods can be used to model and analyse each biological system. Due to specific biological features, some mathematical formalisms are more suitable for specific systems (see main text). Some methods can model different types of systems, using either different (e.g. SR-FBA40) or the same mathematical formalism.47 Specific references are only used for the cases where a general term is not available; see main text for more references. Bettenbrock et al.,44 iFBA,49 idFBA50 and Karr et al.6 represent the first efforts to integrate the three different systems. König et al.47 and Mosca et al.48 presented a metabolic ODE model approximately integrating the hormonal control via insulin, glucagon and epinephrine as underlying signalling networks are not incorporated. |
There is currently no single modelling formalism that can cover all biological aspects. Different types of biological networks are modelled using different formalisms that properly suit and represent their behaviour and specific properties. On the one hand, ordinary differential equations (ODEs) describing the underlying biochemistry are often used, as they are detailed and have high explanatory power. However, their applicability is limited due to the difficulty to obtain the necessary model parameters. They also have limited scalability, and thus they are, in general, not applicable to genome-scale models and simulations. On the other hand, less detailed approaches like Boolean networks and constraint-based models have been used in larger networks. Choosing the best modelling formalism is a trade-off between detail and complexity.
Signalling networks can be mathematically represented using different types of formalisms. One can roughly distinguish between a mechanistic (biochemical) description, based on the chemical reactions that underlie signal transduction, and a causal description, where connections between nodes (typically proteins) correspond to a node's effect on the other, without describing these processes in molecular detail.9
Within the causal set of formalisms, arguably the simplest modelling approach is the Boolean representation of networks, identifying a node's state to be either active/on or inactive/off. Boolean networks can be modelled as dynamic systems, following the state of nodes over a (discrete) time range,11 or by studying their basic input–output behaviour based on logical steady-states.12 Despite their simplicity, Boolean models can reveal important structural features of signalling pathways.13–15 Moreover, they can be refined or extended in various ways including multi-level logic, fuzzy logic, probabilistic Boolean networks, and logical ODEs.13
Mechanistic formalisms benefit from describing the process details, although this increases the model complexity. Mechanistic models are most commonly modelled as a set of biochemical reactions.10 A more sophisticated and principled way consists of writing rules describing the interactions among the different proteins that are then instantiated into biochemical reactions.16 In both cases, one often generates ODEs to be simulated, while in some cases (in particular when one simulates individual molecules), stochastic formalisms are used. Finally, spatial localisation in the cell is very important for signal transduction, and is sometimes explicitly considered when building models.10
Given this large set of possible interactions, the construction of regulatory networks is far from trivial and our knowledge is still very limited. Indeed, substantial work has been devoted to reverse-engineer these networks from experimental data,21,22 mainly at the gene expression level, but also considering, for instance, transcription factor binding sites and protein–DNA interactions. Popular methods to reconstruct these networks include Bayesian inference, approaches based on mutual information, and modular approaches to reduce the problem's complexity.22 These have resulted in a few genome-scale models, mostly restricted to transcriptional regulatory networks, for well studied microbes such as Escherichia coli, as well as numerous other networks of small/medium scale for sub-systems of interest in biomedical research.
Similarly to the signal transduction networks, formalisms for representing regulatory networks range from Boolean approaches for larger-scale networks to ODEs for small/medium-sized networks.18
Large-scale metabolic networks have been used for some years now, being usually represented as biochemical (mechanistic) networks solely based on the stoichiometry and reversibility of the reactions involved.23 By assuming pseudo steady-state conditions, i.e. the concentrations of all intracellular compounds remain constant, various functional properties and capabilities of metabolic networks can be explored and phenotypes can be predicted under different environmental and genetic conditions. Most of the techniques belong to the class of constraint-based methods which include flux balance analysis, metabolic flux analysis, pathway analysis by elementary modes or extreme pathways.24 There is a rapidly increasing number of stoichiometric genome-scale metabolic models that have been reconstructed and verified using such methods, including organisms like Escherichia coli,25 yeast26 or human.27 Stoichiometric models and constraint-based techniques have also been used to compute intervention strategies for Metabolic Engineering28–30 or to tackle biomedical issues related to biological discovery including elucidation31 and targeting32 of cancer mechanisms.
Regarding kinetic models of metabolism, only small-scale metabolic networks are usually modelled using ODEs and only a few with a larger size exist.33–35 Simulation of large metabolic models requires a huge computational effort, therefore model reduction is often used to reduce the size of the model and consequently the complexity of the mathematical problem.36
In eukaryotic cells, compartmentalization represents an important issue that has to be properly addressed for developing predictive models.37 Division of the cell into different compartments enabled specialization of organelles for carrying out specific metabolic functions. Important conditions for the performance of metabolic enzymes such as pH, energy, cofactors or, generally, metabolite concentrations have been evolutionary optimized and are specific for the metabolic role of the compartment. Current genome-scale stoichiometric models already include compartmentalization.25–27
In the case of qualitative modelling, formalisms have been developed tailored to the different types of large-scale networks, but their integration is not straightforward. At least for signalling and regulatory networks, similar modelling formalisms (e.g. logical models) can often be used, as signal flows are a key characteristic of both network categories. In those cases, model merging becomes, in principle, feasible but is actually rarely done. The reasons for this lack of integrated models might be the lack of knowledge about the molecular interfaces between the layers, and the absence of suitable data simultaneously at both levels. Furthermore, as stated above, different time scales need to be considered for each layer, which is often difficult in qualitative models. Connecting gene regulation or signalling with metabolism is additionally hampered by the different types of interactions (signal flows vs. mass flows) that imply different semantics in model descriptions.
To summarise, no single mathematical formalism currently seems capable of simulating the phenotype of a cell taking signalling, gene regulation and metabolic systems into account. In previous work, several methods have been presented to integrate two of these layers. We review these cases in the next sections, and in the last section efforts to address integrated models covering the three layers are described.
A similar strategy, named steady-state regulated FBA (SR-FBA),40 introduced a unified constraint-based approach. Here, a mixed integer linear programming (MILP) formulation was used to address the underlying optimisation problem. The formulation combined binary variables for the regulatory layer and real valued variables for the reaction fluxes. It also included distinct types of constraints representing the metabolic layer (from FBA) and the regulation interactions, all integrated in a single framework. This revealed feasible combined regulatory and metabolic states. Both rFBA and SR-FBA simulate the metabolic phenotype of a metabolic network under different environmental and genetic conditions (e.g. after knockout of certain genes). Besides having a higher predictive power than FBA, they also reveal insights into novel types of regulatory mechanisms. However, despite their added value, these approaches have two main weaknesses. As the regulatory interactions are represented using Boolean logic, the metabolic reactions are limited to a binary response (on vs. off). Furthermore, rFBA and SR-FBA encounter problems with cyclic networks and cannot account for regulatory feedbacks from the metabolic back to the regulatory layer. These feedbacks can take place, for example, when certain metabolites (or metabolite concentrations) affect the transcription of certain genes.
PROM (probabilistic regulation of metabolism)41 is another method proposed for connecting transcriptional regulatory networks with metabolism. Here, regulatory networks are represented by a probabilistic model, which is inferred (for a given organism) from gene expression data. Thus, PROM allows for a more quantitative description of regulatory events, instead of simple on/off rules. However, it may also have problems when closing back the circuit from the metabolic to the regulatory layer.
Another approach capable of directly integrating high-throughput measurements with metabolic network models is integrative omics-metabolic analysis (IOMA) which quantitatively integrates proteomic and metabolomic data with genome-scale metabolic models.42 The method formulates a quadratic programming (QP) problem to search for a steady-state flux distribution in which flux through reactions with measured proteomic and metabolomic data is as consistent as possible with kinetically derived flux estimations. Hyduke et al. provides an overview on methods for interpreting omics data with stoichiometric models.43
Some other efforts focused on integrating modules of metabolic and gene regulatory processes at the level of ODEs. A larger example of such a model was presented by Bettenbrock and colleagues,44 who quantitatively described the regulation of uptake and metabolism of various carbohydrates in Escherichia coli. By incorporating regulatory and metabolic events and by fitting the kinetic model against a large set of measurements, a complex phenomenon such as catabolite repression could be adequately described in a dynamic and quantitative manner. As stated above, such a fully mechanistic description based on ODEs does not seem feasible for large-scale networks.
However, this is actually rarely done, possibly because the interfaces between these two layers are not well characterised. These interfaces are actually not simple; Stelniec-Klotz and colleagues46 recently inferred jointly a signalling and regulatory network, identifying complex relationships between these layers. Accordingly, there are only a few examples of models linking signalling with gene regulation, and these are on a relatively small-scale level.45,46
Using a similar approach, Mosca et al. presented a dynamic metabolic model in which the cellular metabolic steady state condition was adapted to present two different phenotypes in HeLa cells.48 These two phenotypes are due to the regulatory effect of the PI3K/AKT/mTOR signalling pathway. The metabolic effects of the signalling pathway were modelled as modifications of the maximum rate (vmax) of distinct metabolic reactions. However, as in the previous model, the PI3K/AKT/mTOR signalling pathway was not described mathematically.
iFBA has been applied to integrate different formalisms to create a medium scale model of Escherichia coli: the rFBA model for selected pathways of E. coli, including FBA simulation for the metabolic layer and Boolean networks for the regulatory part, is combined with an ODE model of the phosphotransferase system. This approach demonstrated a strategy to integrate modules of ODE/Boolean representations of metabolic/regulatory processes with FBA models. Importantly, regulatory events (and thus the resulting reaction activities) are modelled as simple binary variables.
In contrast, idFBA requires an integrated stoichiometric reconstruction of the three layers, incorporating slow and fast reactions in the framework. Slow reactions are incorporated directly into the stoichiometric matrix with a time-delay, while fast reactions rely on the pseudo steady-state assumption of FBA. idFBA was applied to the analysis of a prototypic integrated system of yeast.50
To the best of our knowledge, the largest integrated model that has been published so far is the whole-cell model of Mycoplasma genitalium,6 accounting for more than 500 genes. Consistent with our statement given above, the authors also concluded that no single formalism could capture the diverse types of cellular processes. Consequently, in their approach they divided the total functionality of the cell into 28 sub-modules. Each sub-module was modelled using a specific mathematical representation. For example, metabolism was modelled using flux-balance analysis, whereas RNA and protein degradation were modelled as Poisson processes. The integration of the sub-modules during the whole-cell simulation was done based on the assumption that they are approximately independent of short timescales (less than 1 s). Simulations were then performed in discrete time steps: at each step the sub-modules are run independently, but depend on the values of variables determined by the other sub-modules at the previous time step. Hence, the different modules work autonomously for one second and then exchange material and information according to given rules.
This hallmark study is a proof of concept showing that all the processes in a cell can be modelled in an integrated fashion, assuming different time-scales of operation.
To develop models that integrate different types of biological networks into models that can be simulated, three computational aspects need to be addressed: (i) a mathematical formalism has to be adopted to represent each layer; (ii) a simulation method capable of accounting for the different types of systems has to be developed; and (iii) the interactions among different layers have to be identified and modelled. The proper solution is dependent on the specific problem at hand. For example, small models can be integrated using differential equations (ODEs). In contrast, larger models can be integrated using formalisms and simulation methods that best fit each layer. When different layers are modelled/simulated with different formalisms and methods, interfaces among them need to be established. An important issue to take into account is the proper compartmentalization of each biological entity and interaction, which affects model predictive capabilities, but is still hard to achieve with current modelling tools.
The recent breakthrough by Karr and colleagues to assemble a predictive whole-cell model of a simple organism6 should inspire the development of similar models in more complex organisms. The complexity of a human cell (genome size and complex regulation, vast variety of biological molecules and interactions, etc.) is overwhelming, and our knowledge of many biological processes (e.g. RNA regulation) is still fairly limited. Whole-cell computational models for higher organisms and in particular humans are, in our opinion, still a long way away.
Nevertheless, these integrated models should be a long-term grand goal of systems biology as tools to enhance our understanding of biological complexity and human disease.
This journal is © The Royal Society of Chemistry 2013 |