Samuel D.
Foster
,
S. Helen
Oram
,
Nicola K.
Wilson
and
Berthold
Göttgens
*
Haematopoietic Stem Cell Laboratory, Cambridge Institute for Medical Research, Wellcome Trust/MRC Building, Hills Rd, Cambridge, CB2 0XY. Tel: +44 (0)1223 336822
First published on 14th July 2009
Haematopoiesis (or blood formation) in general and haematopoietic stem cells more specifically represent some of the best studied mammalian developmental systems. Sophisticated purification protocols coupled with powerful biological assays permit functional analysis of highly purified cell populations both in vitro and in vivo. However, despite several decades of intensive research, the sheer complexity of the haematopoietic system means that many important questions remain unanswered or even unanswerable with current experimental tools. Scientists have therefore increasingly turned to modelling to tackle complexity at multiple levels ranging from networks of genes to the behaviour of cells and tissues. Early modelling attempts of gene regulatory networks have focused on core regulatory circuits but have more recently been extended to genome -wide datasets such as expression profiling and ChIP-sequencing data. Modelling of haematopoietic cells and tissues has provided insight into the importance of phenotypic heterogeneity for the differentiation of normal progenitor cells as well as a greater understanding of treatment response for particular pathologies such as chronic myeloid leukaemia. Here we will review recent progress in attempts to reconstruct segments of the haematopoietic system. A variety of modelling strategies will be covered from small-scale, protein–DNA or protein–protein interactions to large scale reconstructions. Also discussed will be examples of how stochastic modelling may be applied to multi cell systems such as those seen in normal and malignant haematopoiesis.
![]() Samuel Foster | Samuel Foster is a Medical Research Council funded PhD student in the Göttgens laboratory in the Department of Haematology, within the Cambridge Institute for Medical Research. His project involves investigating genome-wide transcriptional regulation within blood stem cells, as well as in depth analysis of individual candidate regulatory regions. Using in silico, in vitro and in vivo techniques, he hopes to identify novel underlying patterns within the transcriptional networks that control blood stem cells. |
![]() S. Helen Oram | S. Helen Oram is a Kay Kendall Clinical Research Fellow in the Department of Haematology at the University of Cambridge. She has an interest in the mechanisms and effects of aberrations in transcriptional control of key haematopoietic regulators in T-acute lymphoblastic leukaemia. She has completed clinical training in haematology at University College Hospital London where her primary interest was haemato-oncology. |
![]() Nicola Wilson | Nicola Wilson is a post-doctoral research associate in the Göttgens laboratory of the Department of Haematology within the Cambridge Institute for Medical Research. She undertook a PhD at the University of Leeds, looking at the transcriptional regulation of genes critical within haematopoietic lineage choice. Her current research interests are focused around understanding the transcriptional networks controlling blood stem cells with a particular interest in the identification of combinatorial regulatory codes. |
![]() Berthold Göttgens | Berthold Göttgens is Reader of Molecular Haematology at the University of Cambridge. His laboratory is based in the Cambridge Institute of Medical Research and forms part of a wider consortium of groups studying normal blood stem cells and leukaemia. The particular focus of the Göttgens group is the integration of experimental and computational approaches for further understanding of the transcriptional control of blood stem cells. |
![]() | ||
Fig. 1 Simplified haematopoietic tree. The haematopoietic stem cell is able to self-renew and give rise to multipotent progenitor cells, which in turn give rise to more restricted progenitors and ultimately the terminally differentiated cells of both the myeloid and lymphoid lineages. The true haematopoietic tree as it is currently understood contains over 50 different cell types. HSC = haematopoietic stem cell, MPP = multi-potent progenitor, CLP = common lymphoid progenitor, CMP = common myeloid progenitor, MEP = megakaryocyte-erythroid progenitor. |
Given its status as a model system for mammalian development and because of the biomedical impact of haematological disease, increasing effort has been placed on developing models that describe various aspects of the haematopoietic system. The underlying events of the haematopoietic system can be modelled at multiple levels. Analysis can involve study of the intricate workings of core regulatory circuits, through investigation on an -omic and cellular level, up to examination of processes on a tissue-wide dimension. Work undertaken on all these levels has provided a greater understanding of the haematopoietic system, with each casting light on a fragment of the complex regulatory processes that underpin the haematopoietic hierarchy. Here we provide an overview of recent efforts to construct network models at various levels based on specific experimental, as well as theoretical examples, thus outlining how modelling the haematopoietic system is providing new insights that are relevant to both fundamental and translational biomedical research.
Both Gata-1 and EpoR are critical for erythropoiesis, i.e. the formation of red blood cells.15–18 Palani and Sarkar describe the process of creating a mathematical model of the EpoR/Gata-1 core network, crucial to erythropoietic commitment, using an array of biochemical and kinetic data.19 The model is built upon the previously described relationship between EpoR and Gata-1.10–12 Of the 44 parameters used in the model, the vast majority were obtained directly from the literature, others were refined from values in the literature or estimated using Gata-1 DNA binding time-course activity assays.19 The resulting mathematical model presents a bidirectional link between the lineage-specific receptor, EpoR, and the transcription factor Gata-1. The study is careful to make clear that whilst providing an insight into the relationship and mechanics of a small set of crucial erythropoietic molecular effectors, it only covers a small fraction of the full network controlling the process of erythropoiesis. Nevertheless, several aspects of the resulting model are worthy of note. The network described can exhibit ultrasensitivity and bistability between the two molecular effectors. This is a consequence of the links made between the regulation of EpoR transcription by Gata-1 and the activation of Gata-1 by the EpoR–PI3K/AKT cascade. This positive feedback loop provides a level of autoregulation that can be fine tuned due to the separation of the activation and synthesis of Gata-1. The model also appears to provide a meaningful link between an extrinsic and an intrinsic regulatory signal in the regulation of cell maturation as well as survival. The construction of this model is a clear example of how a core regulatory network can be modelled from previously described molecular relationships using, in this case, biochemical data.
The creation of such large amounts of data has driven a demand for statistical and computational methods that can handle the required analysis. One such algorithm is the ARCANE algorithm which uses an information-theoretic method to analyse microarray expression profiles and identify gene product interactions.20 This method differs from other existing computational methods that also use gene expression data to infer biochemical interactions, by having a low polynomial computational complexity, using the full range of data and not making assumptions about underlying networks.20 This, it is suggested, ensures that ARCANE can identify interactions more accurately and more quickly than its rivals. ARCANE has been used to reconstruct a system-wide transcriptional network for human B cells21 from 336 B cell expression profiles, representing normal B cells, B cells transformed to represent B cell tumours and B cells that were experimentally manipulated to be similar to those seen in the lymph node germinal centre. With the resulting network containing ∼129000 interactions, the number of interactions modelled was clearly at genome scale, which would not be achievable through methods used to study core-regulatory networks. The resulting network described how a small number of highly connected genes interact with most of the genes in the cell. Of interest, the proto-oncogene MYC emerged as one of the largest cellular hubs and was subjected to further in-depth investigation, which showed that >90% of inferred interactions corresponded to interactions seen in in vivo experiments.21 Whilst the authors acknowledge that work remains to be done to remove several limitations with the inferred network, the network does provide an example as to the possibilities now open for the construction of genome -wide networks.
Another recent study employing inference of regulatory interactions from genome -wide expression data was centred on an investigation of Toll-like receptor (TLR)-activated macrophages.22 By clustering gene expression data using a k-means algorithm, the authors were able to identify three distinct “waves” of transcription after stimulation by the TLR4 agonist bacterial lipopolysaccharide (LPS). Using the hypothesis that genes clustered through similar expression profiles might share similar cis-regulatory elements, cis-regulatory analysis was performed using the bioinformatic tool MotifMogul.22 The gene cluster with peak expression 1 h after LPS stimulation was predicted to have an over-representation of ATF/CREB binding sites. The only ATF/CREB family member up-regulated by LPS stimulation was ATF3. This observation served as a starting point to study the transcriptional network surrounding ATF3. Using protein–protein interaction maps, a link was made to two other proteins that were previously identified to be involved in TLR signalling, namely NF-κB and AP1,22 thus revealing potential combinatorial regulation downstream of ATF3. A common theme shared between this investigation and the MYC sub-network study is the use of large genome -wide datasets to reconstruct and visualise regulatory networks. However, both studies also agree that in silico inference alone is not yet sufficiently reliable to obviate the need for experimental validation when reconstructing complex networks from genome -wide data.
As the technology is rapidly adopted by other fields, genome -wide catalogues of the binding sites for key haematopoietic transcription factors are generated. One such factor is encoded by the Scl (Tal-1) gene which encodes a basic helix-loop-helix (bHLH) transcription factor required for the specification of HSCs and the differentiation of the erythroid and megakaryocytic lineages.25–27 Within the blood system, Scl is thought to be a key component of the regulatory networks controlling the specification and subsequent differentiation of HSCs.9,14 Important upstream transcriptional inputs of murine Scl and its paralogue Lyl1 are Ets and Gata factors, as well as Scl itself.28–34 Only a handful of direct downstream targets of Scl had been described, including Gata-1,35Runx1,36c-kit37 and α-globin.38 To fully understand the function of Scl within haematopoiesis it is particularly important to gain information on downstream targets at early developmental time points where Scl function is critical. A recent ChIP-Seq screen for Scl targets took advantage of the early progenitor cell line HPC-7 as a model system for blood stem/progenitor cells.39 Stringent filtering of the ChIP-Seq data resulted in the identification of 228 high confidence binding events many of which occurred within or next to genes known to be important within transcription and signalling. To assess the in vivo functionality of regions bound by Scl, comprehensive in vivo validation was performed using F0 transgenic mouse embryos which validated 16 transcription factor genes as bona fide targets of Scl. Moreover, bioinformatic analysis highlighted that the consensus binding sites for several transcription factors were over-represented amongst the Scl-bound regions and subsequent analysis for one of these factors (Gata-2) demonstrated that 15/16 regions bound by Scl were also bound by Gata-2 (see Fig. 2). ChIP-Seq analysis followed by in vivo validation therefore provided a rapid means of clarifying the transcriptional hierarchies of several known haematopoietic regulators (Cbfa2t3h, Cebpe, Nfe2, Zfpm1, Erg, Mafk, Gfi1b, and Myb) as well as providing links to previously unsuspected novel candidate regulators.40 Finally, data mining of protein–protein interactions curated from the literature showed extensive protein–protein interactions within this network many of which involved Scl and Gata-2 (see Fig. 2). This observation not only is suggestive of multiple feedback loops at the level of protein complexes but is also highly reminiscent of the embryonic stem cell pluripotency network.
![]() | ||
Fig. 2 Scl- and Gata2-centric transcriptional network. In vivo validated core network. Each solid line indicates a binding event observed in a ChIP-Seq experiment which has then been functionally validated in transgenic assays, whereas the dashed lines represent protein–protein interactions curated from the literature.40 |
The study illustrated that analysis using whole-population expression averaging may not fully take into account the potential biological roles of outlying subpopulations. This issue was investigated further by analysing whether the heterogeneity of Sca-1 expression observed in the clonal population of EML cells bore any correlation with variable differentiation potential. The fastest rate of erythropoiesis was observed in the Sca-1low population, with the Sca-1high population showing the slowest rate. This rate variance reduced as the subpopulations reverted back to the parental expression pattern when placed into culture under self-renewing conditions. Of note, it required almost an additional 2 weeks of culture, after Sca-1 expression appeared consistent between the subpopulations, until the erythropoietic differentiation potential was equal among cultures derived from the 3 subpopulations.
Intrigued by this variation in lineage choice between the Sca-1low, Sca-1mid, Sca-1high populations, the authors set out to link this variability to protein levels known to affect erythroid lineage differentiation. Cross-antagonism between the Gata-1 and PU.1 transcription factors was known to affect erythroid lineage choice. Interestingly, Sca-1high cells exhibited lowest Gata-1 expression and had higher PU.1 levels. Using granulocyte-macrophage colony-stimulating factor and interleukin-3, it was observed that, for the Sca-1 subpopulations, erythropoietic differentiation rates negatively correlated with myeloid differentiation rates. This showed that, at least on a small scale, heterogeneity within a clonal population can provide various cellular states each with distinct biological functions. The question remained whether this was small in scale, involving only Gata-1 and PU.1, or whether it was in fact a genome -wide pattern. Importantly, expression profiling of the three Sca-1 subpopulations revealed that the effect is genome -wide. Using significance analysis of microarrays (SAM), more than 3900 genes were highlighted as differentially expressed between the Sca-1low and Sca-1high populations. A further expression profile was taken from a population of cells 7 days after treatment with erythropoietin (Epo). This revealed that of the three Sca-1 subpopulations the Sca-1low population was most similar to the fully differentiated Epo treated cells. However, the ability of all three subpopulations, including Sca-1low, to revert back to parental Sca-1 expression, shows that these subpopulations are not pre-committed to any particular lineage choice. Following this work, what may previously have been dismissed as random fluctuations in gene expression causing short-lived cellular phenotypes or cellular gene expression noise,42–47 can now be seen as a cellular-wide network, with each phenotypic subpopulation having its own biological function and potential.
Throughout the lifespan of the organism, haematopoietic tissue must produce a continuous supply of the correct number of differentiated, fully functional blood cells of each of the various types seen in adult blood. The mature cells are fully specified and, as such, are generally incapable of self-renewal or de-differentiation and re-differentiation into an alternative cell type. The source of all these various cells is the HSC. However, as the definition of an HSC is a functional one (they must be able to proliferate, differentiate, self-renew, and be able to reconstitute the blood system after marrow ablation), it is only possible to assign these characteristics to any one cell of interest after subjecting it to the appropriate biological assay during which the attributes and qualities of that cell will inevitably be altered. Another obstacle is that the total number of HSCs predicted in an organism is low, of the order of 100–200 HSCs per kg48 meaning that an adult male human may have only 10000 HSCs to sustain haematopoiesis throughout life. Attempting to locate and identify them is, therefore, a challenge in itself as is amassing a meaningful enough number of cells on which to perform multiple replicated assays. The heterogeneous nature of cell surface markers and of functional properties of stem cells adds further to the complexity of the situation.
By turning to stochastic calculus, one might attempt to generate a model that takes account of observations made of the output of the haematopoietic system, the maturing haematopoietic cells, and of observations made of the effect of the microenvironment on stem cells to predict the early, currently unobservable, behaviour of these cells. The inter-relationship between the current state and current behaviour of the stem cells, as described by a system of partial differential equations, is sufficiently complex such that they cannot be easily analytically solved and thus the behaviour of the system must be observed instead through simulation where random behaviour of the stem cells defined by the partial differential equations is simulated in discrete time space.
There has been interest in the application of stochastic calculus to HSC behaviour for many decades.49,50 Much of the recent work in this field has been published by Roeder and colleagues who, in 2002,51 proposed a model whereby it was assumed that cells could inhabit and change state between two growth environments, one which promoted a quiescent state and the other that promoted a proliferating state (see Fig. 3). It was assumed that each state had a definably, but random, affinity to remain in the non-proliferating compartment or the cycling status. It was noted that cells in the non-proliferating compartment were likely to be attached to and receiving signals from stroma, whereas the cells in the proliferating compartment were likely to have detached. A number of assumptions were then overlaid on the model based on the propensity of any one cell to attach or detach from stroma in any given unit of time. Probability distributions for cycling times and attachment affinity were based on published data and in vitro observations. The resultant model was shown to accurately represent the contribution of donor stem cells to mature blood formation in chimeric animals following haematopoietic stem cell transplantation,52 the emergence of a dominant clone53–56 as well as the variability noted in engraftment potential and replating ability of stem cells dependant on variations in the microenvironment.57
![]() | ||
Fig. 3 Representation of the Roeder/Glauche model.51,67 The model shown above is based on the assumption that the development of haematopoietic stem cells depends upon microenvironmental signals. Accordingly, two different growth environments, one promoting ongoing ‘stemness’ and another promoting division and differentiation are proposed. The variable likelihood of the cells transiting from one environment to the other is modelled as a stochastic process, with transition probabilities dependant on the individual cellular affinity for its current environment and on the effect of attractors drawing cells towards the other state. |
In order to test the ability of the model to accurately represent the variable clonogenic and repopulation potential of HSCs after bone marrow transplantation, Roeder et al.52 have shown that their model correctly demonstrated the clonal competitiveness and unstable chimerism seen in recipients after chimera formation and may also predict the effects of cytokine-induced disturbance of HSC contributor proportions in stable chimeras. This work has subsequently been extended48,58–61 in an attempt to estimate the degree of clonal heterogeneity seen after transplant, to predict competitive differences between HSCs and to predict the end clonal composition of the recipient and was shown to closely follow experimental data.58–61
An extension of the Roeder model is to consider that there may be more than one attractor drawing the stem cell away from the quiescent/self-renewing compartment. The Kirkland model62 uses a branching Markov process which models a population of stem cells proceeding randomly in a fashion entirely independent of its condition in the past. It takes account of the prospect that, following division, some of the progeny of a stem cell may acquire new and different properties which may also be modelled independently of each other and of the parent cell’s behaviour. The model has been validated by analysing the output of the stem cell compartment and accurately predicted fluctuations in contributions to that output by progressively changing dominant clones.
Chronic myeloid leukaemia (CML) and its specific therapy using the bcr-abl targeted tyrosine kinase inhibitorimatinib has emerged as the paradigm of molecularly targeted cancer therapy. Stochastic modelling has been applied with good effect to aspects of the disease as diverse as accurate modelling of the progressive dominance of the malignant clone,54–56 changes in bcr-abl transcript levels,54–56,67 the effects of commencing and terminating imatinib treatment55,56 and the emergence of resistance to therapy.68 Of principal interest is that both the Roeder54 and the Michor68 models accurately represent one of the most significant challenges in CML therapy, the persistence of low-level residual disease despite therapy.
In addition to exploring the utility of modelling in the context of haematological malignancies, similar approaches have been applied to the field of gene therapy. There are a great number of challenges in the path of a safe and effective gene therapy program involving the haematopoietic system; how to identify and acquire adequate numbers of HSCs, how to efficiently deliver the DNA to the HSC to be certain of adequate expression in the recipient, how many HSCs must be transplanted to be assured of adequate engraftment to alleviate clinical symptoms. Abkowitz et al.58 utilised a computer simulated stochastic model in the late 1990s to predict the outcome of altering these variables. Modelling exercises such as these provided vital information as to the eventual impact of vector choice, HSC number transduced and transplanted at a time where the technical capability to perform the experiments in vivo was still in development, and their predicted outcomes have since been supported by experimental data.58,61,63,69,70
Footnote |
† This article is part of a Molecular BioSystems themed issue on Computational and Systems Biology. |
This journal is © The Royal Society of Chemistry 2009 |