Fernando J.
Alvarez-Vasquez
*a,
Julio A.
Freyre-González
b,
Yalbi I.
Balderas-Martínez‡
c,
Mónica I.
Delgado-Carrillo
d and
Julio
Collado-Vides
c
aNational Institute of Agronomic Research, UR1115 PSH, Avignon, France. E-mail: fernando.alvarez-vasquez@inra.avignon.fr
bEvolutionary Genomics Program, Center for Genomic Sciences, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, Mexico
cComputational Genomics Program, Center for Genomic Sciences, Universidad Nacional Autónoma de México, A.P. 565-A, Cuernavaca, Morelos 62100, Mexico
dInstitute of Applied Mathematics, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
First published on 2nd February 2015
Transcription factors (TFs) modulate gene expression as a consequence of internal or exogenous changes in cell signaling. TFs can bind to DNA either with their effector bound (holo conformation), or as free proteins (apo conformation). With the aim of contributing to the understanding of the evolutionary fitness and organizational principles behind the different TF conformations, we inquire into the origins of these conformational differences by analyzing these two TF conformations from the perspective of Savageau's demand theory. For the control of a gene whose function is in high demand, we found that evolutionary constraints are responsible for activator TFs binding to DNA mainly in holo conformation whereas apo activation is under-represented. The mathematically controlled comparison of the apo and holo conformations reveals formal and evolutionary arguments in favor of this TF control asymmetry, which suggests that evolution favors holo activation under environmental conditions commonly found by E. coli in the human digestive tract. Specifically, the sensibility analysis performed for the holo conformation, in the positive mode of regulation, shows that the wild-type is more robust for situations where realizable changes in the model's parameters favored a better performance under non-stressful environmental conditions commonly found by E. coli in the human digestive tract. By contrast, the positive apo conformation is better adapted to adverse situations. On the other hand, the sensibility analysis performed for the negative mode of regulation showing none of the TF active conformations presents an advantage.
Why is the transcription factor in holo conformation dominant in the Escherichia coli K-12 bacteria as the mode of regulation? Why is the apo active conformation under-represented? Are these alternative TFs conformations historical accidents or have they been evolved in base of their functional differences?
In this work, we inquire into the possible evolutionary origins of this asymmetry from a population genomics perspective. We explored how mutations and selection could affect the preference for certain TF active conformations, and present evolutionary and mathematical arguments for the apo–holo asymmetry as a product of adaptations allowing the bacteria to respond optimally to the challenges it faces inside the mammalian gut.
Warm-blooded animals provide a favorable habitat and reproduction niche for Escherichia coli.2,3 However, even inside the host this enterobacteriaceae member faces stress-induced situations such as host diet, competition with other microbiota, etc.4
We evaluate the possible influence of the TF–DNA protective interaction on the different TF active conformations and modes of regulation of the environmental conditions for E. coli inside the gut.
Theoretical studies have suggested a functional explanation for the demand theory of gene regulation (DTGR) predictions, claiming that the TF can protect the DNA from errors produced by unspecific interactions between DNA and proteins or other biological components.5 Recently, the possibility of TF–DNA error minimization has been tested experimentally with synthetically engineered organisms.6
Nevertheless, metabolism and gene regulation are strongly coupled by allosterism in bacteria. Interactions between metabolic effectors and their cognate TFs play a fundamental role in controlling genetic output,11,12 given that genetic response not only depends on the presence/absence of the TF but also on the combinatorial control exerted by both the TF and the metabolic effector. Based on information collected from the RegulonDB database13 a recent study found that activator TFs mainly regulate in holo conformation, and provided evidence of statistical under-representation of the apo activation in Escherichia coli K-12.1
Four types of gene control circuits were previously analyzed in DTGR: induction with positive and negative controls, and repression with positive and negative controls. These combinations define the anatomy of the molecular switches that modulate gene expression levels in bacteria when allosterism is neglected8 (Fig. 1). Therefore, this model depends only on the presence/absence of the TF and excludes the possibility of combinatorial control exerted by both the TF and metabolism.
To take this into account, we developed the transcription factor conformation (TFC) model (Fig. 2), which considers the mutation and growth rates of single and double mutant populations after mutations affect: the ability of the TFs to bind an effector or allosteric binding site (r1), TF's DNA recognition site (r2), TF's DNA binding site (m), and the operon promoter. Fig. 2 clearly shows that each double mutant population has two different routes to be generated. Note that the mutation sequence is important for the parameter assignation and the final gene expression (Table S1, ESI†).
Fig. 2 Schematic diagram representing the wild-type and mutant populations. The symbols are as follows: Xw the number of wild-type organisms; Xp the number of promoter mutants; Xm the number of modulator mutants; Xr1 the number of regulator mutants at the ligand binding domain; Xr2 the number of regulator mutants at the DNA binding domain; Xd1⋯Xd2 double mutants. The growth rates are represented by gi where i can take the symbols {w, m, p, r1, r2, d1, d2, d3, d4, d5, and d6}. The symbols inside the square frames correspond to the mutation taking place. The alpha-numbers at one side of the arrows correspond to the mutation rates in Table S1 (ESI†) key. |
Our TFC model includes two new variables Xr1 and Xr2 that correspond to the population of mutants in the allosteric binding site (R1) and the DNA recognition site (R2), respectively, (Fig. 2). We also include two new mutant rate parameters: the DNA protection exerted by the TF (ψ) and the allosteric binding site mutation rate (ω) (Table S1, ESI†). To model the combinatorial control exerted by both the TF and the effector, TFs are now divided into two regions: the first is named rho (ρ), defined as the rate of loss of the functional TF's DNA recognition site (R2 or r2), and the second, omega (ω), defined as the mutation rate for the loss of the allosteric binding site (R1 or r1) (Fig. 3 and Fig. S1–S6 and Table S1, ESI†). TF dissection is essential for appropriate modelling of the apo and holo conformations. As a consequence, our TFC model does not present the additive parameters for the rate of loss of the modulator target site (τ) with the rate of loss of the functional TF (ρ) as collapsed in Savageau's seminal model (see Table 1 from ref. 9). We used three values for modelling the allosteric binding site mutation rate (ω = {1, 20, and 40}). These values are directly related to the average number of critical bases involved in the interaction between TFs and their cognate metabolic effectors, and correspond to around 1, 10, and 20 amino acids, respectively, because the third codon position is the wobble position. We chose these values in agreement with experimental data for LacI showing that the region encoding the essential residues involved in the interaction with allolactose is in the range of 20 to 40 critical bases.14 Please note that ω = 1 is an extreme value that assumes that a single base mutation could disturb the functionality of a fragile TF interaction with its effector.
Fig. 3 Regulation for LacI of an inducible system with negative control during high (a) and low demands (b). The DNA can mutate (diagonal red line) in the modulator (M), promoter (P), and/or in the regulator site R1 if the mutation occurs in the TF-ligand domain or in R2 if the mutation occurs in the TF–DNA binding domain. The horizontal arrow represents the gene expression of the structural gene (E). A blue line starting from R2 and ending in an arrowhead indicates the interaction of the TF with the DNA; if the blue line ends in an X, it represents no TF–DNA interaction with the operon. (a) High demand; (a) wild type, (b–e) four single mutants, (f–k) six double mutants. (b) Low demand; (a–k) similar to Fig. 3a. |
For all the TF conformations analyzed, it is assumed that the TF–effector interaction produces a conformational change in the TF that affects the TF–DNA binding site. In mathematical terms, this implies an additive effect of ω and ρ over the mutation rates, c, i1, j2, and k2, (Table S1, ESI†). This intrinsic TF interaction has been experimentally reported, at least for the well-documented LacI, by molecular structure analysis,15 and by changing residues that affect the binding site,16 among others.
In all the TFs analyzed, it is assumed that the regulatory proteins follow a classical coupled circuit regulation where the TF itself is unregulated,17 as has been experimentally reported for LacI operon regulation.18 Mathematically, the implication is that epsilon's (ε) mutation rate does not affect the TF expression when the structural gene expression is enhanced (Table S1, ESI†).
Following the same assumption as in the DTGR model, we did not include the analysis of possible combinations of double, triple or quadruple mutant populations due to the low probability of their occurrence. Nevertheless the universe of double mutants is presented in Fig. 2 and eqn (S25)–(S30) (ESI†).
As represented by the unidirectional arrows in Fig. 2, it is assumed that the possible reverse mutations restoring the original DNA functionality or compensating the mutation effects are low and were neglected.
It is also assumed that the TF–modulator interaction reduces the basal rate of the mutation by a factor of ψ = 1/10. The parameter ψ represents the DNA mutation rate reduction as a consequence of DNA protection under extreme environmental conditions. This protein–DNA protection can occur under oxidative stress or starvation (e.g.ref. 19 and 20) and is associated with the non-specific binding of other TFs, metabolites, and/or other proteins to the free binding site.5
The growth parameter delta (δ) was assigned according to the more nutritionally deficient environment along the proximal and distal portions of the human digestive tract (Table S2, ESI†).
In the case of Cbl, the δ assignation during the high demand fraction of the E. coli cycle was made in spite of the presence of sulphur nutrients in the colon,21 under the assumption of starvation for sulphur scavenging as a consequence of competition with other sulphur-specialized microorganisms and/or by competition with the host22 (see Discussion for details).
Given that the idea was to make mathematically-controlled comparisons of the active conformations within the activator and repressor modes of regulation, the TFs with dual modes of control are not included in this work.
Fig. 4a and Fig. S7 (ESI†) show that LacI wild-type TS are similar to Savageau's seminal model with respect to their shapes and demand extreme values (Fig. 2A from ref. 8) but different with respect to the TS enclosing the wild-type region. When omega equals 20 and 40, the wild-type boundaries are delimited by Xr1/Xw and Xr2/Xw with the TS for Xm/Xw and promoter Xp/Xw at the periphery. When ω increases, the Xr1/Xw curve moves to the right and the Xr2/Xw curve is displaced slightly to the left; these two migrations act in conjunction, narrowing the wild-type region.
Fig. 4b and Fig. S8 (ESI†) show the following: first, that the curves of the modulator and promoter are similar in shape to those obtained with LacI (Fig. 4a and Fig. S7, ESI†); second, that when ω increases, the Xr2/Xw threshold moves inwards through smaller values of the demand, narrowing the wild-type region; and third, that in all the simulations, the wild-type region is delimited by Xp/Xw on the left side of the demand and by Xr2/Xw on the right side.
Fig. 4c and Fig. S9 (ESI†) show that the shapes for the threshold of selection for the modulator and promoter are similar to those obtained with Savageau's model (ref. 8, Fig. 3A). However, the wild-type boundaries Xp/Xw and Xr2/Xw are delimited now. When ω is increased, the Xr2/Xw thresholds shift to the left increasing the wild-type region.
Cbl regulation is intimately associated with the hierarchical preference of E. coli for sulphur sources: cysteine > sulphate > sulphonates.27 In the presence of cysteine, the preferred sulphur source, the Cbl associate regulon is not expressed. This is because CysB, the major regulator of sulphur utilization, is inactive.
When sulphur is present, N-acetyl-L-serine (NAS) binds to CysB to change its state into the functional holo conformation.28 In the absence of sulphur, the APS concentration decreases, so Cbl can regulate its regulon in its functional apo conformation.
Fig. 4d and Fig. S10 (ESI†) show the wild-type TS boundaries of the wild-type region delimited by Xr1/Xw and Xr2/Xw. When ω increases, Xr1/Xw and Xr2/Xw thresholds shift to the right and the left, respectively, narrowing the wild-type region.
Within the two modes of regulation, there is an almost complete overlapping of the wild-type regions, indicating that the apo and holo conformations do not differentiate in this aspect (Fig. 5).
Tables S4–S6 (ESI†) offer an overview of the population areas framed by the TS from Fig. 4 and Fig. S7–S10 (ESI†) after ω variation. They mark the wild-type as well as the realizable favorable (F) and unfavorable (U) single mutant population regions under high demand. The regions not marked represent zones of coexistence of single mutants.
Fig. 6 and Fig. S15 (ESI†) display the influence of the parameter change on the extreme values of the demand. Fig. S16 (ESI†) presents the influence of the parameters over the TS not surrounding the wild-type region.
Each TFC model parameter (Table 1 and Table S3, ESI†) was evaluated around its nominal value and its influence over Dmin and Dmax were analyzed (see the ESI† Model description for details).
Mutation rate parameters | |
---|---|
μ | Reference mutation rate |
π | Relative to μ, for loss of a strong promoter with negative control |
υ | Relative to μ, for gain of an up-promoter with positive control |
τ | Relative to μ, for loss of a regulator's functional target site |
ρ | Relative to μ, for loss of the transcription factor DNA binding domain |
ω | Relative to μ, for loss of the transcription factor ligand domain |
ε | Relative to μ, when expression is increased 100-fold |
ψ | Relative to μ, for a 10-fold decrease in μ when the transcription factor interacts with its functional DNA binding domain |
Growth rate parameters | |
---|---|
γ | Reference growth rate in the nutritionally richer of the two environments |
δ | Relative to γ, for the more nutritionally deficient of the two environments |
λ | Relative to γ, when there is a loss of expression with negative control |
λ | Relative to γδ, when there is a loss of expression with positive control |
σ | Relative to γ, when there is superfluous expression with positive control |
σ | Relative to γδ, when there is superfluous expression with negative control |
The sensitivities were analyzed by comparing their effect over the area of the wild-type region. A change that produces an increase in the wild-type region is considered to be advantageous over other changes that do not have discernible effects or that produce a decrease of the wild-type region. If no discernible difference is found, then no advantage is selected for any TF conformation.
Negative | Positive | |||||
---|---|---|---|---|---|---|
LacI | TrpR | MalT | Cbl | |||
Increase (→) | D min | Mutation | π | ω | — | μ, ρ, ω |
Growth | — | — | σ, θ | δ, λ | ||
D max | Mutation | — | — | — | μ, υ, ρ, ω, ε | |
Growth | — | — | λ, θ | δ, σ | ||
Decrease (←) | D min | Mutation | ω | π | μ, ρ, ω | — |
Growth | — | — | δ, λ | σ, θ | ||
D max | Mutation | — | — | μ, υ, ρ, ω, ε | — | |
Growth | — | — | δ, σ | λ, θ |
These TF mirror advantages for π and ω are for both the Dmin sides of the demand (Table 2). However, because there is no significant room to additionally increase the wild-type region from the Dmin side, there is no practical implementation or advantage, even if it is theoretically possible (see Fig. 4a, b, 5a, b and Fig. S7, S8, ESI†).
On the whole, from the point of view of the parameter sensitivities, the apo and holo conformations are both well-adapted at the negative mode of regulation. At least, this is the case if one does not take into consideration other factors that could bias the advantages. Possible examples of this might involve mechanisms not included in the model, such as the TrpR attenuation29,30 or gene regulation by auto-regulation.13,24,31
Globally, the parameters with advantages are equally distributed between the two conformations, with 16 cases each (first row Table S10, ESI†). In addition, Table S10 (ESI†) shows that the advantages are equally distributed after grouping with respect to the extremes of the demand or according to the mutation and growth parameters.
Marked differences are evident only when the parameters are grouped according to the increase or decrease in their nominal parameter values (Table 2 and Table S10, ESI†). This includes a bias for the apo conformation when the parameters increase (12 of 16) and for the holo conformation when they decrease (12 of 16).
The classification in Table 2 allows for a better visualization of the advantages after sub-collecting the extremes of the demand within the parameters that increase or decrease their basal values.
It is important to note that the MalT and Cbl wild-type areas almost completely cover the upper extreme of the demand with no practical room for further increase (Fig. 5c and d). This implies that parameters with Dmax advantages, though mathematically feasible, do not offer realistic advantages, and are therefore are not analyzed here.
In Table 2, the Dmin extreme of demand shows a bias for MalT advantages when the parameters decrease their nominal value with three mutation and two growth parameters. The mutation parameters correspond to the reference mutation rate (μ), loss of the transcription factor DNA-binding domain (ρ), and the loss of the transcription factor ligand domain (ω). Growth parameters encompass the more nutritionally deficient environment of the two environments (δ), and the loss of expression with positive control (λ).
By contrast, the Dmin advantages when the parameters increase their nominal value show a bias for Cbl with the same mutation (μ, ρ, ω) and growth (δ, λ) parameters.
Table 2 shows that Cbl presents advantages in the growth parameters delta (δ) and lambda (λ) when the parameters increase their nominal value. For MalT, the growth parameters with advantages are sigma (σ) and theta (θ). These Cbl and MalT parameter results are reversed when their nominal value is decreased.
The individual analysis of the parameters from Table S7 (ESI†) highlight the advantage of Cbl under stress conditions when there is an increase in the basal mutation rate mu (μ). Also, Cbl presents an advantage after increasing omega (ω), reflecting a better adaptation or flexibility for the apo conformation over the holo to mutations in the DNA region coding for the effector TF binding site. In addition, Cbl better tackles mutations that increase rho (ρ) than MalT. The parameter rho (ρ) represents the rate of mutations at the level of the TF-site of interaction with the DNA (Table 1).
The criterion of selection theta (θ) represents the minimal fraction a mutant population can decrease with respect to the wild type before it disappears in a given environment.32 A low value of θ indicates better adaptation under extreme conditions. Table S9 (ESI†) shows that a decreasing θ is advantageous for Cbl over MalT.
In summary, individual analyses of the parameter sensitivities indicate that Cbl apo conformation is better adapted to stress situations where the rates of the mutation are likely to be increased and the selection coefficient theta (θ) decreased.
Two parameters, gamma (ψ) and psi (Ψ), exhibit no influence in any of the cases (Fig. S15h and S16i, ESI†). The parameter γ represents the reference mutation rate in the richer of the two environments.
The parameter ψ represents the decrease in the mutation basal rate when the TF interacts with the DNA binding site (Table 1). Fig. S15g and S16h (ESI†) do not reveal sensibility effects to the changes in ψ around their nominal value. However, Fig. S16h (ESI†) shows that a 20-fold and 40-fold increase in the nominal value for the negative and positive modes of regulation, respectively, produces an abrupt decrease in the threshold of selection modulator sensitivities. In addition, simulations (not shown) can reproduce these abrupt sensitivity changes around the nominal value if the basal mutation rate (μ) is increased 100-fold. These simulations indicate that ψ can become an important parameter that affects the boundaries delimited by Xm/Xw in stress situations when the basal mutation rate is incremented (e.g. under heat shock, starvation, or oxidative stress).
From an evolutionary standpoint, the results indicate that the positive apo conformation (Cbl) has been under selective pressure, likely due to the particular stress suffered due to sulfate limitation in the distal digestive tract. By contrast, positive holo conformation (MalT) adapts better to the “normal” conditions that E. coli more frequently faces in the colon of the digestive tract.
With the exception of LacI, where the Dmin threshold of selection changes from Xp/Xw (when ω = 1) to Xr1/Xw (when ω = 20 and 40), the rest of the TFs analyzed maintain the same TS boundaries for the wild-type region along the different ω values studied (Fig. 4 and 5 and Fig. S7–S10, ESI†).
In Fig. 4, 5 and Fig. S7–S10 (ESI†), it can be seen that the Xm/Xw TS are never part of the boundary limits for the wild-type population in either mode of regulation. Rather, Xp/Xw is frequently the wild-type lower limit of the demand. In many cases, at least one of the TS enclosing the wild-type regions corresponds to Xr1/Xw or Xr2/Xw.
As expected, the promoter and modulator LacI and MalT TS presented in Savageau's model8 have shapes similar to those obtained with the TFC model, although slight differences can be observed with respect to the wild-type extent of selections. The reason behind these differences can be found in the increase in the details of the regulation, as seen with the dissection of the TF in two sectors r1 and r2.
The parameter advantages for the positive mode of regulation are biologically realizable from the Dmin side (see Fig. 5c and d), which indicates that the organism can deal well with mutations related to short periods of high demand. The reverse is true for the case of negative regulation, which is better adapted to dealing with increasing periods of high demand (Dmax); in this case, the sensibility parameters do not exhibit a bias for either transcriptional configuration (Table 2), which is in accordance with the more balanced frequencies reported in ref. 1. The selection of one or the other transcriptional mechanism is probably made on the basis of other selectionist arguments.
Exploratory studies for the six LacI double mutants (not shown) produced a range of different TS but with low-level total life cycle (C) curves as the common denominator. These results would indicate a better adaptation of these mutants to larger total life cycles or, in other words, a predominant presence of the wild type for shorter life cycles.
In principle, this is in contradiction with our model assumption that Cbl should be in its functional apo conformation in that later section of the intestine. A possible reason behind this assumption is that E. coli could face starvation for inorganic sulphur during the period spent in the distal region of the intestine as a consequence of competition for the element with sulfate-reducing bacteria in the large intestine33 (see delta assignation (δ) for Cbl in Table S2, ESI†). This is a highly competitive environmental situation where cysteine and sulphate could be effectively unavailable for E. coli (or with low scavenging capacity). This would force the organism to use other sulphate sources such as taurine, which is found in high concentrations in the colon, where it is key for chelating bile acids, or sulphonates, whose assimilation and catabolism into sulfite are activated by Cbl under its active apo conformation. This situation for Cbl apo conformation could also probably occur in unpredictable sulphate detrimental situations outside of the host as well.
In conclusion, the results presented here furnish evolutionary arguments favoring the holo conformation over the apo TF representation under the positive modes of control, as reported recently.1 In addition, the observed unbiased distribution for the negative apo or holo frequencies is also in accordance with the no-preference model parameter sensitivities for the two TF configurations studied.
A better comprehension of the apo and holo transcriptional regulation connected to an organism's life cycle is fundamental for improving the design of “à la carte” bacteria that may not be as robust as the wild type,34 but will offer specific fitness advantages of human interest. In this respect, there is evidence in the literature for E. coli systems built on the basis of a deep understanding of the transcriptional regulation mechanisms.35
The TFC model consists of a set of binary S-system equations (eqn (S20)–(S30)) and can be log-transformed into linear equations allowing for reverse engineering with classic linear optimization techniques for the design of mutants able to grow in the demand and total cycle ranges of human interest.36 This technique promises to rationalize the search for mutants able to live during a given period of time and under certain environmental conditions from a universe of bacteria with different modes of transcriptional regulation.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c4mb00561a |
‡ Current address: Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, Mexico. |
This journal is © The Royal Society of Chemistry 2015 |