Haoran
Zhang
,
Brett A.
Boghigian
,
John
Armando
and
Blaine A.
Pfeifer
*
Department of Chemical & Biological Engineering, Science & Technology Center, Tufts University, 4 Colby Street, Medford, MA 02155, USA. E-mail: blaine.pfeifer@tufts.edu
First published on 9th November 2010
Covering: 1990 to 2010
Heterologous biosynthesis has emerged as a viable route to access the beneficial properties of natural products. This development primarily owes to the difficulties encountered when using traditional methods of natural product discovery and production. However, the process of heterologous biosynthesis also presents a number of challenges that have produced an array of creative solutions and noteworthy success stories. In this review, a historical perspective will be presented, together with an analysis of the experimental approaches thus far used to address the unique issues associated with heterologous natural product biosynthesis.
Haoran Zhang | Haoran Zhang is completing his doctorate at Tufts University after having received his B.S. and M.S. degrees from Xiamen University. His research at Tufts has focused on establishing and optimizing the biosynthesis of erythromycin A from E. coli. He has also made efforts to heterologously produce marine natural products using E. coli as a surrogate host. |
Brett Boghigian | Brett Boghigian is completing his doctoral research at Tufts University after having completed his B.S. and M.S., also at Tufts. His research has focused on applying metabolic and process engineering to the heterologous production of complex natural products. Specifically, he has combined computational and experimental approaches to rationally improving the heterologous production titers of both polyketide and isoprenoid natural products using E. coli. |
John Armando | John Armando is completing a dual B.S/M.S. degree at Tufts in chemical and biological engineering. His research is focused on further expanding the metabolic capabilities of heterologous host systems to better accommodate complex natural product pathways. |
Blaine Pfeifer | Blaine Pfeifer received his bachelor's degree in 1997 from Colorado State University in chemical engineering. He pursued graduate work the following year under the direction of Chaitan Khosla at Stanford University. His doctoral thesis focused upon the production of complex polyketide and nonribosomal peptide compounds using E. coli as a heterologous host. After receiving his PhD in 2002, and following a postdoctoral position at MIT, he began as an assistant professor at Tufts University in the chemical and biological engineering department. Since that time, his laboratory has continued to work on problems related to heterologous natural product biosynthesis, with particular focus on successful gene transfer to and metabolic engineering of the recipient hosts. |
Fig. 1 Established therapeutic natural products in three major classes: (a) polyketides are represented by erythromycin, daunorubicin, and lovastatin; (b) non-ribosomal peptides are represented by vancomycin; and (c) isoprenoids are represented by Taxol. |
Besides providing a rich environment for genetic- and biochemical-based research, the key attraction of the study of natural product biosynthesis is the impressive range of therapeutic value associated with the final compounds produced. This is partially captured in Fig. 1, but the compounds featured are only a fraction of those that have been instrumental in transforming modern medicine. For example, analysis of small molecule compounds introduced into the clinic from 1981–2006 indicates that 76% of antibacterials and 78% of anticancer agents were natural products or derived from natural products.10 Beyond the impact natural products have had in antibacterial and anticancer treatment, they have also found applications as anti-cholesterol, antimalarial, antiparasitic, and antifungal agents.
The therapeutic potential of natural products has also attracted the interest of applied scientists and engineers eager to access the associated medicinal value. This same interest was the basis for significant industrial investment in natural product programs. As a result, since the 1940s, a community of scientists and engineers has propelled efforts to understand and overproduce clinically-relevant natural products. This review will look at the emerging frontiers for natural product biosynthesis with a particular focus on heterologous biosynthesis. In particular, the review will detail the steps required to transfer the responsible genetic material to a given host and establish natural product formation (Fig. 2). Special emphasis will be placed on guiding methodology and current experimental tools available to complete this task. Several excellent reviews provide similar summaries of heterologous biosynthesis.11–14 Here, we intend to update and expand the analysis across additional natural product classes and host systems. However, to be somewhat succinct, the review has been constrained to a select number of hosts and natural product systems and excludes certain natural product types (such as flavonoids, stillbenes, alkaloids, carotenoids, and others) and hosts (such as plant systems, myxobacteria, and others). We direct the reader to other reviews featuring aspects of heterologous biosynthesis in relation to these compounds and hosts.15–23 In addition, although mentioned peripherally, the review will not comprehensively address the steps involved in natural product screening; compound chemical characterization; host isolation; gene cluster identification, sequencing, analysis, or synthesis; and the metabolic and process engineering needed to optimize heterologous production once achieved.
Fig. 2 (a) A comparison of the typical characteristics between native and heterologous hosts. (b) A generalized workflow for establishing natural product biosynthesis in a heterologous host. First, a native host is identified as a producer of a natural product of interest. Second, the genes responsible for producing the product are identified, isolated, and integrated within a heterologous host whether it be through an artificial replicon or within the host's own chromosome(s). Third, expression of the heterologous genes is initiated, producing soluble protein and the product of interest. Last, this host and process are scaled up and/or further modified for large-scale production. |
Starting from the earliest isolated natural products, there was practical interest in extending their medicinal properties for widespread use. The most famous example here is penicillin.29 Spurred in large part by the need to treat the injured of World War II, the successful mass production of penicillin was used as a blueprint for similar efforts to capitalize upon the therapeutic properties of other natural products as they emerged. This process began with natural product discovery which usually involved an extensive effort to screen environmental samples for a desired bioactivity. Because antibacterial assays were (and still are) simple to implement, this class of natural products emerged first. As other screens or assays developed, additional bioactivities were also identified. Many times, these activities were found in addition to the antibiotic properties of a given natural product. Once a promising natural product and its accompanying host were identified, there was a need to scale production for eventual application. It could be claimed that this need was an early harbinger of the heterologous biosynthesis concept, because the native producers of many therapeutically-relevant natural products were not process-friendly. For example, native hosts typically exhibited slow growth rates, fastidious nutrient requirements, and low production titers that made mass production challenging from a time and cost perspective. Nonetheless, and perhaps due to the limited molecular biology protocols at this point in time, process development efforts were built around the native host. In parallel, strain improvement programs were dedicated to increasing titers from the native producer, primarily relying on mutagenesis and screening.30 The result was a gradual improvement in natural product production, and the success associated with penicillin and other early compounds ushered in a “golden age” (ranging from approximately 1940–1960) for natural products that featured both industrial and academic commitment to the field.
However, over time the relative ease of natural product discover began to diminish.31,32 In particular, it became more difficult to isolate new natural products possessing new forms of bioactivity.33 The situation spurred a number of changes in academic and industrial perspectives. From an industrial viewpoint, the lack of readily available new natural products, coupled with the substantial time, effort, and cost associated with environmental screens, provided a significant obstacle in terms of justifying continual research in this direction.34–37 At the same time, the continual use of already existing antibacterial natural products was inevitably perpetuating resistance mechanisms in the target bacterial populations.38–40 Therefore, from a global perspective, there is a continual need for new natural products, yet there is waning interest to initiate this search from industry. However, there are still both basic and applied reasons for continuing the search for new natural products, and though several strategies have now been pursued towards this goal, one of the simplest approaches is to expand the search for new compounds.
A recent frontier in the search for new natural products has been the marine environment.38,41–46 For example, the marine actinomycete Salinispora tropica has received particular interest, given that approximately 10% of its >5 Mb genome is dedicated to secondary metabolism and because a particularly notable metabolite generated by this species is the potent anticancer hybrid nonribosomal peptide-polyketide salinosporamide A.47–49 The cited reviews summarize and testify to the potential of this environment for the isolation of structurally and medicinally diverse new natural products. With this search comes a range of new native host systems that offer even more challenges regarding process scale and development. In many cases, compounds have been isolated from higher-order organisms (ascidians, sponges, mollusks, etc.) that may or may not involve the symbiotic involvement of microbial producers.50 Further, the references above have pointed to the need to include growth medium components which help to replicate the marine environment, highlighting the familiar challenge of satisfying the fastidious nature of many natural product producers.
Although there are several reasons for the emergence of heterologous natural product biosynthesis, the challenges detailed above provided much of the initial impetus. The prospect of producing a medicinally-relevant natural product through a host that offers technical convenience with regard to process development was considered an ideal situation for accessing the potential of natural products. Of course, as will be detailed below, this goal is not without its own challenges, but the promise and potential of natural products provide the motivation towards overcoming these concerns.
To begin, one must also recognize the origin, type, and complexity associated with the natural product of interest. First, it is important to consider the native host system and the associated biology. Host-specific biology then serves as the first heuristic in choosing a heterologous host. For example, a natural product derived from a eukaryotic system will most likely benefit from reconstituted production within a heterologous eukaryotic host with the logic that the similarities in gene expression and cellular environment would better suit the natural product pathway to be reconstituted. The same logic could be applied to prokaryotic systems where the differences between Gram-positive and Gram-negative native and heterologous hosts may affect eventual heterologous production. Second, one needs to assess the availability of genetic transfer techniques associated with the chosen heterologous host. Although numerous methods and protocols have been developed to facilitate the introduction of foreign DNA into heterologous systems, host-specific technical barriers still exist especially given the unique features common to natural product gene clusters.
Finally, besides biological similarities between the native and heterologous hosts and readily available genetic transfer protocols as factors for heterologous host selection, it is important to understand what is known about the candidate heterologous host's native metabolism. If the candidate heterologous host already possesses the ability to produce a natural product, there then exists a level of confidence that the chosen heterologous host will be able to support the metabolic requirements of a foreign natural product biosynthetic scheme. In addition, many common heterologous hosts have now been sequenced, and this information also provides clues about the native capabilities of heterologous biosynthetic support. At the same time, an overabundance of natural product capabilities within a heterologous host may provide unwanted “crosstalk” between the pathway of interest and those native to the heterologous host, such that biosynthetic precursors are sub-optimally supplied to the desired heterologous pathway or, alternatively, natively produced compounds complicate heterologous product analysis. These criteria will be further emphasized in the specific examples below used to illustrate the attempts and successes with heterologous natural product biosynthesis.
Because the actinomycetes (and particularly, the Streptomyces genus) were recognized as a prolific source of therapeutic compounds, early heterologous host choices included Streptomyces coelicolor and Streptomyces lividans. Largely through the efforts of Sir David Hopwood and colleagues at the John Innes Centre, these hosts had been extensively characterized, and this effort included the development of a set of recombinant DNA techniques to facilitate the steps needed for eventual heterologous gene transfer. It is of note that many of the technical and scientific advances associated with heterologous natural product biosynthesis have coincided with general advances in molecular biology protocols and recombinant DNA technology. In the case of heterologous natural product biosynthesis, the emerging recombinant DNA technology no doubt helped establish S. coelicolor as a potential heterologous host. These same technical developments also contributed to the second requirement for heterologous biosynthesis. Namely, as DNA identification and sequencing technology progressed, it became increasingly technically feasible to identify the gene sequences responsible for natural product biosynthesis. Fortuitously, many natural product pathways were found clustered genetically within the chromosomes of native producers. This is particularly true for polyketide and nonribosomal peptide products, and the clusters typically feature genes responsible for polyketide or nonribosomal peptide biosynthesis, product tailoring (glycosylation biosynthesis and transfer, etc.), regulation, and self-resistance (i.e., antibiotic protection). The availability of recombinant protocols and gene sequence information for early natural product clusters allowed Streptomyces-based hosts to emerge as early candidates for heterologous biosynthesis.
There is an additional subtlety associated with the choice of Streptomyces host systems and their ability to support heterologous natural product biosynthesis. This distinction particularly applied to polyketide natural products traditionally associated with actinomycetes. However, as introduced above, the analogy also extends to other types of natural product classes. Specifically, the new host must also supply the newly-introduced biosynthetic pathway with the needed intracellular substrates required for final product formation. As such, the requirements for a heterologous host must now be amended. For successful heterologous biosynthesis, one must have available the sequence information and/or genetic material for a particular natural product and a well-characterized host with (1) associated recombinant DNA tools and (2) the metabolic capability to support the biosynthetic process.
In principle, successful transfer and reconstituted biosynthesis would then eliminate the need to develop a new production process for every new natural product. Instead, the knowledge associated with the new host would allow more consistency in process development. Through the course of gene cluster identification and more basic biochemical analysis, models for natural product biosynthesis began to emerge. As detailed briefly above, these models vary depending upon the natural product class in question, but in regard to the technical steps of natural product heterologous biosynthesis, details of the biosynthetic process have repercussions regarding the relative ease in reconstituting biosynthesis. As an example, polyketide and nonribosomal peptide biosynthesis is often associated with megasynthase enzymes that can easily surpass 300 kDa in size and require post-translational modification. Moreover, these enzymes can associate into higher-order super structures which often exceed 1 MDa in size. In addition, biosynthesis of the pharmacologically-active polyketide, nonribosomal peptide, or isoprenoid product may require the coordinated activity of several to >20 enzymes. As such, a heterologous host must be capable of accommodating both the individual gene size (especially for those encoding multi-domain megasynthases) and total gene number associated with eventual biosynthesis. Furthermore, as before, the host must support the metabolic- and enzymatic-specific requirements of the newly introduced pathway. Parenthetically, as details of complex natural product biosynthesis were determined, it became clear that final compound structure could be altered by modifying the genes/proteins responsible for biosynthesis. Hence, another rationale for heterologous biosynthesis was to exert a level of control over the biosynthetic process for the purpose of diversifying natural product structure through use of the molecular biology tools available to the new host.51,52
However, one cannot overlook the advances in molecular biology and gene expression that have emerged with the use of these new hosts. The number of total and successful attempts at heterologous biosynthesis through these new hosts is growing, thanks in part to the increasing number of gene expression parameters (promoters, operators, plasmids, and strains) available to answer some of the questions raised in the last paragraph. In addition, the speed at which DNA can be sequenced and synthesized is further expediting heterologous biosynthetic attempts, first, by rapidly identifying promising gene clusters of interest and, second, by allowing custom genetic content design for subsequent transfer to the host of choice. Such tools are intrinsically tied to well-characterized strains, and this, in part, can be traced back to the simple advantage these new hosts possess in the form of rapid growth kinetics. As a result, the previously daunting challenges raised in the last paragraph are becoming more manageable as the genetic and metabolic engineering tools continue to improve with these new host options.
Before specific examples are presented, an overview of common gene transfer approaches will be provided. General comments will be presented regarding the options available for facilitating genetic cluster transfer between native and heterologous host systems. At the outset, every heterologous biosynthesis attempt is presented with the challenge of genetic transfer. The complexity behind natural product pathways can result in two primary obstacles to this step: 1) the number of genes needed for complete biosynthesis and 2) the size of the individual genes within this total number. These constraints complicate common molecular biology strategies used for heterologous gene transfer and have emphasized the need for alternative approaches and strategies.
As mentioned, genetic information (through genetic pathway identification and DNA sequencing and synthesis improvements) has provided a number of relevant natural product pathways with many more anticipated in the future. For example, sequencing of Saccharopolyspora erythraea and Streptomyces avermitilis revealed the expected gene clusters for erythromycin and avermectin in addition to >20 other clusters (per organism) putatively capable of producing secondary metabolite products.53,54 Once identified, a plan must be formulated for genetic transfer to a heterologous host. Traditional molecular biology protocols would suggest cloning or expression plasmids be used for this purpose. However, when faced with the prospect of transferring individual genes >10 kb or total genetic material >50 kb in size, common steps (ligations, transformations, etc.) or plasmid stability may become problematic. As such, one must consider non-traditional uses of familiar plasmids or altogether different options.
The dilemma of stably cloning, transferring, and maintaining large genes or gene clusters has previously been addressed through the use of specialized plasmids capable of harboring large DNA fragments. Simpler options include cosmid or fosmid vectors.55,56 In some cases, conventional cloning vectors have been redesigned to stably maintain up to 300 kb of foreign DNA.57 Alternatively, P1-derived artificial chromosomes (PACs), bacterial artificial chromosomes (BACs), or yeast artificial chromosomes (YACs) have demonstrated the capability of stably maintaining 100–300 kb genetic inserts, sizes similar to many natural product gene cluster lengths.58–61 Many of these vectors have been developed for E. coli; however, they have also seen utility within other heterologous biosynthetic efforts and will be highlighted, where appropriate, in the specific examples below.
Another option for harboring natural product gene clusters within a heterologous host is within the host's own chromosome. Both well-characterized and sequenced prokaryotic systems and more complicated eukaryotic hosts have been used in this context. This alternative would appear to present a stable final location for heterologous natural product gene clusters; however, regardless of the final option, challenges in gene transfer and localization will accompany the large gene clusters associated with most complex natural products.
The final plasmid or chromosomal locations must be accompanied by tools to enact genetic transfer. Here again, the starting point is the traditional in vitro methodology associated with standard molecular biology. These tools have been continually refined since their introduction in the mid-1970s and advances now allow for rapid manipulation of DNA, including impressive steps in PCR amplification that allow the direct isolation of genes >10 kb and a range of mutational techniques. There have also been a number of systems developed for plasmid-based insertion into the chromosomes of target heterologous hosts.62–65 These are just a few examples of the methods available, and others will be featured in specific cases below. In addition, two powerful advances that are expected to play a significant role in the design and transfer of natural product gene clusters to heterologous hosts include the capability to synthesize specific genes and the use of PCR-based recombination techniques. Efficient and inexpensive gene synthesis is now allowing the directed production of genes and DNA elements of impressive length, perhaps best highlighted by work at the J. Craig Venter Institute on reconstructing the Mycoplasma mycoides genome.66 Supporting these efforts are specialty academic institutes and private companies capable of supplying rapidly sequenced or precisely-designed synthesized DNA to customers. The situation allows one to quickly move from natural product native host to genetic material for heterologous transfer while simultaneously allowing for a higher degree of engineering in gene cluster design. Likewise, precise recombination techniques have offered new options in site-specific transfer of heterologous gene clusters.67,68 These and other methods will now be presented in the context of the various natural product classes and heterologous host systems and within case-specific examples.
By 1990, protocols existed for heterologous transfer of foreign genes to S. coelicolor. Of particular utility was a method reliant upon protoplast transformation, although methods for conjugation also emerged.74–76 In addition, several plasmids were developed for use with the protoplast transformation method.77–79 These plasmids derived from those found naturally within S. coelicolor or S. lividans and have been modified over time to allow a finer level of control when used in heterologous expression attempts. As a result, newly deduced polyketide natural product gene clusters could now be readily packaged and transferred to a well-characterized S. coelicolor heterologous host.
Heterologous transfer would then allow the expanding tools and knowledge available with S. coelicolor to aid in understanding or accessing the desired natural product. However, it should be mentioned that most transfer attempts were from other Streptomyces hosts. As such, S. coelicolor was at an intrinsic advantage for accepting similar GC-rich genetic material (and most likely accounting for any potential codon bias), supporting gene expression and post-translational processes, and metabolically supplying the required substrates (primarily acyl-CoA compounds) for biosynthesis. Considering the enormous number of biologically active polyketides derived from Streptomyces, choosing a member of this family as an early heterologous host was only logical.
The last two decades have seen a steady output of polyketides using S. coelicolor as a heterologous host. These examples have fully utilized and expanded upon the recombinant tools available for heterologous production. Efforts have relied on either cosmids or expression vectors for gene transfer (with the former case taking advantage of the similarities in gene expression between the native and S. coelicolor hosts; this was particularly apparent in an example in which heterologous biosynthesis was observed for S. coelicolor but not S. lividans, presumably because the original host shared a closer gene expression relationship to S. coelicolor80). Both low- and high-copy plasmids have been utilized together with actI, mel (constitutive), aphI (constitutive), ermE (constitutive), tipA (thiostrepton-inducible), PnitA, gylP1/P2 (glycerol-inducible), and tcp830 (tetracycline-inducible) promoter systems.81–98 To reduce potential intra- or inter-plasmid gene recombination events (especially for large PKS or NRPS genes containing homology sequences), codon optimization and alternative gene subunit complementation strategies were successfully used for S. coelicolor.99 Further means to influence gene dosage and transcript levels included the use of a helper plasmid capable of increasing copy levels of the accompanying expression plasmid and specific integrative designs that often utilized the ΦC31 phage recognition capabilities.100–104 Integrative methods have also been championed based upon potential biosynthetic inhibitory properties of self-replicating plasmids; these plasmids may also require antibiotic pressure throughout process development, a prospect that becomes prohibitively costly at large scale for any host utilizing selectable plasmids.105 Also, owing to the often sizable gene clusters associated with natural product biosynthetic pathways, BAC plasmids have been designed for pathway capture and heterologous transfer.106–109 A recent study extended earlier work by using a copy-up BAC plasmid to aid in pathway construction and conjugal integration using the less common (but compatible with ΦC31) ΦBT1 attB site.65,110,111 As a result, an impressive number of heterologous polyketide products have been produced including 6-deoxyerythronolide B, granaticin, epothilones A and B, medermycin, 6-methylsalicylic acid, novobiocin, and several type II polyketide products.112–120 Additional details will be included within the erythromycin case study presented below.
Given the critical role E. coli played in establishing modern molecular biology protocols, there is reason to believe that researchers considered E. coli as an early potential candidate host in natural product heterologous biosynthesis. However, there are of course marked differences in cell morphology, physiology, metabolism, and regulatory mechanisms between E. coli and the actinomycetes responsible for the majority of polyketide compounds. As such, any advantages E. coli possessed relative to host options like S. coelicolor or S. lividans were quickly diminished when viewed from this more global perspective.
Nonetheless, the allure of establishing heterologous polyketide production in E. coli was strong enough to attract leaders in the polyketide field. An early success was the production of 6-methylsalicylic acid (6-MSA) from an iterative type I polyketide synthase originally from Penicillium patulum.130 In this case, a T5 promoter was used to drive expression of the 6-methylsalicylic acid synthase (6-MSAS) gene under the control of an inducible lac operator. Highlighting some of the challenges associated with the E. coli system, the native holo-acyl carrier protein synthase was unable to correctly post-translationally modify the 6-MSAS. In addition, 6-MSAS insolubility was observed resulting in associated inactivity. The first problem was solved by the inclusion (on a separately selectable and compatible plasmid) of a recently available promiscuous phosphopantetheinyl transferase termed Sfp from B. subtilis.131,132 The second problem was partially addressed by lowering the post-induction culture temperature to 30 °C. It was suspected that problems of protein folding were due to the E. coli bacterial environment used to express a native eukaryotic gene and because of the large (190 kDa) multidomain nature of 6-MSAS.
Spurred by this early effort, more ambitious attempts at E. coli-derived polyketide production began. The first successful production of a modular type I polyketide product was completed in 2001 (to be profiled below).133 This work resulted in a new E. coli production host (termed BAP1) capable of post-translationally modifying both polyketide synthase (PKS) acyl carrier protein (ACP) and nonribosomal peptide synthetase (NRPS) peptidyl carrier protein (PCP) domains. It was later found that phosphopantetheinylation of polyketide synthases expressed in a BAP1 derivative reached a high level (>99%), further validating this strain for heterologous natural product biosynthesis.134 In addition, the host was engineered to supply propionyl-CoA and (2S)-methylmalonyl-CoA, common polyketide biosynthetic precursors that are uncommon E. coli metabolites. In the 6-MSA case above, the required precursors were acetyl- and malonyl-CoA, compounds known to exist within the framework of E. coli primary metabolism. These examples highlight both advantages and disadvantages of E. coli (or similar heterologous hosts generally devoid of natural product biosynthetic capabilities). Here, the lack of native polyketide products implies that E. coli does not have the metabolism to support more complex heterologous polyketide pathways. Alternatively, E. coli can be viewed as an attractive “clean host” which does not have any molecular bias. In other words, there were no native pathways to either contribute to or detract from targeted heterologous products. In addition, there was little concern of natural polyketide products complicating heterologous production analytical or purification efforts.
The clean host designation associated with E. coli also requires more consideration when designing heterologous production efforts. Unlike Streptomyces hosts, more thought is needed regarding gene design, expression parameters, and post-translational steps (including self-resistance to antibacterial products). As one example, the GC-rich genes associated with Streptomyces sources clearly differ from the codon usage associated with native gene expression. Alternatively, the E. coli expression options may simplify the regulatory aspects of heterologous polyketide production; however, there is also the concern of cellular burden, given the need to engineer so many different aspects of foreign polyketide biosynthesis. These points will be further explored in the context of this section's case study.
For filamentous fungi, there has been considerable effort placed on developing the recombinant tools associated with these hosts. Once isolated, DNA coding for a particular fungal polyketide product may need to be adjusted for heterologous expression. For example, in the case of 6-MSA production in S. coelicolor, the MSAS gene was altered to remove intron sequences and further tailored to aid S. coelicolor gene expression. Although these concerns may be mitigated when using similar filamentous fungi heterologous hosts, they must still be considered in addition to the common issues of providing the correct intracellular precursors, gene expression elements, and post-translational modification steps. Once prepared for transfer, polyketide-coding genetic material is usually introduced using protoplast fusion or electroporation transformation protocols, and plasmids used for transfer are more commonly integrated into the chromosome (especially when compared to the traditionally autonomous replicating systems associated with E. coli).142,143 Promoters used in heterologous production attempts include amyB (inducible by starch or maltose), trpC (constitutive for A. nidulans), and alcA (cyclopentanone-inducible).144–146 Successful examples include the production of 6-MSA, melanin, squalestatin, 3-methylorcinaldehyde, and monacolin J (a late-stage intermediate of lovastatin biosynthesis).147–150
As with the actinomycetes, many fungal systems, especially the Aspergillus genus, produce a variety of complex secondary metabolites, which may or may not be generated under typical growth conditions. For example, Aspgerillus niger contains a predicted 17 nonribosomal peptide coding regions, 34 polyketide coding regions, and 7 hybrid nonribosomal peptide-polyketide coding regions within its 33.9 Mb genome.151 The crosstalk and regulatory mechanisms within these potential hosts may play a significant role in determining successful heterologous biosynthetic attempts; however, emerging recombinant strategies may be used (similar to previous cases with Streptomyces hosts) to either eliminate or harness these conditions for biosynthetic success.
Fig. 3 A schematic of erythromycin biosynthesis. A molecule of propionyl-CoA is primed on the loading domain of the DEBS1 enzyme. Polyketide formation requires six (2S)-methylmalonyl-CoA substrates and NADPH as a cofactor in the reductive steps on DEBS1, DEBS2, and DEBS3. The thioesterase domain of DEBS3 is responsible for cyclization and release of the polyketide from the PKS complex. Glucose-1-phosphate is used as a substrate for the EryB and EryC pathways to generate and attach the two sugars (L-mycarose and D-desosamine, respectively) to the aglycone 6dEB core. Abbreviations: AT = acyl transferase; ACP = acyl carrier protein; KS = ketosynthase; KR = ketoreductase; DH = dehydratase; ER = enoyl reductase; TE = thioesterase. |
At the time, S. coelicolor had gained momentum as a heterologous host for polyketide compounds. In particular, the strain CH999 developed by Khosla and coworkers had proven very effective in overproducing aromatic polyketide compounds.118 CH999 was designed based upon the actII-ORF4 regulatory mechanism (sourced from the actinorhodin cluster) that controlled gene expression from actI and actIII promoters during the growth transition to secondary metabolism; hence, production of heterologous compounds commenced in a semi-natural manner.157 As mentioned earlier, many polyketide clusters contain regulatory genes that are often implicitly included during heterologous biosynthetic efforts (i.e., through native cluster introduction using cosmid or BAC vectors). Other times, more explicit measures must be taken to express positive regulators that impact reconstituted biosynthesis.158 In the case of CH999, a regulation system had been considered when designing the recombinant strategy. This is also the case for non-Streptomyces heterologous hosts, as gene expression regulation is typically under the control of designed recombinant systems developed for the host of choice.
The CH999 strain had been modified so that an ermE (erythromycin resistance) gene replaced the native actinorhodin gene cluster, reducing unwanted crosstalk between this native pathway and those to be introduced, while simultaneously providing the resistance mechanism that would allow full heterologous erythromycin biosynthesis. CH999 also possessed a mutation that blocked production of another acetate-derived secondary metabolite pigment called undecylprodigiosin. (Additional native polyketide clusters have since been identified in S. coelicolor post-sequencing, and the elimination of these clusters should further alleviate concerns over metabolic crosstalk and product contamination.14,159) To this host, an E. coli–S. coelicolor shuttle vector was introduced carrying the genes coding for three polyketide megasynthases (each >300 kDa) collectively referred to as deoxyerythronolide B synthase (DEBS) and individually referred to as DEBS1, 2, and 3.112
In an excellent early example demonstrating the challenges of working with genes the size of those for DEBS and the tools available for their manipulation, Kao et al. used a recently-developed homologous recombination technique to sidestep traditional molecular biology when dealing with genes of this size (and the lack of unique restriction sites).62 More specifically, regions homologous to the ends of the DEBS operon were used to capture all three genes through a double cross-over event that relied on both temperature- and antibiotic-selection methods to aid the desired recombination events. Once completed, the shuttle vector carrying the DEBS genes was introduced into S. coelicolor and 6-deoxyerythronolide B (6dEB, the polyketide precursor to erythromycin) production was observed. In this case, CH999 provided an accommodating intracellular environment, as would be expected, to aid the production of a polyketide pathway from a close bacterial relative. However, it should be noted that to increase product titers, exogenous propionate was added to the culture medium, presumably to boost the levels of intracellular propionyl- and (2S)-methylmalonyl-CoA, which may have been suboptimal considering that the dominant native polyketide compounds produced by S. coelicolor (such as actinorhodin) were derived from acetate units; thus, it could be argued that S. coelicolor exhibited a compound bias in this example.
S. lividans was also tested as a heterologous host. In particular, a system was implemented where three individual plasmids were used to introduce the three DEBS genes into S. lividans K4-114, a strain designed similar to S. coelicolor CH999 in which the actinorhodin gene cluster was removed.123 Production at levels similar to those reported for the S. coelicolor CH999 system was recorded when two of the plasmids replicated autonomously while the third integrated into the host chromosome. Variations on this theme were explored, including high- and low-copy number plasmids, using compatible and, presumably, incompatible plasmid origins of replication, and localizing multiple DEBS genes to the host chromosome. Of these strategies, production titers peaked when using two low-copy plasmids with the same origin of replication (but separately selectable) carrying the DEBS1 and 2 genes and an integrative vector carrying DEBS3. Similar titers were achieved when using two integrative vectors (including pSAM2) carrying the DEBS2 and 3 genes. This example demonstrates an additional level of flexibility when using S. lividans. As will be outlined below, the use of multiple plasmids was instrumental in allowing for erythromycin biosynthesis in E. coli. It is interesting to note that the addition of exogenous propionate is not mentioned in this work, indicating that S. lividans may provide an innate advantage over CH999 in substrate supply to fuel DEBS.
Having established 6dEB production from either S. coelicolor or S. lividans, there was still the need to convert this polyketide intermediate to full erythromycin. However, as opposed to building upon 6dEB biosynthesis in CH999 (as would be expected given the incorporation of the ermE resistance gene), purified 6dEB produced from CH999 production cultures was exogenously fed to mutants of S. erythraea, allowing subsequent conversion to full erythromycin. Using this strategy, numerous rationally designed 6dEB derivatives were generated using the Streptomyces strains and these analogs were then converted to erythromycin derivatives via converter S. erythraea strains.160,161
Heterologous erythromycin biosynthesis was also attempted using E. coli. The rationale for choosing E. coli as a heterologous host has been previously introduced. In this case, there was also precedent to be set by attempting such a challenging reconstitution problem in this host system. If successful, it was assumed the example would inspire similar efforts at complex natural product biosynthesis. In addition to the drastic differences in cell type when compared to Streptomyces hosts, the regulatory mechanisms intrinsic to and manipulated for polyketide production in Streptomyces would be completely absent within E. coli. As such, an all-inclusive approach was needed to allow successful erythromycin production.
The earliest efforts to generate erythromycin from E. coli began with attempts at functional protein production. A notable early example was the attempt to produce DEBS3 using the T7 expression system.162 Here, it was observed that the resulting DEBS3 protein (though successfully produced and in a partially active, properly folded, and dimerized state) was not correctly post-translationally modified through 4′-phosphotantetheinylation. Nonetheless, this example set the stage for later efforts employing a T7 promoter to drive expression. Later, biochemical analysis of individual modules of the DEBS system also used E. coli as an expression host. In these studies, Sfp was co-produced to allow successful in vivo post-translational modification prior to protein purification and in vitro analysis.131,132,163 By this time, commercial strains and plasmids taking advantage of the T7 system were available and used in these studies to overproduce DEBS components.126 Besides containing the powerful T7 RNA polymerase which would seem to be particularly well-suited to the expression of the large genes often featured in complex natural product biosynthesis,164 the strains selected as expression hosts were also touted as being deficient in certain intracellular proteases which would of course aid in foreign gene expression and would potentially provide an advantage when compared to other host options. Successful production of active DEBS proteins prompted experiments to further probe E. coli's utility in supporting erythromycin biosynthesis.
The first target molecule was the erythromycin intermediate 6dEB. Production of 6dEB required the coordinated expression of DEBS1, 2, and 3 and a cellular environment capable of then supporting biosynthesis. To meet the latter requirement, additional heterologous metabolism was introduced to the E. coli cell. A metabolic pathway was designed such that exogenously added propionate was converted intracellularly to propionyl-CoA and then (2S)-methylmalonyl-CoA, two substrates that, if present in the native BL21(DE3) host, were at concentrations insufficient to support eventual 6dEB biosynthesis. This pathway was partially implemented using the same technology highlighted above in engineering the DEBS genes for introduction into S. coelicolor. Namely, plasmid-based homologous recombination was used to introduce the sfp gene into the prp operon of E. coli. The prp operon contained genes responsible for propionate catabolism.165,166 Clearly, consumption through innate cellular metabolism would be undesirable if propionate were to be channeled to 6dEB formation. Hence, the operon was eliminated except for one gene, prpE, coding for a propionyl-CoA synthetase capable of converting propionate to propionyl-CoA. The integrated sfp, as described above, allowed for in vivo post-translational modification of the DEBS enzymes, with the additional benefit of being chromosomally localized so as to free space for additional expression plasmids. Both the integrated sfp and remnant prpE were placed under inducible (lac operator) T7 promoters.
The CH999 S. coelicolor system featured a shuttle plasmid containing all three DEBS genes. The size of these genes makes normal in vitro molecular biology steps challenging, namely because of the lack of available restriction sites and associated complications with standard ligation and transformation protocols. Hence, recombination techniques were used in constructing the DEBS-CH999 shuttle vector. For E. coli, this problem was addressed by re-designing standard T7 expression plasmids such that more than one gene was expressed per plasmid. Doing so allowed for the expression of DEBS2 and 3 on one plasmid (designed as an operon with one inducible T7 promoter) and DEBS1 and a propionyl-CoA carboxylase167 (PCC, from S. coelicolor) on another separately selectable plasmid (designed such that both the PCC and DEBS1 genes were under separate inducible T7 promoters). The PCC was required for the completion of the metabolic pathway to support intracellular substrate provision by converting propionyl-CoA to (2S)-methylmalonyl-CoA. Prior to consolidating the three DEBS genes onto two expression plasmids, expression of the individual genes was confirmed. Once introduced into the newly designed E. coli strain (BAP1), it was possible to coordinate gene expression and eventual production of 6dEB by the addition of IPTG and propionate to the culture medium.
Using this system, 6dEB production from E. coli was accomplished with a key adjustment in post-induction temperature.133 Production was only observed at 22 °C, and this lower temperature most likely had a positive impact on protein folding or higher-order association of the DEBS enzymatic complex. Later, newly developed recombineering methods were used to integrate the three DEBS genes into the chromosome of BAP1.168 This accomplishment was notable for its application of λ-Red recombination to genes ∼10 kb in size. Although 6dEB production was markedly reduced as a result of reduced gene dosage, activity was observed at a higher post-induction temperature (37 °C), indicating the potential beneficial effects related to protein folding/association as a result of chromosomal gene expression.
Having accomplished 6dEB biosynthesis, efforts then turned towards full erythromycin production. In principle, this goal required an additional 17 heterologous enzymes to generate erythromycin A (the most naturally abundant and clinically-relevant form of erythromycin). Two separate efforts were initiated using either an analogous pathway from Micromonospora megalomicea or a hybrid pathway with tailoring biosynthetic genes from S. fradiae, S. erythraea, and S. venezuelae.169,170 Why these pathways were used instead of the original S. erythraea pathway is not known, although the authors indicate that these particular pathways do result in successful gene expression in E. coli, implying that there were early challenges in establishing active expression of the original S. erythraea tailoring genes. (A similar swapping of gene homologs was used to obtain 3-amino-5-hydroxybenzoic acid biosynthesis in E. coli from similarly designed and regulated T7-driven operons.171) During these attempts, effort again was made to ensure individual gene expression before constructing biosynthetic operons for introduction into E. coli. In the course of these studies, N-terminal leader sequences, specific to the pET expression plasmids, positively influenced gene expression for several tailoring genes. These cassettes were then used in operon construction. Both of these approaches relied upon designed operons and multiple expression plasmids to introduce the required genes into E. coli. This possibility is much more plausible with E. coli as the transformation protocols (in particular, electroporation) and multiple available plasmid options do not generally exist for either S. coelicolor or S. lividans. These efforts led to the production of 6-deoxyerythromycin D and erythromycin C using E. coli. Both efforts also took advantage of accompanying commercially-available chaperone protein folding systems available for E. coli to support eventual biosynthesis. Hence, this approach provides another E. coli tool used in aiding aspects of protein folding.
More recently, an attempt was made to reconstitute the entire S. erythraea pathway within E. coli.172 Here, it should be noted that of the studies dedicated to reconstituting erythromycin biosynthesis in E. coli, nearly all used the native Saccharopolyspora, Streptomyces, or other original gene sequences, indicating the ability of E. coli to accept and coordinately express significantly different (though still bacterial-based) genetic material. (In an interesting related study, the DEBS genes were impressively synthesized and expressed within E. coli with subsequent protein levels showing marked improvements; however, there was a concomitant reduction in 6dEB levels hinting at an imbalance in metabolic substrates for either protein production or polyketide biosynthesis (or both) and pointing to the need for future genetic, protein, and metabolic engineering.173) For the study focused on reconstituting the S. erythraea pathway within E. coli, a combination of plasmid-specific gene leader sequences and chaperone co-expression together with individual and coordinated (operon) gene expression and protein activity assays led to the production of erythromycins B and D. Through careful examination of the designed operons and intermediate products, erythromycin A biosynthesis was accomplished using a separate plasmid carrying eryK (encoding the terminal P450 hydroxylation step). In this way, an additional boost in gene dosage facilitated the ultimate goal of producing erythromycin A. It should be noted that in all of the E. coli efforts, uncommon usage of plasmids, including those with incompatible origins of replication (yet separately selectable) and the inclusion of up to six different plasmids, allowed the eventual single cell conversion of propionate to erythromycin A. Regardless of the methods, the reconstitution of the full erythromycin pathway in E. coli stands as a model for the possibilities in heterologous biosynthesis using this host.
In this sub-section, we will highlight another tool for heterologous biosynthesis: the feeding of exogenous substrates to influence precursor supply and biosynthesis. This strategy has been mentioned earlier in the context of propionate addition to growth medium as a way to later increase intracellular propionyl-CoA and, subsequently, (2S)-methylmalonyl-CoA levels. Efforts have also been made to influence malonyl-CoA levels through the addition of glycerol to the growth medium.130 In work related to nonribosomal peptides, the first example was the feeding of salicylate in the production of yersiniabactin.181 A precursor-directed strategy towards epothilone production was conducted by Boddy et al. in which synthetic precursors of epothilone were fed to later-stage biosynthetic enzymes.185 This strategy, in particular, is useful if difficulties are encountered in reconstituting in vivo activity of biosynthetic enzymes or providing suitable precursors. Taking advantage of the modular nature of polyketide or nonribosomal peptide systems allowed the alteration of DEBS such that the loading di-domain was replaced with a nonribosomal peptide synthetase di-domain from the rifamycin synthase. Doing so allowed the production of a hybrid molecule when exogenous benzoate was added to the medium.133 Finally, in a nice example of combinatorial biosynthesis and synthetically-derived precursor feeding, Watanabe et al. generated new derivatives of echinomycin.186
However, the one apparent shortcoming of B. subtilis as a heterologous host is the lack of autonomous plasmids to facilitate gene transfer and heterologous expression. This problem is made worse by plasmids containing large inserts, as is likely for complex natural product genetic systems.189 The absence of such plasmids may explain the relatively few efforts to date in using the bacterium as a host system. Instead, a diverse range of integrative vectors exist for introducing foreign DNA.190
Integrative vectors featured prominently in a seminal effort by Eppelmann and coworkers to heterologously produce the antibiotic bacitracin using B. subtilis.191 Akin to the strategies for S. coelicolor, the authors were interested in using B. subtilis as a universal host for the numerous peptide products produced nonribosomally through various, and sometimes genetically intractable, Bacillus strains. In the case of bacitracin, the original host was B. licheniformis. The authors began by considering the net loss or gain of chromosomal genetic material. As such, they first removed the 26 kb native surfactin gene cluster (with the exception of the sfp gene). This step was also expected to aid later detection efforts for the desired heterologous product. What was unknown was whether the new B. subtilis host was metabolically optimal to support bacitracin biosynthesis, though this concern was mitigated by the close relationship between native and heterologous hosts. The authors then systematically introduced the 49 kb bacitracin cluster through repetitive double crossover homologous recombination steps that alternated between several antibiotic markers. This was done in a series of events with the 49 kb cluster integrated in two steps, which might suggest that there is a limit to the size of the integrated DNA fragment. However, separate research since the publication of this example suggests that the integrative transfer of foreign DNA ≥100 kb is possible.192,193 Through the course of the work, the authors used both an IPTG-inducible spac promoter and native promoters associated with the bacitracin gene cluster. The native promoters directed expression within the heterologous B. subtilis host, further emphasizing and taking advantage of the similarities between native and heterologous hosts in this case. It was assumed that the use of native promoters would similarly not disrupt any regulatory or typical expression steps in the reconstituted production effort. NRPS gene expression and bacitracin production were ∼50% higher in the heterologous host. This is an impressive result considering that the majority of heterologous production events result in reduced titers (at least initially) of the desired compound.194
Based upon the success of this work, the same authors developed a series of dual expression vectors for the express purpose of aiding nonribosomal peptide heterologous production attempts.195 The vectors were designed to shuttle between E. coli and B. subtilis. In E. coli, they existed extra-chromosomally; whereas, within B. subtilis, they integrated into the same surfactin locus used previously. Expression was driven by a lac-operator controlled T5 promoter functional in both hosts. Thus, these particular vectors not only allowed for convenient cloning steps in E. coli, but they also allowed heterologous production to be tested in both hosts. Heterologous biosynthesis was attempted with a model NRPS from the surfactin system. The subsequent analysis indicated comparable protein levels between the two systems, even though one featured chromosomal gene expression while the other featured expression from a multi-copy expression plasmid. In addition, when analyzed by SDS-PAGE, the protein product from B. subtilis showed improved structural integrity when compared to the product from E. coli which showed a degradation pattern. In this comparison, however, the authors were quick to point out that there may have been a bias in the comparison considering that the genetic material used as the model system derived from a Bacillus source. Notwithstanding this caveat, these vectors would appear to be of great utility when considering heterologous efforts using B. subtilis.
In the course of these studies, Müller's group also applied several new techniques to the cloning and transfer of large gene clusters. First, in conjunction with Francis Stewart, Red/ET recombineering was used to rapidly assemble complete gene clusters from genetic material captured by cosmids. (The same general technique has greatly simplified the ability to manipulate the chromosomes of model hosts such as E. coli and has also been adapted to Streptomyces spp.199,200) As the authors pointed out, this step allows for rapid gene assembly when compared to traditional in vitro molecular biology steps which can become cumbersome when dealing with sizable gene clusters typically associated with complex natural product systems. The authors used both conventional Red/ET recombination protocols and triple recombination attempts that further aided the assembly process. Triple recombination is a genetic manipulation protocol in which a short PCR-generated fragment can be used to link two other, substantially larger, DNA elements; the process minimizes the possibilities of PCR-generated errors within the larger DNA fragments. Overall, this approach coupled the recent emergence of E. coli-based Red/ET technology for gene cluster construction with the heterologous production advantages of P. putida. Gene transfer by conjugation, followed by chromosomal integration by homologous recombination, allowed the efficient introduction of the desired gene clusters. The inclusion of only one initiating xylS/Pm repressor-promoter system (toluic-acid-inducible) was used to successfully drive gene expression of the entire clusters. In the case of myxochromide S, heterologous production titers were in excess of those from the native host.
Finally, Müller's group also studied the use of transposition as a method of introducing sizable natural product gene clusters into Myxococcus xanthus and P. putida. (Earlier work by Baltz and coworkers also studied the use of transposition and transduction in streptomycetes.201) Success was achieved for both hosts with an emphasis on improved chromosomal localization using transposition as opposed to homologous recombination. This creative strategy allowed the incorporation of the myxochromide S (∼30 kb) and epothilone (∼60 kb) gene clusters in both organisms. In this study, the Tn5 promoter was used to drive heterologous gene cluster expression.
The epothilones exhibit impressive anticancer properties but are produced at low levels from the slow-growing native producer Sorangium cellulosum. Hence, the situation offered a perfect opportunity for heterologous biosynthesis. The first effort relied on S. coelicolor and was completed by Kosan Biosciences.115 At the time, tremendous success had been achieved with the CH999 S. coelicolor strain and accompanying recombinant plasmids. Therefore, the same system was used in an attempt to produce epothilone-based compounds. The epothilone biosynthetic process is illustrated in Fig. 4. The native building blocks were expected to be available within S. coelicolor. A more significant concern was the need to introduce a 55 kb biosynthetic pathway featuring a megasynthase four modules in length (765 kDa). Introducing the gene cluster was accomplished by transforming two separately selectable plasmids, one a low-copy replicating plasmid (containing epoA, B, C, D) and the other an integrative plasmid (containing epoE, F, K). Both clusters of epo genes utilized the actI promoter in accordance with the expression system designed for CH999. This approach allowed for successful production of epothilones A, B, C and D, and again demonstrated the capabilities of the S. coelicolor heterologous host.
Fig. 4 A schematic of epothilone biosynthesis. A molecule of acetyl-CoA is primed on the loading domain of the EpoA enzyme. The hybrid polyketide-nonribosomal peptide chain requires one molecule of cysteine on the NRPS EpoB, and one molecule of malonyl-CoA and eight molecules of (2S)-methylmalonyl-CoA substrates on EpoC, EpoD, EpoE and EpoF. The thioesterase domain on EpoF is responsible for cyclization and release of the hybrid chain from the NRPS-PKS complex. Epothilone A is generated by an EpoK-catalyzed epoxidation. |
As expected, the production of epothilone compounds using the significantly different E. coli heterologous host was more complicated. In particular, successfully reconstituting biosynthesis from a ∼55 kb cluster featuring a ∼22 kb gene was a substantial challenge. As a result, the E. coli epothilone biosynthesis effort employed several strategies directly aimed at overcoming the challenges of gene transfer and active expression.
The first attempt used a precursor-directed approach in which the last four modules of the epothilone cluster were expressed from three different T7 expression plasmids.185 Next, Mutka and co-workers introduced the entire epothilone gene cluster taking care to address many of the anticipated problems in subsequent biosynthetic reconstitution. First, the authors synthesized the entire cluster so as to eliminate any expression problems through codon bias within E. coli. This step was a good early example of leveraging emerging technology, in this case, advances in megasynthase gene synthesis.202 Next, a combination of low post-induction temperature (15 °C), the use of the arabinose-inducible PBAD promoter (as an alternative to the commonly used T7 system and in conjunction with a strain modified to eliminate arabinose catabolism and aid arabinose transport), and the co-expression of E. coli-specific chaperonins allowed for the successful expression of the majority of the epo genes. The exception was the 22 kb epoD. Impressively, the authors overcame the lack of expression by dividing the gene in two and using docking domains from the stigmatellin PKS system to ensure protein association of the newly-divided EpoD. Docking domains normally allow inter-enzyme communication within the context of specific natural product biosynthesis schemes.163,203 In this example, the communication capabilities were used in an engineered fashion by designing the docking domains as part of the smaller EpoD subunits, effectively overcoming the need to divide the enzyme. (In an analogous effort, Watanabe et al. divided the 530 kDa RifA protein using docking domains from the DEBS system.163,171 Expression in this example was driven by T7 promoters and allowed for the production of active protein products able to communicate via the introduced docking domains, resulting in an early intermediate of rifamycin.) The authors used a derivative of strain BAP1 to provide the needed (2S)-methylmalonyl-CoA substrate. Combining all of these steps allowed for the production of epothilones C and D.182
The first report towards heterologous artemisinin was the production of the precursor amorphadiene in E. coli.206 In contrast to modular polyketide or nonribosomal peptide systems, isoprenoid biosynthesis does not usually involve large enzymes. However, there may be the need to introduce multiple genes as part of the overall pathway. E. coli represents a relatively clean host for isoprenoid biosynthesis. Although there are native mechanisms for isoprenoid biosynthesis in E. coli, the innate metabolic pathways needed for precursor supply are considered insufficient to support heterologous biosynthetic efforts and, as a result, there is little concern of an overabundance of native isoprenoid compounds contaminating heterologous production efforts. However, this situation necessitated metabolic engineering to ensure sufficient precursor supply for eventual biosynthesis.
To this end, Martin and coworkers favored heterologous transfer of the yeast mevalonate pathway into E. coli, as opposed to engineering the native DXP pathway (Fig. 5), for which there was a concern about unknown regulatory elements that might hamper over-production.206 Once a pathway was established for precursor supply, the first committed step towards artemisinin was introduced. This step was catalyzed by the amorphadiene synthase from Artemisia annua. Without direct access to the amorphadiene synthase gene, gene synthesis was employed to both optimize codon usage within a heterologous host and to provide de novo genetic material for heterologous transfer. This study featured the use of lac operators and both lac and trc promoters. It is interesting to contrast these gene expression elements with those (primarily T7-based) that have been used for polyketide and nonribosomal peptide heterologous expression attempts in E. coli. Also in this study, conventional K12 E. coli strains were used as opposed to the B strains that commonly accompany the T7 expression system.
Fig. 5 A schematic of artemisinin and Taxol biosynthesis. First, either the mevalonate or the non-mevalonate pathways generate the two universal C5 precursors for isoprenoid natural products, isopentenyl diphosphate and dimethylallyl diphosphate. These two can be interconverted through the action of isopentenyl diphosphate isomerase. One molecule of dimethylallyl diphosphate and one of isopentenyl diphosphate are condensed to give the C10 geranyl diphosphate by a geranyl diphosphate synthase. The C15 farnesyl diphosphate is generated from one molecule of geranyl diphosphate and one of isopentenyl diphosphate by a farnesyl diphosphate synthase. Lastly, the C20 geranylgeranyl diphosphate is generated from one molecule of farnesyl diphosphate and one of isopentenyl diphosphate by a geranylgeranyl diphosphate synthase. The first committed steps towards artemisinin and Taxol biosynthesis produce amorphadiene and taxadiene by cyclization of the C15 and C20 intermediates by an amorphadiene synthase and a taxadiene synthase, respectively. Amorphadiene and taxadiene both undergo significant oxidations on their cyclic cores to generate the final molecules. Abbrevations: IPP = isopentenyl diphosphate; DMAPP = dimethylallyl diphosphate; IDI = isopentenyl diphosphate isomerase; GPP = geranyl diphosphate; FPP = farnesyl diphosphate; GGPP = geranylgeranyl diphosphate; GPPS = geranyl diphosphate synthase; FPPS = farnesyl diphosphate synthase; GGPPS = geranylgeranyl diphosphate synthase; ADS = amorphadiene synthase; CYP = cytochrome P450; A-4,11-D H = amorpha-4,11-diene hydroxylase; AAD = artemisinic alcohol dehydrogenase; A-ALD-R = artemisinic aldehyde reductase; AAR = artemisinic alcohol reductase; DHAA-D = dihydroartemisinic aldehyde dehydrogenase; T-5α-H = taxadiene 5α hydroxylase; T-5α-ol O-AT = taxdien-5α-ol O-acetyltransferase; T-13α-H = taxadiene 13α hydroxylase; T-10β-H = taxane 10β-hydroxylase; 2α-HT 2-O-BT = 2-α-hydroxytaxane 2-O-benzoyltransferase; 10-DB III-10-O-AT = 10-deacetylbaccatin III 10-O-acetyltransferase. |
The next step towards artemisinin production was the functional production of plant P450 enzymes within E. coli. Natively, plant-derived P450 enzymes are found membrane-bound, which hints at the challenges likely to be encountered when attempts are made to heterologously express such genes in E. coli. However, numerous strategies have been developed to overcome the lack of heterologous expression/activity and many of these were systematically employed by Chang et al. during the production of artemisinic acid in E. coli.207 Namely, codon optimization, gene truncation, 5′-region modification, reductase partner inclusion and modification, promoter variation (T7, PBAD, lac, tac), and strain variation (BL21(DE3), DH1, DH10B) were tested for improvements in final activity and product formation. Although it may not be possible to simply apply the best case scenario here for future attempts at other plant-derived P450 gene expression/activity studies, the range of options pursued will most likely be applicable and lead to improvements in most cases.
A key alternative option to reconstituting biosynthesis of plant-derived natural products is the use of a eukaryotic host, and the same group that spearheaded artemisinic acid production in E. coli studied production in S. cerevisiae in parallel. Using this route, the first heterologous production of artemisinic acid was actually completed before the success with E. coli. To do so, amorphadiene synthase was placed into an expression plasmid (2μ autonomously replicating) under the control of an inducible GAL1 promoter. The cytochrome P450–reductase pair, isolated through a probing of closely-related plant species, was then introduced on a separately selectable 2μ expression vector also using an inducible GAL1 promoter, with the gene sequences unaltered from their plant-derived origins. Lastly, over-expression of a truncated form of a 3-hydroxy-3-methylglutaryl-CoA reductase (tHMGR), over-expression of upc2-1, and down-regulation of ERG9 (encoding a squalene synthase, the beginning of the essential sterol biosynthetic pathway) using a methionine-repressible promoter (PMET3) improved amorphadiene titers nearly 500-fold; these steps were metabolic engineering attempts to reduce unwanted regulatory or crosstalk mechanisms. Successful production was achieved with little effort invested in optimizing gene expression conditions (at least when compared to E. coli), implying that yeast might be the host of choice moving forward for either total or semisynthetic production of artemisinin.208
Early genes identified were often tested for expression through convenient hosts such as E. coli and S. cerevisiae.216,223 Though such efforts were primarily driven by the interest to better biochemically characterize the enzymes of the Taxol biosynthetic pathway, information also began to accrue regarding the feasibility of reconstituting protein activity through these hosts.
At the same time, a variety of methods were being explored for the production of Taxol. Although the total chemical synthesis of Taxol has been achieved by several routes, the low yields obtained made this approach economically infeasible for large-scale production.224 Semi-synthetic approaches were developed to convert a key intermediate extracted from Yew needles, simultaneously providing a viable production process and minimizing the environmental impact that resulted from initial isolation attempts requiring destructive extraction from Yew bark.225 Alternative efforts were focused on the use of plant cell culture for Taxol production. Several Taxus species have been studied in the context of Taxol cell culture production, with numerous strategies used to boost production.226 Examples include feeding methyl jasmonate, rapid removal of the Taxol product, and immobilizing the plant cells.227–229 However, despite significant advances, plant cell cultures still suffer from low yields, high production costs, and unwanted by-products.226 Heterologous plant hosts were also tested, with taxadiene successfully produced through Arabidopsis thaliana and tomato fruit.230,231 It was discovered that in addition to plant-derived Taxol production, certain Taxus-associated fungi also have the ability to produce Taxol.232–234 One particular strain, Nodulisporium sylviform, was subjected to rounds of mutagenesis to improve Taxol yield,234,235 with production making only modest gains from already low titers (314 to 393 μg/L). Although fungi-derived production does provide an alternative biological route to Taxol, thus far, quantities produced are still well below those from plant-dependent routes with slight, if any, biological advantages provided by the fungal hosts.
The drawbacks associated with each attempted production route and the lure of designed analog production provided arguments for heterologous Taxol biosynthesis through convenient microbial hosts. It was then no surprise to see the first efforts made at Taxol intermediate production in E. coli and yeast, relying on the native cellular pathways to provide the precursors needed for biosynthesis. In the case of E. coli, the taxadiene synthase gene was heterologously expressed, again using the popular T7 system. However, researchers observed exclusive inclusion body formation and no protein activity regardless of whether K12 or B strains were used or the post-induction temperature was modified; soluble and active protein was obtained when a thioredoxin fusion construct was expressed at lowered post-induction temperatures (20 °C).223 Later, efforts were made to improve gene expression by removing a leader N-terminal region (required for plant plastid localization) and including an additional argU gene to aid the codon bias expected for E. coli.236,237 Huang et al. accomplished in vivo production of taxadiene from E. coli by also co-expressing certain DXP pathway genes.237 This theme would continue for E. coli-derived Taxol intermediates with cues taken from an extensive analysis of the DXP pathway to support lycopene production.238
Research also began to test heterologous production in yeast. S. cerevisiae was tested for expression of the first eight genes (after having removed plastidial domains) thought to be responsible for front-end Taxol intermediates. The first five steps were then evaluated for in vivo heterologous reconstitution. To do so, separately selectable autonomous plasmids were used, one of which carried two genes. Initial efforts relied primarily on inducible GAL promoters and one constitutive GPD promoter. Later, this system was exclusively switched to GAL promoters. Using this arrangement, the first two committed Taxol intermediates (taxadiene and taxadien-5α-ol) were quantified from production cultures.239 Building on this work, Engels et al. focused on improved taxadiene production. Specifically, separately selectable plasmids were again used, but this time with a codon-optimized taxadiene synthase under the control of a PGK promoter. In addition, care was taken to avoid biosynthetic overlap with native yeast sterol production (similar to the steps taken for artemisinic acid production described above). Here, it should be noted that although yeast may provide a more accommodating environment for plant-derived genes and intrinsic support for the biosynthesis of isoprenoid compounds, the host displays a disadvantage in native metabolism interacting with and/or contaminating heterologous production efforts. To further address this, the yeast mevalonate pathway, which feeds into both native sterol and heterologous taxadiene biosynthesis, was adjusted to eliminate feedback inhibition resulting from sterol production and by supplying a separate geranylgeranyl diphosphate (GGPP) synthase capable of decoupling the two pathways. In addition, a separate modification allowed for exogenous uptake of sterols; this step was expected to down-regulate native sterol metabolism that would otherwise detract from heterologous taxadiene production. Notably, each metabolic engineering step was designed to eliminate crosstalk between native and heterologous pathways.240
To conclude this sub-section, very recent work will be profiled reporting on the production of the first two Taxol pathway intermediates in E. coli at unprecedented titers.241 In this work, careful attention and an array of modifications were first made to ensure sufficient expression of the first two genes in the overall pathway. This involved codon optimization, 5′-region modifications (both to remove plastidial domains and to improve cytochrome P450 gene expression), gene fusions, and lowered temperature expression. These same techniques have been highlighted throughout previous case studies, but their combination in this case allowed for a comprehensive attempt to facilitate biosynthetic gene expression. However, the drastic improvements in titers resulted from the careful balancing between precursor supply and biosynthetic conversion. (Previous studies for amorphadiene had indicated that improper balancing may negatively impact cell viability and product titers.206) To accomplish this, a combination of promoters (trc, T5, and T7) and gene copy variants (through chromosomal- or plasmid-borne gene expression) allowed an optimization of product biosynthesis. However, as alluded to above, the native DXP pathway was engineered to supply the needed isoprenoid precursors. Success in this example provides another point of debate between using E. coli or S. cerevisiae as a long-term host for attempts at reconstituting the remainder of the Taxol pathway.
Microbe | Microbe information | Genome information | Heterologous host | Key example of metabolite produced heterologously | Largest gene cluster expressed | Largest heterologous gene expressed | Largest number of genes expressed and product of the expression | |
---|---|---|---|---|---|---|---|---|
Advantages | Disadvantages | |||||||
Bacillus subtilis | Gram-positive, rod-shaped, doubling time 1 h | 4.2 Mb, 43.5% GC, 4100 ORFs; (strain 168) | Naturally competent, readily available integrative vectors | No native plasmid systems; potentially limited metabolic support | Bacitracin191 | Bacitracin (49 kb)191 | bacC (19 kb)191 | Bacitracin (9 genes)191 |
Escherichia coli | Gram-negative, rod-shaped, doubling time 20 min | 4.6 Mb, 50.8 GC%, 4288 ORFs, (strain K-12 MG1655) | Readily available genetic engineering tools; rapid growth cycle; established bioreactor protocols | Does not support PKS or NRPS post-translational modification; limited metabolic support | Erythromycin,169,172 epothilone115 | Epothilone (55 kb)182 | epoD (21.7 kb)182 | Erythromycin (23 genes)169,172 |
Pseudomonas putida | Gram-negative, rod-shaped, doubling time 1 h | 6.18 Mb, 61.6% GC, 5437 ORFs; (strain KT2440) | Supports PKS and NRPS post-translational modification | Potentially limited metabolic support | Myxochromide S,196 epothilone198 | Epothilone(58 kb)198 | epoD (21.7 kb)198 | Epothilone (7 genes)198 |
Saccharomyces cerevisiae | Eukaryotic, budding yeast, doubling time 1.5 h | 12.5 Mb, 38% GC, 5884 ORFs; (strain S288c) | Plasmid systems available, excellent native recombineering system | Does not support PKS or NRPS post-translational modification; limited metabolic support | 6-methylsalyclic acid,140 dihydromonacolin L245 | — | lovB (10 kb),245pcbAB (12.3 kb)246 | Taxadien-5α-acetoxy-10β-ol (5 genes)239 |
Aspergillus nidulans | Filamentous fungus, doubling time 1 h | 30 Mb, 50.3 GC%, 9546 ORFs; (strain FGSC A4) | Supports PKS and NRPS post-translational modification; native metabolic support | Slow growth cycle | 6-methylsalyclic acid,144 monacolin J150 | Monacolin J (37 kb)150 | tenS (12.7 kb)243 | Monacolin J (lovastatin intermediate) (10 genes)150 |
Streptomyces coelicolor | Gram-positive, forms spore and mycelium, produces natural product, doubling time 2 h | 8.7 Mb, 72.1 GC%, 7825 ORFs; (strain A3(2)) | Established recombinant DNA techniques; native metabolic support | Slow growth cycle; need for unmethylated DNA during genetic transfer | Caprazamycin,244 epothilone115 | Epothilone (55 kb)115 | epoD (21.7 kb)115 | Granaticin (37 genes)113 |
Streptomyces lividans | Gram-positive, forms spore and mycelium, produces natural product, doubling time 4 h | 8.2 Mb, 71 GC%, 7551 ORFs; (strain TK24) | No need for unmethylated DNA for transformation; established recombinant DNA techniques; native metabolic support | Slow growth cycle | Validamycin,242 daptomycin108 | Thiocoraline (53 kb)176 | dptBC (22 kb)108 | Clorobiocin (27 genes)117 |
We also want to emphasize that there are additional host choices available. In an attempt to more broadly capture the alternative options in heterologous natural product biosynthesis, Fig. 6 plots the heterologous natural product efforts across the three classes of natural products discussed above. Here, we have included other hosts in an attempt to present those omitted from the discussions above and to highlight emerging trends in heterologous biosynthetic efforts.
Fig. 6 Cumulative trends in heterologous hosts used for natural product production of polyketides, nonribosomal peptides, isoprenoids, and their intermediates/derivatives. Data were generated from the literature over the time period indicated, and each data point represents a publication presenting results of a novel heterologously produced natural product (including novel methods of producing natural products previously reported; for example 6-MSA through S. cerevisiae and E. coli).243,263–361 Omitted were papers focused upon gene expression (for the purpose of recombinant protein products or enzyme biochemical characterization) or optimization of a previously produced heterologous compound. Hybrid polyketide-nonribosomal peptide compounds were included in the nonribosomal peptide category totals. Extrapolation curves to 2015 were calculated based upon trend slopes over the 5 years from 2005 to 2010. Due to the volume of data produced during the analysis, the results may not be completely comprehensive, but they serve to illustrate current and future trends within the heterologous natural product biosynthetic field. (a) Overall trends of heterologous natural product production of the three classes highlighted in this review. (b) Trends in heterologous hosts for polyketide production. (c) Trends in heterologous hosts for nonribosomal peptide production. (d) Trends in heterologous hosts for terpenoid/isoprenoid production. |
The environment holds an unknown level of chemical diversity in the form of natural products. There is every reason to believe this diversity holds tremendous therapeutic potential. Supporting this view is the current track record of successful compounds isolated from the soil and marine environments. However, the challenges in isolating and harnessing the native producers have been a primary driver in the development of heterologous biosynthesis programs. Natural product discovery and screening efforts have suffered over time from rediscovery of the same natural products and natural product producers. Besides leading to a situation of diminished discovery and a lack of new compounds, it is now generally accepted that the majority of environmental microbes will not survive in artificial, man-made culture environments. It is estimated that over 99% of environmental microbes are un-cultivatable outside their native environments.247 Hence, the tantalizing potential of molecular diversity is crippled by lack of access to the native producers.
Several approaches have been developed to address this access concern. First, with the use of consensus sequence information resulting from the genetic and biochemical similarities between certain natural product classes,248 environmental microbial sources can be narrowed through screening made more efficient by probing for those cellular populations putatively possessing natural product pathways. In essence, this reduces the total number of microbes to be screened and narrows the search to those considered most promising.249 However, this approach still does not guarantee successful culturing in the lab or during process scale up. Alternative approaches have used creative culturing techniques to facilitate the successful growth of environmental microbes. In these cases, unique laboratory environments are generated to replicate those found in nature including access to reconstructed growth environments and co-culturing.250,251 In an approach that embraces the idea of heterologous biosynthesis to provide access to natural products, the field of metagenomics has emerged in which environmental DNA (eDNA) is captured from a field sample, packaged into a suitable host plasmid, and either completely sequenced or first transferred to a heterologous host for screening of any encoded natural product activity.252,253 In this example, the challenges in microbial cultivation are by-passed in favor of capturing the genetic material potentially encoding the desired natural product bioactivity. Finally, the number of fully sequenced genomes is growing at a staggering pace.254 As highlighted before for selected hosts, the resulting genomic information has revealed a surprising number of “silent”, “cryptic”, or “orphan” coding regions that serve as potential sources of additional natural products.255–258
In the scheme of natural product heterologous biosynthesis, the above-mentioned research areas help provide access to the DNA encoding for pathways of interest. In the case of metagenomics, this access is aided by the fact that certain natural product pathways are found genetically clustered (though this does not hold for most plant-derived natural product systems). Once a new host organism or an accompanying genetic pathway has been identified, the host-specific gene transfer/reconstitution methods outlined above can be implemented towards heterologous biosynthesis. In addition to those host systems previously profiled, an alternate recent approach has been the re-design (typically through removal of the primary natural product biosynthetic pathway) of established production systems such as S. erythraea and S. fradiae to then serve as heterologous hosts, with the advantage of a highly-manipulated cellular background capable of accommodating biosynthesis.259,260
Once heterologous biosynthesis steps have begun or are completed, additional areas of research can be subdivided into biosynthetic mechanics and metabolic support. Natural product gene clusters may contain very large (>10 kb), numerous (>20), notoriously foreign (CytP450), and non-optimized (containing rare codons) genes that will tax any heterologous host's gene expression machinery. As highlighted throughout case studies in this review, methods to aid gene expression could then focus on a range of promoter, plasmid, or operon designs in addition to control of gene expression and culture conditions. More generally, thought must be given to every stage of the protein's life to ensure that active enzyme is present in sufficient quantities and for an adequate period of time for the purpose of natural product biosynthesis. The process will no doubt be fine-tuned empirically through the currently available tools developed through recombinant protein production and those to be developed for recombinant natural product biosynthesis. Like heterologous protein production, there will most likely be no general approach applicable to all cases. Instead, guidelines will accompany the growing number of options to facilitate the overall process.
Overcoming the challenges of gene expression and biosynthetic mechanics then gives way to intracellular support. Future challenges associated with more complex natural products will include providing the most convenient heterologous hosts with the capabilities to support biosynthesis. As mentioned, the intrinsic intracellular environment was the key early advantage to systems like S. coelicolor. If E. coli, B. subtilis, or others are to truly become universal hosts for complex natural products, they must be metabolically engineered to broadly support biosynthesis. This primarily means providing the substrates of the biosynthetic process, which may require the same gene transfer and expression refinement associated with natural product biosynthetic reconstitution. However, while direct precursors are the most obvious need, they are by no means the only metabolic support required. When one analyzes the biochemical steps from precursor to final product, there exist a number of cofactors and auxiliary components that contribute to overall biosynthesis. These must also be present in non-rate-limiting quantities to optimize eventual biosynthesis. Towards this end, heterologous natural product biosynthesis is now reaching the stage in which traditional modes of metabolic engineering may be applied to optimize cellular production. Metabolic engineering has been defined as “the directed improvement of product formation or cellular properties through the modification of specific biochemical reactions or introduction of new ones with the use of recombinant DNA technology.”261 This definition may be further subdivided into heterologous metabolic engineering (genetic transfer and reconstitution, the focus of this review) and pathway metabolic engineering, where the principles and methods of metabolic engineering may now be applied towards heterologous biosynthesis optimization once product formation has been accomplished. Optimization approaches have included numerous computational and experimental predictive and analytical methods to characterize and improve biosynthesis. Given the relatively recent success of heterologous complex natural product biosynthesis, these tools are just beginning to be applied, but they promise to further add to the success thus far achieved and they are particularly well-suited to the heterologous hosts highlighted above.262 Such approaches, in addition to -omics global characterization techniques and more traditional process engineering steps, will be paramount to maximizing natural product output from heterologous hosts. As emphasized throughout this review, heterologous natural product biosynthesis derives from situations that limit access to or control of natural product medicinal value. Heterologous production at minimal titers does not solve this problem. However, the genetic, metabolic, and process engineering strategies mentioned above serve as powerful routes to realize the full potential of heterologous biosynthesis.
One of the driving goals of the natural products field is to increase access to the medicinal potential of natural products. Heterologous biosynthesis sprung from this objective. Here, we have reviewed the origins of this quest with a special emphasis on the heterologous hosts chosen and their associated experimental tools to aid the gene transfer and reconstitution processes. Future research will continually fuel these efforts with additional promising therapeutic natural product gene clusters that push the limits of the heterologous biosynthetic approach. Looking further, there will be numerous opportunities to continually engineer, fine-tune, and improve heterologous biosynthesis. With dedication to these efforts, there is no reason to believe the impact of natural products is in decline, but simply in a state of re-invention, with the promise of even more therapeutic impact in the future.
This journal is © The Royal Society of Chemistry 2011 |