Dong
Yi†
,
Thomas
Bayer†
,
Christoffel P. S.
Badenhorst†
,
Shuke
Wu†
,
Mark
Doerr†
,
Matthias
Höhne†
and
Uwe T.
Bornscheuer
*
Department of Biotechnology & Enzyme Catalysis, Institute of Biochemistry, University Greifswald, Felix-Hausdorff-Str. 4, D-17487 Greifswald, Germany. E-mail: uwe.bornscheuer@uni-greifswald.de
First published on 18th June 2021
Biocatalysis has undergone revolutionary progress in the past century. Benefited by the integration of multidisciplinary technologies, natural enzymatic reactions are constantly being explored. Protein engineering gives birth to robust biocatalysts that are widely used in industrial production. These research achievements have gradually constructed a network containing natural enzymatic synthesis pathways and artificially designed enzymatic cascades. Nowadays, the development of artificial intelligence, automation, and ultra-high-throughput technology provides infinite possibilities for the discovery of novel enzymes, enzymatic mechanisms and enzymatic cascades, and gradually complements the lack of remaining key steps in the pathway design of enzymatic total synthesis. Therefore, the research of biocatalysis is gradually moving towards the era of novel technology integration, intelligent manufacturing and enzymatic total synthesis.
However, since the natural synthetic pathways of many natural products have not been fully elucidated, coupled with the challenges of heterologous expression of enzymes, it is not easy to achieve the enzymatic total synthesis of natural products in vitro or in model host cells. Complex drug compounds are mostly artificially designed molecular structures, which require well designed enzymatic synthesis pathways and heavy protein engineering to reduce the synthetic steps and solve the problem of low enzyme activity towards non-natural substrates. Comfortingly, innovative results of novel enzymes, enzymatic reaction mechanisms, and enzymatic cascades make the de novo biocatalytic synthesis of natural products and their derivatives easier. Especially, at present, the fourth scientific and technological revolution characterised by informatization, networking, and intellectualization is in full swing. Following Moore's Law,3 the integration and computing power of electronic chips are being continuously doubled, which enable big data mining, network relational processes, and in silico protein design that require heavy computation to be realised. Machine learning (ML), which has recently entered the era of deep learning (DL), has opened up the intelligence of data analysis and network construction on big data processing and extended their application in genomics, proteomics and metabolomics.4–6 This brings us effective tools for the discovery of novel enzymes, biocatalytic reactions, and enzymatic synthetic pathways from massive gene, protein and chemical data analysis. In addition, ML also guides the rational design and protein engineering to create artificially evolved enzymes.7,8 Moreover, the rapid development of quantum computers, automation and ultrahigh-throughput (UHTP) screening technology has brought unlimited possibilities for biocatalysis research. Impressively, quantum computing has been oriented towards practical use for calculating quantum chemistry models at an incredibly highspeed.9 Automated screening robots have gained popularity in many laboratories. The next generation experiment robot has begun to serve chemical experiments with an efficiency hundreds of times higher than that of humans.10 Combined with fluorescence-activated cell sorting (FACS) and microfluidics, these advancements increase the screening throughput of enzyme mutagenesis libraries by several orders of magnitude.11–13 It is foreseeable that biocatalysis research will enter a new era characterised by the integration of novel technologies and intelligent manufacturing in the next decade. Therefore, enzymes and enzymatic pathway designs by artificial intelligence, as well as adaptive evolution assisted by UHTP screening, have simplified synthetic pathways of complex compounds and already have been adapted to model host cells. Together with the latest achievements in synthetic biology, several important natural products, their derivatives, and non-natural compounds with pharmaceutical value have been fully enzymatically synthesised, which has elevated the field of biocatalysis substantially. Eventually, historical borders between classical biocatalysis – using isolated and often immobilised enzymes – and biotransformation – using (engineered) whole-cell microorganisms – will disappear since nowadays all available tools can be integrated to create product-oriented designer pathways and only efficiency in terms of productivity in grams per litre and time will decide which type of reaction system is superior.
In this review, we briefly summarize the application of novel advanced technologies applied in biocatalysis, highlight the identification of novel enzymes and enzymatic reactions/cascades, and collect the innovative artificial enzymatic total synthesis of natural products and valuable drug compounds to elaborate the features of the new trends in biocatalysis: integration of novel technologies, intelligent manufacturing, and enzymatic total synthesis of complex molecules (Fig. 1).
Fig. 1 New trends in biocatalysis: integration of novel technologies, intelligent manufacturing, and enzymatic total synthesis. |
Most ML algorithms assist in finding rules or patterns in very high dimensional data, e.g., millions of 3D molecular coordinates of atoms of amino acid residues, and reduce these rules to likely (low dimensional) predictions of the behaviour of an investigated system, e.g., a (single) enzyme activity. The downside of current computer systems is that they have no experience and understanding of our experimental work, the objects of investigation, the relation of parts of an experiment and the conditions under which a certain experiment was performed and data acquired. All this information about the details and the process of experiments that lead to a certain outcome is called “meta-data”. Meta-data is therefore all information required for an uninstructed entity (like a machine) to reproduce a scientific experiment: exactly what components (e.g., down to the lot of a chemical), what devices (e.g., exact type and firmware version), what experimental conditions (e.g., temperature, pH, pressure, solvents, metabolites, metal ions), the sequence of operations, and additional information about the experiment itself, such as who did the experiments where and when, is meta-information that helps a machine to relate the outcome of different experiments and judge possible sources of error. The predictive power of “learning” machines is highly related to the quality and reliability of the data – and especially the meta-data – it “learned” from.14,15 Therefore, the future success of the application of ML strategies, also in protein engineering and biocatalysis, will heavily rely on proper data generation in conjunction with association of as much meta-data as possible. Meta-data and data linkage are the keys to good predictions since they describe many aspects under which a certain data set was generated and puts data into a context that can be “understood” by a machine.
If not instructed, machines have no notion of meaning, the “semantics”, of these objects and relations. With a formal language derived from the theories of logics and reasoning, we can provide a tiny subset of descriptive relations to objects and operations of experiments that enable the machine to perform a limited amount of logical reasoning and “understanding” of an underlying experiment or data set. This subset is called “ontology”, which is derived from the ancient Greek terms “being” and “logical discourse” – so a logical discourse about the being, e.g., of an experiment or measurement. Creating more and more fine grained logical descriptive term networks (“ontologies” – Google.com also calls them “knowledge graphs”) and associating them with the performed, biochemical experiments will help machines to more and more “understand” the underlying experiments and perform logical reasoning and relating information from very different sources (e.g., international databases such as UniProt, PDB, BRENDA, KEGG, PubMed, and PubChem) to particular experiments. Therefore, the development of precise and consistent ontologies in the realm of protein engineering or biocatalysis and connecting these terms with ontological terms from related disciplines, such as chemistry, molecular biology, and biology, is paramount for a very powerful aggregation of data to further improve the predictive power of machine-based reasoning and ML approaches. This also requires not only a standardised terminology and (fitting) ontologies, but also standardised means of (meta-)data transfer and a standardised query language. Good sources for actively developed ontologies are the EMBL-EBI Ontology Lookup Service (https://www.ebi.ac.uk/ols/index), ontologies of the Pistoia Alliance (https://www.pistoiaalliance.org/), and Allotrope foundation (https://www.allotrope.org/ontologies). Furthermore, a very recent initiative of the German Research Foundation (DFG), NFDI4Cat, is currently assembling a workgroup to develop and refine ontologies for (bio-)catalytic processes, which stresses their importance. In the case of database query language development, the international W3C consortium gathered a standardization workgroup, which is developing the Graph Query Language (GQL, https://www.gqlstandards.org/), a very promising candidate to aggregate data from large, public chemistry and biochemistry databases.
A foreboding of the power of combining knowledge of different databases is given by the recently published EnzymeMiner webtool (https://loschmidt.chemi.muni.cz/enzymeminer/) of the Damborsky group.16 Based on ML predictions, EnzymeMiner proposes a selection of promising enzymes originating from a broad range of organisms that have a potentially similar activity as a provided template and are very likely solubly expressible.17 This can dramatically enhance the exploration of new enzyme types that were not in the focus of current research and knowledge.
Another webtool of this kind in the realm of synthetic biology pathway design is Galaxy SynbioCAD (https://galaxy-synbiocad.org/).18 SynbioCAD starts with a retrosynthetic pathway of a target compound. It then proposes the best biosynthetic pathways for the transformation, cloning strategies of the corresponding enzymes into target vectors and finally liquid handling protocols to execute the assembly of the complex vector constructs.
In current ML-based protein engineering approaches, ML experts need to work tightly with biochemists to first extract the relevant features from a protein engineering question and build a numeric feature set from that to train the ML algorithms. This requires a lot of experience and deep knowledge in modelling, the right questions from the ML side as well as some general understanding from the biochemist's side. With good ontological systems in place, many aspects of this model building processes might be possibly automated in the future, so that ML can become a standard tool for enzyme engineers.
Semantics, derived from ontologies and logical reasoning, is currently a very fast-growing part to improve the information retrieval for (bio-)catalytic information, but it is only one aspect. Another very related aspect that is also rapidly changing and which will influence how we engineer novel enzymes in the future is the development of lab automation standards and data file formats and data exchange protocols. It is very important that these standards are open, freely accessible and royalty free to achieve a global adoption overcoming the monopolism of proprietary island solutions of large instrument vendors. Such free and open lab automation and data standards are developed by the SiLA (https://sila-standard.com/) and AnIML (https://animl.org/) workgroups to support reproducible and reliable data transfer and automated documentation. The quality of the collected data also highly relies on a reasonable number of replicates, consistency checks, personal responsibility of researchers, a systematic way of reporting data (e.g., Standards for reporting Enzyme Data – STRENDA19). It should be also stressed that reports about experiments with negative results are of a very high value for the community – not only that they reduce the number of redundant trials, but also enable machines to learn what did not work. It should also be noted that results might get a completely different context, if our knowledge proceeds: unexplainable results might become very understandable in the future, when new knowledge is created. Results of our biocatalytic exploration should therefore be handled neutral, avoiding human bias.
If ML shall be successful in the future, the machines need standardised, consistent data with the semantics to interpret them. Therefore, a shift in paradigm in the way how we design experiments is required: a high level of automation – with validated, reproducible devices, to get good statistics about the intrinsic variation of a measurement (e.g., three technical replicates of an activity measurement are not enough). Variations related to the expression host, growth medium, composition, expression vector, etc., need to be much more explored.
A shift in thinking will also be necessary in modelling ML questions for protein engineering. Most aspects of current ML in protein engineering have been covered by very recent and exhaustive reviews from the groups of Frances Arnold,8 Jiri Damborsky,15 and Manfred Reetz.14 Briefly, ML has been successfully applied to improve enzyme activities,20 stereoselectivity by the AR algorithm21 which is related to the promising Adaptive Substituent Reordering Algorithm (ASRA),22 enzyme thermostabilities,17 and soluble expression using the SoluProt tool.16 Currently, most ML approaches use too crude assumptions about an enzymatic system despite experimental experience that in many cases, enzyme reactions rely on a very fine-tuned arrangement and dynamics of residues in the active site and very small levels of energy differences play crucial roles in their activities. Enzymes are complex multi-dimensional objects (at least three spatial and one temporal). The exchange of one single amino acid can sometimes completely alter the reactivity or stereo-/enantio-selectivity of an enzyme.23 Predicting these small effects on a quantum mechanical level seemed for a long time inaccessible. Very recent developments of ML derived/generated force fields with quantum chemical accuracy might very soon lead to much better predictions of enzymatic activities and lead to a deeper understanding of enzyme mechanisms.24
A very contrasting, holistic approach is followed by Alley et al.,25 who applied deep learning to approximately 24 million unlabelled natural amino-acid sequences to extract important features, like protein stability and – to some extent – activity.
These unified representations contain short and long term “memories” of a protein sequence, allowing forecasting of structural features with high accuracy even of completely de novo proteins unknown in nature. The model contains a hidden knowledge of protein architecture and could be by that a very powerful tool to annotate or even improve protein activity. Very powerful algorithms for predicting protein structures – one foundation of enzyme activity prediction – have been developed by DeepMind, a Google.com company: AlphaFold 2,26 a community version of the AlphaFold algorithms with open source code is maintained by Wendy Billings et al.18 AlphaFold and related algorithms learn and predict distances between pairs of amino acids and bond angles within amino acid residues from large protein structure databases, like the PDB, by feeding deep neural networks. AlphaFold 2 won the 2020 Critical Assessment of Protein Structure Prediction contest (CASP14) with a highest Global Distance Test (GDT) score of 92.4. One of the currently fastest implementations of folding algorithms, based on Recurrent Geometric Networks (RGN), was published by Mohammed AlQuraishi.27 His algorithm can predict protein structures in milliseconds and accuracies in similar orders as the AlphaFold algorithm.
One important aspect of recent ML-based protein engineering is the question of epistatic mutations28 and their pivotal role as enablers for the emergence of new enzymatic activities and means of leaving local activity optima towards higher enzymatic activities. More experimental data exploring folding and enzyme activity transitions via epistatic mutations should be extended to explore the full power of this concept.
Enzymes in a biological system, either a cell or an in vitro system, also have a very dynamically changing environment, such as different metal ions, protonation states, electron transfers, interactions with small molecules, oligomers and other proteins or (poly-)nucleic acids, which modulate their activities. Therefore, reducing enzymes to the bare one-dimensional sequence information is just a tribute to our present limitations in computational power – and the lack of knowledge of the structural dynamics, molecular interactions and electrodynamics of the associated components. Advances in high-performance computing and quantum computing might overcome these current restrictions in the future.
Fig. 2 An overview of droplet microfluidic technologies relevant to ultrahigh-throughput screening. (A) A schematic representation of droplet formation using two aqueous components and subsequent fluorescence activated droplet sorting (FADS). (1) Single cells are injected simultaneously with (2) a mixture of assay components, usually containing a fluorogenic substrate and a lysis agent. (3) A fluorinated oil containing a surfactant is injected into the third inlet. This breaks up the aqueous mixture into monodisperse water-in-oil droplets. (4) The single-cell lysate contains both the genotype (plasmid) and (5) the phenotype (enzyme). (6) Active variants convert the substrate to a fluorescent product, which (7) can be detected using the appropriate lasers and photomultiplier tubes (not shown). (8) A fluorescence signal triggers an electric pulse, delivered through imbedded electrodes. The dielectrophoretic force pulls the droplet from the path of least resistance into the “sorted” channel. DNA is subsequently recovered from sorted droplets by plasmid isolation or PCR (not shown). (B) Droplets can be reinjected into a second chip to make double emulsions. In this setup, inlet 1 would contain the droplets, inlet 2 some oil to space the droplets, and inlet 3 the outer aqueous phase containing a surfactant like Tween.43 The resulting double emulsion droplets can be sorted using flow cytometers, similar to FACS. (C) As an alternative to fluorescence measurements, imbedded fibre optic cables have been used to measure the absorbance of droplets. An increased absorbance signal triggers an electric pulse that moves the droplet into the sorting channel.44,45 (D) Droplets can be split, evenly or unevenly, into smaller droplets. This is useful for example if a destructive detection method like mass spectrometry is used, so the first droplet is sacrificed, and the second droplet sorted.46,47 (E) Droplets can also be fused, which is useful, for example, for delivering larger volumes of substrate or lysis agents. The chip first aligns droplets and then uses an electric pulse to merge them.48 (F) Smaller volumes of liquid can also be added to droplets using a method called picoinjection. Droplets flow past an aqueous inlet and an electric field is used to combine the two aqueous phases.49 (G) Droplets can also be sorted into multiple channels38 based on (H) at least two fluorescence measurements.40 (I) An interesting and promising approach allows the label free detection of molecules in droplets using RNA or DNA aptamers.50,51 A unique feature of aptamers is that they allow different enantiomers to be detected using the L- and D-forms of the nucleic acid aptamers. In the example shown, fluorescently labelled DNA aptamers are hybridised with antisense oligonucleotides labelled with quenchers. Binding to the target molecule separates the fluorophores and quenchers, resulting in a fluorescence signal. This allows both the concentration and enantiopurity of the substance to be determined. Importantly, if an aptamer for one enantiomer is available, the other enantiomer is easily detectable by synthesising the opposite enantiomer of the aptamer.50 This figure was inspired by Kintses et al. and Neun et al.52,53 |
Despite the technical simplicity of double emulsion sorting by flow cytometry, droplet microfluidics has made a relatively minor impact on biocatalyst discovery and engineering. While droplets have been used to screen hydrolase, oxidoreductase, aldolase, transferase, and isomerase activities, the hydrolases dominate by far (lipase, esterase, phosphatase, phosphonate hydrolase, sulfatase, β-glucosidase, β-galactosidase, and more).42,53–56 The reason for this is simply that most droplet sorting systems require a fluorescent signal and that it is relatively simple to design and synthesise fluorogenic hydrolase substrates. Four years ago, the Hollfelder group broadened the applicability of droplet sorting by introducing absorbance-activated droplet sorting (AADS) (Fig. 2B). They used a chip with embedded optic fibres to measure the absorbance of and sort individual ∼80 μm droplets at a rate of about 300 Hz. This enabled them to evolve an NAD+-dependent amino acid dehydrogenase.44 AADS has attracted significant attention and the paper has been highly cited. Just recently, the Hollfelder group published the first follow up AADS papers, albeit for the same type of reaction and detection system.45,57 As far as we know, no other group has published the use of AADS for directed evolution. There might be several reasons for this. The droplets used are rather large (∼80 μm compared to ∼20 μm for fluorescence sorting, representing a >50-fold larger volume), resulting in a much lower final enzyme concentration and making the method unsuitable for the detection of very low activities. Furthermore, the detection limit was about 10 μM for a strongly absorbing formazan dye (extinction coefficient >37000 M−1 cm−1), meaning that much higher concentrations of dyes with lower extinction coefficients, like 4-nitrophenolate (18500 M−1 cm−1), would be needed. Unfortunately, this brings us to a much bigger problem, which affects not only absorbance assays but also assays based on fluorescence or any other type of detection system. Despite being predominantly charged at alkaline pH, 4-nitrophenolate is known to “leak” between droplets, meaning that it can transiently enter the oil phase and then move to neighbouring droplets.58 Some dyes leak within seconds (aminocoumarin), some in minutes (rhodamine 6G), some over hours (resorufin) and some over days (fluorescein).59,60 While adding charged groups to a dye is known to dramatically slow leaking (from seconds to days), the case of 4-nitrophenolate demonstrates that this is a complex and often counterintuitive phenomenon.58,60 Not only hydrophilicity but also size matters, demonstrated by the ability of water molecules to diffuse over the fluorinated oil barrier rather easily.43 For simple hydrophobic molecules like haloalkanes there is a correlation between hydrophobicity (logP) and a tendency to partition into the oil phase.61 However, for more complex structures like fluorescent dyes, leakage seems to be related to surfactant concentration, with lower surfactant concentrations slowing down the leaking process, probably due to lower rates of micellar transport.59 Unfortunately, surfactants are needed to facilitate droplet formation and to stabilise droplets during incubation. Sindy Tang's group introduced the use of amphiphilic silica nanoparticles to address this problem. The partially fluorophilic nanoparticles adsorb to the aqueous phase, forming very stable pickering emulsions.62,63 Unlike surfactants, the nanoparticles cannot escape from the droplet surface and can therefore not facilitate the transport of molecules between droplets. This exciting technology has been commercialised by Dolomite microfluidics as a ready-to-use mix in HFE-7500 called Fluoro-Phase. While this approach, combined with alternative oils like perfluoro(methyldecalin), has been demonstrated to significantly (but not completely) reduce leakage, we are not yet aware of a publication describing the use of Fluoro-Phase in high-throughput screening (HTS).62,63
Beyond leakage, limited detection options are certainly one of the most serious challenges of microfluidic droplet sorting. As mentioned before, fluorescence and absorbance assays are not always applicable, and when they are, the use of chromogenic surrogate substrates tends to bias the screening outcome (“you get what you screen for”).64 Therefore, there is intense interest in developing alternative detection strategies. Surface-enhanced Raman spectroscopy, light scattering, image analysis, mass spectrometry, impedance measurements, electrochemical detection, and even NMR have been used to detect droplet contents. Wang et al. recently reported Raman-activated droplet sorting (RADS) at a frequency of about 1–2 Hz. They achieved sensitive detection of intracellular triacylglycerols by using an electric field to temporarily halt a moving cell, allowing enough time for an accurate Raman measurement. They claim to have detected intracellular TAG levels previously undetectable using fluorescent stains like Nile Red. However, because they do measurements on single cells before droplet encapsulation, it is not clear whether this method will find use for analysis of products that do not accumulate intracellularly.65 Mass spectrometry (MS) has been used to detect droplet contents at rates up to 30 Hz,66 and sorting was reported at rates of about 6 Hz, using electrospray ionization (ESI)-MS to analyse ∼15000 droplets in 6 h.46 While droplets containing an in vitro expressed transaminase could be sorted, the large droplet volume (25 nanolitre compared to 2 picolitre for fluorescence activated droplet sorting (FADS)) means that this detection strategy would not be applicable to biocatalysts derived from single DNA molecules or single cells.53 However, recent advances in in vitro DNA amplification, transcription, translation, and assay in microfluidic droplets could address this issue.67 Another limitation of MS detection is that it is destructive, so droplets have to be split before analysis. Droplets stored in a delay line are later sorted based on the outcome of the MS analysis.46 This means that the sequence of droplets is critically important and that the fusion of any two droplets would desynchronise the droplet trains, resulting in the loss of all hits. While this problem can be dealt with by co-injection of reference droplets, the system is far from straight-forward. An excellent summary by Neun et al. shows that despite all the progress, only fluorescence, absorbance, and electrochemical detection has been used for the sorting of actual directed evolution-derived or metagenome libraries.53 Furthermore, fluorescence detection is still unique in being the only format that uses droplets of only a few picolitres (high enzyme concentration) and is capable of detecting low nanomolar product concentrations and sorting at frequencies of several kHz. High-speed absorbance measurements on picolitre and femtolitre droplets is possible using differential detection photothermal interferometry (DDPI). This recent technology allows 100 picolitre droplets to be analysed at 1 kHz, with a detection limit of 1.4 μM for Erythrosin B (82500 M−1 cm−1).68 DDPI therefore has the potential to dramatically expand the scope of AADS. However, due to its complexity, the technique will likely remain limited to a few specialist laboratories.
Smart libraries, powered by advances in rational design and DNA synthesis, reduce but have yet to abolish the need for UHTP screening in directed evolution. Furthermore, HTP screening will remain important for the identification of novel biocatalysts from metagenome libraries.42,69 Until DNA synthesis becomes significantly cheaper than it currently is, functional metagenomics will remain essential for exploring the rich functional diversity of nature, which is more relevant to this review than UHTP screening of mutant libraries. Therefore, continued research and development is critical, and it is important that the basic techniques become more accessible to larger numbers of researchers. Commercialisation of key technologies would certainly facilitate this process. Affordable commercially available devices capable of generating and sorting droplets based on fluorescence measurements, combined with simple and user-friendly software, would significantly encourage more researchers to start working with droplet microfluidics.
This bottleneck has been addressed by the implementation of genetically encoded biosensors such as (allosteric) transcription factors (TFs) or riboswitches (Fig. 3). Although TFs have already long been used to construct inducible gene expression systems for different prokaryotic and eukaryotic hosts,76–79 their added value as biosensors for the detection of small molecules has only been recognised in the last decade.80–82 Since then, genetically encoded biosensors facilitated the directed evolution of enzymes83–86 and the engineering of (natural) metabolic pathways by the high-throughput detection of metabolites,87–89 as well as the dynamic regulation of genetic circuits to improve overall pathway performance, among other applications.71,80,90–94
Fig. 3 Genetically encoded biosensors. (A) (Allosteric) TFs which bind metabolites can act as transcriptional activators (shown in blue at the top) or repressors (shown in purple at the bottom). TFs can also recruit other activators or repressors to regulate the activity of RNA polymerase (RNAP; not shown).71 (B) Upon binding a ligand or metabolite, riboswitches can act on the levels of transcription and translation by the formation/resolution of a terminator hairpin (top) and the sequestration/release of the ribosome binding site (RBS, bottom, shown in yellow). |
To function as biosensors, TFs contain a ligand-binding domain (LBD) and a DNA-binding domain (DBD). The LBD detects the presence of a chemical compound in the environment or a metabolite inside the cell, whereas the DBD facilitates the association with the cognate nucleotide sequence or the dissociation of the TF upon binding of a ligand (Fig. 3A). TFs can act as transcriptional activators or repressors and have been successfully identified by gene expression and protein profiling in the presence of a desired small molecule through combined transcriptome and proteome analyses95–97 and the computer-assisted mining of databases.74,98,99 Furthermore, (microbial) TFs can be responsive to different but structurally related compounds, which has inspired both random mutagenesis and rational design of various LBDs and DBDs.100–105 For example, the TtgR regulatory protein from Pseudomonas putida was engineered by directed evolution and subjected to repeating rounds of FACS, yielding variants with enhanced response to resveratrol.102,103 The Keasling group employed a chemoinformatic approach inspired by small molecule drug discovery. By scouting catabolisable chemicals with molecular shapes similar to the metabolic engineering target and subsequent gene cluster analysis, the ChnR/Pb TF-promoter pair was identified as a suitable biosensor for lactams.106 Besides the identification of TF-based biosensors, improving their performance in terms of selectivity, sensitivity, and operational range can be challenging since additional regulatory elements in the 5′ and 3′ untranslated region (UTR) including (natural and synthetic) promoters, the context of RBS, and transcriptional terminators will influence the functionality of the biosensor.71,72,80 The operational range is defined as the concentration of the metabolite of interest (i.e., the input signal) required for the biosensor to provide a significant change in the output signal (e.g., fluorescence).89 It is not only affected by the attributes of TF including affinity for the ligand and the cognate DNA sequence and its concentration to saturate operator binding sites; expression levels of the reporter have to be carefully adjusted as well. Alternatively, the output signal can be amplified by the implementation of an enzymatic reporter.71,89 The dynamic range of the output signal can be expanded by TF, riboswitch, promoter, and RBS engineering.94,107–114 Curated databases for prokaryotic TFs, their associated regulatory elements and target genes may be helpful for initial designs but might not avoid the necessity of iterative rounds permutating different combinations of genetic parts.71,99,115,116 All these strategies are often time-consuming and results nonintuitive. Berepiki et al. addressed this issue and used a design of experiments (DoE) methodology to efficiently map gene expression levels and provide biosensors for protocatechuic acid and ferulic acid with maximised signal output, improved dynamic range, expanded sensing range and sensitivity.117
In contrast to the multicomponent design of TF-based biosensors, riboswitches comprise of an RNA aptamer – an oligonucleotide sequence with a length of 30–80 nucleobases located in the 5′- or 3′-UTR of mRNA – specifically binding a target molecule (Fig. 3B). Due to the physical proximity, binding of the ligand leads to a conformational change, which can directly affect the binding of ribosomes to RBS on the mRNA upstream of a reporter gene or facilitate the formation of a terminator.71,107 Although riboswitch-based biosensors seem to have a simple architecture and exclusively act at the post-transcriptional level, their rational design is still in its infancy due to the limited understanding of ligand-induced structural changes and the frequently encountered small operational window of riboswitches.118–120 Accessible sources for riboswitches are the RiboD,121 Rfam,122 and RiboGap123 databases, compiling information about prokaryotic riboswitches and their ligands, sequence alignments and conserved secondary structures, and intergenic regions harboring noncoding RNAs and Rho-independent terminators, respectively.
Recently, Calero et al. connected a synthetic fluoride-responsive riboswitch (FRS) to the induction of artificial metabolic pathways for the biosynthesis of fluoronucleotides and fluorosugars in engineered P. putida using inorganic fluoride as both the only fluorine source (i.e., substrate) and as the inducer of the genetic circuit.124 The FRS post-transcriptionally (Fig. 3B, bottom) binds fluoride ions, which triggers the translation of the orthogonal T7 RNA polymerase, subsequently enabling the T7 promoter-controlled production of fluorinases and a purine nucleotide phosphorylase.
Regarding the construction of artificial riboswitches, the systematic evolution of ligands by exponential enrichment (SELEX) has been successfully employed. During SELEX, a library of oligonucleotides specifically binding a target ligand or ligands are produced, selected, and enriched in vitro.73,125,126 Furthermore, natural riboswitches can be engineered like TFs and enzymes.127 Examples include the engineering of a set of riboswitch-based genetic devices to enable the control of gene expression according to changes in the environmental pH128 and the switching of a thiamine pyrophosphate-sensing riboswitch from a device for the repression of downstream genes to an activator.129 Lastly, the physicochemical stability of DNA can be used to detect natural products such as biotin, vitamin D, and folate at nanomolar levels by strand displacement reaction-based biosensors, which have been shown to exhibit increased sensitivity, low interference, and high controllability.130,131
The application of (small-molecule) biosensors and the development and engineering of new sensory devices is certainly of interest for different industries to meet performance criteria through the directed evolution of enzymes,132,133 for the optimisation of microbial cell factories,134–136 and the real-time monitoring of the production of target molecules137 including (aromatic) alcohols, aldehydes, and acids,112,113,138–141 precursors for the synthesis fatty acids and their derivatives,84,92,93,142–148 isoprene and terpenoids,149,150 steroids, as well as flavonoids. Biosensor systems for the last two will be highlighted in the following.
Steroids are polycyclic and highly functionalised compounds and their production is of high interest due to their broad significance as active pharmaceutical ingredients (APIs). However, the synthesis of steroids is demanding and often low-yielding and advanced bio-based procedures are desirable as addressed later in this review. The analysis of steroids usually involves time-consuming sample preparation and analysis by chromatographic methods that limits sample throughput and the efficient development of production strains. Consequently, the development of biosensors for steroids offers advantages to established methods. Mazumder and McMillen constructed a dual-mode promoter in yeast that comprises five steroid hormone responsive elements and one lac operator upstream and downstream of the TATA box, respectively, in a minimal cytochrome C promoter. This dual controller is activated by testosterone (see also Scheme 8) and repressed by IPTG.151 More recently, Chamas et al. created biosensors for the detection of estrogens, progestogens (see also Scheme 8), and androgens in Arxula adeninivorans yeast strains by coupling human hormone receptors and different fluorescent reporter proteins.152 A complementary approach was followed by Maser and Xiong and put into perspective of alternative steroid-sensing methods.153 Their Comamonas testosteroni steroid-sensor (COSS) system is based on the insertion of a green fluorescent protein (gfp) gene upstream of the regulatory region of the hsdA gene encoding a 3α-hydroxysteroid dehydrogenase/carbonyl reductase. Upon steroid exposure, GFP is produced. Disadvantages of the COSS assay were the high background of fluorescence observed in both cellular and cell-free assays. Lastly, the Galagan group identified a progesterone-sensing TF from Pimelobacter simplex by exposure of cultures to different steroids, subsequent RNA sequencing, and bioinformatic analysis. The allosteric TF was ultimately implemented into an optical biosensor consisting of quantum dots coated with the TF and oligonucleotides. The latter resemble the TF binding site and are conjugated to a fluorescence resonance energy transfer (FRET) acceptor. Upon ligand binding to the TF, the DNA probe is released and the FRET signal quenched, corresponding to the concentration of progesterone.154 Optical and other emerging strategies for the design of biosensors for the detection of natural products were recently reviewed by Piroozmand and co-workers.155
For similar reasons, the synthesis and sensing of flavonoids has gained attention in the last decades. Hence, studies aimed at the design of flavonoid-biosensors and the improvement of microbial production. Siedler and colleagues developed a FdeR-based biosensor for naringenin and a QdoR-based sensing device to detect quercetin and kaempferol in real-time.135 Recently, De Paepe et al. followed two strategies for the development of chimeric LysR-type biosensors with customised ligand specificities towards the flavonoids naringenin, apigenin, and luteolin. The first strategy involved the construction of chimeric promoter regions to tune TF binding; the second approach created chimeric TFs by engineering and customization of the LBDs.156 Although DBDs and linker sequences connecting them to the LBDs as well as chimeric TFs were constructed previously,101,105,157,158 the combination of both strategies certainly points towards the expansion of the repertoire of (chimeric) biosensors for the detection of flavonoids and other natural products.159
Thus, biosensors are a powerful tool not only for the engineering of enzymes and the set-up of HTS by monitoring the presence of metabolites in real-time; biosensor systems can time and precisely control the expression levels of pathway enzymes.71 Most of the selected examples of biosensor applications in living cells sensed target molecules in the cytosol. However, recent efforts have been made to sense natural products secreted into the extracellular environment as well.137,146,160–162 An elegant biosensor set-up was realised by Mukherjee et al. who coupled a medium-chain fatty acid (MCFA)-responsive G protein to a receptor on the cellular membrane, enabling the transduction of subsequent signals in the presence of extracellular MCFAs.147 Similarly, the group of Peralta-Yahya engineered a human serotonin G protein-coupled receptor to detect serotonin secreted by a serotonin producing yeast strain.163
Remaining challenges involve the contextualization of novel biosensor designs in terms of their operational range and the functional implementation in heterologous hosts, especially transferring prokaryotic TFs into eukaryotic hosts.164,165 The continuous advancements in bioinformatics and synthetic biology provide a solid foundation to discover TFs and riboswitches and their cognate natural genetic parts or rationally combine them with artificial regulatory elements. DoE methodologies have already shortened this process.117 Furthermore, the combination of TFs and riboswitches that complement each other's shortcomings have emerged as exemplified by Wang et al. who reported a hybrid controller consisting of a riboswitch-based detector and a protein regulator for compensating the low dynamic range of the riboswitch.111 Current and future strategies address feedback control and aim at synchronizing a cell population, reducing the metabolic burden, and balancing the expression of multiple pathway genes depending on the input signals.166–169 Ceroni et al. designed a dCas9-based feedback-regulation system in which the promoter automatically adjusts the downstream gene expression in response to burden167 and Liu et al. constructed quorum sensing-controlled CRISPRi systems, which can dynamically program bacterial consortia.166 To control multigene expression by one chemical signal, Cunningham-Bryant and co-workers reported a genetic controller that consists of catalytically inactive Cas9 and an RNA-binding protein fused to an inducible TF.168 These last examples not only highlight the versatility and variations of the CRISPR/Cas9 technology (which won the Nobel Prize in Chemistry in 2020) but showcase how far our understanding of biosensors as integral parts in regulatory networks and associated metabolic pathways has already advanced.
In the precursor synthesis pathway, the biosynthesis of p-CA mainly relies on phenylalanine or tyrosine as starting materials (Scheme 1A). Aromatic ammonia lyases catalyse the deamination of phenylalanine or tyrosine to form cinnamic acid or p-CA, respectively. Cinnamic acid can be further hydroxylated by cinnamate 4-hydroxylase (C4H) to obtain p-CA. Phenylalanine ammonia lyase (PAL) and bifunctional phenylalanine/tyrosine ammonia lyase (PTAL) have been identified in dicotyledonous plants and some monocots, respectively.177,178 Both have quite high activity towards phenylalanine, and the latter can also accept tyrosine but with lower affinity.177 Plant-derived PALs and PTALs are good biocatalyst candidates for the synthesis of p-CA,178 but may require chaperones for better expression in prokaryotic hosts, which is a limiting factor for their synthetic application.179 On the other hand, PAL, PTAL and monofunctional tyrosine ammonia lyase (TAL) have been reported in microorganisms, especially fungi and bacteria that produce antibiotic phenylpropanoids or utilize phenylalanine and tyrosine as carbon and nitrogen sources.177,180 Among them, the PTAL from Rhodotorula glutinis (RgPTAL) shows impressive activity towards tyrosine181 and thus has been used for the biosynthesis of several phenylpropanoids.182–184 Furthermore, a mutant of RgPTAL (S9N/A11T/E518V), obtained through random mutagenesis, probably anchors the flexible loop region (Glu325–Arg336) to maintain the active-site pocket opening which ensures easy access by tyrosine and thus significantly improved its activity and the yield of p-CA.185 The regioselectivity of ammonia lyases to phenylalanine and tyrosine is also affected by the amino acid residue that binds to the para-hydroxyl group on the benzene ring. Substitution of this position by polar residues can obviously increase the specificity of PAL towards tyrosine.186 Recently, two novel TALs were identified from actinomycetes and achieved the productivity of p-CA up to 2.88 g L−1 h−1) using recombinant Escherichia coli as a whole-cell biocatalyst, which currently represents the highest efficiency for microbial production of p-CA.187 Therefore, these microbial-derived ammonia lyases and their mutants have become the preferred enzymes for the synthesis of p-CA in the precursor pathway with significantly higher expression level, catalytic activity and p-CA yield.
In the central synthesis pathway, chalcone synthase (CHS) catalyses the synthesis of a chalcone, 2′,4,4′,6′-tetrahydroxychalcone (THC), from p-coumaroyl-CoA and malonyl-CoA (Scheme 1B).188 CHS is a plant-specific promiscuous type III polyketide synthase (PKS) which also produces other polyketides such as the p-coumaroyltriacetic acid lactone (CTAL). Therefore, how to improve its product specificity is of great significance for optimising metabolic flux and increasing flavonoid production. Very recently, a conserved strategy was uncovered by which non-catalytic chalcone isomerase-like proteins (CHILs), which are ubiquitous in plants, are able to bind to CHS a rectifier and increased the kcat value (2–15 times higher) for the THC production potentially through binding to the tetraketide-CoA intermediate in an energetically favourable manner, and thus enhance THC production and decrease CTAL formation.189 Since CHILs perform macromolecular interaction with other enzymes in plant specialised metabolism, this result brings us a revelation that protein–protein interactions could be widespread in the biosynthesis of natural products by a broader effect on promoting the activity and specificity of enzymes and the regulation of metabolic flux, which can provide an important tool for optimising heterologous biosynthesis.
Chalcone isomerase (CHI) catalyses the intramolecular cyclisation of THC and generates the flavanone naringenin which is a key intermediate for the structural differentiation to other flavonoids. Plant CHIs are considered to have evolved from fatty acid binding proteins,190 which shows the key role of protein evolution in modifying the catalytic mechanism of enzymes and broadening the source of novel enzymes by mutagenesis.191 According to substrate selectivity and catalytic mechanism, plant CHIs can be divided into type I and II. Both can accept 6′-hydroxychalcones as substrates, while the latter also has high activity towards 6′-deoxychalcones (Scheme 1C).192 Recently, the reaction mechanisms of enantioselective oxa-Michael cyclisation performed by type I and II CHIs have been revealed by X-ray crystal structure and molecular dynamics simulations, wherein the guanidinium ion of a conserved arginine positions the nucleophilic phenoxide and activates the electrophilic enone for cyclisation through Brønsted and Lewis acid interactions.193 This mechanism presents a new enantioselective Michael-type reaction in natural product biosynthesis that efficiently constructs C–O bonds. The crystal structure of type II CHI also revealed two unique water molecules in the active pocket which form an ordered hydrogen bond network with the polar amino acids in the pocket. This extended hydrogen bond network supports the role of ordered water in the destruction of the intramolecular interaction between ketone oxygen and 2′-OH and further provides a ring flip of 6′-deoxychalcone. Therefore, the catalytic efficiency towards 6′-deoxychalcone has been greatly improved.193 These results provide a theoretical basis for screening novel CHIs and broadening the substrate tolerance of CHIs through mutagenesis.
Besides in plants, CHIs also occur in some anaerobic intestinal bacteria as key enzymes for the degradation of flavonoids. The first bacterial CHI was isolated and cloned from Eubacterium ramulus (ErCHI), which has activity towards THC, isoliquiritigenin, butein, eriodictyol chalcone, and hesperetin chalcone (Scheme 1D).194,195 However, this bacterial CHI has no homology to plant CHIs and is even rare in protein databases, which shows its unique evolutionary origin.191 The protein structure of ErCHI consists of two ferredoxin domains as catalytic domains and a solvent-exposed domain.196 Unlike plant CHIs, the intramolecular cyclisation of chalcones is catalysed by bacterial CHI via a reversible Michael addition catalysed by histidine.196 Therefore, ErCHI can also catalyse the isomerisation of the flavanonol taxifolin to the auronol alphitionin.197 The study of these novel enzymatic mechanisms shows the impressive diversity of isoenzymes from various sources related to flavonoid metabolic pathways. In-depth studies on bacterial enzymes in the degradation pathway of flavonoids may enable them to replace plant-derived enzymes and enable the design of novel biosynthetic pathways for flavanones. For example, a flavanone- and flavanonol-cleaving reductase (Fcr) was recently identified from E. ramulus, which is an iron-sulfur flavoprotein containing an intramolecular electron transfer chain. It performs a cofactor-mediated hydride transfer from nicotinamide adenine dinucleotide (NADH) onto C2 of the respective substrate via flavin adenine dinucleotide (FAD), a 4Fe–4S cluster, and flavin mononucleotide (FMN), and further directly attaches the C2 of flavanones and flavanonols and cleaves the heterocyclic C-ring, which provides a novel pathway to synthesise dihydrochalcones and dihydroflavonols from flavanones and flavanonols, respectively (Scheme 1E).198
Hydroxylation and methylation greatly extend the structural differentiation of flavonoids. The hydroxylation mainly occurs on C3 of the A-ring and para- and ortho-positions of the B-ring catalysed by plant-derived flavanone 3-hydroxylase, 3′-hydroxylase, and 3′,5′-hydroxylase, respectively. Because of the low expression of P450 enzymes and the lack of effective electron transport systems in prokaryotic host cells, bacterial hydroxylases, such as an endogenous non-P450 hydroxylase complex from E. coli (HpaBC), have shown their advantages in cell factory construction and have been reported to additionally hydroxylate the ortho-position of the B-ring to achieve conversion of naringenin and afzelechin to eriodictyol and catechin, respectively, with high yields (Scheme 1F).199 Moreover, O-methylation of hydroxyl groups is a common modification of flavonoids catalysed by O-methyltransferases (OMTs) using S-adenosyl-L-methionine (SAM) as cofactor for providing the methyl group, which mostly takes place on the 7,3′,4′,5′-hydroxyl groups of flavonoids and the 7,4′-hydroxyl groups of isoflavonoids. OMTs have been widely found in plants, showing diverse substrate specificity and regioselectivity.200 Meanwhile, some flavonoid OMTs were also discovered in microorganisms, such as Bacillus and Streptomyces.201–203 Many OMTs have been recombinantly produced and used for the biosynthesis of flavonoids due to their superior chemo-, regio- and stereo-selectivity.204,205 However, the bulk demand of the methyl group donating cofactor SAM has hindered the industrial applications of OMTs.204 Therefore, in situ regeneration of SAM is one of the key factors affecting methylation biosynthesis. A early attempt of SAM regeneration was a complex SAM recycling cascade involving five additional enzymes on the basis of the physiological cycle of the metabolites in cells.206,207S-adenosyl-L-homocysteine (SAH) produced after transferring the methyl group is hydrolysed to adenosine and homocysteine by a SAH hydrolase, after which adenosine is sequentially phosphorylated by adenosine kinase, polyphosphate kinases 2 I and II, producing adenosine triphosphate (ATP). After that, SAM is reproduced from ATP and L-methionine by a methionine adenosyltransferase. Although this is a feasible way to regenerate SAM, such a long and energy-consuming coenzyme regeneration pathway is not suitable for biocatalytic methylation, at least in vitro. A newly established and more efficient SAM recycling system consists of only one enzyme, halide methyltransferase (HMT), which produces SAM directly from SAH with methyl iodide as donor (Scheme 1G).208 This novel cascade shows that a simpler cofactor regeneration system can be designed and realised by introducing non-natural donors and off-path tool enzymes, which helps to partly depart from the original metabolic pathway and simplify the biosynthetic pathways of natural products. In addition to methylation, the structural and functional diversity of flavonoids can be dramatically expanded via hydroxyl group bioalkylation with SAM analogues and promiscuous MTs. SAM analogues containing different alkyl substituents can be produced via chemical method or by a chemoenzymatic methods using L-methionine analogues catalysed by methionine adenosyltransferases or halogenases.209–211 A more advanced way is to explore promiscuous or engineered HMTs for the production of SAM analogues and to achieve flavonoid bioalkylation on the basis of the MT-HMT cofactor regeneration system.30,212 This artificial cofactor regeneration pathway provides a novel inspiration and solution for solving the problem of low efficiency of SAM regeneration in biosynthesis.
Prenylation is another structural modification for the functionalization of flavonoids catalysed by aromatic prenyltransferases (PTs). Plant-derived PTs generally have high regiospecificity, transferring the prenyl moiety on the C6 and C8 of the final flavonoid skeleton, as well as the C3′ of chalcones and the C3 and C5 of p-CA in the intermediate biosynthetic step (Scheme 1H).174 Recently, a novel di-PT was isolated from Artemisia capillaris which can accept p-CA as its specific substrate and transfers two prenyl residues stepwise to yield artepillin C.213 This is the first plant PT involved in the biosynthesis of phenylpropanes and capable to introduce multiple prenyl residues to native substrates with different regiospecificity. The plant-derived PTs are transmembrane enzymes. Due to the lack of high-resolution protein crystal structures, the substrate binding pocket and catalytic mechanism of PTs are currently unclear, which limits the protein engineering studies of PTs, such as widening the donor-binding pocket to accept longer chain prenyl donors and thereby broadening the diversity of product structures.174 Plant PTs prefer magnesium ions (Mg2+) to stabilize the pyrophosphate group of the donor. However, a recent study reveals that metal ions can change the substrate specificity of a flavonoid PT from Artocarpus heterophyllus (AhPT1). AhPT1 could catalyse 6-C-prenylation of genistein when Mg2+ served as cofactor but without any activity towards 6-hydroxyflavone. However, 5-C-prenylation of 6-hydroxyflavone was identified by AhPT1 when Mn2+ was used (Scheme 1I).214 This new discovery shows that metal ions play a key role in the substrate specificity, prenylation sites and catalytic mechanism of PTs, rather than just stabilizing the donor. Besides, the prenylation products on the O-site have also been found in plants. However, O-specific PTs have not been discovered yet, which revealed that O-specific PTs might have no homology with the C-specific ones.174 Therefore, the intelligent analysis of genomic, proteomic and metabolomic data could most likely bring new opportunities for the discovery of PTs with O-specificity. In addition, soluble PTs from bacteria show their catalytic capability towards flavonoids and prenylation specificity. For example, indole PT 7-DMATS from the fungus Aspergillus fumigatus accepted chalcones, isoflavonoids, and flavanones, and mainly catalysed prenylation at C6, while another indole PT, AnaPT, prefers prenylations at C6 or C3′ of flavanones and isoflavones (Scheme 1J). These fungal PTs have replaced plant-derived PTs for the heterologous biosynthesis of prenylated flavonoids.215 In addition, dimethylallyl diphosphate (DMAPP) is the preferred donor for PTs and is synthesised through the mevalonate (MEV) pathway and the methylerythritol phosphate (MEP) pathway in vivo. Ensuring an adequate donor supply is one of the limiting conditions of prenylation. Besides the optimisation of the natural donor synthesis pathway, one step phosphorylation of dimethylallyl alcohol by acid phosphatase and isopentenylphosphate kinase with ATP as high-energy phosphate donor offers a simplified pathway to improve the efficiency of prenylation.216
Glycosylation is a major structural modification for flavonoids to increase their solubility, reduce toxicity, and improve bioavailability. Glycosylation takes place mainly on the multi-hydroxyl groups of the flavonoid structure (3-OH, 5-OH, 7-OH, 3′-OH, 4′-OH and 5′-OH) with glucose, mannose or galactose and their 6-deoxy derivatives, arabinose, apiose, and xylose (Scheme 1K). Some dideoxyhexosides, such as pyranoside and bovino pyranoside, have also been reported as sugar moieties.217 In addition, the carbon atoms of the benzene ring can also be glycosylated to form C-glycosides. Glycosylation is mainly catalysed by glycosyltransferases (GTs), which generally have high regio- and stereo-selectivity towards donors and acceptors. Therefore, mining novel GTs in whole genomes and the CAZy database (http://www.cazy.org) via bioinformatic methods is the main concept to find novel enzymes with specificity towards flavonoids.218 Meanwhile, protein engineering has been carried out on GTs to excise the transmembrane domain of GTs to improve soluble expression, optimize the substrate binding pocket to extend substrate scope and improve the efficiency of glycosyl transfer, and to reduce the flexibility of enzyme structures to improve the stability of GTs.218 The use of transglycosylation activity catalysed by glycosidases is another way to achieve the O-glycosylation of flavonoids. Recently, an amylosucrase obtained from Deinococcus geothermalis (DgASase) exhibited its unique transglycosylation activity towards various hydroxyflavones and hydroxyflavanones with high site specificity at the 6-OH and 4′-OH positions, leaving the 3-OH and 7-OH positions unchanged.219 This provides a reference for catalytic mechanism and glycosylation site specificity for predicting and screening more glycosidases. On the other hand, as more C-glycoside flavonoids have been discovered from plants and show great medicinal potential, C-glycosylation has become a hot topic in the study of flavonoid glycosylation. Some C-GTs derived from plants and fungi were cloned and confirmed to catalyse the C-glycosylation of flavonoids at positions C6 and C8 (Scheme 1K).220–225 Very recently, a promiscuous C-GT from Aloe barbadensis was identified to be capable of C-glycosylating scaffolds lacking an acyl group. With dihydrochalcones as substrates, di-C-glycosylation can occur at the C6 and C8 positions.226 Remarkably, a promiscuous C-glycosyltransferase from Trollius chinensis can accept multiple structures of flavones, flavonols, flavanones, flavanonols and dihydrochalcones, and introduce a glycosyl moiety at the C8 position. Meanwhile, it showed O-glycosylation activity on the C7 position when the C8 is already substituted by methoxyl or prenyl moieties.220 The study of the catalytic mechanism and site mutagenesis at two positions (I94E and G284K) switched its C- to O-glycosylation, which provides an important reference for the rational design and directed evolution of C- to O-GTs for synthetic purposes.220
As natural products with the most extensive pharmacological activity, the potential medical use of flavonoids has recently been expanded to treat infections by the coronavirus.227 The discovery of each enzymatic step in the natural synthesis of flavonoids and the replacement of designed de novo enzymatic reactions/cascades are completing the map of heterologous flavonoid synthesis. Under the guidance of the Design–Build–Test–Learn (DBTL) concept and the application of ML,227 the construction of in vitro biosynthesis and establishing new cell factories for flavonoid syntheses provides an efficient biosynthesis program for natural flavonoids and their novel structural derivatives.
Benzylisoquinoline alkaloids (BIAs) are one of the most important plant alkaloids derived from tyrosine or phenylalanine. Because of several important drugs, including morphine, codeine, berberine and noscapine, the biosynthesis pathways of BIAs have been intensively investigated and almost all the key steps were elucidated in opium poppy recently (Scheme 2A).237 In brief, dopamine and 4-hydroxyphenylacetaldehyde (both derived from tyrosine) undergo a Pictet–Spengler reaction with a Norcoclaurine synthase (NCS) to yield (S)-Norcoclaurine, the first committed intermediate in the BIAs biosynthesis. (S)-Norcoclaurine is subjected to a hydroxylation and three methylation steps to form (S)-reticuline, the pivotal intermediate of many BIAs. The further synthesis of key BIAs branches here. For the synthesis of berberine and noscapine, a Berberine bridge enzyme (BBE) catalyses the oxidative C–C bond formation to give (S)-scoulerine, which is further transformed into berberine or noscapine via enzymes from a 10-gene cluster in opium poppy.238 For the synthesis of morphine, (S)-reticuline is epimerised to (R)-reticuline by a unique (S)- to (R)-reticuline epimerase (STORR) with a fused cytochrome P450 monooxygenase (CYP) domain and a reductase domain, which was revealed by three research groups in 2015.239–241 (R)-Reticuline is transformed to salutaridine via a C–C phenol-coupling mediated by salutaridine synthase (SalSyn), and then further converted to thebaine and morphine via several tailoring enzymes.242 The almost full elucidation of enzymes in the biosynthetic pathway of several BIAs has significantly facilitated the engineering of fast-growing microbes for efficient heterogenous production.243
In 2015, the Smolke group reported the complete biosynthesis of opioids from glucose in yeast, a major milestone in heterogenous BIA production.240 In total, more than 20 enzymes from plants, mammals, bacteria, and yeast were over-expressed to access thebaine and hydrocodone. Although the final titres of thebaine and hydrocodone were only on the 0.3–6.4 μg L−1 scale, it represents a ground-breaking advance for the total biosynthesis of complex natural products. One year later, the total biosynthesis of opiates was also achieved via stepwise conversion using four engineered E. coli strains giving thebaine in 2.1 mg L−1.244 For the other branch of BIAs, the Smolke group reported the first total biosynthesis of the anticancer drug noscapine (2.2 mg L−1) in yeast via expression of >30 enzymes from various sources.245 In addition, by feeding 3-halogenated tyrosines, the yeast produced several 8-halogenated (S)-N-methylcoclaurines and (S)-reticulines. A much more practical synthesis of the BIA intermediate was reported very recently: 4.6 g L−1 of (S)-reticuline was successfully produced from sugar via extensive engineering of yeast and using more efficient key enzymes (e.g., NCS).246 Furthermore, by feeding dopamine and different L-amino acids (precursors for aldehydes) to a simplified version of yeast, an array of non-natural tetrahydroisoquinolines (THIQs) were produced, illustrating the broad substrate scope of NCS and methylation enzymes.
Besides engineering BIA pathways in heterogenous hosts, many enzymes (especially the C–C bond forming NCS and BBE) in the biosynthetic pathway could be engineered and evolved for the in vitro synthesis of novel THIQs.247–249 NCS is well-known for its broad scope for accepting different aldehydes to give THIQs. In 2017, Lichman et al. discovered that the TfNCS from Thalictrum flavum catalysed the Pictet–Spengler reaction between dopamine and ketones, leading to novel chiral 1,1′-disubstituted and spiro-THIQs (Scheme 2B).250 The 1,1′-disubstituted THIQ was featured with a chiral quaternary carbon centre, which is challenging to form in organic chemistry and unattainable through imine reductases (IREDs) or monoamine oxidases (MAOs).251 Several variants of TfNCS were explored to ensure high conversion and preparation of these unique THIQs. The Ward and Hailes groups continued to explore TfNCS and its variants for kinetic resolution of α-methyl aldehydes leading to (1S,1′R)-THIQs with two chiral centres in a single step (Scheme 2C).252 The broad scope (aldehyde and ketone), enantioselectivity and mechanism of NCS were investigated and explained in a recent quantum chemical study.253 Furthermore, NCS was combined with other enzymatic or chemical transformations as in vitro (chemo)-enzymatic cascades for the synthesis of THIQ analogues: (i) a carboligase-transaminase-NCS cascade to access chiral 1,3,4-trisubstituted THIQs;254 (ii) an NCS-catalysed Pictet–Spengler reaction and Na2CO3-mediated cyclisation to afford (S)-trolline;255 (iii) a network of tyrosinase, decarboxylase, transaminase and NCS for efficient synthesis of several natural and non-natural BIAs.256 Besides NCS, another important C–C bond forming enzyme, BBE, has been explored for synthetic purposes, such as preparation of (S)-scoulerine and its analogues via kinetic resolution257 or deracemization258 of the corresponding THIQs, and enantioselective dealkylation of N-ethyl THIQs.259
Given the importance of the chiral THIQ scaffolds, many other biocatalytic approaches (besides NCS and BBE in the BIA pathway) have also been developed. One facile approach is direct asymmetric reduction of chemically synthesised 3,4-dihydroisoquinolines (DHIQs) with natural IREDs260,261 or artificial transferhydrogenases.262,263 Many natural IREDs were able to enantioselectively reduce DHIQs with a 1-methyl- or simple 1-alkyl substituent.264 To produce chiral bulky 1-aryl-THIQs, the Qu group assayed a large number of diverse IREDs and found several (R)-selective IREDs and one unique (S)-selective IRED (IR45) converting chloro-, methyl-, and methoxyl-benzyl DHIQ into the corresponding (R)- or (S)-THIQ in high-to-excellent conversions and optical purities.265 To access plant-sourced alkaloids, they further engineered IR45 to improve its activity and combined it with coclaurine N-methyltransferase (CNMT) to achieve the one-pot synthesis of five N-methyl THIQ alkaloids (Scheme 2D).266 The THIQ analogue, 1-benzyl-1,2,3,4,5,6,7,8-octahydroisoquinoline (1-benzyl-OHIQ), is an important synthon for synthetic morphinan drugs (Scheme 2E). Recently, the Zhu group identified two IREDs with complementary enantioselectivity to produce (S)- or (R)-1-benzyl-OHIQs in high optical purity and yield from the corresponding imines.267
Another approach for accessing chiral THIQs is via enantioselective oxidation with MAO.268,269 Although the natural substrates for MAO are usually small primary amines, the Turner group had pioneered in engineering MAO-N from Aspergillus niger (A. niger) for bulky secondary and tertiary bulky amines, such as 1-phenyltetrahydroisoquinoline (1-phenyl-THIQ, Scheme 2F).270 By enantioselective oxidation with MAO-N D11 and simultaneous reduction with BH3-NH3, racemic 1-phenyl-THIQ was deracemised to (S)-1-phenyl-THIQ (a precursor for Solifenacin) in excellent optical purity and yield. Reetz et al. simultaneously engineered the entrance tunnel and active site of MAO-N for efficient deracemization of several 1-substituted THIQs.271 Recently, the Hilvert group applied a UHTP microfluidic assay for single-round remodelling of a cyclohexylamine oxidase (CHAO).54 A highly active CHAO variant was obtained for the synthesis of several (S)-1-substituted THIQs via deracemization. For the 1- and 3-carboxyl-THIQs, a D-amino acid oxidase (DAAO) was successfully employed for deracemization.272 Furthermore, by combining DAAO-catalysed oxidation and a reductase (DpkA)-mediated reduction, the Wu group developed a fully biocatalytic deracemization process to produce (S)-1-carboxyl-THIQs in excellent enantiomeric excess (e.e.) and yield (Scheme 2G).273 This is similar to a previously developed MAO-artificial transferhydrogenase system for deracemization of simple THIQs.274
Another very famous natural THIQ alkaloid, Colchicine, is a potent microtubule inhibitor used for the treatment of inflammatory disorders as well as a research tool for many years. Early feeding studies on Colchicum plants suggested its biosynthesis from tyrosine and phenylalanine,275 but most of the enzymes remained mysterious until very recently. The Sattely group applied metabolomics, transcriptomics and heterologous expression to fully elucidate the near-complete biosynthetic pathway of colchicine in Gloriosa superba (Scheme 3).276 In brief, dopamine (from Tyr) and 4-hydroxydihydrocinnamaldehyde (from Phe) were condensed to a 1-phenethylisoquinoline scaffold, which undergoes methylations and hydroxylations to (S)-autumnaline. Next, P450-catalysed phenol coupling created the bridged tetracyclic isoandrocymbine, which was further subjected to methylation and a unique P450-catalysed oxidative ring expansion to generate N-formyldemecolcine with the hallmark tropolone ring. Further N-modifications gave colchicine. In this ground-breaking study, they not only elucidated the enzymes, but also reconstituted the pathway to N-formyldemecolcine in heterologous Nicotiana benthamiana.
Scheme 3 Novel enzymes, enzymatic mechanisms and cascades for the synthesis of Colchicine. NCS: norcoclaurine synthase; P450: cytochrome P450 monooxygenase. |
Monoterpenoid indole alkaloids (MIAs) are another very important class of alkaloids, including the anti-cancer drugs vincristine, camptothecin, anti-arrhythmic ajmaline, and anti-malarial quinine. The key intermediate of MIAs, strictosidine, is constructed from tryptophan-derived tryptamine and monoterpenoid secologanin via a C–C bond-forming strictosidine synthase (STR).277 This key intermediate undergoes different transformations to several sub-classes of MIAs (Scheme 4A). In the pathway to vincristine and vinblastine in Catharanthus roseus, the conversion of strictosidine to tabersonine and catharanthine was still mysterious (it involves many unstable intermediates) until very recently, the final missing enzymes were fully elucidated by two groups.278–280 Tabersonine is further converted by seven enzymes to vindoline, which is coupled with catharanthine to form vinblastine and vincristine by a peroxidase. Although the main research focus of MIAs was still identification of enzymes and elucidation of pathways in native plants, several studies managed to reconstitute parts of the pathway in yeast. The O’Connor group introduced >20 different genes (including 14 from the MIA pathway) into yeast, and de novo produced the key intermediate strictosidine at ∼0.5 mg L−1.281 For the downstream part, the De Luca group discovered seven enzymes and reconstituted the pathway from tabersonine to vindoline (up to 2.7 mg L−1) in yeast.282 It is still very challenging to reconstitute the whole pathway of complex MIAs in yeast, yet several halogenated derivatives of MIAs have been cleverly accessed by introducing bacterial tryptophan halogenases into the hairy root culture of C. roseus.283
Scheme 4 Novel enzymes, enzymatic mechanisms and cascades for the synthesis of monoterpenoid indole alkaloids. STR: strictosidine synthase. |
The strictosidine synthase (STR) in the biosynthesis pathway of MIAs has recently been explored in the synthesis of 1-substituted tetrahydro-β-carbolines (THBCs) by the group of Kroutil. Different from the natural (S)-strictosidine produced from tryptamine and secologanin, replacement of secologanin with several simple aliphatic aldehydes produced (R)-1-alkyl-THBCs in medium to high optical purity by several STRs (Scheme 4B).284 The STR-reaction was combined with a chemical reduction to achieve a facile two-step synthesis of (R)-harmicine. The switch of enantioselectivity was explained as the inverted binding of short-chain aliphatic aldehydes in STR through a structural and computational study.285 They further employed a substrate-walking strategy to engineer the STR to accept benzaldehydes and produce (R)-1-aryl-THBCs.286 Besides STR, IREDs have been used to produce simple 1-methyl-THBCs from the corresponding imines287 and MAO-N mediated deracemization has been explored to access a variety of chiral 1-substituted-THBCs.288
Scheme 5 Novel enzymes, enzymatic mechanisms and cascades for the synthesis of terpenoids. MEP: methylerythritol phosphate; MVA: mevalonate. |
Complementary to the biosynthesis of terpenoids in microbes, cell-free in vitro multi-enzymatic synthesis avoids many problems in cells (e.g., toxicity, complex regulation), minimizes side reactions and genetic engineering efforts, and often offers higher yields of final products in a cleaner reaction system.305–308 These advantages were clearly demonstrated in the pioneering work on the cell-free one-pot production of monoterpenes from glucose by the Bowie group.309 Simply combining standard Embden–Meyerhof–Parnas glycolysis and the MVA pathway leads to imbalance of cofactors: the glycolysis generates ATP and NADH (excess), while the MVA pathway consumes ATP and NADPH. The authors cleverly tackled this issue by creating enzymatic purge valve nodes:310 an NAD+-utilizing glyceraldehyde-3-phosphate dehydrogenase (GAPDH), an NADP+-utilizing mutated GAPDH and an NADH oxidase. Mathematic modelling was used to identify the potential bottlenecks (hexokinase, pyruvate dehydrogenase, and phosphate), and these key parameters were experimentally optimised. With careful consideration and optimisation, the final in vitro system comprising 27 enzymes converted glucose (500 mM) to limonene (12.5 g L−1) with a theoretical yield of 88% over 7 days. By replacing the terpene synthase, pinene and sabinene were also produced at 14.9 and 15.9 g L−1 and almost quantitative yields, respectively. These product titres are at least 10 times higher than those from microbial production, far exceeding the toxicity limits of these compounds. In vitro multi-enzymatic synthesis has been reported for other terpenoids, such as the production of amorphadiene from mevalonic acid (6 steps with ATP recycling)311 and the production of geosmin and patchoulol from acetic acid (10 steps with cofactor regeneration).312 Cell-free in vitro multi-enzymatic synthesis has made great progress recently, but it is still difficult to work on some complex and difficult enzymes (e.g., membrane enzymes, P450s), and large-scale application is hampered by relatively high costs (e.g., enzyme purification, cofactors).
Currently, the biosynthetic pathways for the majority of natural terpenoids have not been fully elucidated. Thanks to the advances in genomic sequencing and bioinformatics, many putative terpenoid synthetic enzymes could be identified in silico. To verify and characterize these putative enzymes, heterologous expression in suitable (engineered) hosts could often enable efficient production of terpenoid products for characterisation. For rather simple and small bacterial terpenes, heterologous expression in E. coli is often sufficient for rapid and facile characterisation.313,314 While for more complex plant-origin terpenoids, heterologous expression of the enzymes in yeast or plants is necessary.315–317 For example, plant diterpene labdanes and clerodanes are often synthesised by a pair of distinct monofunctional class I and class II diterpene synthases (diTPSs). By mimicking the modular diterpene biosynthesis, the Hamberger group tested every combination of 9 class I and 11 class II diTPS from 10 plant species in N. benthamiana by A. tumefaciens-mediated transient expression.318 51 Diterpene skeletons were stereo-selectively biosynthesised, including 41 new-to-nature ones. By engineering S. cerevisiae, four useful diterpenes were produced at a scale relevant for industrial applications. To quickly access highly diverse oxygenated plant triterpenes (>20000 reported so far),319 the Osbourn group developed a translational synthetic biology platform based on transient expression in the whole plant of N. benthamiana (Scheme 6).320 Initially, a feedback-insensitive version of an HMG-CoA reductase (tHMGR) in the MVA pathway was found to significantly boost the production of β-amyrin when co-expressing with the oat β-amyrin synthase (SAD1) in N. benthamiana. To provide a gram-scale synthesis of triterpenes, the authors developed a vacuum agro-infiltration system for transient expression in the whole plant rather than individual leaves. Co-expression of tHMGR and SAD1 in about 460 N. benthamiana plants and cultivation for 5 days allowed successful isolation of 800 mg of β-amyrin with >98% purity. This platform (tHMGR and SAD1 in N. benthamiana) was combined with one or a pair of five different β-amyrin-oxidising P450 enzymes to offer 41 different oxygenated triterpenes (some new-to-nature). A handful of them were isolated on 10 mg scale and further evaluated for antiproliferative and anti-inflammatory activities. Besides the proper functional expression of plant enzymes, another advantage of transient expression in N. benthamiana is that multiple genes can be co-expressed by simply co-infiltrating multiple A. tumefaciens strains. This feature allows a quick test of combinations of enzymes either to generate new products (as shown in the two examples above) or to elucidate the biosynthetic pathway, which was demonstrated in a recent study of root triterpenes in A. thaliana.321
Merochlorin A and B are common C4-prenylated meroterpenes, while certain vanadium-dependent haloperoxidases mediate an α-hydroxyketone rearrangement, leading to naphthomevalin with a unique C3-prenylation pattern (see also Section Meroterpenoids, Scheme 14).322 These enzymes were recently employed for total enzymatic syntheses of antimicrobial and cytotoxic meroterpenoids, napyradiomycin A1 and napyradiomycin B1 from 1,3,6,8-tetrahydroxynapthalene, GPP and DMAPP (Scheme 7).323 By applying two aromatic prenyltransferases (NapT8 and T9) and two vanadium-dependent haloperoxidases (NapH1 and H3) in one pot, napyradiomycin A1 was synthesised in 22% yield. With the addition of vanadium-dependent haloperoxidase NapH4, napyradiomycin B1 was synthesised in 18% yield.
Besides the vanadium-dependent haloperoxidases, other halogenating enzymes324–327 could also provide plenty of opportunities in the enzymatic syntheses of terpenoids (and other natural products) as well as their derivatives.
Traditionally, steroidal APIs have been synthesised through chemical processes that are characterised by the requirement for multiple sequential steps that offer only poor control over the stereo- and regioselectivity and very low yields.337 In the 1950s, the corticosteroid hormone cortisone was synthesised from the bile acid (BA) deoxycholic acid (DCA) over 31 steps with a yield of 0.16%. By including a fermentation step with the fungi Rhizopus arrhizus and A. niger, the number of chemical steps could be reduced to 11, markedly reducing production costs (Scheme 8).338 To date, many typical steroidal APIs are manufactured chemically but involve microbial biotransformations for the preparation of key intermediates335,337,339 or the late-stage functionalisation of steroids.340–343 The latter regularly involves stereo- and regioselective hydroxylations by P450s and will be described with focus on recombinant applications below.333,340,344,345
Although microorganisms have long been used for precursor synthesis and steroid modifications, fully microbial processes featuring efficient platform strains or recombinant microbial cell factories are scarce. Limitations are directly caused by the intrinsic properties of steroidal compounds such as low solubility in aqueous media and cellular toxicity.29 Some of these issues have been addressed by the emulsification of substrates with surfactants or the utilization of two-phase systems with organic solvents.333,339,346 Furthermore, steroid transformations in well-characterised recombinant host cells (e.g., E. coli, S. cerevisiae) regularly yield low product titres simply because steroid-modifying enzymes only poorly express or function outside their native hosts, combined with insufficient substrate uptake and unintended metabolism.75,335,347
In the following, the bio-based synthesis of steroid precursors and (recombinant) functionalization will be highlighted as well as trends that project towards the de novo synthesis of steroidal APIs and their customization for future applications.
Important substrates, in the context of microbial production of API precursors, are sterols.330,335 This sub-group of steroidal compounds, bearing a 3-hydroxyl group, include cholesterol, lanosterol, phytosterol, ergosterol, and BAs like DCA and lithocholic acid (LCA; Scheme 8). Sterols are intermediates of both anabolic and catabolic steroid pathways and are available from plant-based and animal feedstocks for biotechnological applications.335,348–352 Various strains from the genera Mycobacterium, Nocardia, and Rhodococcus were identified to transform sterols and, importantly, have been metabolically engineered to reroute fluxes towards the accumulation of value-added steroids. Examples feature the production of C-19 steroids such as 4-androstene-3,17-dione (AD), 1,4-androstadiene-3,17-dione (ADD), and testosterone in mutant strains of Mycobacterium smegmatis (M. smegmatis) starting from cholesterol. Single and multiple gene deletions including kstD and ksh, encoding 3-ketosteroid-Δ1-dehydrogenase and 3-ketosteroid-9α-dehydrogenase, accumulated AD and ADD, respectively; the heterologous overexpression of 17β-hydroxysteroid dehydrogenases (HSDHs) from different (bacterial) sources successfully converted AD into testosterone.339,353
Whereas the biotransformation of sterols into AD requires multiple reaction steps carried out by endogenous host enzymes, targeted functionalization usually involves one-step transformations by the activity of a single (engineered) enzyme. Industrially relevant steroid modifications are Baeyer–Villiger oxidations,354,355 hydrogenation and dehydrogenation of CC and C–C bonds, respectively,337,356–358 and alcohol/carbonyl group interconversions.359–362 Several wild-type organisms are used at industrial scales to perform these functionalisation reactions.333,335,363
Of particular interest are direct hydroxylations of inert C–H bonds as carried out, for example, by P450s, a superfamily of heme-containing enzymes.333,344,345,364,365 P450s can execute an impressive variety of other reactions340,366–368 and were even engineered to perform a set of ‘new-to-nature’ reactions.369–372 To direct the biological activity of steroid drugs, mainly their stereo- and regioselective hydroxylation activities are of interest.344 To name two, a hydroxyl function at position 11β is required for the anti-inflammatory activity of cortisol and prednisolone373 and the presence of two hydroxyl groups – 1α and 25α – is essential for the biological activity of vitamin D derivatives.374 Integral parts to customise hydroxylation activities have been the many well-established protein engineering techniques, exemplarily highlighted in the following.
The CYP enzyme P450BM3 from Bacillus megaterium (CYP106A2) was amongst the first bacterial steroid hydroxylases characterised375 and hydroxylates multiple pharmaceutically relevant steroids including cortisol, progesterone, and testosterone predominantly at the 15β position.376 P450BM3 is a self-sufficient CYP and, as such, does not require additional redox partner proteins for the transfer of electrons required for catalysis.364 It has been the target of numerous protein engineering studies (see section Terpenoids), not only to enhance the physiological 15β-hydroxylation activity but to invert stereoselectivities and shift regioselectivity. The group of Reetz has published thorough research on these topics, heavily employing directed evolution strategies including the combinatorial active site saturation test (CAST)377 and iterative site saturation mutagenesis (ISM).7,378,379
In two prominent examples, CASTing was used to transform the previously identified P450BM3 F87A mutant,380,381 which hydroxylates testosterone at the positions 2β and 15β with low selectivity, into biocatalysts with nearly perfect regioselectivity. Variants with the additional mutations A330W (KSA-1) and R47Y/T49F/V78L/A82M (KSA-14) catalysed the 2β- and 16β-hydroxylations of testosterone with 97% and 96% regioselectivity, respectively.382 In a subsequent study, ISM – based on information from mutability landscapes, molecular dynamics simulations, and X-ray crystallography – was used to generate P450BM3 variants with exquisite regio- and stereoselective hydroxylation activities for testosterone and four other steroids at the C16 position.383 Whereas the mutant WIFI-WC (combining the mutations R47W/S72I/A82F/F87I and Y51W/L181C from two distinct libraries after three rounds of ISM) produced 16α-hydroxy testosterone with 96% stereoselectivity, WWV-QRS (combining R47W/A82W/F87V and L181Q/T436R/M177S) produced the 16β-stereoisomer with 92% selectivity.
Most recently, the group of Wong demonstrated the crucial roles of glycine mutations in P450BM3 for different substrate binding orientations, resulting in a variant library capable of hydroxylating AD and testosterone, for example, at a wide range of positions (C1, C2, C6, C7, C15, and C16) with up to 97% selectivity.384 Very recently, Li et al. created P450BM3 mutants with 7β-hydroxylation activity towards testosterone and related steroidal compounds.385 Previously, CYP106A2 had only been described to yield the 7β-products from the steroids pregnenolone and dehydroepiandrosterone.376,386 The resulting compounds and their derivatives are considered to act as neuroprotective and anti-inflammatory agents to treat neuronal damage after stroke or trauma.387 Regarding the hydroxylation at position C7, BAs that predominantly occur in the bile of mammals,388,389 have also moved into the focus due to their clinical significance.390–392 Again, their synthesis requires tedious multi-step chemical procedures that suffer from low yields and poor control over regioselectivity.333,393,394 The synthesis of the BA ursodeoxycholic acid (UDCA) is no exception. Cholic acid (CA)351,361 or chenodeoxycholic acid (CDCA)360 were suggested as precursor molecules and the biocatalytic epimerisation of CDCA to UDCA at C7 further shortened the synthesis route and enhanced yields.395,396 However, apart from certain filamentous fungi such as Fusarium equiseti,343,397 direct hydroxylations, especially in recombinant systems, have not been described until very recently.
The CYP107D1 from Streptomyces antibioticus (OleP), which physiologically catalyses an epoxidation step in the oleandomycin biosynthesis pathway,398,399 hydroxylates testosterone at the positions 6β, 7β, 12β, and 15β.400 In contrast, BAs like LCA are hydroxylated exclusively at the 6β-position.401 Grobe et al. engineered OleP based on a semi-rational directed evolution approach and generated a triple-mutant (F84Q/S240A/V291G) with nearly perfect regioselectivity for the 7β-position.29 Hits after directed evolution were identified by a colorimetric HTP assay, which is based on the activity of a 7β-HSDH, specifically oxidizing the 7β-OH of UDCA to the corresponding ketone.359 The reaction also yields NADPH, which reduces a dye and results in an increase in absorption dependent on the concentration of UDCA.29 The assay principle offers an easy-to-implement alternative to time-consuming chromatographic methods that are currently employed to verify the success of P450 engineering approaches. Noteworthy, the heme group in CYPs has been used as ‘intrinsic chromophore’ in HTP screenings to identify potential CYP substrates (and inhibitors). Binding of a ligand causes the spin shift of the heme iron that can be detected as a signal spectrophotometrically.402–404
Although these selected examples certainly highlight the power of directed evolution to engineer CYPs to execute highly desired steroid modifications, they are typically far from industrial applications due to low yields (2% isolated yield after LCA to UDCA transformation by the best OleP mutant in E. coli co-expressing putidaredoxin and putidaredoxin reductase as redox partner proteins)29 and/or low substrate loads (1 mM testosterone for different hydroxylations in E. coli by self-sufficient P450BM3 variants).383,385 None of these studies addressed the optimisation of CYP enzyme production in vivo apart from precursor supplementation for heme production.29 Khatari et al. adjusted the stoichiometry of CYP260A1 from Sorangium cellulosum and the redox partner proteins adenoredoxin reductase (AdR) and adenoredoxin (AdX) and showed that CYP260A1 hydroxylates 11-deoxycorticosterone (11-DOC) at high ratios of the redox partners (e.g., CYP260A1:AdR:AdX = 1:3:10) mainly at the C1α-position.405 At lower ratios (CYP260A1:AdR:AdX = 1:3:5), also C1–C2-ene-11-DOC was produced in vitro. A high ratio (CYP260A1:AdR:AdX = 1:3:20) and additional recycling of NADPH mainly formed 1α-,14α-dihydroxy-11-DOC in an E. coli whole-cell biocatalyst.405,406 Besides cofactor recycling,383,407 an increase of gene copy numbers combined with genomic manipulations (integrations347,408 and deletions339,353) of target genes has been realised.335 Whereas enzyme and redox partner stoichiometry can be easily controlled in vitro, this is certainly challenging in vivo70 but should definitely be considered in future applications of CYPs. However, Khatari et al. only reached conversions up to 80% at very low substrate loads (0.2 mM of 11-DOC).405 These – to say at the least – modest performances of biocatalytic syntheses are a current and future challenge, not only of recombinant processes for steroid modifications.372,409
To date, only two recombinant processes yielding steroidal APIs have been implemented industrially. Shi et al. developed a process that converts CA to 12-oxo-CDCA, a key precursor for chemoenzymatic synthesis of UDCA, in a single step with very high productivity (68 g L−1 h−1). The responsible enzyme, a 12α-HSDH from Rhodococcus ruber, was identified using a structure-guided genome mining approach and is applied as lyophilised E. coli whole-cell powder during the process, yielding 12-oxo-CDCA.362,409
The second example is the exploitation of a metabolically engineered S. cerevisiae strain harboring an artificial biosynthetic pathway consisting of four mammalian P450s. The heterologous cascade reaction yields cortisol from glucose by mimicking human steroid biosynthesis (Scheme 8).410,411
The lack of steroid-modifying enzymes as bottleneck has been overcome with the continuous discovery of new P450s from microbial but also eukaryotic sources.362,412–417 Recently, Szaleniec et al. reviewed P450s for the degradation of cholesterol (CYP125 and CYP142 family) and steroid hydroxylations (CYP106A, CYP109, CYP154, and CYP260), as well as Rieske-type monooxygenases, 3-ketosteroid 9α-hydroxylases, and molybdenum-containing steroid C25-dehydrogenases as alternative (bacterial) steroid hydroxylases.344 The impressive successful engineering of CYPs complements the steroid hydroxylations found in nature. Together with the other useful enzymatic reactions described above, steroidal APIs have been accessed through both the application of wild-type strains and recombinant systems, yielding anti-inflammatory cortisol410,411 and prednisolone,337,373 the sex hormones testosterone and progesterone,339,353,385 derivatives thereof exhibiting neuroprotective functions,376,386,418,419 the value-added BA UDCA,29,395 as well as the biologically active (1α,25-dihydroxylated) forms of vitamin D2420,421 and D3,374,422 and many more.333,335,344,346,365,423,424 Although the optimisation of these bio-based processes has been addressed by the emerging tools from synthetic and systems biology, steroids remain ‘tough’ substrates, intermediates, and products due to their low solubility in aqueous media and cellular toxicity. However, this trend is rapidly changing since new microbial chassis for the biotransformation and functionalization of steroids including P. putida,425 different Rhodococcus sp. and related mycobacterial strains are emerging.335,426Corynebacterium glutamicum, for example, has beneficial properties for steroid biocatalysis such as efficient transport of steroidal compounds, high stress tolerance, and potentially interfering metabolic pathways are missing.427 Lastly, genetic tools have become readily available for these strains428,429 and the revolutionary CRISPR/Cas9-based recombineering tool, accelerated genomic manipulations.430,431 Hence, the development of potent microbial cell factories for the customization of steroidal APIs has never looked brighter.
Polyketides are an excellent example of the central theme of this review, which is to demonstrate that it is still difficult to achieve the total enzymatic synthesis of natural products without the original host organism. In this case, the reason is not merely the complexity of host metabolic pathways, but also the extreme complexity of the megasynthases themselves. Their sheer sizes make cloning and standard DNA manipulations complicated. It also makes the proteins very hard to express and fold in heterologous hosts like E. coli.436 Polyketide synthase engineering is a very promising field of study but is restricted by the same technological limitations. Furthermore, it is extremely challenging to determine the structures of complete assembly-line polyketide synthases due to their large sizes and the often-weak protein–protein interactions between modules.437–439 Overcoming the hurdles to designer polyketide synthases would be a clear sign that new trends of biocatalysis have emerged. In this section we review recent trends in PKS engineering, suggesting that there is hope despite the decades of failing to deliver on the promise of on-demand designer polyketides.
The modular Type I PKSs are commonly described as “assembly-line” complexes because each module sequentially adds a unit to the growing product so that the sequence of functional groups in the final polyketide depends on the sequence of PKS modules.434,435,437,442 Each module of an assembly-line PKS consists of at least a ketosynthase, an acyltransferase, and an acyl-carrier protein domain (Fig. 4C).443 The acyltransferase domain of a loading module first transfers an acyl group, usually from acetyl-CoA, to the phosphopantetheinyl arm of its acyl-carrier protein. The ketosynthase domain of the downstream module catalyses both translocation of the acyl group and carbon–carbon bond formation by a decarboxylative Claisen condensation of the translocated acyl group with a malonyl derivative (bound to the acyl-carrier protein as a thioester). This reaction forms a 3-ketoacyl intermediate and releases carbon dioxide, which thermodynamically drives the process.437 This intermediate may then be reduced, and further functionalised by an optional “reducing loop” composed of a keto-reductase, a dehydratase, and an enoyl-reductase.444 All these reactions are stereospecific, and the ketoreductase domains determine the stereo-configuration of the α- and β-carbon atoms of the resulting product molecules.442,444 Some modules also have methyltransferase domains and repetitions, with variation in each elongation cycle, produces polyketides of great diversity.435 Finally, a thioesterase domain cleaves the polyketide from the terminal acyl-carrier protein, often cyclizing the product to form a macrolactone.434,437,442,443
Fig. 4 Traditional and updated assembly-line polyketide synthase module definitions. (A) Four different perspectives on the venemycin assembly-line PKS. (1) The synthase can be divided into two polypeptide chains called VemG (232 kDa) and VemH (140 kDa). (2) The PKS can also be viewed in terms of its domains. The adenylation domain (A) accepts the 3,5-dihydroxybenzoyl starter unit. The inactive ketoreductase domain (KR0) probably plays a structural role. The ketosynthase (KS), acyltransferase (AT) and acyl carrier protein (small circle) domains are the core components of PKSs. The thioesterase domain finally releases the polyketide from the synthase, often by cyclisation to form a macrolactone. (3) In the traditional view, the module boundaries are the N-terminus of the ketosynthase domain and the C-terminus of the acyl carrier protein domain. (4) In the updated definition, modules end at the C-termini of ketosynthase domains, reflecting the evolutionary co-migration of domains. (B) The venemycin and pikromycin assembly lines depicted using the new module definitions. The assembly-line steps and products are also shown. The pikromycin assembly line additionally includes dehydratase (DH) and enoylreductase (ER) domains. Note that modules can be split over different polypeptide chains. For example, Vem Mod2 is split between the VemG and VemH proteins. (C) The functions of the different catalytic domains exemplified by Pik Mod5, which has a full set of reductive domains. The domains are represented by spheres coloured by module as in (B). The process starts with a tetraketide intermediate attached to an acyl carrier protein, which is transferred to a cysteine residue on the ketosynthase domain. An acyltransferase domain loads the downstream ACP with an acyl-CoA-derived extender unit, in this case methylmalonyl-CoA. The ketosynthase catalyses decarboxylative condensation of the tetraketide intermediate and the extender unit to form an ACP-linked pentaketide intermediate. This intermediate is then subjected to reductive reactions by the ketoreductase, dehydratase, and enoylreductase domains (the carbonyl subjected to reduction is coloured green). All three reactions are optional so that other modules may stop at either the β-keto, β-hydroxy, or α,β-alkene intermediates. The resulting ACP-linked intermediate is substrate to either the next KS domain or the terminal thioesterase, which usually results in cyclisation. This figure was simplified and redrawn from the Miyazawa et al. and Smith et al.440,441 |
The rich chemical and structural diversity of naturally occurring polyketides can often be enhanced by structural fine-tuning.437,442 Chemical modification of natural polyketides is possible, but often inefficient, and selectivity is hard to achieve.434 However, successful modification of the biosynthetic machinery itself would enable diversely modified analogues to be produced. Therefore, scientists have been trying to reprogram the modular architecture of assembly-line PKSs since they were discovered in the 1990s.442 Not only should it be possible to produce targeted polyketide modifications, but libraries of ‘unnatural natural products' could also be created and screened for novel activities, if only we could insert, delete, or swap out individual PKS modules at will.434,445 Unfortunately, this combinatorial assembly approach is not straight forward since the chimeric assemblies are usually catalytically impaired and the production of structural analogues of medically relevant polyketides by genetic/protein engineering is still a major challenge.434,442,446,447
Interactions between domains and modules seem to be as important to the activity and fidelity of assembly-line PKSs as enzyme-substrate interactions.448 Therefore, in addition to the typical protein engineering challenges like changing substrate specificity, delicate protein–protein interactions must be maintained. Klaus et al. studied chimeras with modules from the erythromycin, rapamycin, and rifamycin PKSs. Analysis of bi-modular chimeras revealed that turnover rate correlated with efficiency of the intermodular chain translocation, which depended on interactions between the ACP and downstream KS domains. These results demonstrate that more efficient engineering of domain-domain interactions could significantly facilitate the generation of highly productive chimeric PKSs.440,448 Difficulties in engineering the protein–protein interactions necessary for modules to functionally interact seem to be largely responsible for the limited success of PKS engineering.442,448
Adding to the difficulties of designing functional protein–protein interactions is the fact that we do not know much about the overall structures of entire multi-modular PKSs.440,442,446 Assembly-line polyketide synthases are some of the largest and most complex protein structures known. Their several-megadalton sizes are probably their most striking attributes.435,437,442 Protein–protein interactions between noncovalently attached modules can be rather weak, making it hard to isolate and structurally characterise entire complexes.437–439 Interestingly, catalytic modules of assembly-line PKSs are observed in both extended and arched conformations but until very recently it was unknown whether these conformations influence catalytic activity. Khosla's group used a high-affinity antibody to lock a PKS in the extended form, which retained catalytic activity.449 Only recently did Dutta et al. and Whicher et al. use cryoelectron microscopy to determine the structure of an entire pikromycin synthase module and key stages of its catalytic cycle.437,450,451 These developments will facilitate future rational design and engineering endeavours.
The modern protein engineering approaches that work so well for predominantly monomeric/independent enzymes are hard to apply to megasynthases, where rational design is essentially impossible due to the lack of structural information. As noted in an excellent review by Khosla's team, the rich natural diversity of modular PKSs is even more astonishing considering how hard it is to engineer these proteins in the laboratory.442 It seems like understanding and mimicking natural evolutionary mechanisms is currently one of the most promising PKS engineering strategies. Natural PKS evolution depends on functional PKSs resulting from domain exchanges. Therefore, there is a growing interest in understanding the ‘natural splice points' which could accelerate rational engineering. Analysis of many PKS systems has suggested that the KS-AT linker is a natural splice site, making it an attractive target for engineering by homologous recombination.442,452,453 Peng et al. showed that hybrid aureothin and neoaureothin synthases were more active when modules were split at the KS-AT linker than when split at the traditional ACP-KS interface, confirming that the KS-AT linker is a good fusion site for module swapping.442,446,454 These findings are in line with recently updated module definitions that place the module boundaries between the KS and AT domains. Evolutionary models based on gene duplication suggest that the unit of duplication would be the KS-AT-ACP module, which is functionally required for chain elongation and matches the boundaries of single-module PKSs.442 However, Zhang et al. recently compared four very large aminopolyol-producing PKSs (each 25–30 modules long or the size of about five ribosomes). They observed that the co-migrating module consisted of AT-DH-ER-KR-ACP-KS domains rather than the traditional KS-AT-DH-ER-KR-ACP module (Fig. 4).433,455 Importantly, this new module definition has led to some promising results. The lower activities of synthases engineered using traditional boundaries seems to result from weaker interactions between acyl carrier proteins and the downstream KS units that do not co-migrate in natural evolution.440,446 Miyazawa et al. reconstituted the venemycin PKS, a short aromatic polyketide-producing assembly line, in vitro. Venemycin production was achieved by incubation of the polypeptides VemG and VemH with the substrate 3,5-dihydroxybenzoate and ATP, malonate, coenzyme A, and the malonyl-CoA ligase MatB for malonyl-CoA production. Venemycin could be isolated on the milligram scale, without the need for chromatography, from dialysis reactors which also enabled enzyme recycling.440 They performed assembly line engineering using the venemycin and pikromycin synthase modules and demonstrated that chimeric synthases designed using the updated module definitions outperformed those based on traditional module boundaries by over an order of magnitude (Fig. 5).434,440 Peng et al. used genome mining to identify nine homologous biosynthetic gene clusters encoding assembly lines for aureothin and neoaureothin, two compact and highly functionalised polyketides produced by homologous biosynthetic gene clusters. They successfully morphed the neoaureothin assembly line to produce the aureothin backbone by deletion of two modules. They found that the KS-AT linker is well suited for both insertion and deletion of modules, further supporting the alternative domain definition.446
Fig. 5 Hybrid polyketide synthases constructed using both the updated and the traditional module definitions. (A) The products of three hybrid assembly lines. (B) Three assembly lines constructed using the updated module definition are over an order of magnitude more productive (red numbers) than (C) assembly lines constructed using the traditional modules. Despite the success of assembly lines based on the new module definitions, the native VemG-VemH assembly line is still more than double as active (36 min−1). The domain abbreviations are as for Fig. 4 and again the acyl carrier proteins are represented by small circles. This figure was simplified and redrawn from Miyazawa et al.440 |
Despite advances in modelling natural evolutionary processes and computational prediction of optimal splice sites, a major barrier to successful engineering has been a lack of experimental data for guiding optimal selection of splice sites for generating functional chimeric PKSs.447,450,451,456,457 Experimentally accelerated molecular evolution based on homologous recombination is a recently introduced strategy for gaining valuable information on optimal splice sites for functional chimeras. Homologous recombination based on naturally occurring stretches of sequence similarity can be used for the assembly of novel chimeric PKSs, similar to natural PKS evolution.442 Chemler et al. used homologous recombination between the erythromycin (DEBS) and pikromycin coding sequences in Saccharomyces cerevisiae to generate hybrid libraries containing many functional chimeras.456 This method has the potential for generating large libraries rich in functional variants. Wlodek et al. recently described a method for adding, removing, and replacing modules, based on recombination between regions of high sequence homology within a PKS gene cluster. Rather than using yeast, they harnessed the homologous recombination machinery of a Streptomyces strain, rapidly generating diverse and highly productive assembly lines by 'accelerating' the natural evolution of modular polyketide synthases.447 They generated 17 rapamycin synthase and 9 tylactone synthase chimeras, many of which were highly active, producing titres comparable to the wild-type strain.442,447 Sherman's group described the use of PKS modules in vitro to convert a chemically synthesised thiophenyl-activated analogue of the hexaketide intermediate of tylactone biosynthesis. The intermediate was accepted by the ketosynthase of the JuvEIV PKS module and further processed by the JuvEIV and JuvEV modules to form tylactone. Macrolactonization was followed by in vivo glycosylation, in vitro P450-mediated oxidation, and chemical oxidation, resulting in the total synthesis of a range of macrolide antibiotics from the juvenimicin, M-4365, and rosamicin family.458 Analogues of tylactone intermediates accessed by homologous recombination-based genetic engineering could be valuable alternative starting points for chemoenzymatic late-stage modification, enabling structural diversification to an even larger number of macrolide antibiotics.447,458
Khosla's group recently reported the use of in vitro reconstitution to decode the orphan polyketide assembly line responsible for producing the nocardiosis-associated polyketide (NOCAP). The reconstituted PKS, in the presence of octanoyl-CoA, malonyl-CoA, NADPH, and SAM, produced octaketide and heptaketide products that could partially be structurally elucidated by MS and NMR.459 More recently, they reconstituted the entire NOCAP assembly line both in vitro and in E. coli to fully “deorphanise” the NOCAP synthase, independent of its genetically challenging and hazardous natural host.460 These approaches for studying multi-megadalton assembly lines are by far not standardised. For example, to overcome heterologous expression problems due to the exceptional size of the NOCAP PKS (3 MDa homodimer), multi-modular proteins had to be dissected into mono- or bimodular units that could be more easily expressed in E. coli.438,460 Optimisation or modifications of polyketide synthesis in the native host strain is often desirable because the gene cluster is already functionally expressed. However, as the NOCAP case demonstrates, the hosts may be hazardous or otherwise hard to culture and manipulate. One of the key advantages of E. coli is that its metabolic background is not cluttered by complex natural product biosynthesis pathways, avoiding crosstalk with heterologously introduced PKSs and facilitating the identification of novel polyketides. The metabolism and molecular biology of E. coli are also well understood, and the genetic toolkit is unrivalled.436 The E. coli genome has not only been extensively sequenced but also completely recoded.461,462 Therefore, future advances in systems level understanding could be rapidly translated into genomically-reprogrammed hosts. Despite the advantages,443,460 only a few assembly-line PKSs have been functionally reconstituted in E. coli, because E. coli is not an ideal “universal host”.436 Nontrivial engineering is necessary for biosynthesis of precursors and performing critical post-translational modifications. No assembly-line PKS has yet been completely deorphanised solely by reconstitution in E. coli. This achievement would revolutionize natural product discovery if robustly implemented.436
As in other protein engineering endeavours, the number of PKS variants interrogated is limited by the throughput of the available screening methodologies. Chromatographic analyses are time consuming but due to the complexity of the molecules, simple colorimetric or fluorometric assays are not typically applicable to PKS screening.434 While biosensors for detecting enhanced precursor (e.g., malonyl-CoA) formation have been reported,143,463 relatively few high-throughput methods for polyketide products are available. Kasey et al. recently showed that engineered variants of the promiscuous erythromycin-sensing transcription factor MphR could be used to detect related ligands like clarithromycin. This work demonstrated the potential of engineered biosensors to facilitate the directed evolution of macrolide synthases, but little work on this topic has been published recently.71,434,464 While biosensors are promising tools for HTS, they are often limited to naturally occurring biosensors. Unfortunately, biosensor engineering is challenging in itself, effectively doubling the engineering effort if an off-the-shelf biosensor is not available.465 However, for a problem as complex as PKS engineering, it might well be worth to first engineer a biosensor for screening PKS libraries. Rapid advances in both protein structure prediction466 and de novo design of bioactive protein switches165,467 might make biosensor design much simpler in the near future. Unfortunately, these advances cannot entirely solve the problem, since in many cases the structure of the target polyketide will not be known in advance, making activity-based screening indispensable. While some (e.g., antibiotic) activities are relatively easy to screen for using agar plate or microfluidic droplet-based UHTP screening technologies, others are not. In an interesting approach to screen for proapoptotic compounds, Theodorou et al. mixed bacteria in microfluidic droplets with mammalian cells, which could be assayed for apoptosis markers.468 For the foreseeable future, these complicated screening problems will have to be solved on a case-by-case basis.
Yuzawa et al. recently reported engineered strains of Streptomyces albus harbouring engineered hybrid polyketide synthases capable of converting plant biomass to methyl and ethyl ketones at titres of over 1 g L−1.469 While these product titres are too low to compete with fossil fuels, they demonstrate that engineered hybrid polyketide synthases can produce g L−1 titres, which is significant for more valuable pharmaceutical compounds. Probably the greatest challenge for the field is increasing production rates to industrially competitive values since both time and titre are relevant to the space-time yield. The ability to rationally exchange assembly line modules has long been one of the holy grails in biochemistry. While linking a loading domain to more than one extension module without losing functionality is still a major challenge, recent trends in evolution-guided engineering strategies suggest that there is hope for the future. Updated module definitions (Fig. 4 and 5), supported by high-throughput experimental recombination and characterisation of chimeras and machine learning, may well make this goal attainable.470 Advances in DNA synthesis and assembly coupled with automated and unbiased PKS characterisation is necessary to realise the full potential of machine learning in PKS research. The hurdles to designer polyketide synthases are expected to be overcome taking advantage of the newly arising tools available for biocatalysis.
The successful total biosynthesis of islatravir is a textbook case for the design of a new synthetic route based on the salvage/degradation pathway, enzyme screening and modification assisted by protein engineering, and synthetic pathway optimisation. This will certainly inspire the total biosynthesis of other drug compounds in the pharmaceutical industry.
By integrating OA, GPP, and cannabinoid synthetic modules, Keasling and his colleagues realised the construction of a cannabinoid synthetic pathway in yeast. The obtained recombinant yeasts can synthesise cannabinoids and their non-natural derivatives up to a yield of 8 mg L−1 starting from galactose and fatty acid derivatives.473 However, the low supply of prenylation donors and the toxicity of product and intermediates to recombinant cells are still the barriers against achieving a high yield of cannabinoids by whole-cell conversion.
Therefore, Bowie and his colleagues chose a cell-free platform for the prenylation of natural products and application to cannabinoid production.474 The cell-free prenylation system contains glycolysis, acetyl-CoA, MEV, and cannabinoid modules, and involves 25 enzymes. The synthesis of the prenylation donor GPP was achieved by adding glucose from an external source and the prenylated products were synthesised by adding a variety of aromatic substrates. Constructing natural enzymatic pathways in a cell-free system often encounters problems of metabolic flow equilibrium, feedback inhibition, and low activity of enzymes towards non-natural substrates. By introducing a purge valve to allow carbon flux to continue through the glycolysis pathway without building up excess NADPH, a pyruvate dehydrogenase bypass to eliminate the inhibition by intermediates, and engineered PT (NphB) for improved activity and regioselectivity, a final yield of 1.25 g L−1 was achieved for CBGA. However, the shortcomings of this synthesis system are obvious: the 25-step enzymatic cascade is too complex, and expensive OA or DA are required as one of the starting materials.
In a recently reported cell-free system,475 the enzymatic synthesis pathway is designed into four modules: isoprenoid, aromatic polyketide, ATP regeneration, and cannabinoid modules. Isoprenoid was used as a starting material for the synthesis of GPP through a four-step enzymatic cascade (Scheme 10), which greatly shortens the original 23-step GPP synthesis pathway starting from glucose. Inexpensive acetyl-phosphate (AcP) acts as a phosphate donor for ATP regeneration, and at the same time as CoA transfer medium to generate malonyl-CoA through the catalysis of a phosphotransacetylase and a malonate decarboxylase α-subunit. This non-natural CoA transfer cascade greatly improves the efficiency of malonyl-CoA and further OA/DA synthesis. In the cannabinoid module, an efficient, water-soluble CBGA synthase replaced the natural membrane enzyme to directly prenylate OA/DA. In the end, the optimised cell-free system uses only 12 enzymes with low-cost organic acids as starting materials to produce CBGA and cannabigerovarinic acid (CBGVA) with a yield of 0.5 g L−1. This successful example demonstrates the unique advantages of cell-free systems for the enzymatic in vitro biosynthesis, including complete flexibility in pathway design, rapid design-build-test-learn cycles, precise control of all system components, and circumventing the toxicity of products and intermediates to cells.475 Meanwhile, this case also proves that careful selection of starting materials, introduction of heterologous synthetic pathways and isoenzymes, and simplification of the synthesis pathways of each module can effectively increase the supply of raw materials, improve the efficiency of coenzyme regeneration, and thus achieve product output with high yield.
The biosynthesis of enterocin and wailupemycin in S. maritimus uses benzoic acid and malonyl-CoA as starting materials and is catalysed by the enc II type polyketide synthase complex (Scheme 13). First, EncN catalyses an ATP-dependent activation and transfers benzoate to the acyl carrier protein EncC. Then, the benzoyl unit migrates from EncC to the ketone synthase heterodimer EncA-EncB. The resting EncC is malonated under the catalysis of malonyl-CoA:ACP transacylase (FabD), followed by a Claisen condensation reaction between benzoyl and malonyl units. This reaction process is repeated six additional times to produce an octaketide, which is then reduced by the ketoreductase EncD to obtain an intermediate common to enterocin and wailupemycin. This intermediate can be self-cyclised to generate wailupemycin D-G. However, in the presence of EncM, the linear polyketide intermediate is oxidatively converted in a Favorskii-like reaction to form the enterocin tricyclic scaffold. Finally, this scaffold is methylated by the O-MT EncK and hydroxylated by the cytochrome P450 hydroxylase EncR to produce enterocin. Interestingly, EncN has a wide substrate scope and is able to accept structural analogues of benzoic acid. As a result, 24 unnatural 5-deoxyenterocin and wailupemycin F and G analogues were successfully synthesised in vitro by adding halogenated benzoic acids, phenolic acids, and thiophene acids as starting materials.480 The use of unnatural starting materials may bring toxicity to cells in whole-cell transformation. However, an enzymatic total synthesis system in vitro does not have this disadvantage. Therefore, it is convenient to synthesise natural product derivatives by trying starting materials with different structures, which brings unlimited possibilities for expanding the activity and application of natural products.
The design of the heterologous biosynthetic pathway of hyoscyamine and scopolamine basically refers to the natural plant synthesis pathway. Through the mining of plant transcriptome data, the successful identification of the key enzymes AbPYKS, AbCYP82M3, AbUGT, AbLS, AbCYP80F1, DsHDH and DsH6H was the prerequisite for constructing a complete synthetic pathway. In addition, screening orthologues and introducing microorganism-derived enzymes are able to improve the efficiency of the single-step reaction and product yield. However, the enzymes involved in the original synthesis pathway are expressed in different tissue cells in plants and various subcellular structures in plant cells. Due to lacking the corresponding differentiated cells and subcellular structures, it is a great challenge to construct such complex enzymatic cascades in most heterologous microbial cells. In this influential biosynthesis example, enzymes are expressed in the cytoplasm, mitochondria, peroxisome, vacuole, and membrane of vacuole and endoplasmic reticulum, respectively, according to their protein transmembrane structure, cofactor regeneration and electron transport chain required for activity. This strategy of ‘regionalised’ expression not only ensures the activity and function of the enzyme, but also brings convenience to the regulation of synthetic pathways. The final product titres were reported as 30 to 80 μg L−1, hence extensive further optimisation is needed to reach grams per litre. Nevertheless, this wonderful case illustrates that the integration of classic biocatalysis and synthetic biology is constantly advancing the construction of complex enzymatic synthesis pathways in engineered whole cells and improving the research achievements of enzymatic total synthesis to a higher level.
Until recently, only four enzymes were known that used light energy directly to drive small molecule conversions.488 These enzymes – including photolyases involved in DNA repair – have not been synthetically applied. The characterisation of a fatty acid decarboxylase489 found in algal lipid metabolism complements this ensemble of natural photoenzymes and now facilitates novel reaction cascades, such as the biofuel production from triglycerides (Scheme 16A).490,491 In addition, engineered variants are able to carry out the kinetic resolution of α-hydroxy carboxylic acids,492 and the unnatural amino acid phosphinothricin, with high stereoselectivity.493
One approach to induce photocatalytic non-natural activities in enzymes relies on the ability of the naturally occurring nicotinamide cofactors and flavines to facilitate light-driven redox reactions that lead to radical intermediates not observed in reactions proceeding in the ground state. This possibility was explored in pioneering studies published by the Hyster lab that facilitates preparation of α-chiral lactones – a structural motif found in drug molecules such as Artemisinin and the psychotropic terpenoid Salvinorin A. Instead of constructing the chiral lactone from a chiral acid precursor, racemic α-brominated lactams are used as the starting material: a keto reductase (KRED) was shown to facilitate its enantioselective dehalogenation when irradiated with blue light (Scheme 16B).494,495 After excitation, NAD(P)H transfers a single electron to the substrate. This weakens its carbon-halogen bond and leads to loss of bromide with the subsequent formation of a prochiral radical intermediate. As the complex is still bound in the enzyme's active site, H˙ can now be delivered from NADPH+˙ in a stereoselective manner to form the chiral lactone product. Importantly, this reaction sequence is initiated while the cofactor and halogenated lactone are bound in close proximity inside the active site: only the resulting charge-transfer (CT) complex is excited at the chosen wavelength. As such a CT complex is not formed in solution, a potentially non-selective background reaction is avoided. Stereocomplementary KREDs were identified yielding the enantiomers of nine additional product analogues with up to 96% e.e. Racemic starting materials are employed, but interestingly, 85% yield clearly outperforms a classical kinetic resolution: the enzyme does not discriminate between the substrate enantiomers, and dehalogenation leads to the same central prochiral radical. However, the hydrogen atom delivery from the nicotinamide occurs only from one face and thus is highly stereoselective. This makes it possible to transform both enantiomers of the halolactone substrate to one enantiomer of the lactone product.
Based on similar principles, photocatalytic dehalogenations generated radicals suitable for cyclisation and intermolecular hydroalkylation reactions in the active sites of double bond reductases (“ene reductases”) as protein scaffolds,496–499 where a flavin mononucleotide acts as photoredox catalyst for electron and hydrogen transfer reactions. One example is the construction of β-chiral lactams, a motif found, for example, in the antiepileptic drug brivaracetam. In the conversion of α-chloroamides, the dehalogenated intermediate adds in an intramolecular addition to a double bond, and two stereocentres can be set in this way (Scheme 16C) with up to 98% e.e. and 84% d.e.499 This route differs from the conventional construction of a lactam by building and cyclizing a γ-amino acid precursor. Interestingly, this radical addition to a double bond is also possible in an intermolecular manner: in this case, a quaternary charge transfer complex suitable for excitation is formed between the protein, FMN hydroquinone, α-chloroamide, and the alkene substrate (Scheme 16D).497 Due to its modularity regarding the choice of different substrates, this hydroalkylation approach is a valuable tool for asymmetric C–C bond formation and creates a stereocentre in the γ-position of the amide function. In addition to α-halogenated ester or amide substrates, the photocatalytic approach was recently also expanded to N-alkyl-iodides, which yield unstabilised radical intermediates after the initial dehalogenation (Scheme 16E).498 All of the above-mentioned reactions require a cofactor regeneration, as reduction equivalents from FMN or NADP(H), delivered to radical intermediates during the reaction. A different mechanism is the redox-neutral cyclisation496 shown in Scheme 16F for the production of oxindoles. Here, the light-driven first step generates the flavine semiquinone form of FMN in the active site at the expense of a sacrificial tricine buffer molecule. This opens the possibility to efficiently catalyse one-electron redox reactions, which do not occur in the ground state of the ERED. The formed oxindoles feature a quaternary stereocentre and are obtained in high optical purity and up to 95% yield from the racemic starting material. Compared to the above reactions, inclusion of photoorganoredox catalysts that have strong binding to proteins can unlock further promiscuous reactions: the rose bengal radical anion, which can be formed outside of the protein by light illumination, can strongly associate with a double bond reductase and thus inject its electron into an α-acetoxyketone located in the active site (Scheme 16G).500 Here, the role of the protein is to attenuate the redox potential of the ketone substrate by a hydrogen bonding interaction to its carbonyl group. This ensures that the single electron transfer from the photocatalyst only occurs to the enzyme-bound ketone substrate and not in the bulk reaction solution, and thus facilitates the exclusive asymmetric H-atom delivery from the enzyme-bound nicotinamide cofactor. The obtained α-chiral ketones or amides can serve as valuable synthons suitable for further functionalization.
For the construction of more complex molecules, it is a plausible next step to combine photocatalytic reactions with further enzymatic conversions in a one-pot sequential fashion or even as one-pot concurrent cascade. The feasibility of this approach has been demonstrated very recently,501 and a couple of examples are available from the last two years. In most cases, the photocatalytic step generates a (reactive) intermediate that is then converted by an enzyme in the second step.487,488 Exploited photoreactions include e.g., oxyfunctionalisation of alkanes502 or alcohols503 by the water-soluble photocatalyst sodium antraquinone sulfate to yield ketones or aldehydes. Many different enzymatic reactions can then be coupled as the second step, yielding amines and cyanohydrins, amongst others. However, the one-pot two-step protocol was more efficient in terms of yield compared to the concurrent cascades. True concurrent cascades have been realised e.g., by the oxyfunctionalisation of 2-arylindoles to indol-3-ones, which can be alkylated with ketones by employing the promiscuous activities of lipases (Scheme 17A). In this way, a quaternary carbon stereocentre was constructed – a structural motif occurring in natural products such as (−)-isatisine A and (+)-hinckdentine A.504 The same Ru(bpy)3 catalyst was also used to generate a reactive but sufficiently stable radical intermediate, which could be reduced enzymatically to provide optically active pyridines, a privileged moiety in physiologically active compounds, such as the antihistamine dexchlorpheniramine:505 First, light irradiation of vinyl pyridines yields the neutral benzylic radical in the reaction solution, which then diffuses inside an enoate reductase (ERED) to get reduced to the optically active hydrocarbon (Scheme 17B). In this way, a new reactivity of EREDs was unlocked, as double bond reduction only occurs in enones, enoates, and nitroalkenes.
Many natural products and pharmaceuticals contain (multiple) stereocentres. If asymmetric synthesis cannot be used for setting the stereocentre, kinetic resolutions using stereoselective enzymes have been realised in many cases, but yields are often limited to 50%. A racemization step can thus substantially increase the synthesis efficiency, and photocatalysts have been shown to facilitate the desired racemization. cis/trans-photoisomerization of a CC double bond is one of the classical photocatalysed reactions and was one of the first examples coupled with an enzymatic stereoselective reduction. As the used ERED only converts the E-enantiomer, the cascade increased the efficiency of the reaction by facilitating near complete substrate conversion.501 One particular challenge is that many physiologically active molecules contain static, remote stereocentres related to a functional group, which should be targeted by the enzyme. One example is the class of 3-substituted ketones (Scheme 17C). A clever combination of an organo- and a photocatalyst facilitates racemization of the ketone enantiomers and highly selective enzymes were identified that convert preferentially one of the ketone enantiomers.506 This yielded chiral alcohols or amines featuring both excellent enantiomeric and diastereomeric ratios and could be employed for the synthesis of the drug candidate LNP023. These few exemplary cascades demonstrate the potential of combining photo- and biocatalytic steps and we anticipate a vivid exploration in this research field.
Footnote |
† Equal contribution. |
This journal is © The Royal Society of Chemistry 2021 |