M.
Winn
,
J. K.
Fyans
,
Y.
Zhuo
and
J.
Micklefield
*
School of Chemistry and Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK. E-mail: Jason.micklefield@manchester.ac.uk
First published on 24th December 2015
Covering: up to July 2015
Nonribosomal peptides are amongst the most widespread and structurally diverse secondary metabolites in nature with many possessing bioactivity that can be exploited for therapeutic applications. Due to the major challenges associated with total- and semi-synthesis, bioengineering approaches have been developed to increase yields and generate modified peptides with improved physicochemical properties or altered bioactivity. Here we review the major advances that have been made over the last decade in engineering the biosynthesis of nonribosomal peptides. Structural diversity has been introduced by the modification of enzymes required for the supply of precursors or by heterologous expression of tailoring enzymes. The modularity of nonribosomal peptide synthetase (NRPS) assembly lines further supports module or domain swapping methodologies to achieve changes in the amino acid sequence of nonribosomal peptides. We also review the new synthetic biology technologies promising to speed up the process, enabling the creation and optimisation of many more assembly lines for heterologous expression, offering new opportunities for engineering the biosynthesis of novel nonribosomal peptides.
Nonribosomal peptides are biosynthesised by large, modular, multifunctional enzymes known as nonribosomal peptide synthetases (NRPS) (Fig. 2). Each module within an NRPS is responsible for the incorporation of a single building block into the final polypeptide structure. Since every incorporated amino acid requires a specific module, nonribosomal peptide synthetases can be extremely large enzymes. For example, the single NRPS responsible for cyclosporine A assembly in Tolypocladium niveum is 1.6 MDa in size.2 In general, NRPS modules in bacteria tend to be distributed over a number of smaller subunit proteins which associate into a larger multi-enzyme system.
Major insights into the substrate specificity of NRPS domains came when the first structure of an adenylation (A) domain was determined. The structure of the phenylalanine activating A domain from GrsA, an NRPS involved in gramicidin S synthesis, was solved in complex with AMP and L-phenylalanine.3 In this structure the active site residues, responsible for binding the substrate Phe, were identified thus enabling the NRPS specificity code to be deciphered. This allows the prediction, with fairly high levels of accuracy, of the cognate substrate of a module.4,5 In addition to the 21 proteinogenic amino acids, NRPS modules can also incorporate unusual, non-proteinogenic, amino acids including D-amino acids. Hybrid NRPS assembly lines are also known which include polyketide synthase (PKS) and other enzyme activities.6
The first module in an NRPS is known as the initiation module and can typically be subdivided into an adenylation domain (A) and a thiolation domain (T), also known as a peptidyl carrier protein domain (PCP). Following this are a number of elongation modules which also contain A and T domains but have an additional upstream condensation domain (C). The cycle of nonribosomal peptide synthesis requires the priming of a conserved serine residue within the T domain by the addition of a flexible 4′-phosphopantetheine (PPT) prosthetic group, catalysed by a 4′-phosphopantetheinyl transferase (PPTase). This flexible linker allows tethered intermediates to be passed from one domain to another along the assembly line. Following the priming of the PCP, the A domain of the initiation module activates its cognate amino acid substrate through a reaction with ATP to generate an aminoacyl-AMP intermediate which is attacked by the thiol group of the PPT resulting in a PCP-tethered aminoacyl thioester (Fig. 2). The A domain of module 2 similarly activates its amino acid substrate to generate a second aminoacyl thioester tethered to the PCP of module 2. The condensation domain of module 2 then catalyses peptide formation to give a dipeptide intermediate tethered to the second PCP domain (Fig. 2). The initiation module can then load another substrate amino acid and commence assembly of another peptide. The peptidyl-thioester intermediate is passed from one module to the next with a single amino acid being added at each module. Finally the full length polypeptide is released by a terminating thioesterase (TE) domain which either hydrolyses the linear product or catalyses cyclisation during the release (Fig. 2).7 In addition to these standard modules, further structural variation can be introduced by other optional domains such as epimerization (E), methylation (MT) and cyclization domains (Cy). Epimerization domains occur at the C-terminal end of modules responsible for D-amino installation and act on the PCP-tethered peptide.7 As the product of these domains is a racemic mixture, the C-domain of the downstream module ensures that the correct enantiomer/diastereomer is subsequently used for elongation.8,9 Methylation of nonribosomal peptides is achieved by specialised methylation domains or by standalone enzymes that come in three different flavours (N-, C- or O-methyltransferases) that utilise S-adenosylmethionine as the methyl donor. N-Methyltransferases are most commonly found as domains inserted within the adenylation domain and typically methylate the PCP-tethered amino acid substrate prior to condensation, such as in thaxtomin A biosynthesis.10 Alternatively methylation is catalysed by separate enzymes within the cluster which act in trans on the final, often cyclised, peptide such as chloroeremomycin.11C-Methyltransferases are much more commonly found in PKS rather than NRPS clusters but an example can be found in the yersiniabactin biosynthetic cluster, a hybrid NRPS/PKS from Yersinia pestis, here the methylation domain, found within a nonribosomal peptide module, catalyses methylation of a thiozolinyl-S-PCP intermediate. O-Methylation events are rarer still but an example can be seen within the NRPS cluster for saframycin Mx1 biosynthesis.12 Cyclisation domains (Cy) are unusual tailoring enzymes as they take the place of the condensation domain in a module and catalyse the formation of a peptide bond via the heterocyclisation of cysteine, serine and threonine residues to thiazoline or oxazoline heterocycles. In many cases the resulting heterocycle is then oxidised by an oxidase domain (Ox) to the corresponding thiazole, for example during epothilone biosynthesis.13 More detailed explanations of tailoring domains and their functions have been covered in previous reviews.7,14
Although nonribosomal peptides can have an important function in the producing organism, such as iron-scavenging carried out by siderophores,15 most interest in these molecules relates to the fact that nonribosomal peptides display a wide range of bioactivities; nonribosomal peptides can be exploited in agrochemical applications, or in the development of therapeutic agents including anti-tumour, antiviral, immunosuppressive and antimicrobial agents. Although many nonribosomal peptides exhibit significant biological activity many do not possess desirable pharmacokinetics or ADME properties and so the semi-synthesis or engineered biosynthesis of nonribosomal peptide variants is desirable. With the emergence of antibiotic resistance among pathogenic bacteria, there is currently massive interest in developing new and more effective antimicrobial agents. Early researchers in this field envisioned that the assembly lines of nonribosomal peptides could be engineered to incorporate different residues thereby producing new and improved “non-natural” products. This review seeks to cover the progress in engineering nonribosomal peptides that has occurred in the last ten years.
Fig. 3 Cyclosporin analogues incorporating nonnatural allylglycine (1), β-cyclohexylalanine (2) or D-serine residues (3) produced by precursor directed biosynthesis.16 |
Many other examples of precursor directed biosynthesis can be found in earlier reviews.17 Although there are many examples where precursor directed biosynthesis has been used effectively, one of the problems associated with this technique is that the synthetic precursors compete with the natural endogenous amino acid precursors, which likely act as the preferred substrates. As a result the isolated yields of novel compounds can be low with the wild-type products predominant. As a solution to this problem, mutasynthesis was developed. In this process the modified substrates are fed to an engineered organism which is deficient in the enzyme(s) required for the biosynthesis of a specific natural precursor, so that a precursor analogue may be more effectively incorporated. In contrast to precursor directed biosynthesis, a reasonable amount of genetic information has to be known about the biosynthetic gene cluster and the genetic tractability of the producing organism.
Using a mutasynthesis approach, novel calcium-dependent antibiotics (CDAs) were generated through the creation of a Streptomyces coelicolor strain where the production of CDA was abolished following deletion of the gene hmaS. This gene is involved in the biosynthesis of 4-hydroxymandelic acid, a precursor for the biosynthesis of 4-hydroxyphenylglycine (L-Hpg), which is one of the non-proteinogenic amino acids installed in the CDA structure (Fig. 4).18 A series of novel lipopeptides were produced when the mutant was instead supplied with a number of synthetic mandelate, arylglyoxylate and arylglycine analogues. Feeding of phenylglycine instead of L-Hpg led to CDA variant (4) being produced that lacked the hydroxy group relative to the wild-type CDA. More interestingly, cultures fed with 4-fluorophenylglycine or similarly 4-fluoromandelic acid or 4-fluorophenylglyoxylate, led to the detection of fluorinated analogues (5) and (6). However, similar L-Hpg analogues carrying the bulkier chlorine or methoxy functionalities did not lead to new products.
Fig. 4 Structures of calcium-dependent antibiotics CDA2a and 2b and variants (4–6) produced by mutasynthesis.18 |
In a similar study, the biosynthetic pathway of the vancomycin-related glycopeptide balhimycin was manipulated so that the gene responsible for the formation of the naturally incorporated β-hydroxytyrosine, bhp, was inactivated. Cultures of this deletion mutant were fed with either 2-fluoro-β-hydroxytyrosine (7), 3-fluoro-β-hydroxytyrosine (8) or 3,5-difluoro-β-hydroxytyrosine (9) to yield the corresponding fluorinated balhimycins (10), (11) and (12) (Fig. 5).19 As with the previous example, not all the tested β-hydroxytyrosine analogues led to novel glycopeptide structures; several β-hydroxytyrosine analogues lacking the para-hydroxyl group failed to be incorporated.
Fig. 5 (A) Structures of the glycopeptides vancomycin and balhimycin. (B) Feeding of fluorinated β-hydroxytyrosine led to the isolation of the correspondingly fluorinated balhimycins.19 |
These two examples help to highlight a recurring problem in the traditional mutasynthesis and precursor directed biosynthesis approaches in that the introduced changes are usually conservative due to limited or uncompromising substrate flexibility of the native enzymes. Moreover, these examples were limited to modifications of non-proteinogenic amino acids as gene deletions that abolish production of these non-essential amino acids do not, on the whole, affect growth. Introducing modifications to the proteinogenic amino acid residues can be more challenging, requiring the creation of amino acid auxotrophs and feeding experiments conducted in minimal media.20
Although the techniques of precursor directed biosynthesis and mutasynthesis have been utilized for some time they are still in regular use as they offer a simple means of generating functional analogues of nonribosomal peptides.21–24 However recent developments in synthetic biology are opening up new avenues for introducing structural diversity into nonribosomal peptides.
The introduction of halogen substituents into nonribosomal peptide scaffolds has been a common target, as simple changes in halogenation patterns can have a significant impact on the activity of a compound. For example, when the enzyme PrnA, a flavin-dependent tryptophan-7-halogenase from Pseudomonas fluorescens Pf-5, was expressed alongside the NRPS genes for the uridyl peptide antibiotic pacidamycin, which is produced by Streptomyces coeruleorubidus,25 a new halogenated analogue was generated. The halogenase gene was cloned into the plasmid pIJ10257, which integrates into the streptomyces φBT1 site, and placed under the control of the ermE* constitutive promoter. The new pacidamycin analogue was halogenated at the C-terminal tryptophan moiety (13), with the tryptophan becoming halogenated by PrnA prior to incorporation by the NRPS. This modified analogue was produced as the minor product alongside the wild-type pacidamycin in a typical ratio of 1:5 but the authors note that in some cases the chlorinated product was produced as the dominant species. Chloropacidamycin was isolated at approximately 1 mg per litre, a yield that was comparable to that achieved in their previous precursor directed biosynthesis work with 7-chlorotryptophan.22 This halogenation approach also provided access to a range of new arylated analogues (14–17) via a semi-synthetic Suzuki–Miyaura coupling reaction performed on the purified pacidamycin analogues (Fig. 6).25
Fig. 6 Pacidamycin derivatives were generated by producing 7-chlorotryptophan in vivo, which is subsequently installed at the C-terminus of pacidamycin. Further analogues were then produced using a semi-synthetic approach, using purified pacidamycins.25 |
A significant portion of the work on engineering of nonribosomal peptides has focused on the family of lipopeptide antibiotics, an important class of antibiotics that includes the calcium dependent antibiotics, friulimicins and daptomycin. This family of lipopeptides all possess an N-terminal fatty acid chain which aids their penetration into the membrane of Gram-positive bacteria. The length of the fatty acid chain varies between family members and can have a significant impact on antimicrobial activity. The antibacterial activity generally rises with increasing acyl chain length, however chain lengths longer than 11 carbons tend to exhibit toxicity in humans.26,27 Deacylation mechanisms also play a part in increasing resistance to lipopeptide antibiotics, such as daptomycin, so being able to vary this chain offers a route to new effective antibiotic treatments.28 Lewis et al. were able to modify the active site of the β-ketoacyl-ACP synthase FabF3 from Streptomyces coelicolor leading to the installation of fatty acid chains of differing lengths onto CDA.29 CDA has a trans-2,3-epoxyhexanoyl fatty acid side chain, which is unusually short in comparison to most other lipopeptides. The authors first experimented with the fatty acid chain of CDA using more traditional mutasynthesis techniques and determined that the biosynthesis of the CDA lipid moiety is controlled by a fab operon of five genes (Fig. 7B).30 The operon includes a gene encoding an acyl carrier protein (ACP) which facilitates the biosynthesis and transfer of the fatty acid during the first stage of CDA assembly. Also present are genes fabF3 and fabH4 encoding β-ketoacyl-S-ACP synthase enzymes (KAS-II and KAS-III) which catalyse Claisen-type condensation reactions during chain elongation, leading to a hexanoyl-S-ACP intermediate (24). Additionally, there are genes encoding a hexanoyl-ACP oxidase (HxcO), which generates a trans-hexanoyl-S-ACP intermediate (25), and a monooxygenase (HcmO), which catalyses an epoxidation reaction to give the epoxyhexanoyl-ACP (26). Deactivation of the module 1 PCP of CDA biosynthesis prevented the transfer of the upstream ACP-tethered 2,3-epoxyhexanoyl fatty acid chain, therefore production of CDA was abolished. Feeding an exogenous supply of synthetic N-acyl-L-serinyl-NAC analogues restored the production line and allowed detection of CDA analogues with pentanoyl (20) and hexanoyl (21) side chains (Fig. 7A).30
Fig. 7 (A) Structure of CDA4a analogues with altered fatty acid side chains. (B) Proposed biosynthesis of CDA epoxyhexanoyl-S-ACP side chain.30 |
Sequence analysis of the KAS-II type enzyme, FabF3, showed that the acyl-binding pocket contained a Phe residue at position 107 rather than the smaller amino acids, such as Ile or Leu, which are found in other similar enzymes. The authors speculated that this phenylalanine residue acts as a block to longer chain fatty acids, which explains why CDA contains an unusually short lipid chain. However when mutants were constructed where the Phe107 was replaced with Ile, Leu or Ser, wild-type CDA with the native trans-2,3-epoxyhexanoyl side chain (CDA4a) was still produced rather than CDA products with longer lipid chains. The F107I and F107L mutants did however also produce a small amount of two new products that were identified as being CDA modified with either a 2,3-epoxybutanoyl (18) or a butanoyl (19) fatty acid side chain (Fig. 7A). FabF3 is the second enzyme in the fatty acid chain elongation, which catalyses the condensation of a malonyl unit with butanoyl-S-ACP (23). The fact that CDA analogues were isolated with a butanoyl chain suggested that the FabF3 mutants were lacking in activity compared to the wild-type, leading to the accumulation of the butanoyl intermediate. Nevertheless, the fact that these intermediates were successful in initiating the CDA core peptide, together with the earlier mutasynthesis results, suggested a certain flexibility in the initiation module of CDA. The formation of CDA variants with epoxybutanoyl fatty acids also demonstrated that the epoxide forming monooxygenase also has a certain degree of substrate promiscuity.29 These results could potentially lead to further novel structures being produced with more variation in the fatty acid chain of lipopeptide antibiotics.
Fig. 8 Structures of related lipopeptides enduracidin and ramoplanin. The two structures differ in chlorination pattern (highlighted in red), fatty acid side chain (highlighted in blue), and glycosylation (highlighted in green).31,32 |
Employing the assumption that the structural similarities between the two lipopeptide structures would allow the binding and subsequent mannosylation of enduracidin, ram29 was expressed in the enduracidin-producing Streptomyces fungicidicus. An expression cassette containing the ram29 gene along with its native Shine–Dalgarno sequence under the control of the tetracycline inducible promoter and integrated at the ΦC31 site on the Streptomyces chromosome failed to produce any evidence of mannosylated enduracidin. The expression cassette was optimised by replacing the native Shine–Dalgarno sequence and GTG start codon with the corresponding sequence from the eGFP expression construct pIJ8668. This resulted in conjugates that produced novel monomannosylated enduracidins, although the new products were produced as minor products alongside the wild type enduracidin. The site of mannosylation was determined by tandem mass spectrometry to be on L-Hpg11, the same as found in ramoplanin. The failure of enduracidin to be mannosylated twice, as with ramoplanin, is unexplained at this time, however the authors hypothesised that another enzyme outside of the ramoplanin cluster could be conducting this second mannosylation in ramoplanin biosynthesis or that S. fungicidicus could contain an α-mannosidase that may be removing one of the mannosyl groups.38 This work successfully highlighted that expression of PPM-dependent glycosyltransferases could be used as a method to produce novel glycopeptides. In addition, the importance of expression cassette optimisation when engineering natural product clusters is worth noting.
Other lipopeptides also utilise glycosylation to modulate their activity. Teicoplanin A2-2 and the related A40926 are lipoglycopeptide antibiotics used as last-line treatments for multi-drug resistant Gram-positive bacterial infections.39,40 Both lipoglycopeptides have glucosamine derived glycosyl groups, with a long N-acyl side chain (Fig. 9). This acyl chain is vital for activity and is derived from the corresponding acyl-CoA thioester by an N-acyltransferase (NAT) enzyme present in both clusters. Sequence analysis of these NAT enzymes suggested they have a unique structure not found previously.41 Syue-Yi Lyu and colleagues solved the crystal structures of these unusual NAT enzymes and found some unique traits that suggested that they represent a new NAT architecture. Based on the crystal data, in combination with biochemical and mutagenic assays, they proposed that acyl-CoA first binds to the enzyme, triggering a conformational change which forms the teicoplanin psuedo-aglycone binding site. Following the acyl transfer, the departure of CoA enables the enzyme to re-adopt the open conformation and release the acylated antibiotic. The structural information highlighted that the acyl chain extends into a spacious tunnel. The authors found that this pocket could accept a variety of long and bulky acyl chains including stearoyl (29), biphenylacetyl (36), or naphthaleneacetyl (37), and allowed the generation of a series of new glycopeptide analogues. Steric limitations prevented the acceptance of branched chains such as benzoyl-, malonyl- or methylmalonyl-CoA and ITC analysis showed that C10, the naturally incorporated chain length, was the optimal chain length for the enzyme with efficiencies decreasing as chain length was lengthened or shortened. However these results suggest that chain lengths longer than 16 may also be well tolerated, postulated to be due to the longer lipid chain forming a new favourable shape in the active site (Fig. 10). In addition to a range of monoacylated products, diacylated compounds were also formed including a 2-N-decanoyl-6-O-octanoyl-teicoplanin (43) (Fig. 11). The authors were able to test a number of these new compounds with variable length acyl chains for activity against known vancomycin resistant enterococcus (VRE) and revealed some very encouraging biological activities. In particular diacyl analogues showed significantly enhanced bactericidal activity against the tested strains when compared to mono-N-acylated teicoplanin.
Fig. 9 Structures of teicoplanin A2-2 and the related A40926. Glycosylation sites heighted in green and acylation sites in blue.39,40 |
Fig. 10 Produced mono-acylated teicoplanin A2-2 analogues.41 |
Fig. 11 Produced di-acylated teicoplanin41 analogues. |
Recently another potential diversification option has been exploited which relies on the use of 3′-phosphoadenosine 5′-phosphosulfate (PAPS)-dependent sulfotransferase enzymes to modify teicoplanin-like antibiotic scaffolds. Noteably, two glycopeptide clusters were identified from an environmental DNA library (eDNA) extracted directly from soil.42 It was discovered that one of these clusters, the teicoplanin-like eDNA derived gene cluster (TEG), included several unique sulfotransferase-like enzymes (TEG12, 13 and 14). These three enzymes were heterologously expressed in E. coli and IMAC purified in order to test their activities in vitro. The nonribosomal peptide product of the TEG cluster was predicted to be very similar to teicoplanin, with the only difference being the substitution of the tyrosine found in teicoplanin to the β-hydroxytyrosine (Bht2) found in TEG. The teicoplanin aglycone (47) was therefore tested as a surrogate substrate for the three enzymes in the presence of PAPS. Each of the three TEG sulfotransferases produced a monosulfated analogue of teicoplanin (48–50) and when all three enzymes were used in tandem a trisulfated product was formed (54) (Fig. 12), suggesting that each sulfotransferase has a particular regioselectivity with TEG12, 13 and 14 sulfating the hydroxyls on Hpg3, Cl-Bht6 and Hpg4 respectively. Although these enzymes were not tested in vivo they demonstrated a potential new class of important tailoring enzymes. The related sulfonated peptide A47934 which had been isolated previously,43 was shown to be a weaker inducer of GPA resistance genes in actinomycetes compared with the corresponding desulfo-derivative.44 Based on this it was suggested that sulfation, which does not compromise anti-microbial activity, could be utilised to evade resistance to this class of antibiotics.
Fig. 12 (A) Structures of sulfated teicoplanin-related compound A47934 and the predicted product from the TEG pathway. (B) Structures of teicoplanin aglycones modified with TEG sulfotransferases. TEG12, 13 and 14 reaction sites are highlighted.42 |
During the characterisation of the 81-kb gene cluster involved in the biosynthesis of the unusual sulfated glycopeptide antibiotic UK-68597 from Actinoplanes sp. ATCC 53533,45 a number of potentially interesting tailoring enzymes were identified that are responsible for installing a number of features on UK-68597,46 including an aryl sulfate ester on Dpg3 (dihydroxyphenylglycine), four aromatic chlorinations and an α-keto acid in place of an amino acid at the N-terminus (Fig. 13). Even though in this study UK-68597 could not be detected following fermentation, the putative enzymes involved in its biosynthesis were assigned from genome sequencing analysis. In particular, the enzyme Auk20 was assigned as a sulfotransferase, overexpressed in E. coli and then assessed for activity with various glycopeptide substrates including vancomycin, vancomycin aglycone, A47934, DS-A47934 (desulfated A47934) and teicoplanin. Both teicoplanin and DS-A47934 were successfully sulfated by Auk20 ((55) in 95% and (56) in 51% yield), with MS and NMR data placing the position of the sulfation on teicoplanin Dpg3, the same position as reported for UK-6859745 showing that the sulfation is regio-selective. The gene under the control of the ermE* promoter was also introduced into the φC31 site on the chromosome of heterologous hosts, the A47934 producer Streptomyces toyocaensis and the S. toyocaensis ΔstaL mutant, where the native sulfotransferase had been disrupted. The activity of the enzyme was monitored in cell free extracts by HPLC and MS analysis. These results confirmed the in vitro studies and showed that the desulfated DS-A47934 (produced by ΔstaL mutant) was a substrate for Auk20, producing a sulfated DS-A47934 modified at the Dpg3 moiety (56). The expression of Auk20 in the wild-type A47934 producer showed no evidence of the production of a disulfated variant (Fig. 14). Although sulfation events are rare, six different glycopeptide sulfotransferase genes have been discovered within the last 10 years. The increased rate of discovery of new sulfotransferases means that they will potentially be an important class of enzymes in the nonribosomal peptide tailoring toolkit in the years to come.
Fig. 13 Structure of UK-68597.45 Highlighted are post-NRPS modifications around the aglycone structure (chlorination, glycosylation and sulfation). An unusual α-keto acid is moiety is also highlighted. The enzymes responsible for the sulfation and glycosylation, and sites of action, are labelled.46 |
Fig. 14 Effect of incubation of several teicoplanin- (A) and vancomycin- (B) like structures with Auk20 sulfotransferase and Auk10 glycotransferase. When incubated with Auk20 both teicoplanin and DS-A47934 showed evidence of sulfation on L-dpg3 (55 and 56). Neither vancomycin (lacking L-dpg3) nor A47934 (already sulphated at L-dpg1) acted as substrates for Auk20. Auk10 was shown to form glucosylated products from A47934 (57). DS-A47934 (58) and the vancomycin aglycone (59).46 |
In addition to sulfation, UK-68597 is also glycosylated with L-vancosamine-1,2-glucose at Dpg4 (Fig. 13). Three enzymes, Auk10, Auk11 and Auk14, have been identified as glycotransferases from gene cluster analysis. Auk10 showed similarity to characterised enzymes that glucosylate vancomycin on Hpg4. Auk11 showed similarities to the enzyme that installs dehydrovancosamine to balhimycin during its biosynthesis and it was, therefore, proposed that Auk10 glucosylates Hpg4 of UK-68597 while Auk11 transfers the L-vancosamine to complete the L-vancosamine-1,2-glucose glycosylation. Auk14 showed the most similarity to enzymes responsible for glycosylating amino acids at position 6 of glycopeptides such as the enzyme tGtfA that is known to install N-acetyl-glucosamine on beta-hydroxytyrosine at position 6 of teicoplanin. This enzyme, however, seems redundant as only two sugars are known to be attached to UK-68597 and none at the 6 position, although it does show some level of similarity to Auk10.
To determine which enzyme was responsible for the first glycosylation of UK-68597, both Auk10 and Auk 14 were overexpressed and purified from E. coli and tested for activity with the same glycopeptide substrates as the sulfotransferase (minus teicoplanin). Auk10 was able to glucosylate A47934 (24%) (57), DS-A47934 (8%) (58) and the vancomycin aglycone (5%) (59), while Auk14 showed almost no in vitro activity with any of the tested glycopeptides, although a trace was detected when the vancomycin aglycone was used (Fig. 14). The regioselectivity of the better performing Auk10 was determined to be position 4. As with the sulfotransferase, Auk10 was also introduced into the chromosomes of S. toyocaensis and the ΔstaL mutant but no production of glycosylated products were observed for either strain.
Together with the enzymes highlighted above, there are four additional chlorination events during UK-68597 biosynthesis and although the responsible enzymes have yet to be characterised it demonstrates that individual biosynthetic gene clusters have huge potential as rich sources of new tailoring enzymes. With the costs of genome sequencing decreasing, the number of newly characterised nonribosomal peptide gene clusters is growing which opens up the possibility that more and more unique and tantalising tailoring enzymes remain to be discovered. Discovery and characterisation of these will further increase our ability to introduce structural diversity into nonribosomal peptides during their biosynthesis. In the same environmental DNA library where the TEG pathway was discovered, a second identified cluster, the vancomycin-like eDNA derived gene cluster (VEG), was also found to encode a number of tailoring enzymes including a halogenase, 7 glycosyltransferases and 3 methyltransferases. This, in particular, highlights how environmental DNA libraries could be important to the discovery and development of new tailoring enzymes.
Fig. 15 (A) Schematic of the NRPS genes responsible for the biosynthesis of the related lipopeptides daptomycin, A54145, and CDA. (B) Subunit exchange strategy where DptD subunit was first deleted and then complemented in trans with either DptD or equivalents from A54145 or CDA biosynthesis to produce two novel daptomycin analogues (60) and (61).48 R = decanoic acid. |
Daptomycin is a cyclic 13-amino acid lipopeptide and is a product of three biosynthetic NRPS subunits, DptA, DptBC and DptD. In the first of a series of studies, in which the biosynthetic pathway was successfully engineered to produce new derivatives of daptomycin, dptD was deleted. The gene dptD encodes for an NRPS subunit responsible for incorporating the final two amino acids, 3-methylglutamate (3mGlu) and kynurenine (Kyn), at the C-terminus (at position 12 and 13 respectively) as well as incorporating the TE domain for peptide cyclisation and release (Fig. 15).48 Following this knockout, and confirmation of the abolition of daptomycin production, successful complementation in trans was demonstrated using a strong constitutive promoter to drive expression of not only the wild-type dptD but also heterologous genes cdaPS3 and lptD from CDA and A54145 biosynthesis respectively. Both of these heterologous genes are also responsible for installing the final two amino acids in their respective NRPS pathway. CdaPS3 incorporates Glu (or 3mGlu) and Trp at the end of CDA biosynthesis, while LptD installs Glu (or 3mGlu) and either Ile or Val to finalise A54145 biosynthesis (Fig. 15). An advantage in choosing these two subunits for heterologous exchange was that they both include an initial Glu/3mGlu-specifying A domain, similar to DptD. This similarity seemed to be sufficient to maintain the interaction of the altered C domain with the upstream PCP and aided the incorporation of the subsequent non-native amino acid (either Trp, Ile or Val) at the C-terminus. Another advantage for choosing this final subunit is that the inclusion of the TE domain means that there are no downstream interactions that could be adversely affected. Although both heterologous subunit exchanges produced modified daptomycin analogues, these changes came at the expense of a drop in yield in the range of 25–50% of wild type levels (Fig. 15).48
In a follow-on study, additional genetic modifications were made in the ΔdptD strain to help to improve yields. The first module of daptomycin biosynthesis, dptA, was also deleted. This module is responsible for the initiation of the biosynthesis by first coupling the decanoic acid precursor with the N-terminal tryptophan. It was envisioned that complementing this gene in trans under the control of the strong constitutive ermE* promoter would lead to the overexpression of the initiation module and therefore positively influence yields. Using this method the production of the daptomycin derivatives was boosted to around 40–69% of wild type levels when complemented with the dptD homologous genes from either CDA or A54145 biosynthesis.49 Interestingly while all the daptomycin biosynthetic genes were seen to be expressed on a single transcript, sequential translation was not required for robust production, meaning that deletion and trans-complementation of NRPS subunits was possible.
Fig. 16 (A) Structure of daptomycin and analogues produced through NRPS module exchange. (B) NRPS organisation of the daptomycin cluster and schematic showing module exchange strategy. Modules 8 and 11 were swapped for each other or for the Asn8 module from A54145 biosynthesis (see Fig. 15A). These NRPS chimeras were then expressed alongside either the wild type DptD subunit or similar LptD/cdaPS3 subunits from A54145/CDA biosynthesis. (C) Further module swapping involved swapping an entire four subunits from DptBC for the related subunits from A54145 biosynthesis. Daptomycin modules are shown in black, A54145 modules in red and CDA modules in blue.50 Domains originating from DptBC module 8 are shaded solid gray, domains originating from DptBC module 11 are hatched. |
Following this proof of concept the genes from the similar A54145 biosynthetic cluster (the D-Asn encoding module 11) were used to replace either D-Ser8 or D-Ala11 positions with D-Asn (Fig. 16B). This also proved successful and two new analogues were isolated (D-Asn11 (69) and D-Asn8 (70)), however production levels were further reduced relative to wild type (the D-Asn11 (69) analogue showing slightly higher production than D-Asn8 (70)). Replacing the original E domain with the heterologous E domain from A54145 also resulted in the formation of the new analogues, demonstrating that total module replacement (C–A–T–E) is possible, but this change caused a significantly decreased yield of product versus the native E domain showing that maintaining the native module–module linker regions is important for activity.50 Activity assays showed that the D-Asn8 daptomycin analogue (70) was less active than daptomycin, but the D-Asn11 analogue (69) retained potency.
More extreme changes were also made to the core structure of daptomycin by exchanging four modules within DptBC (D-Ala8-Asp9-Gly10-D-Ser11) with four modules from LptC (D-Lys8-OmAsp9-Glu10-D-Asn11) from the A54145 cluster (Fig. 16C). Although production of the expected product of this module exchange was seen (minus the O-methylation of Asp9 due to the necessary tailoring enzymes not being present in the daptomycin cluster) (76), the yield was drastically reduced, with production being less than 0.5% of control levels.
The successful production of compounds with changes at positions 8 and 11 led to the combination of these with the previously successful changes to the final amino acids at positions 12 and 13 through exchange of dptD for lptD (62–66) or cdaPS3 (71–75) (as well as other modifications made to the lipid tail) leading to the production of multiple daptomycin hybrid compounds (Fig. 16B) with production levels ranged from approximately 0.5–45% of control levels, and a general trend observed that the greater the number of changes imposed the lower the production levels. Each compound was assessed for antibacterial activity and although they were, on the whole, no greater in potency than daptomycin (with the exception of one against an E. coli imp mutant), the successful production of new compounds in a combinatorial manner indicates that this sort of approach is possible.
In a follow on study dptD was again chosen to be the subject of modification. The C–A or C–A–T domains of module 13 (incorporates Kyn13 in daptomycin) were exchanged for different domains from cdaPS3 (incorporates Trp11 in CDA) or lptD (incorporates Ile13 in A54145) (Fig. 15A). Although no production was observed in the C–A–T domain swaps, exchange in just the C–A domains alone led to production of the predicted Trp13 and IIe13 daptomycin analogues at levels of approximately 20% of the control. Unfortunately the new compounds produced were found to be less potent than daptomyin when assessed in antibacterial assays.51 The fact that Cubist failed to identify any daptomycin variants with improved antimicrobial activity, despite a major industrial effort, is perhaps not surprising. There is a very close evolutionary relationship between the daptomycin, CDA & A54145 NRPS modules and domains that were exchanged. Most likely nature had already sampled these combinations and selected against the compounds Cubist created, in favour of the daptomycin which remains the most active antibiotic in this large family of natural and engineered lipopeptide variants.
A slightly more subtle domain swapping approach was demonstrated with the in vivo production of novel pyoverdine derivatives by Pseudomonas aeruginosa PAO1 through smaller domain substitutions where alterations were limited to either the A domain alone (A) or together with the C domain (C–A).52 Similar to dptD, pvdD encodes an NRPS subunit responsible for incorporating the final two residues, which are both threonine at positions 10 and 11 of pyoverdine (Fig. 17A, B). A previous attempt where changes were aimed at altering the penultimate amino acid in pyoverdine had failed, presumably due to disruptions in the interactions between modules in the system.15 In an attempt to minimise these disruptions only the final amino acid in pyoverdine (Thr11) was substituted with alternative amino acids. In a similar fashion to the earlier daptomyin experiments, the native A domain (or both the A and C domains) of module 11 of pvdD was first deleted and then expressed in trans to ensure successful complementation was possible before various A domain (or C–A domains) replacements were introduced. Replacement domains tested included homologous domains obtained from elsewhere in the same biosynthetic cluster (Fig. 17B), or heterologous genes obtained from pyoverdine biosynthetic clusters present in other Pseudomonas species (Fig. 17C). In total, nine constructs for each of the A and C–A domain replacements were complemented into the ΔpvdD strain; three Thr-specifying, three Ser-specifying (one of which accepted a Thr from the neighbouring module immediately upstream), one Lys-specifying, one Asp-specifying and one Gly-specifying. Each of these were assessed for production of pyoverdines, detected either by monitoring changes in UV or a more sensitive fluorescence method.
Fig. 17 (A) Structure of pyoverdine. Amino acid chosen for replacement is shown in box. (B) Module replacement strategy for the replacement of Thr11 (shown in black box). Module 11 was replaced with three modules from within the pyoverdine cluster. Replacement modules (either C–A or just A) are highlighted with dotted lines. Modules that were successfully complemented are shown in green, unsuccessful in red. (C) Modules from homologous Pseudomonas clusters that were used to replace PvdD11 are highlighted within their native clusters. Modules that were successfully complemented are shown in green, unsuccessful in red. Relative production levels of successful chimeras are shown.52 hfOrn = L-N5-formyl-N5-hydroxyornithine. |
Where the native Thr11 A domain was substituted with the three non-native Thr-specifying A domains production of pyoverdine could be detected by absorbance in each case, with two showing native production levels and the third producing at 29% of the control level. All the other, non-Thr-specifying A domain substitutions, were found to produce very low levels of a pyoverdine-like compound that could only be detected using the more sensitive fluorescence detection. Mass spectrometric analysis showed that all these analogues still contained Thr11, indicating that these heterologous A domain replacements had failed to function as anticipated in their new context.
Differing results were obtained with replacement of the C–A domains, with only one of the homologous Thr-specifying replacements (from the P. syringae pyoverdine biosynthetic gene cluster) producing high levels of pyoverdine (83% of the control). This result taken in isolation would seem to indicate that joint C–A domain replacement was a less favourable approach than the simpler A domain replacement, however, unlike with isolated A domain replacement, two novel pyoverdine derivatives with unnatural substitutions at position 11 were successfully obtained when replaced with non-threonine C–A substitutions. One of these, containing Lys11, was obtained at 76% of control level and was produced following domain exchange with a C–A domain taken from pvdJ from the same pathway in P. aeruginosa PAO1. The second, containing Ser11, was obtained at 18% of control level and was produced following replacement with a C–A domain from P. syringae pv. phaseolicola 1448A (which in its native context accepts Thr from an upstream neighbouring module). It was also reported that in the majority of C–A domain replacements a truncated product related to pyoverdine was detected, indicating stalling of biosynthesis as a result of the modifications. This study highlights that a single approach to engineering NRPS specificity is not always applicable and that domain substitutions do not always function in a predictable fashion. It also highlights that the condensation domains (C) may be performing a more complicated role than just peptide bond formation and may have some role in substrate selection.
It was concluded that the relationships between the modules upstream and downstream of inserted domains were important to successfully create hybrid NRPS assembly lines. Whilst the C–A domain from module 5 was kept intact, as this was deemed important for conferring D-Hpg specificity, the arrangement of the other domains were aligned so that the native linker regions were maintained. So while the C–A of module 5 was chosen to maintain connection with the upstream E of module 4, in the same way the epimerisation domain (E) from module 4 was chosen to make the most efficient contact with the downstream C domain of module 5.54
In another example, during dissection and heterologous expression of the three module beauvericin and bassianolide biosynthetic systems it was shown that product formation relied on maintenance of the N-terminal linker region of the C domain of the second module.55
Another recent study has looked into the effect of substituting the native T domain of IndC, from the indigoidine biosynthetic pathway of Photorhabdus luminescens, with a number of heterologous synthetic T domains selected following a computational analysis. In total seven synthetic T domains were assessed that showed either high homology, less homology or little homology to the native one and it was found that one third of the synthetic systems were functional. In addition the T domain from BpsA, a IndC homolog from Streptomyces lavendulae, was also tested. Due to the similarity of BpsA it was expected to yield a functional enzyme, however it proved to be nonfunctional. The problem was thought to be caused by poor inter-domain interaction, therefore a number of genetic constructs with different A to T and T to TE linker regions of IndC or BpsA origin were produced and assessed. It was discovered that inclusion of longer lengths of the native linker regions originating from the incoming BpsA positively affected indigoidine production, indicating again that native linker regions are often essential for correct NRPS activity.56
Further importance of the linker hinge regions, in this instance between A and T domains, was shown in studies on EntF, an NRPS involved in enterobactin biosynthesis. A D857P mutation was introduced into the A–T linker region, this was based on previous work on acyl-CoA synthetases which showed that insertion of a proline in the hinge region restricts subdomain rotation and traps the enzyme in the adenylate forming conformation.57,58 As expected, this mutation in EntF abolished production of enterobactin in a reconstitution assay, despite detection of wild type levels of adenylation activity in a PPi exchange assay. This confirmed that the hinge region conformation, which the proline interferes with, is important for domain alignment and catalysis. Interestingly, subsequent sequence analysis of multiple A–T domains then revealed that the region following the A10 motif (the Stachelhaus A domain specificity conferring residues)4,5 is more proline-rich than those found in standalone A domains. Mutation of one (P961) or a combination of other proline residues (P959, P968 and P972) led to severely impaired production of enterobactin. A conserved LPxP motif was then identified at the N-terminus of the A–T domain linker region and shown, through homology modelling and further mutational analysis, to interact with a key residue (Y908) in the C-terminus of the A domain which is also required for movement of the T domain relative to the A domain to complete the catalytic cycle.59 This suggested that the linker regions need to form a specific conformation for activity which is in part controlled by these proline residues. This in vitro study provided mechanistic insight and biochemical evidence of the importance of linker regions in controlling domain conformation and lends greater weight to previous observations that suggest careful consideration of these regions should be undertaken when attempting any combinatorial biosynthesis studies with an NRPS.
The identified core sub-domain of the HrmO-(β-Me)Phe3 A domain (Fig. 18A & C) was replaced with three different sub-domains taken from other NRPSs in the hormaomycin biosynthetic pathway (HrmO-(3-Nep)Ala4, HrmO-Thr2 and HrmP-Val6) as well two sub-domains from CDA biosynthesis (cdaPS1-Asp5 and cdaPS1-Hpg6). The hybrid domains were then expressed and purified for use in adenylation activity assays. Although the hybrids derived from the sub-domains of the hormaomycin pathway were all active and recognised their cognate amino acid, those derived from the CDA biosynthetic pathway were inactive.60
Fig. 18 (A) & (B) 2D representation of adenylation domain secondary structures with circles representing helices and arrows representing sheets (adapted from Kries 2015).61 The swapped sub-domains are highlighted in pale green spanning residues 204 to 323 (A);60 or residues 221 to 352 (B).61 Magenta numbers highlight the eight substrate specifying residues and blue numbers highlight the invariable Asp and Lys residues. (C) The specificity-conferring codes of HrmO3A and GrsA. (D) 3D structure of the Phe-activating adenylation domain of GrsA3 with the flavodoxin-like domain coloured green. The eight substrate specifying residues are highlighted in magenta and the invariable Asp and Lys residues are highlighted in blue. |
Subsequent work by a different group identified further boundaries for replacement that were limited to only the flavodoxin-like sub-domain of the Phe-specifying A domain of GrsA (Fig. 18B and D). Nine sub-domain replacements were made, four taken from other NRPS subunits in the gramicidin biosynthetic pathway, and five from NRPS subunits of a range of other biosynthetic pathways. The hybrid domains were then expressed and purified for use in adenylation activity assays. Significant adenylation activity was shown for two sub-domains taken from the gramicidin pathway as well as two from other species, showing that the success of sub-domain transplantation is not restricted to within a particular species.61 A further assay coupled the activity of the hybrid A domain mutants to GrsB1 in an attempt to produce a diketopiperazine. In this case, one out of the four mutants that had previously been identified as successful via the A domain assay was tested and shown to catalyse this reaction. The fact that even one out of the nine flavodoxin-like sub-domain mutants was successful in both of the in vitro biochemical assays demonstrates the ability of this approach to form a functional A domain which, with further refinement, could be extended for use in other biosynthetic pathways and for in vivo production of hybrid nonribosomal peptides.
Another study examined the third module of Plu3262, from the luminmide NRPS, which includes a promiscuous adenylation domain that incorporates predominantly Phe (producing luminmide A), or Leu (producing luminmide B). Expression of the luminmide biosynthetic genes, taken from an entomopathogen, in E. coli allowed luminmide production to be examined in a non-pathogenic host and also led to the authors discovering that Tyr, Val or Met can also be incorporated by Plu3262 to produce novel luminmides, albeit at a much lower level. This heterologous system also permitted the use of ccdB counterselection (discussed later in this review) as an efficient and seamless means of introducing concurrent mutations into the A domain coding sequence. Based on the comparison of the specificity conferring code of Plu3262 with a range of Leu, Phe and Val-specifying adenylation domains, three residues were chosen for mutation with the aim of driving production towards luminmide B (C278M/T, I299F, and A301G). Variants of Plu3262 were produced harbouring either a single or double mutations in these residues.66 Three of these strains were found to favourably overproduce luminmide B when compared to wild-type. The best result was seen with the single mutation A301G. The two other overproducing strains also harboured this mutation but in combination with mutations that in isolation were shown to negatively influence production of luminmide B. The change in production profile between luminmides A and B was also mirrored by some of the minor luminmides, which could prove advantageous for further characterization of these newly discovered analogues.
Fig. 19 (A) Sequence alignment of specificity-conferring codes for the Glu-specifying A domains of CdaPS3, SrfA and FenA and the Gln-specifying A domains of LicA and TycC, with positions relative to those in GrsA. Residues targeted for mutation in CdaPS3 are indicated with an underscore, red text indicates a basic K or H at positions 239 or 278 in Glu-specifying A domains and blue text indicates an uncharged Q at the same positions in Gln-specifying A domains. * indicates the residue in CdaPS3 mutated from K to Q leading to production of CDA derivatives. (B) Structures of CDA lipopeptides showing the native Glu/mGlu residues and the non-native (10Q and 10mQ) Gln/mGln residues incorporated at position ten, the latter of which were produced by S. coelicolor following K278Q mutation of the CdaPS3 A domain.67 |
Since then, two further examples where residues in the A domain binding pocket were mutated to promote the introduction of alternative and non-natural amino acids have been reported. The first of these in 2014 utilised the well-characterised Phe-specifying A domain of GrsA (DAWTIAAICK), in which the eight variable residues that recognise the substrate were chosen to create a library containing single point mutations in the A domain. Each of these mutants were expressed in E. coli and purified for use in a 96 well plate PPi exchange assay utilizing each of the 20 proteinogenic amino acids. Mutation of one residue at position 239 from a Trp to a Ser altered the substrate preference from L-Phe towards L-Tyr. Subsequent modelling suggested this was due to the mutation increasing the volume of the active site cavity. The mutant A domain was then shown to not only possess the ability to adenylate para-substituted phenylalanine derivatives but also the functionalisable ‘clickable’ amino acids p-azido-L-Phe and O-propargyl-L-Tyr (79) with high efficiency.69 When the mutant A domain was expressed, together with the adjacent T–E domains, alongside the first module of GrsB, the ability to form the diketopiperazine (DKP) product, O-propargyl-L-Tyr-L-Pro (81), was demonstrated (Fig. 20). Furthermore, the same mutation was transferred to TycA, and DKP formation and was shown both in vitro as well as in vivo, the latter of which indicates that this mutant may be able to function during nonribosomal peptide biosynthesis to allow labelling of nonribosomal peptides by click chemistry which may, for example, be used to increase bioactivity and broaden structural diversity.
Fig. 20 GrsA-W239S mutant with the ability to adenylate the non-natural O-propargyl-L-Tyr (79). Coupling this mutant with GrsB1 allowed the formation of the diketopiperazine O-propargyl-L-Tyr-L-Pro (81).69 |
A second example of an A domain being engineered to allow the introduction of azide-containing amino acid derivatives was reported recently.70 In this study, the structural basis for the ability of an A domain to simultaneously recognise two different amino acids, Arg and Tyr, that are incorporated into anabaenopeptin was determined by solving the structure of the A domain in complex with each amino acid. Based on the structural information obtained, it was theorised that the mutation of amino acids at three positions (E204, S243, and A307) could lead to formation of an A domain with a shifted specificity towards Tyr. Following mutations at position 307 only 4 out of 19 mutations were active in a PPi exchange assay. A switch in substrate preference towards Tyr, relative to Arg, seemed to be most favoured when a large aliphatic side chain-containing amino acid was present at position 307 (Val or Leu). Mutations at position 243 produced 15 active mutants out of 19 mutations with large non-polar side chain-containing amino acids again selecting Tyr over Arg. Mutation of the Glu at position 204 in the A domain revealed that this was key for Arg selection, with mutations at this position accepting only tyrosine.70 A new substrate preference for Trp was also observed with the S243E mutant and the double mutant (E204G/S243E) was shown to actually prefer Trp as a substrate. Several of the single mutants also demonstrated the ability to activate Trp as well as the unnatural 4-azido-Phe (Az). Importantly the double mutant (E204G/S243E) was shown to be capable of using Az as a substrate with activity that was similar to the wild-type A domain with Arg.
Fig. 21 The serine specific A-domain EntF, from enterobactin biosynthesis, was replaced with a similar serine-specific domain, SyrE, from syringomycin biosynthesis. The new domain was 30 fold less active than the native domain. Subsequent rounds of random mutagenesis and directed evolution restored production to almost wild type levels with only 4 amino acid substitutions needed.72 ICL – isochromate lyase – responsible for biosynthesis of 2,3-dihydroxybenzoate. The standalone EntE A-domain is responsible for loading of Dhb onto EntB. |
The authors also demonstrated that directed evolution can be applied to an adenylation domain involved in andrimid production, a NRPS/PKS hybrid. The insertion of the promiscuous A domain CytC1 led to a non-functional chimera, as was expected. Through three rounds of mutagenesis production of andrimid was be restored to only 3-fold less than wild-type and the promiscuous nature of CytC1 was used to produce andrimid analogues containing non-native L-2-aminobutyrate or D-2-aminobutyrate following exogenous feeding. In a second experiment with the andrimid NRPS, an A domain was transplanted that was designed to change the specificity from isoleucine to valine, producing a derivative of andrimid known to have more potent antibacterial activity. This substitution produced the expected product at levels that were 7-fold less than wild-type. Following only a single round of mutagenesis one clone restored production of the andrimid derivative to wild-type levels.72 Mutations that increased production were found to be located not only within the active site of the A domain but also in other distil regions that were not predicted, further demonstrating the advantages of using directed evolution when engineering nonribosomal peptides for in vivo production of novel compounds.
In another study on andrimid published in 2011, error prone PCR was used to introduce a combination of mutations at three specific sites in the A domain, identified by multiple sequence alignment, to produce a library of 14330 mutants.73 These were grown in 96 well plates, pooled and assessed for production of new andrimid derivatives by LC-MS. Four clones were found with altered production profiles (following pooled row and column searches), two of which incorporated Ile or Leu instead of Val and the other two producing both Ala and Phe-containing andrimid derivatives. When assessed for bioactivity, two of these andrimid derivatives had lower MICs than andrimid itself against some of the organisms tested.73 These were double, triple or quadruple mutants, with the latter containing an additional spontaneous mutation that showed high activity and improved solubility. This again indicates the potential importance of residues outside the A domain active site.
In a third study, the Phe-specifying A domain from the tyrocidine synthetase was subject to saturation mutagenesis to introduce mutations into each of the eight non-conserved active site residues, producing eight individual libraries consisting of 45 clones at each position. As the wild-type protein is known to have weakly promiscuous activity for L-Thr, enhanced activation of this amino acid was detected using a PPi exchange assay with L-Thr as a substrate in 96 well plates. From this, two mutations (A301C and C331I) were identified as having a positive influence on activation of L-Thr and were therefore combined, activity assessed and entered into a second round of saturation mutagenesis.74 Two further mutations (I330V and W239M) led to enhanced activity in the PPi exchange assay and these two mutations were therefore combined to give a quadruple mutant which was entered into a third round of saturation mutagenesis. Further rounds of evolution contributed no further improvements to the selectivity towards threonine. The mutant containing three mutations (A301C/C331I/I330V) was found to have the highest catalytic efficiency for L-Thr, with a 12-fold increase compared to wild-type. The catalytic efficiency of this mutant adenylation domain was further assessed with nine substrates that ranged in size. Efficiency was found to increase with decreasing substrate size indicating that a steric effect was conferred by the introduced mutations. As three substrates (L-Val, L-2-amino-butyric acid and L-Ala) showed efficiency that were as good as native A domains for those amino acids, the authors demonstrated that the introduced mutations have led to a novel promiscuous A domain.74
In the final and most recent example of directed evolution of NRPS adenylation domains, a different approach was used to identify and select mutants through the combined use of yeast cell surface display and fluorescence activated cell sorting (FACS).75 Firstly a system normally used with antibodies was adapted to display the wild type A domain of DhbE on the yeast cell surface, which was confirmed by detection of two different fluorescent labels fused to the N- and C-terminals of the protein. Having successfully shown this, a method of detecting substrate binding was demonstrated by preparing a chemical probe that mimics acyl-adenylate form of the substrate, acyl-AMS (AMS is adenosine monosulfamate, an isostere of AMP), and has the ability to be labelled with biotin (Fig. 22A). The biotin tag was detected with phycoerythrin (PE)-labelled streptavidin. It was successfully shown that it was possible to detect both the C-terminal myc tag on the wild-type A domain (indicating production of a full length protein) and the probe (indicating binding of the substrate, or a salicylic acid substrate mimic to the A domain) (Fig. 22B). This system was then used to assess a library of DhbE A domain mutants, created by randomization at four positions (His234, Asn235, Ala333 and Val337) that were shown in the crystal structure of DhbE to interact with the 2-OH group of the natural substrate 2,3-dihydroxybenzoic acid (DHB) but are absent from the desired new substrates 3-hydroxybenzoic acid (3-HBA) and 2-aminobenzoic acid (2-ABA). Following five rounds of cell selection, based on detection of probe binding, detection of N- and C-terminal tags and the ability to bind probe at lower concentration, thirty clones were selected for sequencing. Within these clones there were found to be eight distinct mutants with a combination of mutations in the A domain at three positions, with H234 being mutated to Trp in all cases. Four of these eight mutants were assessed in a PPi exchange assay and it was found that only one had higher catalytic activity than wild-type for 3-HBA but upon further assessment this mutant A domain was found to abolish the ability to transfer 3-HBA to the aryl carrier protein (ArCP) of DhbB despite the ability of the wild-type protein to load around 10%. The structure of the EntE, a related A domain, provided a clue as to why this may be; changes at position 234 may hinder the approach of the PPT arm, thus preventing transfer of the substrate. This mutation was changed back to His and upon reassessment of all four mutants they were found to catalyse adenylation of 3-HBA, with the best of these then being shown to load approximately 60% onto the ArCP of DhbB. Kinetic assessment of this mutant in a PPi release assay revealed a change in specificity of around 30-fold. A similar process using a 2-ABA analogue as the probe led to the identification of a mutant that showed a 206-fold change in substrate specificity when compared to the wild-type A domain. The authors noted that whilst binding of the substrate to the A domain may be improved using this method that does not necessarily translate to improved catalytic activity of the enzyme. However, this method is a novel and efficient way to select mutants with improved binding and potentially improved ability for incorporation of non-native amino acids to create novel nonribosomal peptides.75
Fig. 22 (A) Chemical structure of AMS-biotin conjugated SA (salicylic acid) showing SA in red, AMS in blue, the long flexible linker in magenta and biotin in black. (B) Schematic representation of yeast cell surface display of mutant A domains harbouring an N-terminal Aga2p tag that forms disulphide bridges with Aga1p on the cell surface. Mutants that bind an amino acid substrate mimic (acyl-AMS) tagged with biotin are detected following incubation with a streptavidin–phycoerythrin (strep–PE) conjugate, leading to fluorescence-activated cell sorting (FACS) and identification of mutants showing preference for the substrate tested.75 |
Recently there has been a drive to establish optimised, engineered Streptomyces host expression platforms, in which the principal endogenous biosynthetic gene clusters have been deleted. These include the engineered model strain S. coelicolor M1152 and the genome-minimized industrial strain S. avermitilis SUKA.80–82 Of course heterologous expression systems have been developed in other organisms including the yeast Hansenula polymorpha,83Bacillus subtilis84,85 and even E. coli where biosynthetic gene clusters for nonribosomal peptides such as echinomycin, valinomycin, and alterochromides have been successfully expressed.86–88
In the case of Gibson assembly, the linearized target vector and the PCR fragments or chemically synthesized DNA parts containing overlapping sequences are mixed together in a single tube with an exonuclease, which chews-back the fragment from the 5′ to 3′ end. Phusion polymerase then fills in the gaps and a ligase seals the remaining single stranded gaps (Fig. 23).90 This can be used to simply insert a single gene into a vector but has also been used to assemble entire gene clusters, such as the 72-kb pristinamycin PII polyketide biosynthetic gene cluster, which was assembled from 14 individual DNA fragments of 4–5 kb in length. This allowed the authors to insert an additional PII biosynthetic gene cluster into the native host strain, which had the consequence of increasing the production of PII by 45%.93
These homology-based assembly methods have also been used for the direct capture of entire gene clusters from pure genomic DNA or even from environmental DNA samples. The most common method is to co-transform a linear cloning vector, flanked with homologous arms, with the target genomic DNA into either yeast (TAR) or engineered Escherichia coli (LLHR).92,94 This engineered E. coli strain combines the traditional λ-red recombination system with two functionally similar proteins from the Rac prophase (RecET). Using this strain all ten megasynthase clusters (ranging from 10–52 kb in length), of unknown function, from restriction digested Photorhabdus luminescens genomic DNA were successfully captured.94 Heterologous expression of two of these clusters identified them as producing the nonribosomal peptides luminmide A/B and the NRPS/PKS hybrid luminmycin A respectively.
TAR cloning has been used for the capture of the 67-kb biosynthetic gene cluster responsible for the biosynthesis of the dichlorinated lipopeptide antibiotic taromycin A95 and a 67-kb amicoumacin NRPS/PKS cluster.84 It has also been applied in the successful reassembly of a 90 kb gene cluster from environmental DNA.96 However, the maximum size limitation for direct capture from genomic DNA has yet to be determined. As well as in vivo methods of recombination, a similar approach has been demonstrated with digested genomic DNA coupled with Gibson assembly to build a final cyclic product in an in vitro manner.97
While incredibly useful, assembly methods based on homologous recombination have limitations, especially if there are repeated sequences or stable secondary structure of single stranded DNA at the end of the fragments to be assembled. These repeated sequences are commonly found in NRPS genes. These will compete with the required single-stranded DNA fragment or hinder the assembly process, greatly reducing the efficiency or even introducing unwanted errors into the assembly.
Using a combination of these techniques can enable combinatorial biosynthesis of natural product clusters. For example a homologous recombination method such as TAR or Gibson could be used to capture an entire gene cluster from isolated genomic DNA or environmental libraries and then a specific module of interest could be flanked with phage integrase sites to enable that module to be exchanged for entire libraries of different modules to enable true combinatorial biosynthesis.
The creation of larger libraries of NRPS encoding genes is empowering an improved understanding of how these complex assembly line enzymes function and advancing us towards a more combinatorial biosynthetic approach. Despite a large number of good examples of NRPS engineering, progress towards combinatorial nonribosomal peptide biosynthesis has been slow. Early attempts to exchange NRPS domains and modules showed mixed results, with some major successes but many more examples where the same approach was not as effective. Gradually the methods of domain and module exchange have become more surgical but a true understanding of how to reprogram nonribosomal peptide synthetases, whilst maintaining activity comparable to the wild type enzymes, seems to be some way off. As illustrated by the work of Cubist on daptomycin, NRPS module exchanges between biosynthetic gene cluster of very close evolutionary origins can lead to functional chimeric nonribosomal synthetases which retain good activity. However, following too closely the evolutionary relationships between NRPS enzymes, and making obvious modular exchanges, runs the risk of re-creating nonribosomal peptide variants that nature has already sampled and discarded due to sub-optimal biological activity. Clearly the bigger goal for the field is to develop strategies that can allow NRPS re-programming to include new functionality, chemistry that nature is yet to sample, within nonribosomal peptide scaffolds. To this end there is hope with the excellent range of new DNA assembly and editing technologies becoming available, particularly CRISPR/Cas9, that promise to enable rapid changes within NRPS modules. Combined with ever decreasing costs of gene synthesis, the new assembly and editing techniques could allow for a far greater number of mutant and chimeric NRPS constructs to be generated and tested, than was possible by conventional genetic techniques. Should these advances be fully exploited, the rules that govern the architecture of NRPS will become more evident and more radically altered nonribosomal peptides may ultimately be produced. The ultimate goal in engineering of nonribosomal peptides is often suggested to be attainable through a so called “plug-and-play” approach, whereby bespoke modules can be assembled together with characterised linker regions, allowing peptide scaffolds to be assembled in order. However, early attempts at “plug-and-play” have not proved to be as simple as some would like to admit; there remains some significant and exciting work to be done relying on traditional enzymology and structural biology, combined with the major technological advancements.
This journal is © The Royal Society of Chemistry 2016 |