The biosynthetic implications of late-stage condensation domain selectivity during glycopeptide antibiotic biosynthesis

The condensation domain synthesising the last peptide bond in glycopeptide antibiotic biosynthesis has a preference for linear peptide substrates, with effective peptide formation linked to the rate of amino acid activation by the preceding adenylation domain.


Introduction
Natural products biosynthesis contains many examples of complex, bioactive molecules produced by the actions of equally complex enzymatic assembly lines. In particular, polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) assembly lines serve as potent examples of nature's ability to produce a diverse range of structures based on the assembly of repeating building blocks (acetate/malonate and amino acids, respectively). [1][2][3][4] What makes both systems of great interestin addition to the large number of important compounds produced by these pathwaysis that such assembly lines typically consist of repeating groups of conserved catalytic domains clustered into modules, each responsible for the incorporation (and modication) of monomers into the growing product. In NRPS-mediated biosynthesis, a modular architecture allows the formation of peptides with greatly diversied amino acid content, modications and altered stereochemistry to that typically seen from peptides derived from ribosomal synthesis. 3,5 Central to NRPS synthesis are three domains: adenylation (A)-, peptidyl carrier protein (PCP)-and condensation (C)-domains, which together form a minimal unit required to extend a growing non-ribosomal peptide by one amino acid residue (Fig. 1). 3 Selection and activation (adenylation) of the desired monomer is performed by the A-domain in an ATPdependant process, which results in the initial activation of the desired monomer as an AMP adenylate. 6 This highly activated monomer is then transferred onto the terminal thiol group of the phosphopantetheine arm of the adjacent PCP domain, resulting in the formation of a thioester bound aminoacyl-PCP species. 7 Peptide bond formation is then performed in the C-domain, where two (typically) PCP-bound substrates are condensed such that the upstream "donor" amino acid/peptide is transferred onto the downstream "acceptor" aminoacyl-PCP, resulting in peptide bond formation and elongation of the peptide by one residue. 8,9 Minimal NRPS modules are oen supplemented by additional modication domains, arguably the most important of which are epimerisation (E)-domains. 3 These domains are responsible for the epimerisation of the C-terminal residue of the PCP-bound peptide from the L to the D form, and are believed to act together with C-domains to ensure that the correct stereochemistry is maintained during NRPS-mediated synthesis (Fig. 1). Upon completion of the peptide chain, the peptide is removed from the NRPS, typically through the actions of a terminal thioesterase (TE) domain, which serves to act as a further point for structural diversication of the peptide. 10 Given that the products of many NRPS assembly lines have important roles in medicine and that their structural complexity can limit their chemical synthesis at scale, the modular architecture of an NRPS is naturally highly attractive for potential redesign efforts to produce new bioactive peptide products. 4 Such efforts are oen restricted, however, due to our limitations in understanding the exact structure, selectivity and rate of these complex molecular machines: this makes understanding the fundamental process that underpin NRPS activity of key importance and crucial to the success of future enzymatic redesign efforts for these important systems.
Within non-ribosomal peptide synthesis, condensation (C)domains play the essential role of catalysing amide bond formation between neighbouring PCP-bound substrates (Fig. 1). 9,11 Whilst previously seen as little more than stereochemical gatekeepers during NRPS-mediated peptide synthesis a role that they share with structurally related E-domains -Cdomains have now been shown to perform highly diverse roles during NRPS biosynthesis. Examples include the formation of beta-lactam rings, multiple-step heterocyclisation reactions, peptide cyclisation, ester bond formation and complex transformations to produce modied amino acid residues. 3,8,[12][13][14][15][16] Beyond this expansion of conventional C-domain activity, many questions still remain concerning the specicity of C-domains during peptide bond formation, including selectivity for their upstream PCP-bound peptide substrates, the inuence of transacting enzymes and the importance of coupling A-domain amino acid selection with the rate of C-domain activity. As in vivo studies have already demonstrated the potential for Cdomains to display selectivity towards their peptide substrates, 17,18 this makes a detailed characterisation of Cdomain behaviour in vitro all the more pressing in order to understand the mechanism behind the apparent selectivity observed for these key NRPS domains.
The glycopeptide antibiotics (GPAs) serve as a potent example of the need to study and understand non-ribosomal peptide synthesis: these heptapeptide natural products remain one of the last clinical antibiotics with activity against methicillin-resistant Staphylococuus aureus (MRSA). 19 Their complex chemical structures and resulting difficulties in total synthesis are the reason that we remain reliant upon the natural biosynthetic pathways that produce these compounds for their clinical use (Fig. 2). 20 GPAs rely on the interplay between a linear NRPS and a complex, late stage peptide cyclisation cascade comprising 3 or 4 cytochrome P450 monooxygenase enzymes (known as Oxy enzymes) (Fig. 2). 21,22 It is known that the cyclisation cascade in GPA biosynthesis occurs whilst the peptide substrates remain bound to the NRPS machinery, with the interaction between the Oxy enzymes and the NRPS-bound peptide mediated by a unique recruitment domain, known as the X-domain. 23 The X-domain, found in the nal NRPS module of all GPA producing assemblies, is an example of a modied C-domain and the only other reported example of a C/E type domain immediately prior to a terminal thioesterase domain along with the penicillin producing d-(L-a-aminoadipyl)-L-cysteinyl-D-valine (ACV) synthase. 23,24 Whilst in vitro results have been supportive of the X-domain playing a role in the complete enzymatic crosslinking cascade introduced at the heptapeptide stage (and hence on the nal NRPS module), 23,[25][26][27][28] in vivo experiments provide a different hypothesis favouring hexapeptide cyclisation for all steps before that of the nal AB ring Biosynthetic scheme for the glycopeptide antibiotics (GPAs), exemplified for teicoplanin (type-IV GPA, upper panel) as well as related GPA structures relevant for this work actinoidin (type-II GPA, lower left) and balhimycin (type-I GPA lower right). Type-III GPAs possess the same core peptide sequence as type-IV GPAs. In GPA biosynthesis, the NRPS-mediated synthesis of a linear heptapeptide precursor is followed by an oxidative peptide cyclisation cascade of cytochrome P450 (Oxy) enzymes, which transform the linear peptide into its rigid, active form whilst the peptide remains bound to the NRPS machinery. In the biosynthesis of the three GPAs indicated here, the NRPS machinery remains the same from a domain and module perspective: the main differences between these GPA biosynthetic machineries are the number of Oxy enzymes and hence crosslinks installed in the cyclic peptide (3balhimycin/actinoidin; or 4teicoplanin), the presence of 3 (balhimycin) or 4 (actinoidin/ teicoplanin) NRPS-encoding proteins, and the residues contained within the peptide that are dictated by the selectivity of the A-domains.
insertion, which is catalysed by OxyC. [29][30][31][32][33][34] This raises the question as to the selectivity of the C-domain connecting modules 6 and 7 of the NRPS machinery, and hence the process of hexapeptide elongation to form the heptapeptide. 32 Furthermore, within the peptide synthesis machinery itself a phylogenetic analysis of the C-domains within the NRPS machineries in GPA biosynthesis has shown that these are all congured to accept peptides bearing a C-terminally congured D-amino acid residue, despite several (including the C-domain connecting the 6 th and 7 th modules) actually being in the L-conguration. 35 Given these unanswered questions surrounding the late steps within GPA peptide biosynthesis, we determined that this would serve as an excellent system in which to address the impact of peptide structure and stereochemistry on the selectivity of condensation domains within the NRPS-mediated biosynthesis of complex peptides.
HRMS analysis. HRMS was performed on an Agilent 6220 Accurate Mass LC-TOF system with an Agilent 1200 Series HPLC.
UV-vis spectrophotometer. For the A-domain activity assay UV-spectra were recorded using a JASCO V-750 spectrophotometer. For data analysis, the soware Prism7 was used.

Peptide synthesis
For the C-domain selectivity assay, peptides linked to coenzyme A were synthesised according to a previously established protocol. 36 Fmoc-based SPPS was performed manually on 2chlorotrityl chloride resin (scale 0.05 mmol, 200 mg). Resin swelling was performed in DCM (8 mL, 30 min), followed by washing with DMF (3Â), treatment with 5% hydrazide solution in DMF (6 mL, 2 Â 30 min), washing with DMF and capping with a solution of DMF/TEA/MeOH (7 : 2 : 1) (4 mL, 15 min). Amino acid coupling used Fmoc-amino acid (0.06 mmol), COMU (0.06 mmol) and 2,6-lutidine (0.06 mmol, 0.12 M); initial coupling was always performed overnight and a second coupling step was always accomplished to cap unreacted hydrazide groups using BOC-glycine-OH (1 h). Subsequent amino acid couplings were incubated for 40 min. For Fmocdeprotection, a 1% DBU solution in DMF was used (3 mL, 3 Â 30 s). In the last coupling step, a Boc-protected amino acid was always used. The hydrazide peptide intermediate was cleaved from the resin, including t Bu and Boc removal, using a TFA cleavage mixture (TFA/TIS/H 2 O, 95 : 2.5 : 2.5 v/v 0 /v 00 , 5 mL) for 1.5 h with shaking at room temperature. The solution was concentrated under nitrogen stream to $1 mL and precipitated with ice cold diethyl ether ($8 mL), followed by centrifugation in a ame-resistant centrifuge (Spintron). All crude hydrazide peptides were puried using a preparative HPLC, and puried hydrazide peptides subsequently converted to CoA-linked peptides. To achieve this, the peptide hydrazide (5 mM) was dissolved in buffer A containing urea (6 M) and NaH 2 PO 4 (0.2 M), pH 3 (obtained via addition of HCl) and the reaction mixture was cooled to À15 C using a salt/ice bath. In the next step, 0.5 M NaNO 2 (0.95 eq.) was added to the solution and stirred for 10 min before addition of coenzyme A (1.2 eq., dissolved in buffer A). The solution was adjusted to pH 6.5 by adding

Protein expression of Tcp12
All Tcp12 constructs (pET-MBP-1c) were co-expressed with the teicoplanin MbtH-like protein Tcp17. This was performed by transforming 50 mL of competent cells with a plasmid encoding Tcp17. Cells were thawed on ice and 1 mL of DNA (20-30 ng for both constructs) was added to the cells. The mixture was incubated for 30 min on ice, before performing a 42 C heat shock for 10 s and returning the mixture to ice for 5 min. Cells were recovered by adding 750 mL of room temperature SOC media and incubation at 37 C, 750 rpm for 60 min. Aer incubation, 450 mL of the mixture were spread onto an antibiotic-selective LB-agar plate having selectivity markers for both plasmids (kanamycin and streptomycin) and incubated overnight at 37 C. Expression of the Tcp12 constructs was performed in auto-induction media (10 g tryptone, 5 g Na 2 HPO 4 , 3.4 g KH 2 PO 4 , 1.3 g Na 2 SO 4 , 0.24 g MgSO 4 , 5 g glycerol, 0.5 g glucose, 2 g lactose, pH 7.4 adjusted with NaOH per 1 L media) with the media supplemented with the respective antibiotic (kanamycin 50 mg mL À1 and streptomycin 50 mg mL À1 ). Inoculation used 1/ 100 of culture volume of pre-culture. Bacterial growth was performed at 37 C and 170 rpm for 5 h followed by subsequent reduction in temperature to 18 C. The culture was then incubated for a further 16-40 h at 18 C.

Protein expression of PCP 6
Transformation of the PCP 6 domain derived from Tcp11 was performed in BL21(DE) cells following the same procedure as the Tcp12 constructs but without co-expression of an MbtH-like protein. Expression of the PCP 6 construct (pET-Trx-1b) 37 was performed in LB-media supplemented with the respective antibiotic (kanamycin 50 mg mL À1 ). Inoculation used 1/100 of culture volume of pre-culture. Bacterial growth was performed at 37 C and 170 rpm until an OD 600nm of 0.6 was reached, upon which the temperature was reduced to 18 C and protein expression induced by the addition of IPTG (0.1 mM nal concentration) followed by incubation for 6 h at 18 C.

Protein expression of cytochrome P450 s
OxyB and OxyA (expression vectors pET28 or pET151d) were transformed into E. coli KRX and expression took place in LB media supplemented with the respective antibiotic and inoculated by adding 1/100 of culture volume of pre-culture. Bacterial growth took place at 37 C and 120 rpm until an OD 600nm of 0.40-0.45 was reached. Subsequently, the temperature was reduced to 18 C, d-aminolevulinic acid (100 mg L À1 ) was added and protein expression was induced through addition of 0.1% (w/v) rhamnose and 0.1 mM IPTG (nal concentration); incubation continued overnight at 18 C (90 rpm).

Protein purication of cytochrome P450 OxyB bal
Cell harvesting, lysis and NiNTA purication followed the same protocol as for Tcp12 and PCP 6 . Aer NiNTA chromatography, the fractions containing protein were pooled and dialysed overnight into anion exchange buffer A (AEX) (20 mM Tris HCl pH 8.0, 50 mM NaCl). Subsequently, AEX chromatography was performed (Äkta, GE Healthcare, 6 mL ResourceQ column). Protein was loaded using AEX buffer A and eluted by applying a gradient from 0-50% AEX buffer B over 20 column volumes (20 mM Tris HCl pH 8.0, 1 M NaCl). As a nal purication step SEC was performed using the same buffer as for Tcp12 and PCP 6 . All proteins were ash frozen and stored at À80 C.

In vitro experiments
Online A-domain activity assay. In order to monitor the rate of amino acid activation by ATP of the different Tcp12 constructs, an online activity assay detecting PPi release was used which allows the detection by spectroscopic methods. 38 The assay can be used with or without an acceptor domain such as the PCP. If it is performed with a PCP-domain present, the PCP can also be converted into the holo-form rst to allow the loading of amino acids and two rounds of amino acid activation. For the optional PCP-loading reaction 1 mM R4-4 mutant Sfp, 39 300 mM PCP and 600 mM CoA in 25 mM Tris, pH 7.4 and 5 mM MgCl 2 were used. Aer optional PCP-loading, the A-domain activity assay was performed by using 1 mM of the preloaded Tcp12, 0.5 mM ATP and Dpg (0-0.06 mM) in 100 mM Tris, pH 7.4, 1 mM MgCl 2 , 0.1 mM EDTA, 0.2 mM NADH and the components needed for detection (F-6-P ¼ D-Fructose-6phosphate (3 mM), PPi-PFK ¼ PPi-dependent phosphofructokinase (0.1 U mL À1 ), aldolase (1 U mL À1 ), TPI ¼ triosephosphate isomerase (5 U mL À1 ), GDH ¼ glycerophosphate dehydrogenase (5 U mL À1 )). The nal reaction volume was 0.5 mL.
C-domain selectivity assay/P450 crosslinking. Aer CoA loading steps, peptidyl-PCP 6 (50 mM), holo-Tcp12_DTE 2 (50 mM), ATP (1 mM), MgCl 2 (10 mM) and amino acid (1 mM) were combined in buffer (50 mM Hepes (pH 7), 50 mM NaCl) and incubated for 3 h at 30 C, 300 rpm. If the reconstitution assay was combined with P450 turnover, OxyB bal (0.5 mM), PuR (0.66 mM), PuxB A105V mutant (2.5 mM), 40 glucose (0.33%), glucose dehydrogenase (0.033 mg mL À1 ) and NADH (2 mM) were added. Peptide cleavage from the peptidyl carrier domain was performed through addition of 40% methylamine solution in water (0.5 M) at room temperature for 15 min. Subsequently, the samples were neutralised to pH $7.0 with 0.1% formic acid in water and puried via solid phase extraction (SPE columns Strata-Xpolymeric cartridges, reversed phase). Before the sample was loaded the columns were rst conditioned with 1 mL MeOH and activated with 1 mL water. The column material was washed with 1 mL 5% MeOH and elution took place using 0.5 mL of 1% FA in MeOH. The solvent was concentrated in vacuo using an Eppendorf concentrator. For HPLC-MS analysis the samples were dissolved in ACN/H 2 O (50 : 50).

Preparation for in vivo experiments
Strains and plasmids. E. coli XL1-blue was used as general cloning host. Amycolatopsis balhimycina DSM5908 is the balhimycin producing wildtype and was used to generate the NRPS mutant A. balhimycina_DbpsC_X (this study). The inactivation plasmid pESbpsCX (this study) is a derivative of the nonreplicative vector pSP1. 41 Media and culture conditions. E. coli strains were grown in Luria broth (LB) medium at 37 C, supplemented with 100 mg mL À1 ampicillin when necessary to maintain plasmids. A. balhimycina strains were grown in R5 medium 42 at 30 C. Liquid/ solid media were supplemented with 50 mg mL À1 erythromycin to select for strains carrying integrated antibiotic resistance genes.
Preparation and manipulation of DNA. Methods for isolation and manipulation of DNA were performed as reported. 42,43 PCR fragments were isolated from agarose gels with QIAquick gel extraction kit (Qiagen, Hilden, Germany). Restriction endonucleases (NEB, Ipswich, MA, USA and Fermentas, St. Leon-Rot, Germany) were used according to their specications. PCR protocols for amplication of the fragments bpsCXle, bpsCXright PCRs were performed on a Robo Cycler Gradient 40 thermocycler from Stratagene (La Jolla, CA, USA) with the Expand High Fidelity PCR System (Roche, Grenzach-Wyhlen, Germany). For the amplication of the fragments bpsCXle and bpsCXright the following PCR conditions were used: initial denaturation (95 C for 5 min), 30 cycles of denaturation (95 C for 1 min), annealing (65 C for 2 min), and polymerisation (72 C for 2 min), an additional polymerisation step (72 C for 10 min) at the end. The primers used were as follow: for bpsCXle (2079 bp): bpsCXleP1, bpsCXleP2 and for bpsCXright (1916 bp): bpsCXrightP1, bpsCXrightP2 ( Table 2).
Construction of the inactivation plasmid pESbpsCX. pESbpsCX was constructed for the inactivation of the X domain  (Fig. 3). Direct transformation of A. balhimycina. For transformation of A. balhimycina, a modied transformation method was used as described previously. 41 "Stress" protocol. The stress treatment was essentially used as described previously. 44,45 For further fragmentation, protoplast were generated as described by Thompson et al. 45 Aer storage on ice (10 min), 100 mL of appropriate dilutions (10 À1 to 10 À4 ) were plated on R5 agar plates. Aer incubation at 30 C for 10-14 days, the colonies were used for further investigation.
Determination of balhimycin biosynthesis. Balhimycin production was determined by bioassays using Bacillus subtilis ATCC6633 as a test organism and cell-free supernatants of A. balhimycina strains grown in R5 medium.

HPLC-ESI-MS measurements
Prior to HPLC-MS analysis the extracts were concentrated and desalted by solid phase extraction. To this end, a 1 g chromabond C 18 cartridge (Macherey & Nagel, Düren, Germany) was conditioned with methanol (MeOH, 1 column volume) and H 2 O (1 column volume), aer which 2 mL of the respective extracts were applied to the column. The column was washed with H 2 O (3 column volumes) and eluted with MeOH (2 column volumes). The concentrated extracts were then dried in a Speedvac (Genevac EZ-2 MK2, Ioswich, United Kingdom), resuspended in 200 mL 50% MeOH and subjected to HPLC-ESI-MS as described below. The HPLC-MS measurements were conducted on an Exactive ESI-Orbitrap-MS (Thermo Fisher Scientic, Bremen, Germany) connected to an analytical Agilent 1200 HPLC system (Agilent, Waldbronn, Germany) equipped with a GRACE Grom-Sil120 ODS-4 HE column (50.0 Â 2.0 mm; Grace, Deereld, IL, USA). The mobile phase consisted of H 2 O as solvent A and acetonitrile as solvent B, both acidied with 0.1% formic acid. The gradient increased linearly from 5-100% solvent B over 17 min. Measurements were conducted in positive ionization mode. Data analysis was performed using the Thermo Xcalibur 2.2 soware.

Results and discussion
Reconstitution of nal GPA NRPS module encoded by the Tcp12 protein In order to study the nal condensation domain within GPA biosynthesis it was rst essential to reconstitute the activity of the nal module within the NRPS machinery -specically encoded by the protein Tcp12 in teicoplanin biosynthesis (Fig. 2). 46 This module consists of 5 domains and exhibits the C-A-PCP-X-TE architecture conserved for GPA producing NRPS systems bearing the specic P450 recruitment (X)-domain. 23,35 In order to study this module, we initially identied that overexpression in E. coli was enabled by the co-expression of the MbtH protein Tcp17 from the teicoplanin gene cluster, together with the expression of Tcp12 as an MBP fusion protein to improve protein yield. 37 Expression without an MbtH protein led to signicant degradation of the protein during expression, whilst co-expression of the other MbtH protein in the teicoplanin gene cluster (Tcp13) did not provide the same overall yield as Tcp17. Following a two-step purication protocol employing sequential Ni-affinity and gel ltration steps, the catalytic competence of the module was tested both in terms of the ability to convert the PCP from the apo to the phosphopantetheine bearing holo form and the subsequent ability of the neighbouring A-domain to select, activate and load amino acid substrates onto this PCP domain. First, reconstitution of the holo-PCP state was successfully accomplished using the promiscuous phosphopantetheinyl transferase Sfp (R4-4 mutant). 39 Subsequently, A-domain activity was tested for the natural substrate (S)-3,5-dihydroxyphenylglycine (Dpg) using a coupled enzymatic activity assay, which allows an assessment of the rate of activity of the A-domain as well as the number of Adomain cycles performed (based on the amount of PPi released, Fig. 4). 38 This assay showed that the A-domain within the nal module encoded by Tcp12 was active and able to load Dpg onto the neighbouring PCP domain within the module at a rate of 0.8-1.1 min À1 (Table 3 and Fig. 4). This rate is comparable to that seen for the only other A-domain from teicoplanin to have been characterised (1.6 min À1 for Dpg activation by NRPS module 3, encoded by the protein Tcp10), 38 and is comparable to the rates reported for other complex assembly lines (pyochelin NRPS: $2 min À1 ; 45 Pseudomonas virulence factor NRPS: 3.4 min À1 ; 47 yersiniabactin NRPS/PKS hybrid: $1.4 min À1 ; 46 6deoxyerythronolide B PKS: 1 min À1 ). 47 The observed rate of Tcp12 A-domain activity is, however, signicantly slower than the observed rate peptide cyclisation enzymes that should act subsequent to heptapeptide bond formation (each $10 min À1 ). 26 The slower rate of amino acid activationand hence peptide bond formationwould allow the production rate for linear GPA peptides to be well matched to their complete maturation (3-4 cyclisation steps) prior to the selective cleavage of the completely cyclised peptide from the NRPS through the actions of the TE domain. 48 Before utilising the Tcp12 protein for peptide bond formation assays we were concerned about the potential interference of the C-terminal thioesterase (TE) domain in C-domain assays. Whilst this domain has been shown to have a preference for  activity against completely crosslinked PCP-bound peptides, hydrolysis of linear peptide has also been demonstrated for this domain. 48 Given that such hydrolysis would not allow us to assess the possible role of peptide hydrolysis performed by the Cdomain, we designed, expressed and puried three C-terminally truncated forms of Tcp12 (Fig. 4A). These constructs either removed the minimal TE-domain (Tcp12_DTE 1 ), the extended TE-domain (Tcp12_DTE 2 ) or the complete linker-TE region beyond the X-domain (Tcp12_DTE 3 ). All proteins could be expressed and puried as for the wildtype protein, and gratifyingly the activity of the A-domain within all constructs in their apo-PCP form was comparable to that of the apo-PCP wildtype protein (Table 3). For ongoing C-domain experiments, we then selected the construct Tcp12_DTE 2 as this was the construct with the highest rate of amino acid activation. We also tested the acceptance of other phenylglycine substrates (Fig. 4B) in comparison to the natural, preferred Dpg substrate by apo-Tcp12. This showed that singly hydroxylated 4-and 3-Hpg substrates were also accepted by this A-domain, albeit at $40% and $20% of the Dpg rate respectively, whilst Phg was a poor substrate for this A-domain. 49 This result is somewhat surprising given the presence of 4-Hpg residues within GPAs (and hence the presence of this amino acid within the producer strain), although the activation of 4-Hpg does explain the presence of modied (i.e. Hpgcontaining) GPAs in producer strains in which Dpg production had been abolished. 32 This result was also useful in the context of our current study, as it would allow us to probe the effect of Adomain rate upon the production of peptides by the neighbouring C-domain once this had been reconstituted (see below).

C-domain displays broad substrate selectivity and stereochemical tolerance
With a functional, truncated Tcp12_DTE 2 protein in hand, we then turned to the characterisation of the C-domain within this construct. To this end, we synthesised 11 different peptides ( Table 4, SI1, † and Fig. 5), initially based on a range of potential hexapeptide substrates as their coenzyme A (CoA) thioesters using our reported Fmoc-based solid phase synthesis route. 36,50 The peptides conformed to the sequence of teicoplanin (1) and were designed to explore the tolerance of the C-domain for modications in the peptide structure at various positions throughout the peptide. These included peptides in which the C-terminal Tyr residue was exchanged for other amino acid residues (Phe (2), 4-CN-Phe (3)), the variable amino acid at position 3 was exchanged for the type-II GPA sequence (Phe (4), actinoidin), and/or the Tyr residues in the peptide were exchanged for chlorinated Tyr residues (5, 6) (Fig. 5). Furthermore, we synthesised hexapeptides in which the C-terminal Tyr residue was present in the non-natural D-conguration to explore the stereochemical selectivity of the C-domain (D-1, D-4), and also truncated pentapeptides (7)(8)(9) to test the effect in alterations in peptide length on peptide bond formation. At this point, we cloned, expressed and puried the PCP domain from the preceding NRPS module (module 6) as a thioredoxin (Trx)fusion protein 37 to be able to use this protein to present these peptides to the C-domain. Use of the PCP-domain proved essential for this assay, as there was no activity of the C-domain detected when isolated CoA peptides were used. Peptidyl-PCP substrates were then prepared for C-domain activity assays by loading the peptidyl-CoAs onto the apo-PCP domain using the promiscuous R4-4 Sfp mutant. 39 The C-domain activity assays were performed in triplicate, and utilised a 1 : 1 mixture of loaded peptidyl-PCP and holo-Tcp12_DTE 2 , along with Dpg and ATP to generate the required C-domain aminoacyl-PCP acceptor substrate (Fig. 5).
Initial results using the teicoplanin-like hexapeptide (1) demonstrated that the assay worked well, with more than 50% conversion into the heptapeptide determined (Table 4). This result also showed that the entire module 6 was not required to a Total yield of extended peptide is based on the percentage reduction of initial hexapeptide peak from initial starting material. The hydrolysed/PCP-bound fractions for each peptide length is determined by dividing the area for each peak by the sum of both peptide peaks. b Elongated product cleaved through the use of methylamine to cleave the PCP 7 -bound thioester. c Elongated product hydrolysed from PCP 7 domain of Tcp12 construct during the course of the reaction. d PCP 6 -bound peptide substrate cleaved with methylamine. e Starting peptide hydrolysed from PCP 6 during the course of the reaction. support peptide bond formation, thus greatly simplifying the assay. Modications of the peptide, either the C-terminal (6 th from the peptide N-terminus) residue (2-3) or variable residue 3 rd from the peptide N-terminus (4) maintained (and indeed improved) high levels of peptide formation. The chlorination state of the peptide (5-6) did not dramatically alter peptide formation in any case except for the pentapeptides, which showed signicant variability in peptide yield depending on the sequence used (7-9). The tolerance for peptide chlorination is in keeping both with the reported activity of the Oxy enzymes and the timing of GPA chlorination during peptide synthesis, 25 which has been demonstrated to occur on PCP-bound amino acids. 51 These results are in keeping with the general role ascribed to C-domains as merely stereochemical gatekeepers, with there being little need for C-domains to be highly selective for the peptide substrates themselves due to the selectivity of amino acid selection performed by A-domains. Unexpectedly, however, peptides bearing the C-terminal Tyr residue in the incorrect D-conguration (D-1, D-4) remained effective substrates for the Cdomain, with only a 20% reduction in yield in these cases. This result is certainly unusual for a domain believed to be responsible for stereochemical selection during peptide bond synthesis, although a hypothesis explaining this result can be made based on the evolutionary history of the GPA NRPS machinery. 35 Phylogenetic analysis of GPA C-domains has shown that all these C-domains cluster in the D C L C-domain clade, and hence that all these domains initially accepted peptides bearing a D-congured C-terminal residue. As the residues found in positions 3 and 6 of most GPAs are L-congured (Fig. 2), it can be anticipated that the C-domains in modules 4 and 7 must have evolved to accept peptides with an L-congured C-terminal residue. Our results from the module 7 C-domain indicate that this evolution towards acceptance of L-congured substrates has not led to the signicant loss of activity for D-congured peptide substrates. This again is attributable to the specicity of A-domains, albeit this time for L-congured residues, for D-congured residues within NRPS peptides typically require an epimerisation (E)-domain with in the module to affect this change in stereochemistry. As there is no E-domain within module 6 of modern GPA NRPS assembly lines, this means that there is no enzymatic means to generate the D-congured peptide substrate, and hence no need for the downstream C-domain to select against this substrate during Fig. 5 Condensation domain assay for the final module of the teicoplanin NRPS. Initially, peptidyl-CoA substrates prepared by solid phase peptide synthesis are loaded onto the isolated PCP domain from the preceding module using the promiscuous phosphopantetheinyl transferase Sfp (top left), after which these substrates are then added to the Tcp12_DTE 2 construct along with Dpg and ATP in order to assess the formation of Dpg-extended peptide products. Products peptides are extended by the addition of Dpg through the actions of Tcp12_DTE 2 (green box, residual starting peptide shown in the red box). All peptides can either remain PCP-bound at the end of the assay (where they are then analysed as their methylamides through the addition of methylamine) or they can be hydrolysed from the PCP. Peptide structures synthesised and used as substrates in these assays are shown in the boxed area on the right of the figure (1-9, D-1, D-4). NRPS domain descriptions: C, condensation; A, adenylation; PCP, peptidyl carrier protein; X, P450 (Oxy) recruitment.
synthesis. This is an important result, for it suggests that the evolutionary history of C-domains within modern NRPS clusters can have important and unexpected effects on their stereochemical selectivity.

A-domain rate is coupled to the efficiency of peptide bond formation in neighbouring C-domains
With an understanding of the specicity of the C-domain for peptidyl-PCP donor substrates, we then turned to investigate the effect of utilising different aminoacyl-PCP acceptor substrates, specically 4-Hpg (Table 5). We were particularly interested in this residue as our initial A-domain characterisation efforts had showed that this residue was accepted at a reduced rate compared to the natural Dpg substrate (Fig. 4), and we wanted to utilise this reduction in rate to explore the potential coupling between the rate of downstream A-domains with upstream C-domain activity. Given that the A-domain activation cycle has been demonstrated to play a major role in the positioning of the neighbouring PCP domain relative to upstream or downstream domains, 52,53 we hypothesised that a reduction in the rate of this A-domain cycle could cause deleterious effects on hydrolysis of the upstream donor peptide due to it being bound to the C-domain in the absence of aminoacyl-PCP acceptor. We therefore tested this hypothesis and compared the levels of heptapeptide produced as well as hexapeptide hydrolysed in our assay using either Dpg or 4-Hpg as acceptor substrates (Table 5, Fig. 6). a Total yield of extended peptide is based on the percentage reduction of initial hexapeptide peak from initial starting material. b Sum of elongated products either cleaved through the use of methylamine to cleave the PCP-bound thioester hydrolysed from the PCP 7 domain of the Tcp12 construct during the course of the reaction. c PCP 6 -bound peptide substrate cleaved with methylamine. d Starting peptide hydrolysed from PCP 6 during the course of the reaction. e Reaction included co-incubation with OxyB bal enzyme, so these values also include a very small proportion of monocyclic peptide starting material (<5%). Our results showed that there was a signicant reduction in heptapeptide produced when using 4-Hpg displaying reduced A-domain activation rate (33%) as compared to assays containing Dpg (68%), which closely matches the reduction in rate for the A-domain (2.5Â reduction in rate, 2.1Â decrease in peptide formation) ( Table 5, Fig. 6). Furthermore, the reduction in heptapeptide production is due to a signicant increase in the hydrolysis of the hexapeptide in the 4-Hpg containing assays (58% vs. 12%). This supports the hypothesis that interrupting the coupling of C-domain and A-domain activity can cause a signicant reduction in effective peptide production by such NRPS systems due to hydrolysis of C-domain bound peptides. We tested exclusion of an amino acid acceptor substrate from our C-domain assays and demonstrated that there was signicant hydrolysis of the hexapeptide donor substrates in this case (72%) that was signicantly above that of background peptide hydrolysis (8%) in the absence of the C-domain. This result further supports the hypothesis that a decoupling of A-domain activity from the downstream C-domain leads to hydrolysis of the peptide by the C-domain in these cases through hydrolysis (Table 5 and Fig. 6). These results help to explain the results of NRPS A-domain modication experiments in vivo, which have shown that such modied assembly lines can produce signicant amounts of truncated peptide immediately prior to incorporation of the modied amino acid residue. 17,18 Rather than this being ascribed to the effects of C-domain selectivity for the modied peptide (which our results have shown to be rather exible), our hypothesis would instead suggest that peptide hydrolysis is a result of the slow formation and hence delivery of the aminoacyl-PCP acceptor substrate in these cases, which is caused by the introduction of a modied A-domain with a slower amino acid activation rate than the original A-domain. This strongly argues for the need to test the properties of such modied constructs in vitro prior to engaging in in vivo NRPS redesign, which can have unintended deleterious consequences for NRPS efficiency if the rates of activity of modied A-domain domains are signicantly slower than those present in the wildtype system. Studies have noted that the substrate selectivity of A-domains observed in vitro can be altered by the presence or absence of the adjacent C-domain: 54,55 our results now indicate that C-domain activity is closely coupled to that of the A-domain, which more than ever speaks to the need to characterise complete NRPS modules to truly assess their selectivity and function.
Relationship between peptide bond formation and the Xdomain mediated P450-cyclisation cascade: the timing of peptide cyclisation GPA biosynthesis requires the essential, late stage modication of the peptide by cytochrome P450 enzymes to introduce crosslinks between the side chains of specic amino acids within the NRPS-bound peptide (Fig. 2). 21 Whilst the X-domain present in the nal module of all GPA-producing NRPS machineries has been implicated in recruitment of these P450 enzymes, the exact time of the cyclisation events within GPA biosynthesis are somewhat unclear. 23 Given that all crosslinks prior to the nal AB ring, catalysed by OxyC, can theoretically be installed at the hexapeptide stage and that such species had been identied from in vivo experiments investigating GPA biosynthesis in A. balhimycina and Streptomyces toyocaensis, [29][30][31][32][33][34] we wanted to explore the cyclisation cascade in context of the nal peptide bond formation step to clarify the exact timing of the GPA cyclisation cascade. To this end, we turned to the balhimycin producer A. balhimycina, the most widely studied GPA assembly line in vivo due to it being the sole GPA producer that was able to be manipulated for many years. [32][33][34]41 We rst created two modied GPA producer strains in which either the C-domain or the X-domain from the nal NRPS module were Fig. 7 Analysis of the roles of the C-and X-domains within the final NRPS module from balhimycin biosynthesis as assessed through the isolation and analysis of the peptide products formed by the resultant mutant producer strains in which these domains had been removed. Results show that the initial cyclisation step performed by OxyB can occur on the peptide at the hexapeptide stage, which raises the question of cyclisation vs. heptapeptide formation during GPA biosynthesis. NRPS domain descriptions: C, condensation; A, adenylation; PCP, peptidyl carrier protein; X, P450 (Oxy) recruitment; TE, thioesterase. deleted (Fig. 3) and searched for any evidence that cyclisation could occur at the hexapeptide state (Fig. 7). Analysis of the culture ltrates from the C-domain deletion strain showed the absence of heptapeptides and the presence of both linear and monocyclic hexapeptides, supporting the ability of NRPShexapeptides to be modied by the GPA cyclisation cascade ( Fig. SI1 and SI2 †). Analysis of the X-domain deletion strain showed similar hexapeptide results as the C-domain deletion strain, although the presence of linear and monocyclic heptapeptides was now also detected due to the ability of this NRPS to elongate hexapeptides (Fig. 7). All peptides detected contained a Cl-Tyr 2 residue and hexapeptide and heptapeptide species also containing a Cl-Tyr 6 residue, which is in keeping with GPA chlorination occurring during peptide synthesis on specic PCP-bound amino acid residues. 51 The presence of monocyclic peptides in these in vivo studies raised questions about the timing and substrate specicity of the OxyB bal enzyme during peptide synthesis. Whilst in vitro activity assays have shown that all OxyB homologues tested to date display the highest level of activity on substrates where the X-domain is present, some OxyB homologues also display reasonable activity against PCP-bound peptide substrates in the absence of the X-domain (including OxyB bal and OxyB van ). 23,26,36,[56][57][58][59][60][61] As the peptide synthesis machinery is likely to be stalled because of the modied NRPS assembly line in these mutant strains, this would provide an opportunity for relatively slow processes (such as OxyB activity against peptides bound to PCP-domains without a neighbouring X-domain) to occur that are not typically involved in the peptide synthesis process. The lack of production of bicyclic peptides by these mutant strainswhich is theoretically possible following OxyB activitymatches recent data from in vitro activity assays that show a strict dependence on the presence of the X-domain for the activity of the bicyclisation enzyme OxyA. Given this, we concentrated on understanding the timing of the initial peptide cyclisation step performed by OxyB. In order to explore whether the appearance of monocyclic hexapeptides was an on-pathway process or rather was being caused by the stalling of the NRPS machinery in modied producer strains, we then performed several in vitro experiments to characterise the relative acceptance of the C-domain from the nal GPA NRPS module for linear and monocyclic peptides.
We rst conrmed reported results that OxyB-catalysed peptide cyclisation activity in vitro (Fig. 8A) was signicantly reduced for PCP substrates alone as compared to PCP-X didomain substrates (Table 6) 23,56,57,60 even when using the OxyB homologue from balhimycin activity that has high levels of reported activity using PCP-bound hexapeptides as substrates. We then showed that the hydrolysis of linear hexapeptides in the presence of the truncated Tcp12_DTE 2 construct was much faster than competitive OxyB bal activity against the PCP-bound linear hexapeptide (see Table 5, entry 3). However, to fully test the ability of the C-domain to accept monocyclic peptide substrates, we pre-incubated OxyB van with the PCP-bound hexapeptide 4 to generate signicant quantities of the PCP-bound monocyclic hexapeptide (Mono-4, $75%); it should be noted, however, that this cyclisation activity is signicantly slower and delivers lower nal yields that when the X-domain is also present, which is in keeping with the importance of the Xdomain for Oxy recruitment. [56][57][58][59][60][61] We then included Mono-4-PCP 6 into our established C-domain activity assay and could show that whilst PCP-loaded Mono-4 is able to be converted into the monocyclic heptapeptide by the C-domain, this is a very slow process (only $20% complete aer 3 hours). In comparison, we determined the relative rates of heptapeptide formation for both comparable linear L-and D-congured hexapeptides (L-4 and D-4) and showed that the PCP-bound monocyclic peptide Mono-4 was $3 orders of magnitude slower than the PCP-bound L-congured linear peptide and $2 orders of magnitude slower than the PCP-bound D-congured linear peptide (Fig. 8B). Given this dramatic difference in C-domain activity between the linear and monocyclic peptides, our results strongly suggest that all GPA crosslinking in a complete NRPS assembly line occurs on the nal NRPS module and is mediated by the X-domain. The presence of monocyclic peptides in modied GPA producer strains can be explained by the ability of OxyB bal to cyclise PCP-bound peptides in the absence of a neighbouring X-domain at a much lower rate than when the X-domain is present: given that the GPA-producing NRPS machinery is effectively stalled in the modied producer strains, the products of these slow reactions now become visible. These results greatly help with the interpretation of results from in vivo experiments using modied producer strains, which oen display unexpected modied intermediates (in this case, cyclised hexapeptides). Our in vitro assays show that the detection of such intermediates can occur as a result of slow processes that in a fully functional NRPS system are unable to effectively compete with the on-pathway peptide synthesis process. Such possibilities must therefore be kept in account therefore when interpreting the results of in vivo experiments that affect the NRPS assembly process.

Conclusions
Condensation domains are essential for non-ribosomal peptide synthesis and the identied diversity of function of these domains is rapidly increasing. 3,5,8 Given the central role of these domains with NRPS synthesis, it is essential that we understand the selectivity and interplay of these domains in order to gain a complete overview of NRPS assembly lines and as a prequel to successful bioengineering to produce novel NRPS products. In this study, we have characterised the C-domain from the nal NRPS module of GPA biosynthesis due (i) to its pivotal role in heptapeptide assembly prior to peptide cyclisation, (ii) the unusual evolutionary origins of this domain and (iii) the general lack of characterisation of C-domains acting late within NRPS assembly lines in order to address the effects of peptide structure, stereochemistry and crosslinking on peptide bond formation. Our results show that this C-domain is tolerant of changes to the amino acids contained within the peptide and that this domain is also able to accept both L-and D-congured peptides with regards to their C-terminal residue. This result serves to illustrate that the typical expectation of a C-domain to be a stereochemical gatekeeper in isolation during NRPSmediated synthesis is not always correct and that such selectivity also clearly depends on the presence or absence of a neighbouring E-domain. The rate of acceptance of crosslinked peptides by this C-domain is signicantly slower than for the corresponding linear peptides, which reinforces the role of the unique X-domain in the nal NRPS module as the site of all crosslinking during GPA biosynthesis. Finally, we have been able to demonstrate that C-domain mediated peptide bond formation is closely linked to the rate of amino acid activation performed by the downstream A-domain, with a reduction in Adomain rate leading to a concomitant increase in hydrolysis of the neighbouring C-domain donor peptide substrate. This is an important result in the context of potential enzymatic redesign for such NRPS systems, as it underlines the importance of maintaining the overall rate of peptide synthesis in order to prevent unwanted peptide hydrolysis due to a loss of productive coupling of A-and C-domain activity. Overall, our results strongly suggest that the characterisation of complete NRPS modulescombining the analysis of A-domain and C-domain selectivity as well as coupled peptide bond formationis essential if we are to understand and in future successfully redesign the function of these complex enzymatic assembly lines.