The many faces and important roles of protein – protein interactions during non-ribosomal peptide synthesis

Non-ribosomal peptide synthetase (NRPS) machineries are complex, multi-domain proteins that are responsible for the biosynthesis of many important, peptide-derived compounds. By decoupling peptide synthesis from the ribosome, NRPS assembly lines are able to access a signi ﬁ cant pool of amino acid monomers for peptide synthesis. This is combined with a modular protein architecture that allows for great variation in stereochemistry, peptide length, cyclisation state and further modi ﬁ cations. The architecture of NRPS assembly lines relies upon a repetitive set of catalytic domains, which are organised into modules responsible for amino acid incorporation. Central to NRPS-mediated biosynthesis is the carrier protein (CP) domain, to which all intermediates following initial monomer activation are bound during peptide synthesis up until the ﬁ nal handover to the thioesterase domain that cleaves the mature peptide from the NRPS. This mechanism makes understanding the protein – protein interactions that occur between di ﬀ erent NRPS domains during peptide biosynthesis of crucial importance to understanding overall NRPS function. This endeavour is also highly challenging due to the inherent ﬂ exibility and dynamics of NRPS systems. In this review, we present the current state of understanding of the protein – protein interactions that govern NRPS-mediated biosynthesis, with a focus on insights gained from structural studies relating to CP domain interactions within these impressive peptide assembly lines.


Introduction
Non-ribosomal peptide synthetases (NRPSs) are ribosomally independent macromolecular machineries involved in the production of many different classes of peptide-derived natural products. 1 The utility of NRPS machineries for producing bioactive peptides stems from two main sources -rstly, the ability to select a wide range of monomers for use in peptide synthesis, and secondly the signicant modications able to be performed to the peptide by a range of catalytic domains found within a specic NRPS. Unlike ribosomes that strictly use a dened set of amino acids to produce peptides and proteins, the NRPS system has evolved to become extremely versatile regarding the substrate monomers that it accepts. Indeed, it has been reported that more than 500 different substrates are accepted by NRPS assembly lines. 2 In addition to the 20 wellknown L-a-amino acids incorporated in proteins, these building blocks include a broad range of non-proteinogenic aand b-amino acids 3 (including examples such as phenylglycines, cyclic guanidines, alkene-, alkyne-, halo-, hydroxyl-or cyclopropyl-containing amino acids) 2 and extend even to homologated amino acids or monomers derived from aminobenzoic acid residues: 4 this variety of potential substrates leads to a vast number of conceivable nal peptide products. Together with the related polyketide synthase (PKS) systems, megasynthase machineries represent one of the best sources of biologically active compounds for exploitation in medicine. Along with the classic example of peptide production by ACV synthase during penicillin biosynthesis, 5 other NRPSassembled peptides have been found to play diverse roles, with examples including those that act as siderophores, antibiotics (including clinically relevant examples like the glycopeptide antibiotics (GPAs) and daptomycin), cytostatic agents or as regulators in bacterial quorum sensing. 6,7 The structure of an NRPS plays a vital role in its function. A typical linear NRPS machinery is composed of multiple modules (100-200 kDa, depending on domain composition), with each module in the assembly line responsible for the incorporation of one amino acid monomer into the nal peptide (Fig. 1). Differing architectures to linear systems include iterative and non-linear NRPS systems: iterative systems possess a limited number of modules with the nal peptide being the result of several copies of a shorter peptide fragment, whilst non-linear systems are more complex and less easily dened in terms of traditional modules. 1 Whilst both iterative and non-linear NRPS machineries are intriguing from a mechanistic standpoint, the vast majority of medicinally important NRPS-products are produced by linear NRPS machineries. Each module within a linear NRPS can be further described as a series of at least three different domains: the adenylation domain (or A-domain) is responsible for the selection and the activation of amino acids; the peptidyl carrier protein domain (PCP, also known as a thiolation domain) whose role is to shuttle substrates between different domains; and the condensation domain (or C-domain) that catalyses peptide bond formation between PCP bound amino acid monomers and peptides (or amino acids for the initial NRPS module). In addition to these three essential domains, a module can harbour additional domains: examples include epimerisation domains (E-domains), S-adenosylmethionine (SAM)-dependent methyltransferase domains or formylation domains, 8 adding yet further potential diversity to the structure of the nal NRPS peptide product. The nal module of most NRPS machineries contains a terminal thioesterase domain (or TE-domain) that exerts its catalytic activity in releasing the nal PCP-bound peptide: this process can also serve to introduce yet further diversity into the nal peptide structure, with one common example the cyclisation of a linear peptide. NRPS assembly lines range from a single module as in the Pseudomonas pyreudione synthetase 9 up to 18 modules in length and can be encoded by one or more genes. The two longest NRPS machineries encoded in a single gene are kolossin A synthase from Photorhabdus luminescens, an impressive 15 module-long NRPS enzyme, 10 and the peptaibol synthetase from Trichoderma virens with 18 modules. 11 Although NRPSs are mostly found in bacteria and fungi and where the product peptides provide a tness advantage to the producing host, some rare examples of NRPS-like enzymes do exist in higher eukaryotes. However, those that have been identied display altered architectures when compared to standard NRPS machineries, since such machineries are typically composed of an A-domain, PCP-domain and a dedicated Cterminal catalytic domain that is specic to each NRPS. 12 Examples of those eukaryotic NRPS are the 2-aminoadipic 6semialdehyde dehydrogenases (AASDH) involved in lysine metabolism 13 and the carcinine and b-alanyl-dopamine synthetase Ebony involved in neurotransmitter recycling and cuticle sclerotisation in insects. 14 Given that some of our most critical medicines such as lastresort antibiotics (e.g. vancomycin, daptomycin), anti-tumour drugs (bleomycin, cryptophycin) or immune modulatory compounds (cyclosporine) are NRPS-produced and remain tied to biosynthesis for their production, great interest exists in understanding their biosynthesis. 15 Furthermore, the potential to produce new derivatives of such compounds in the future  into its holo-form by transferring a PPant-moiety onto the conserved serine of the PCP (CP). (B) Adenylation of amino acid monomers are catalysed by A-domains: following selection and adenylation using ATP, the activated monomer is then loaded onto the PPant moiety of the CP via a thiolation reaction. (C) Condensation domains catalyse peptide bond formation during NRPS mediated peptide synthesis and possess two CP binding sites, known as the donor and acceptor sites. When both sites are occupied by their cognate loaded CP, C-domains are then able to catalyse peptide bond formation between the two CP-bound substrates. Epimerisation (E)-domains are often found in NRPS assembly lines and requires effective assembly line reengineering. Because of this, signicant efforts have been made in order to understand how NRPS machineries function and the overall structure of these peptide assembly lines. 16 Structural data are available for examples of each catalytic domain from standard NRPS machineriesthese have been obtained either by NMR, X-ray crystallography or a combination of these techniques. Despite the wealth of information derived from structures of isolated domains, understanding NRPS synthesis as a whole requires imaging domain-domain interactions, module-module interactions and even module-module interactions between modules separated across different proteins. To facilitate the latter, it has been proposed that multi-chain NRPS machineries interact through "communication/docking domains": [17][18][19][20] although limited data is available concerning these domains, there is evidence that such interactions can be mediated through a hand/helix interaction. 21 Of all domains present in NRPS assembly lines, PCPs are the ones involved in the most numerous interactions. As a domain involved in substrate shuttling, PCPs need to efficiently interact with catalytic domains from both upstream and downstream modules to ensure the effective transfer of amino acids and peptides along the NRPS machinery. Given the essential role of PCPs in all NRPS-processes, this review will summarise the structural data available for NRPS systems with a focus on PCPs and their interaction with other domains of the NRPS assembly line.

PCPthe central domain in NRPS mediated synthesis
Carrier protein domains are essential in many processes found in both primary and secondary metabolism. 22 In NRPS systems, PCPs play a central role in shuttling amino acids and peptides between different catalytic domains and can also serve as a platform to present the amino acid or peptide chain to intrans modication enzymes, such as halogenases, transferases or monooxygenase enzymes (Fig. 1E). Typical NRPS assembly lines contain one PCP per module, meaning there are the same number of PCP domains as monomers in the nal peptide product of the assembly line. Before they can function in this role, PCPs rst require activation by post-translational modication: the inactive apo-form of the PCP is the substrate of a phosphopantetheinyl transferase (PPTase) that catalyses the transfer of a phosphopantetheine moiety (PPant) onto an invariant serine residue located at the start of PCP helix-2 (Fig. 1A). This PPant moiety is derived from coenzyme A (CoASH) and terminates with a reactive thiol group: peptides and amino acids can then be temporarily loaded onto the PCP via this arm in the form of a reactive thioester, which also acts as a exible arm to allow these tethered substrates to reach the catalytic sites within NRPS domains.
Whilst the exact timing of phosphopantetheine modication is not yet known, the highly similar structures of the unmodied apo-form and phosphopantetheine modied holo-form (vide infra) suggest that this modication does not need to be performed co-translationally in order for successful NRPS protein synthesis by the ribosome. PCPs are related to the acyl carrier protein (ACP) domains found in megaenzyme synthases such as fatty acyl synthases and the polyketide synthases, which also adopt a similar fold and require post-translational addition of a phosphopantetheine moiety. One important difference between ACPs and most PCPs is the sequestration of acyl chains within the core of ACPs, which is not observed for PCP-bound substrates (some examples of bound substrate interaction with the PCP have been identied, although these appear to be more on the surface of the PCP). 23,24 Substrate sequestration can also lead to minor alterations in ACP structure due to the perturbation of the helical core of the ACP by the substrate. 25 A further consequence of ACP sequestration of substrates is that this requires interacting domains to invoke a chain ipping mechanism in order to allow the bound substrates to engage with these interacting domains, 26 which is not required for PCPs during NRPS biosynthesis. 27 Due to their small size (typically smaller than 100 amino acids, 10 kDa) the fold of isolated PCPs has mainly been studied by solution-state NMR, 24,28-30 although four structures are also available from X-ray diffraction experiments (ref. [31][32][33] and unpublished structure 4HKG). These studies have revealed that the PCP folds as a four helix-bundle, with the two Nterminal helices the longest and the remaining helices typically shorter in length (see Fig. 2). All four helices are mostly amphipathic and the hydrophobic sides serve as the interaction surface holding the bundle together. Helices 1 and 2 mostly run parallel to one another and form the back of the domain whilst the shorter helices 3 and 4 pack against the two N-terminal helices. It is also apparent that if helix 4 aligns in a parallel fashion to helices 1 and 2, helix 3 then adopts a perpendicular orientation in regards to the other helices. A long loop connects helices 1 and 2, with the post-translationally modied serine located at the start of helix 2. 34 Although not visible on the NMR structures, all crystal structures of PCPs obtained so far (mainly as multidomain structures, vide infra) contain a small additional helix (labelled 1 0 in Fig. 2) between helices 1 and 2. In early studies, Koglin et al. 30 reported that PCPs adopted different conformations depending on the assembly line catalytic state. As indicated previously, PCPs rst need to be "activated" by attachment of a PPant cofactor; it was postulated therefore that in order to reach the different domains with which it needs to interact (i.e. adenylation-, condensation-, thioesterase-domains), the PCP would change conformation depending on its loaded state. This would mean that the PCP would itself "drive" the synthesis machinery by actively shuttling substrates from one catalytic domain to the other. However, this model was rst challenged with the crystal structure of BlmI PCP 33 and in the light of many more recent studies it is now clear that the PCP can in fact be thought of as a rather rigid domain, with only slight differences between the different catalytic states of the NRPS machinery. This can be seen from a superposition of all available PCP structures deposited into the Protein Data Bank (PDB) in which the PCP has been studied in isolation: here, it is clear thatirrespective of its modication state (apo or holo)the fold of the PCP remains consistent. To explore whether the PCP undergoes changes in structure during NRPS synthesis (especially considering new multi-domain and full module NRPS structures), a structural alignment of the PCPs from 18 structures was performed. This pool of structures (2VSQ, 35 Fig. 2D and E) represents the conformation of PCPs while interacting with different domains -specically isochorismate lyase, condensation, thioesterase, adenylation and epimerisation domains. PCPs included in the structural alignment readily superimposed and even the two most dissimilar domains (4XZ1 and 2VSQ) still aligned with a relatively low RMSD value (3.1Å; calculated using the structure comparison server DALI 48 ). Even though these two PCPs belong to different organisms and are involved in different interactions (4XZ1: thioesterforming state and 2VSQ: condensation state), the structural differences observed in these two cases (centred on helices 1 and 4) are most probably the result of the upstream/downstream linker regions (Fig. 2E). Thus, all available structural evidence to date indicates that PCPs do not undergoing major structural changes during interactions with either their cognate PPant transferase or catalytic NRPS domains. Rather, the motion of PCPs during NRPS-mediated synthesis appears linked to the different states adopted by the adenylation domains (vide infra), which in turn alters the positioning of PCPs due to their close attachment to the mobile subdomain of the A-domain (A sub ).
Because PCPs play the role of substrate shuttles during peptide synthesis, they need to interact with catalytic domains from both upstream and downstream modules. Owing to this requirement, it could be anticipated that PCPs would not be very specic when it comes to binding other NRPS domains. However, studies from the Ackerley group have cast doubt on such assertions, as their results indicate that PCPs normally found to interact with a condensation domain are unable to interact effectively with a thioesterase (TE) domain in an reengineered NRPS machinery. 49 Thus, in order to understand the grounds for the specicity of PCPs, we will now discuss the interaction network between PCPs and other interacting domains based on the structural evidence currently available. To assist the reader, a table of all structures mentioned in this review is included in the text (Table 1).

Activation of the PCP-domain through interaction with a PPTase
The PPTase superfamily was discovered in 1996 50 and Group II PPTases are involved in the activation of NRPS PCPs (Fig. 3). In contrast to type I and III PPTases, group II PPTases do not oligomerise, but rather are expressed as fused pseudo-dimers (Fig. 3C). 51 The rst structure of a PPTase in complex with a PCP was solved with the PPTase Sfp, which is required for the activation of the PCP-domains of the surfactin synthetase in B. subtilis. 52 In this structure, solved initially by NMR and then later conrmed by X-ray crystallography, Sfp adoptsas expecteda pseudo-dimeric fold where each "monomer" is comprised of a 3-stranded beta-sheet core anked by three alpha helices. A small additional 2-stranded beta-sheet then covers each "monomer". Due to the promiscuity of Sfp which needs to interact and activate many different PCPs in the surfactin NRPS, the binding interface between the two proteins does not involve many residues. However, two clear interaction sites have been localised, including one hydrogen bond between PCP backbone carbonyl oxygen of Gln40 and the amide hydrogen from Sfp Tyr36 (Fig. 3E); the rest of the interaction network relies on hydrophobic residues. Sfp displays a small hydrophobic cavity made of residues from the rst helix and the preceding loop as well as the third helix and the subsequent loop. In the complex structures, the side chains of Leu46 and Met49 from helix-2 of the PCP occupy this cavity (Fig. 3D). Of interest, multiple sequence alignment of both type-II PPTases and NRPS PCP-domains shows that these hydrophobic positions are widely conserved. 53 For these reasons, it is hypothesized that this recognition pattern is widely spread in all NRPS systems, which also aids in explaining the general utility of Sfp for the modication of NRPS PCPs from multiple systems. 53

Function of adenylation domains
NRPS adenylation domains (or A-domains) belong to a large family of adenylate-forming enzymes (ANL superfamily, class 1a; 500 residues, 55 kDa) and are key domains within an NRPS assembly line as they are essential for the selection and activation of monomer units for incorporation into the growing peptide. 54,55 In a multi-step process, the A-domain rst selects the amino acid to be added to the growing peptide chain, then activates this residue via adenylation (ATP + amino acid / aminoacyl adenylate + PPi, Fig. 1B) to make it a competent substrate for the last stepthe transfer of the amino acid to the thiol group of the PPant prosthetic group of the PCP. The rst step includes the specic recognition of the cognate amino acid by the catalytic pocket of the A-domain. Although typically specic for one substrate, some A-domains have been reported to select several amino acids: examples include the rst Adomain of the nostopeptolide A hybrid PKS-NRPS machinery that can bind and activate three branched hydrophobic residues: isoleucine, leucine and valine. 56 Another example is the rst A-domain of the anabaenopeptin synthetase from Planktothrix agardhii (ApnA A1) that has been shown to activate structurally different amino acids: arginine and tyrosine. 57 In this case, X-ray crystallographic studies have revealed that the source of the dual activity is the fact that the arginine residue adopts a conformation that mimics that of a tyrosine within the ApnA A1 catalytic pocket. The specicity of these A-domains for several amino acids could offer great advantages to the source organism since different peptides are produced from only one assembly line, although in both cases the use of proteinogenic amino acids would limit the ability of the organism to select for one amino acid over another.
The amazing diversity of non-ribosomal peptides is to a large extent due to the signicant diversity of molecules recognised by A-domains. In 1997 the structure of the phenylalanineactivating A-domain PheA from gramicidin synthetase 1 was solved, 58 which led to the identication of a specicity code allowing the prediction of the monomer an A-domain would accept as a substrate. 59 The renement of this code and its utility led to the emergence of valuable web-based programs capable of predicting the substrate that an A-domain will accept (see websites NRPSpredictor2 (ref. 60) or NRPS/PKS substrate predictor 61 ). Additionally, the interactions between A-domains and other NRPS domains, such as condensation (C) domains, have recently been shown to alter the selectivity of certain Adomainsthis not only indicates the importance of proteinprotein interactions in NRPS machineries but also implies that NRPS activity should be assessed in complete modules where possible. 62,63 In addition to the great diversity of possible substrates, some A-domainsreferred to as "interrupted A-domains"harbour insertions of modication domains (called auxiliary domains) involved in the alteration of the selected substrate. Auxiliary domains typically exhibit methyltransferase, ketoreductase, oxidase or monooxygenase activities. † Signicant research has been performed on auxiliary domains and it has been shown that it is possible to generate a bi-functional A-domain by inserting the sequence of an auxiliary domain in a standard Adomain. 65 Similarly, it is possible to generate an uninterrupted A-domain from an interrupted one by deleting the sequence coding for the auxiliary domain. 65,66 The potential for engineering specic gain/loss of function within A-domains through the use of auxiliary domains is of great value for reengineering these domains, helping pave the way to generating novel compounds through simple modications to NRPS machineries. The recent structure of TioS, a natural A-domain interrupted by a methyltransferase auxiliary domain, is the rst example of such architecture. In this structure, the adenylation catalytic site is located 60Å away from the methylation site, which again highlights the crucial role of PCP-bound peptide shuttling in NRPS systems. 67 A-domains are oen found to form an essential complex with small proteins (c.a. 70 residues), named MbtH-like proteins (MLPs) due to their discovery in M. tuberculosis. MLPs have a consensus sequence identied as NXEXQXSXWPX 5 -PXGWX 13 LX 7 WTDXRP 68 and share a conserved, relatively at fold consisting of a core central beta-sheet with three strands covered by a single alpha-helix; some MLPs also possess an extra C-terminal alpha-helix as seen in the rst crystal structure of PA2412. 69 The role of MLPs is still not fully understood since they have proven to be essential for the activity or solubility of some A-domains 69,70 and totally dispensable for others. 71

Structure of adenylation domains
A number of crystal structures of A-domains have been deposited in recent years, which indicate that A-domains share a consistent fold similar to that of rey luciferase that also catalyses a similar adenylation reaction. 58,72 The A-domain fold comprises two distinct domains: a larger N-terminal domain (400 residues), also known as A core and a smaller C-terminal domain (100 residues), known as A sub (Fig. 4B) linked together through a small "hinge" loop. 40,73 Although the large N-terminal core domain is relatively well constrained, the smaller C-terminal domain has been shown to rotate substantially relative to the A core . Crystal structures of A-domains in different catalytic states have revealed what is commonly referred to as the "A-domain cycle", or "domain alternation". Adenylation domains have been shown to exist in at least two different catalytically relevant states, 40,46,73,74 with in solution studies conrming this general mechanism whilst also indicating the potential for different catalytic states to exist in mixed conformations as the reaction progresses. 75 46 Although structures of Adomains in different catalytic states were available before, Reimer et al. provided snapshots of these catalytic states from the same machinery -LgrA (Fig. 4B) In addition to providing data on the A-domain catalytic cycle, these structures also show how the newly activated amino acid is subsequently passed to a tailoring domain (formylation/F-domain) that adds a formyl group to the amino acid N-terminus via the movement of the PCP. Of further interest, the A core -and F-domain adopt a very similar conformation in all four structures reported for LgrA. These two domains possess an interaction surface of around 830Å 2 (similar in area to the A/C domain interaction (see below)), which even if not very extensive appears to be sufficient to maintain a constant relative orientation throughout the catalytic cycle of the A-domain. The interaction between the two domains is mostly hydrophobic in nature: specic interactions of note include Phe172 from the formylation domain docking within a hydrophobic cle in the A-domain formed by Leu516 and Leu520, with other important interactions involving Leu522-Leu187 (residues are from A-domain to F-domain respectively) and Leu184 docking into a hydrophobic cavity onto the F-domain formed by Phe87 and Trp88 ( Fig. 4B and D). Although the A core -F interaction stays consistent throughout the catalytic cycle, the position of A sub, is highly variable. The A sub domain can be seen as a lid above the A core , controlling access of the substrates and downstream PCP-domain to the catalytic site. At the beginning of the A-domain catalytic cycle, A sub is located above A core leading to an "open" conformation ( Fig. 4B). In this conformation, the active site is accessible to the substrates (amino acid, ATP, Mg 2+ ). Upon substrate binding, the A sub plays a role as a lid and closes access to the catalytic site, hence forming the closed-state (or adenylate-forming state) by rotating by approximately 30 degrees. In doing so, the nal loop of A sub moves deeper into the A core and positions a conserved, essential lysine residue (Lys672 in LgrA) in close proximity to the catalytic site. This lysine residue not only serves to stabilise both the amino acid substrate and ATP but it also stabilises the highly negatively charged reaction intermediate, making it a key component of the active site. 76 Completion of the adenylation reaction then triggers the rotation of the A sub domain by around 140 degrees. This considerable rotation allows the release of PPi and drives the PCP-domain to dock onto A core , thus forming the thiolation state ( Fig. 4B and E) (the interaction between the PCP-domain and the A-domain will be detailed in the next section). In the thiolation state, the prosthetic PPant group attached to the PCP-domain is then loaded with the amino acid via a thiolation reaction. Aer this reaction has been completed, the PCP needs to shuttle its cargo to the formylation domain rst, covering a distance of roughly 60Å and a further rotation of 75 degrees. This is made possible due to the motion of the A sub domain and the effect this has on the neighbouring, linked PCP. With this step, the cycle can reset at step one with the A sub domain in the open conformation (ready for substrates to bind). Of further signicance is the structure of AB3404, 40 in which a termination module is found in the condensation state (holo-PCP bound to the C-domain) whilst the A-domain is present in the adenylate-forming state. This shows how NRPS machineries have evolved to be an efficient, coupled system in which two catalytic domains are active at the same time within the same module. The residues composing the hinge between A core and A sub are therefore of extreme importance for the proper continuing of the catalytic cycle: for example, mutation of a hinge residue (invariantly an aspartic acid or a lysine in the linker sequence between the A core and the A sub ) into a proline residue was sufficient to constrain the A-domain in the

Adenylation domain interactions with PCPs
The transfer of the adenylated substrate onto the PCP needs to be efficiently controlled as the accidental release of such highly reactive intermediates could lead to non-specic protein modication. To avoid such an event, the A-domain needs to interact with the PCP-domain in a conguration that allows the thiolation reaction to occur. Many structures have been solved with an A-domain in a thiolation state with a bound PCP. Isolation and characterisation of such complexes has been enabled by the use of chemical probes that trap the PPant arm of the PCP in the A-domain active site, thus delivering A/PCP complexes for structural characterisation. 78 The general binding mode in A/PCP complexes involves PCP helix 2, which interacts in a parallel fashion with helix 11 of the A core via hydrophobic as well as ionic interactions ((Ar-CP) EntB/ EntE (4IZ6, 3RG2), 38,39 A-PCP in PA1221 (4DG9), 44 EntF A-PCP interaction (5JA1 and 5T3D)). 40 In addition "loop 1" between helices 1 and 2 of the PCP forms a network of charged interactions with the last structural motif (loop + strand) of the A sub domain. Mutation analyses conrmed the importance of these regions for the activity of EntB/EntE and assembly lines involved in the formation of pyoluteorin/prodiginine. Indeed, NRPS activity is reduced when point mutations are introduced in the loop 1 motif. 40,79 Whilst the PA1221 PCP forms many interactions with its cognate A domain (Fig. 4A), the number of interactions reported in the case of LgrA in thiolation state is much lower. Specically, Gln734 from PCP helix 2 forms a hydrogen bond with A-domain Gln447 and PCP helix 3 Tyr748 engages into hydrophobic interactions with Tyr421 (Fig. 4E). The analysis of the structure of LgrA in formylation state also reveals that the PCP uses the same Tyr748 from helix 3 to bind a small hydrophobic region on the F-domain composed of Leu127 and Met178 (Fig. 4C). In addition, the structure shows that the A sub participates in positioning the PCP for peptide formylation by creating an electrostatic surface composed of Asn648 and Asp652 allowing interaction with PCP helix 4 Arg residues 758 and 762 (Fig. 4C). It is important to mention here that in contrast to interactions described with any other domain, PCP helix 2 is not involved in any direct binding with the F-domain. Whilst many PCP-A structures show a conserved domain conformation, a recent structure of DhbF with a PCP-A di-domain arranged in the thiolation state varies substantially from the other available structures: 43 in this structure, both PCP and A sub domains have moved away considerably from their canonical locations. Indeed, in this structure the A sub domain displays an "open" conformation and the PCP has rotated 86 degrees around the PPant arm attachment site. Whether this conformation is biologically relevant or imposed by the crystal packing remains to be assessed.
Interactions between A-domains and their cognate PCPs also include the linker connecting them. It has been demonstrated that over 70% of linkers between A-domains and PCPs have a conserved motif, which follows the essential conserved catalytic lysine of the A sub and displays a LPxP consensus. A mutation of the leucine residue (L958D in EntF) severely hindered the production of enterobactin (reduction of 1000-fold). 80 The analysis of this linker in crystal structures of A-PCP di-domain reveals that the leucine residue docks in a conserved hydrophobic pocket created by residues from the beta-sheet in the Csubdomain, thus being important for the positioning of the A sub and the PCP in a conformation competent for the adenylation reaction.

Condensation and epimerisation domains
Condensation (C)-domains were rst identied as catalytic domains involved in peptide bond formation in NRPS assembly lines in the late 1990's. The presence of these domains (450 residues, 50 kDa) containing a conserved HHxxxDG motif had been identied to occur the same number of times as the number of condensation and epimerisation events occurred in the peptide synthesis process. 81 To evaluate the importance of this conserved motif and assess the catalytic activity of domains bearing it, mutational studies revealed that mutation of the second histidine of the motif into valine was sufficient to disrupt the formation of a linear dipeptide (D-Phe-L-Pro) by a hybrid NRPS assembly line (GrsA A-PCP-E phenylalanine activating module together with TycB C-A-PCP proline activating module). 82 These data established the C-domain as the peptide-bond forming domain in NRPS-biosynthesis. Four years aer this discovery the rst crystal structure of an NRPS Cdomain was published. 83 This structure revealed the architecture of VibH, a standalone C-domain from the siderophore vibriobactin assembly line (a non-linear NRPS). This structurewhich has been shown to be well conserved across all Cdomainsis reminiscent of a pseudo-dimer of chloramphenicol acetyl transferase (CAT) 84 with additional loss/gain of secondary structure elements. Each "half" of the protein is referred to as the N-terminal lobe or the C-terminal lobe. Both lobes are made up of a central beta-sheet anked by large alphahelices. More precisely, the N-terminal lobe possesses a 5stranded beta-sheet, with one strand originating from a sequence from the C-terminal lobe that is known as "the latch", a peripheral small 2-stranded beta-sheet, ve large alpha-helices and a smaller helix found in the "oor-loop" motif this also originates from the C-terminal half of the C-domain. In the other half of the C-domain, the C-terminal lobe harbours two central beta-sheets (one with 2-and one with 4 beta-strands) protected on one side by eight alpha helices. The overall fold of the C-domain can be seen as an upright V shape where each half forms one branch of the letter. 36 The catalytic site motif HHxxxDG forms part of a loop between the beta strand 6 and the alpha helix 4 in the N-lobe connecting the central strand of the largest beta-sheet with one of the anking helix. This motif is exposed in the centre of the tunnel formed by the domain two halves.
Located at around 15Å distance from the surface at each side of the end of the tunnel, the catalytic site is placed at the perfect distance from both the donor and acceptor PCP binding sites. No structural intermediate of a C-domain with both donor and acceptor substrates has been obtained so far, however a model of this tri-domain structure is reviewed in ref. 85. Given the pseudo-dimeric nature of the C-domain and the low number of interactions between each sub-domain half (oor loop and latch), it has been reported that C-domains are rather exible and can be found in different conformations: these range from conformations seen as more "open" to those best described as being more "closed". 86 The relevance of such conformations is not fully understood, although interactions with PCP-bound substrates could be expected to provoke changes in C-domain state. In this way, controlling the specic order of PCP binding would help to maintain efficient NRPS synthesis, with the directionality of NRPS synthesis maintained through the asymmetry of the condensation reaction. Such ordered substrate binding would also provide a hypothesis to explain the increase in hydrolysis of peptides sometimes observed from NRPS assembly lines immediately preceding engineered Adomains: 87 in these systems, modication of A-domain speci-city can lead to a reduced rate of A-domain activity, which in turn would lead to water being able to competing effectively with the acceptor aminoacyl-PCP for attack of the thioester of the donor peptide substrate due to the slow rate of generation of this intermediate. C-domains can also be seen as crucial gatekeepers in ensuring not only the stereochemistry of the donor peptide (through the presumed dynamic competition for peptidyl-PCP substrate with a neighbouring E-domain) but also in ensuring the correct modication state of the aminoacyl-PCP acceptor substrate through allowing sufficient time for the PCPbound amino acid to interact with the essential modifying domains, either in cis (such as a methyltransferase domains) 67 or in trans (such as a hydroxylase or halogenase enzymes). 88,89 Given these important roles that C-domains must play within NRPS catalysis (Fig. 1C), it is clear that many important insights remain to be gained from structural and biochemical investigation of these domains and their PCP-bound complexes. Analysis of the structures from termination modules C-A-PCP-TE revealed that condensation domains share an extensive interaction surface with neighbouring adenylation domains (total of 1100Å 2 ). 35,40 It has been hypothesised that these two domains could act as a catalytic platform possibly arranged in a helical fashion. 35,90 However, the relevance of this interaction has been challenged with the structure of the EntF terminal module: the A/C interface in EntF is much less extensive than previously observed (780Å 2 ). The reason behind this discrepancy lies in the fact that in the rst two structures the A-domains are seen in the "open-state" where A sub is packed against the Cdomainan interaction that is not present in the EntF structure as the A-domain is in "closed-state", with A sub folded over the Acore and hence not interacting directly with the C-domain. Additionally, it would appear that whilst the A-and C-domains from the same module interact together, this cannot necessarily be extended to catalytic domains from different modules, since no direct interaction could be seen from a crystal structure of a cross-module NRPS (albeit the only example known to date). 43 With a total of only four structures available to date providing insights into A/C interactions, it is difficult to provide a denitive picture of the interaction network between these two important domains and more structural and biochemical data are clearly needed to address this in the future.
C-domains are essential for the process of peptide chain elongation. As described above, both donor and acceptor PCPs bind at a dedicated side of the catalytic tunnel and present the peptides to the catalytic site (Fig. 1C). There, the catalytic histidine (H126 in VibH) (HHxxxDG) has been postulated to act as a general base, enhancing the nucleophilicity of the acceptor aminoacyl PCP and allowing the nucleophilic attack on the carbonyl of the thioester bound amino acid. This then results in the extension of the peptide chain by one residue, which is transferred from the upstream PCP-domain upon peptide bond formation (a mechanism conserved in both CAT and dihydrolipoamide acetyltransferase (E2p)). 91 However, this mechanism has been questioned in the light of mutational analysis of several C-domains. Although in the CAT system the equivalent mutant (H195A) is six orders of magnitude less active than the wild type, 92 the H126A VibH mutant shows only a minor decrease in catalytic activity (less than two fold). 83 In a study by Bergendahl et al., 93 the mutation of the second histidine residue (H146A) in the C-domain of the NRPS TycB was shown to render the enzyme insoluble, suggesting an important structural role. In the same study, the mutation of the aspartate residue of the catalytic motif (D151N) was also reported to yield an inactive enzyme, which has been veried for the equivalent mutations in the other NRPS C-domains VibH (D130A) and EntF (D142A). 94 There is evidence that the histidine residue can interact directly with the amino group of the acceptor substrate (gained from the structure of a C-domain from CDA biosynthesis (H157) that was engineered to covalently bind a mimic of the acceptor substrate), 95 which could also indicate a role for this residue in positioning the amino group of the acceptor for attack of the donor thioester. Thus, despite being highly conserved, the HHxxxDG motif now appears to play varied roles in different C-domains and thus the specics of the peptide bond reaction catalysed within Cdomains could well vary depending on the specic domain involved.
Delivery of substrates to the catalytic site of the C-domain involves the correct docking of both donor and acceptor PCPs at the surface of the condensation domain. Although no structure of a donor PCP has been solved in complex with a standard elongation C-domain in a productive conformation, the structure of the fungal TqaA PCP-C T complex supports the original hypothesis that the binding location and the binding mode should resemble that of a PCP bound to an epimerisation domain for which structural data are also available (see below). 41 Structures of the acceptor PCP-domain bound to the Cdomain have, however, been determined: both structures of the terminal modules from SrfA-C and AB3403 are seen in the condensation state, with the acceptor PCP bound to the Cdomain ( Fig. 5B and C). 35,40 When these structures are compared, C-domains superimpose relatively well (RMSD 4Å; calculated using the Matchalign routine in Pymol (368 Ca aligned from a total of 443 Ca)). However, the PCPs are rotated around the PPant attachment site by more than 30 relatively to each other (Fig. 5B). In the case of the structure of AB3403, most of the interactions originate from PCP helix 2 (that carries the PPant attachment site, Ser1006 in this case) as well as the preceding and subsequent loops. In particular, Leu1007 and Val1010 (N-terminal portion of PCP helix 2) are engaged in hydrophobic interactions with Leu22 and Ile80 of the Cdomain. Additionally, Val1026, Ala1027 and Ala1030 residues (beginning of PCP helix 3) form hydrophobic interactions with Tyr26 and Leu30 from the C-domain (Fig. 5E). There are limited hydrophilic interactions, with those noted involving the side chain of Lys1011 of the PCP and the main chain carbonyl of Gln78 from the C-domain together with Arg344 from the Cdomain interacting with the phosphate of the PPant moiety. In the SrfA-C structure, it is noticeable that PCP helix 2 runs parallel to C-domain helix 1, making possible a number of hydrophobic interactions. Most of the interactions again involve PCP helix 2 and neighbouring loops in the same manner as seen for the AB3403 structure (Fig. 5A). Specic PCP residues include Met1007 and Phe1027 that form hydrophobic interactions with C-domain helix 1 Phe24/Leu28 and helix 10 Tyr337; the importance of these interactions has been probed for the EntB system, which showed the corresponding residues were essential for productive PCP-C interactions. 96 Epimerisation (E)-domains are non-canonical V-shaped domains that are structurally highly reminiscent of C- domains (Fig. 5D) despite a rather low sequence homology (<20%). 83,97,98 E-domains play a vital role in modifying the stereochemistry of amino acids incorporated in the growing peptide chain (i.e. altering the conguration of the C-terminal residue of the PCP-bound peptide from L to D). The rst structure of an E domain was of an isolated domain excised from TycA (the rst module of the tyrocidine synthetase). From the point of view of the overall structure, E and C-domains initially appear very similar: however, two E-domain specic features have been implicated as playing important roles in their catalytic function. The rst feature is found in the so-called "oor loop", which is extended by at least ve residues in E-domains and is postulated to be involved in interactions with the neighbouring PCP-domain. The second important difference is located within the bridge region at the top of the V shaped structure that corresponds to the C-domain binding site for the acceptor PCP. In TycA, this region is blocked by an insertion of eleven residues, which serves to obstruct the catalytic site access from this side of the catalytic tunnel. 83,97 The structure of the gramicidin synthetase GrsA PCP-E didomain 41 shows the interaction network required for a functional complex formation and can also be seen as a mimic of an acceptor PCP bound C-domain. One of the most noticeable features of the PCP-E domain interaction is that the linker between the two domains appears to play a prominent role in recognition and binding (Fig. 5D). Indeed, in contrast to usually exible loops linking C-domains and PCPs, the linker region in this case forms extensive ordered interactions along the surface of the E-domain that are mainly charged/polar in nature. Notably, the residue pairs Arg613/Asp788 and Arg614/Glu785 act as anchor-like electrostatic "hooks" of importance for the localisation of the linker region on the E-domain and the correct positioning of the PCP relative to the E-domain active site tunnel. 41 To conrm the signicance of the linker interaction with the surface of the E-domain, mutation analyses were carried out revealing that a E785/D788R double mutation was enough to disturb the linker interaction network with the Edomain, which resulted in 20% by-pass of the epimerisation reaction. This result emphasises the importance of the linker in PCP-E domain interactions. This structure also resolved the direct interactions between the PCP and the E-domain, which are largely formed by residues from PCP helices 2 and 3. Four hydrogen bonds stabilise the interface between the domains: PCP/E Gln578 (helix 2)/Asp983, Asp572/Gln979, Gln 587/Glu785 and nally, PCP Thr592 (helix 3) forms a hydrogen bond with Glu898 from the extended oor-loop of the E-domain. It has been suggested that this oor-loop participates in the correct positioning of the PCP-helix 2 (and hence the PPant moiety) to allow catalysis: the recent structure of the unusual C T structure from TqaA, a fungi NRPS C-domain-like involved in macrolactamisation and release of the nal product, also demonstrates such a positioning of the PCP in the donor site of the catalytic channel (Fig. 5F). 47 Although the rst structure of a PCP-C di-domain was obtained earlier for a part of the tyrocidine synthase Tyc6, 36 TqaA represents the only structure to date of a donor PCP bound to a C-domain in a catalytic competent state. Upon analysis of the interaction surface between the donor PCP and the C-domain the following multiple interactions have been reported: PCP Arg3571 and C T Asp3906 form the only salt bridge, PCP Phe residues 3554 and 3555 are engaged in hydrophobic interactions with Gly3868 and Ile3869 from the C T domain as well as PCP Ile3561 that docks into a hydrophobic pocket contributed by residues C T Val3772 and Ile3981 (Fig. 5G). Such structural comparisons show that the similar structures of C/E domains are matched by comparable donor PCP-bound states, although the importance of the linker region in E-domains appears to be a crucial difference to C-domains. The example of TqaA also shows that condensation reactions can lead to peptide chain release from an NRPS, although this function is usually the result of a separate thioesterase domain (or less commonly, via reductive cleavage).

Thioesterase domains
Thioesterase (TE)-domains play an essential role in catalysing the release of the complete peptide chain at the end of the NRPS-mediated assembly process (Fig. 1D), ensuring the machinery does not stall and is able to perform multiple cycles of catalysis. 99 In NRPS assembly lines, two types of thioesterase domains can be found. Type I TE domains are typically the nal domain of the last NRPS module whereas type II TE domains are standalone enzymes and are involved in the recognition of incorrectly loaded PCPs. Such misprimed PCPs would lead to the inactivation of the NRPS assembly line and could occur due to modications blocking the reactive thiol group at the extremity of the PPant moiety (i.e. the incorrect amino acid or an acetyl group from the PPT-catalysed loading of acetyl-CoA). In such cases a trans-acting Type-II TE will exert its enzymatic action to hydrolyse and release the improperly loaded cargo from the PPant moiety of the PCP, ensuring that the machinery is maintained in a productive state. 100,101 TE domains are relatively small (250 residues, 30 kDa) and belong to the superfamily of a/b hydrolases that includes a number of lipases and acetylcholinesterases with a catalytic triad typically composed of serine, aspartic acid and histidine residues. TE domains catalyse the release of the substrate from the bound PCP through a two-step reaction: rstly, the TEdomain mediates the transfer of the peptidyl group from the donor PCP onto the activated serine residue in the TE-domain active site, thus forming an O-acyl-enzyme intermediate (the only non-PCP bound intermediate aer A-domain activation of amino acid residues). In the second step, a nucleophilic attack on the enzyme tethered ester can take one of several differentand typically highly specicroutes. One common example is hydrolysis, which is triggered when the nucleophile is a water molecule and leads to the release of the linear peptide from the NRPS. Another very important example is macrocyclisation, which occurs when the nucleophile is a functional group from within the linear peptide (i.e. the N-terminal amino group or a nucleophilic side chain) and leads to cyclisation of the peptide with concomitant release from the NRPS. For a comprehensive review on PKS and NRPS release mechanisms, see Du  Structural approaches have provided insights into both classes of TE domains. The structure of the terminal TE domain from the surfactin assembly line SrfA-C has been excised and structurally characterised as an exemplar of the type I TEdomain fold. 104 It exhibits the conserved superfamily fold of a 7-stranded central beta-sheet surrounded by eight a-helices. Of particular interest are the three a-helices known as the "lid": these cover the active site composed of the conserved serine (Ser80 in SrfA-C) within the signature motif GxSxG together with residues His207 and Asp107. The SrfA-C TE domain was crystallised with two monomers in the unit cell, with the lid regions of each monomer adopting different conformations referred to as the "open" and "closed" forms. The overall structure of the SrfA-C TE domain is reminiscent of a bowl, with a groove under the lid to accommodate the large nal peptide substrate. 104 The overall architecture of the TE domain was further conrmed by the structure of the excised fengycin NRPS TE-domain (FenTE). 105 Aside from a different "lid" conformation, SrfA-C TE and FenTE are closely related structures, with a RMSD of around 1.1Å when the domain cores are compared (and excluding the lid regions). Structural data concerning type II thioesterases were obtained several years later from the external thioesterase of the surfactin synthetase. 106 As has been seen for type I TE domains, the core domain of SrfTEII (surfactin thioesterase type II) also superimposes well onto the core structure of type I TE domains, albeit with some important differences. When compared to SrfA-C TE, SrfTEII possesses an extra helix between the active site residues Asp189 and His216 and also shows a repositioning of the "lid" region. The consequences of those modications are that the catalytic triad is only partially covered by a short loop, which in turn makes the catalytic site much more accessible than it is in the other (type I TE) structures. In addition, the catalytic pocket in SrfTEII is smaller than the one in SrfA-C TE, which matches well with a role in hydrolysing small groups from the PPant arm as opposed to large peptides. This also ensures that the type II thioesterases do not cleave off the growing peptide chain in a "normal" NRPS process, which would be highly deleterious to their efficiency.
The crystal structure of another type II TE, RedJ, conrmed the shared fold with type I TE and the importance of both the catalytic site pocket and the "lid" to maintain a high degree of selectivity regarding thioesterases' substrates. 107 Although TE domains share an overall common fold as described above, examples from the NRPS machineries of the glycopeptide antibiotics and the related GPA-like peptide complestatin possess a longer than usual N-terminal linker to this TE domain. Enzymatic assays carried on the teicoplanin synthesis machinery have recently shown that this linker is important for the activity of the TE domain. 108 Of particular interest is the fact that secondary structure predictions show the linker is mostly alpha-helical in nature, which in turn suggests that it could play a structural role. This is supported by the crystal structure of the macrocycle forming TE domain from the clinically relevant erythromycin antibiotic synthase displaying an extended Nterminal linker folded as two additional helices covering the "lid region". 109 Further structural studies of this unusual linker are needed in order to provide new insights into the activity and selectivity of terminal NRPS thioesterase domains in GPA systems.
One of the critical steps in the function of the NRPS assembly line is the recognition of the donor PCP by the TE domain. Structural information has been provided through NMR studies performed on the PCP-TE di-domain structures of the apo EntF 37 (type I TE with PCP) and the type II surfactin thioesterase with its cognate PCP. 106 The EntF structure reveals that the PCP lies in a small cradle formed by the lid region (residues 226-266, helices 4 and 5) and the core of EntF-TE (Fig. 6B); this lid region covers both PCP and TE domain catalytic sites. The PCP and TE domains mainly interact together through a network of hydrophobic interactions burying a surface of 1300Å 2 . As in other complexes involving a PCP, interactions are predominantly found to involve PCP helix 2 and the loop between helices 1 and 2 (residues 41 to 55) (Fig. 6A). Within this region, the PCP interacts both with the core of the TE domain as well as with the tip of the lid. It was proposed that Phe41 is structurally important to maintain the 4-helix bundle fold of the PCP through hydrophobic interactions, whereas Phe42 directly interacts with the rst beta-strand of the TE core. Specically, PCP residues Phe41, Phe42 and Met72 act as a hydrophobic clamp on the TE Trp121 (Fig. 6C). These two Phe residues are highly important, since mutations of either result in the loss of interaction between the domains. 37 Also, PCP residues Leu49 and His47 dock into a pocket formed at the surface of the TE domain (Fig. 6C). NOE couplings indicate additional interacting residues from the TE lid (Leu240, Ala241, Ala 242 and Gln244) and the TE core (Phe119, Gln122, Leu100 and Leu102). PCP residues involved in the interaction include helix 1/2 loop residue Gly46, helix 2 residues Leu50, and the distant residue Val73. Despite this large interaction network, the PCP-TE didomain structure shows large movement around the contact region indicative of a continuous breathing/opening motion of the lid. The authors of this study emphasise the fact that this motion is essential for providing the conformational plasticity to the TE in order to accommodate the PCP PPant moiety and allowing the PPant to traverse the catalytic cradle.
Although class I and class II TE enzymes are involved in slightly different catalytic activities, NMR interaction studies of SrfTEII with TycC3 PCP indicates a very similar mode of interaction between PCPs and the two classes of thioesterase. 106 The interface of the TycC3 PCP is mainly comprised of PCP helix 2 and some additional residues within PCP helix 1 and the PCP Cterminal region. As reported for type I thioesterases, the "lid" region of SrfTEII plays a great role in recognition of PCP helix 2, mostly in the region of the catalytic Ser45 residue. The additional crystal structure of a class I TE (ClbQ) in complex with a donor PCP, conrmed the role of the exible lid region in substrate binding and specicity. 110  trans-modifying enzymes to specically interact with the NRPSmachinery at the desired carrier protein/s, which in turn requires a mechanism to ensure selective interaction of the modifying enzymes solely with the correct carrier protein domains. Limited structural evidence has been gathered considering the range of probable trans-modifying enzymes within NRPS mediated biosynthesis, however two such examplesboth related to cytochrome P450 monooxygenase enzymes (P450s)have been structurally characterised from the biosynthetic machineries producing glycopeptide antibiotics (GPAs) as well as the cyclic depsipeptide skyllamycin (for reviews on the role of such P450s in NRPS-mediated biosynthesis, see ref. 111 and 112).

Other NRPS interactions
In both these systems, P450 enzymes have been identied as a source of the b-hydroxyl groups found within the nal peptide structures of these compounds. In the case of GPA biosynthesis, machineries encoding a homologue of the P450 OxyD have been shown to incorporate b-hydroxytyrosine (Bht) residues directly into the NRPS peptide 113,114 (a further subgroup utilises a nonheme iron oxygenase that is believed to act directly against NRPS-bound amino acids during peptide synthesis, although this has yet to be investigated in detail). 115 The production of Bht by OxyD also relies on two further proteinsa minimal NRPS module comprising A-and PCP-domains (balhimycin homologue BpsD), and a separate thioesterase (balhimycin homologue Bhp). Mechanistically, OxyD utilises amino acids bound to the PCP-domain of the BpsD protein, which following hydroxylation are then cleaved by the thioesterase for subsequent incorporation into the heptapeptide producing NRPS. The structure of OxyD reveals a well ordered and highly exposed active site, which is unusual for a structure of a substrate-free P450 enzyme. 113 Subsequent analysis of the active site residues responsible for orchestrating the open and rigid conformation of the P450 active site revealed that these are highly conserved amongst P450s responsible for the b-hydroxylation of PCP-bound amino acids, suggesting that these P450s recognise and specically bind to the carrier protein portion of the substrate. 88,113 Structural data relating to a P450/PCP complex were obtained in 2014 with the co-crystal structure of a PCP (PCP7) and an P450 from the skyllamycin NRPS machinery. 45 In this structure, the PCP adopts the classical 4-helix bundle with helices 2 and 3 arranged as a X-shaped cradle to accommodate residues from helix G of the P450 (Fig. 7B), with the majority of the interactions found within these regions of the two proteins (Fig. 7A). Specically, a large hydrophobic cavity is formed by PCP residues Phe35, Phe36, Ala45, Phe65, Phe66 and Leu62 to accommodate Trp193 and Leu194 from helix G of the P450 (Fig. 7C). From the P450, residues Ala90, Met94, Leu200 and Leu239 interact with Leu43 from the PCP (the +1 residue from the catalytic Ser42) as well as the two methyl groups of the PPant moiety. It is important to note that except from Ala45, none of those residues belong to PCP helix-2 and mostly belong the PCP-helix 3 and the loop between helices 1 and 2. However, PCP helix-2 still plays an important role in the interaction with the P450, but mainly contributes residues involved in hydrogen bonding. Indeed, Thr46 and Lys47 from PCP helix-2 interact with E235 and Asn197 from the P450 G-helix, whilst PCP helix-3 residues Leu62 and Arg63 interact with Asp191 and Glu198 from the P450 G-helix, respectively. In addition to these protein-protein interactions, a network of hydrogen bonds within the P450 is involved in stabilising the PPant arm of the PCP. The true nature of these interactions is likely perturbed by the PCP cargo present in this structure, which was a small molecule inhibitor mimic of an amino acid that was necessary in order to improve the affinity of the P450/PCP complex for structural analysis. 45 At this stage of our analysis of PCPs interaction with other domains, and when comparing TE-PCP and P450-PCP complexes ( Fig. 6 and 7), it appears obvious that PCPs interact with these 2 domains in a very similar way, using a group of hydrophobic residues (phenylalanine clamp) to secure a solid anchor to their partner domain.
In addition to providing valuable data about P450/PCP complex formation interface, Haslinger et al. 45 also discussed the role of the PCP three-dimensional structure in selectively recognising their cognate P450s. Given that PCPs are small and share a high level of sequence conservation, it is unlikely that the amino acid sequence of a PCP would dictate its selectivity. However, it has been shown that subtle changes in tertiary structure can be important for PCP specicity. 45 It is important to note that in a comparable P450/CP structure solved from the biotin operon from B. subtilis in which the acyl carrier protein (ACP) is bound to a P450 (P450 BioI ), 116 the ACP protein is located in a very different location on the P450. The differences in the substrate (amino acid vs. fatty acid), carrier protein (amphiphilic PCP vs. acidic ACP) and reaction performed (hydroxylation vs. carbon bond cleavage) likely guide the different binding modes observed in these two complexes, although in both cases these structures reveal how P450 enzymes can use carrier protein binding partners in order to bind and oxidise their desired substrates.
A further, highly complex in trans modication of NRPSbound substrates has been identied from GPA biosynthesis, in which P450 enzymes perform sequential oxidative cyclisation reactions to generate rigid, biologically active aglycones from the original linear heptapeptide product of the NRPS machinery. 117,118 These P450 enzymes (also known as Oxy enzymes) each insert one ring into the nal GPA structure: the three enzymes from vancomycin-type GPAs catalyse insertion of the essential C-O-D, D-O-E and AB rings (catalysed in that order by OxyB, OxyA and OxyC, respectively), whilst the nonessential F-O-G ring from teicoplanin-type GPAs is inserted by the enzyme OxyE immediately aer the activity of OxyB. 119,120 Due to the complexity of the cyclisation cascade, a separate NRPS domainknown as the X-domainhas been implicated in the recruitment of these P450s to the PCP-bound heptapeptide substrate. 121 This to date is the only example of a separate recruitment domain for trans-modifying enzymes, with comparable single step modications (for example the aryl crosslinking observed in arylomycin biosynthesis) 122 not requiring such a domain. The essential role of the X-domain in GPA crosslinking has been implied in a number of in vivo [118][119][120]  and more recently proven by in vitro experiments, where the use of X-domain containing constructs have allowed the characterisation of both OxyE and OxyA enzymes for the rst time. 108,121,[123][124][125] Denitive evidence that the X-domain was indeed a binding platform for the Oxy enzymes came with the structure of the complex between the X-domain and OxyB from the teicoplanin NRPS assembly line. 121 In this structure, as anticipated, the fold of the X-domain resembled that of a C/Edomain, albeit with insertions that blocked the tunnel usually occupied by the acceptor PCP substrate. In addition, the canonical C-domain catalytic motif was modied in the Xdomain making it inactive for peptide bond formation or epimerisation. When compared, structures of the X-domain in the presence or absence of OxyB are extremely similar. 121 This observation, also found to be true for OxyB, shows that the formation of this complex does not trigger domain rearrangement and depends solely on a rigid-body type of interaction.
Unusually for an NRPS, the interaction forces are mostly driven by hydrogen bonds and salt bridges, with few hydrophobic residues involved. The novel position of the Oxy enzyme within the complex retains space for the simultaneous binding of a PCP bound peptide substrate. This structure together with extensive biochemical evidence supports the notion that the catalytically inactive X-domain acts as a platform onto which the Oxy enzymes can bind in order to affect the complex process of peptide cyclisation during GPA biosynthesis. 123 In this case, the subsequent TE-domain also plays a role in proof reading the crosslinked state of the PCP-bound peptide, which only becomes active against fully crosslinkedand thus maturepeptide aglycones. 108 Thus, it appears as though the ability to target PCP-domains is sufficient for enzyme targeting in relatively straightforward trans-modication steps of either aminoacyl-or peptidyl-PCP substrates, whilst the highly complex process of GPA crosslinking requires a separate recruitment domain in order to avoid this step from stalling the NRPS machinery.

Conclusions
Over recent years, our structural insight into NRPS-mediated peptide synthesis has been rapidly advancing. Examples of this can be found in understanding the importance of PCPmotion coupled to substrate activation via adenylation during A-domain activity, the structural rigidity of PCP-domains during substrate shuttling and mechanistic insights provided by characterising how NRPS domains are assembled into modules. The rapid expansion of our understanding of the potential scope of C-domain catalysis beyond peptide bond formation also adds signicantly to our understanding of the catalytic potential of NRPS assembly lines and must also be seen as a major area of future research, in particular the need to determine the structural determinants of C-domain mediated selectivity, novel reactivity and the relevance of coupling Adomain selectivity and rate to effective peptide bond formation in neighbouring C-domains. Furthermore, it is clear that signicant work remains in order to fully understand the nature of the interactions between NRPS modules, the process of assembly of NRPS-machineries encoded across multiple proteins andmost intriguingly of allthe higher order structure of NRPS assembly lines. Several models have been postulated for higher order NRPS assemblies, spanning all the way from highly ordered helical-type arrangements through to exible assemblies with no appreciable ordered structure: 40,43,90 here again, our limited access to the structures of larger NRPS assemblies (in this case complete NRPS modules and dimodules) makes it difficult to understand the relevance and accuracy of such models. One technique that will clearly be of great use in this area is cryo-electron microscopy, which has already delivered impressive insights into related, complex assembly lines. 126-128 A recent and highly important example of the use of this technique to investigate polyketide synthesis was carried out by the Maier and Townsend groups, who could identify and characterise a functionally relevant asymmetric conformation of the protein that was not apparent from crystallographic studies of the same protein. 129 Considering the monomeric nature of NRPS machineries and high degree of variation in module architecture even within one assembly line there is little doubt that cryo-electron microscopy (particularly when coupled with the use of chemical probes to trap the machinery in specic, dened catalytic states) will deliver important contributions to our understanding of NRPS machineries over the years to come. Given the diversity of NRPS systems and their resultant products, it is also conceivable that different NRPS systems will adopt different higher order structures due to the constraints placed on the enzymatic catalysis required to be performed by each individual system. In order to address this, future research into NRPS biosynthesis should prioritize the structural and functional characterization of complete NRPS assembly lines, as it is only with an understanding of complete machineries that we will be able to understand their impressive catalytic and bioengineering potential.