Thierry
Izoré
ab and
Max J.
Cryle
*abc
aThe Monash Biomedicine Discovery Institute, Department of Biochemistry and Molecular Biology and ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Clayton, Victoria 3800, Australia. E-mail: max.cryle@monash.edu
bEMBL Australia, Monash University, Clayton, Victoria 3800, Australia
cDepartment of Biomolecular Mechanisms, Max-Planck Institute for Medical Research, 69120 Heidelberg, Germany. E-mail: max.cryle@mpimf-heidelberg.mpg.de
First published on 12th September 2018
Covering: up to July 2018
Non-ribosomal peptide synthetase (NRPS) machineries are complex, multi-domain proteins that are responsible for the biosynthesis of many important, peptide-derived compounds. By decoupling peptide synthesis from the ribosome, NRPS assembly lines are able to access a significant pool of amino acid monomers for peptide synthesis. This is combined with a modular protein architecture that allows for great variation in stereochemistry, peptide length, cyclisation state and further modifications. The architecture of NRPS assembly lines relies upon a repetitive set of catalytic domains, which are organised into modules responsible for amino acid incorporation. Central to NRPS-mediated biosynthesis is the carrier protein (CP) domain, to which all intermediates following initial monomer activation are bound during peptide synthesis up until the final handover to the thioesterase domain that cleaves the mature peptide from the NRPS. This mechanism makes understanding the protein–protein interactions that occur between different NRPS domains during peptide biosynthesis of crucial importance to understanding overall NRPS function. This endeavour is also highly challenging due to the inherent flexibility and dynamics of NRPS systems. In this review, we present the current state of understanding of the protein–protein interactions that govern NRPS-mediated biosynthesis, with a focus on insights gained from structural studies relating to CP domain interactions within these impressive peptide assembly lines.
The structure of an NRPS plays a vital role in its function. A typical linear NRPS machinery is composed of multiple modules (∼100–200 kDa, depending on domain composition), with each module in the assembly line responsible for the incorporation of one amino acid monomer into the final peptide (Fig. 1). Differing architectures to linear systems include iterative and non-linear NRPS systems: iterative systems possess a limited number of modules with the final peptide being the result of several copies of a shorter peptide fragment, whilst non-linear systems are more complex and less easily defined in terms of traditional modules.1 Whilst both iterative and non-linear NRPS machineries are intriguing from a mechanistic standpoint, the vast majority of medicinally important NRPS-products are produced by linear NRPS machineries. Each module within a linear NRPS can be further described as a series of at least three different domains: the adenylation domain (or A-domain) is responsible for the selection and the activation of amino acids; the peptidyl carrier protein domain (PCP, also known as a thiolation domain) whose role is to shuttle substrates between different domains; and the condensation domain (or C-domain) that catalyses peptide bond formation between PCP bound amino acid monomers and peptides (or amino acids for the initial NRPS module). In addition to these three essential domains, a module can harbour additional domains: examples include epimerisation domains (E-domains), S-adenosylmethionine (SAM)-dependent methyltransferase domains or formylation domains,8 adding yet further potential diversity to the structure of the final NRPS peptide product. The final module of most NRPS machineries contains a terminal thioesterase domain (or TE-domain) that exerts its catalytic activity in releasing the final PCP-bound peptide: this process can also serve to introduce yet further diversity into the final peptide structure, with one common example the cyclisation of a linear peptide. NRPS assembly lines range from a single module as in the Pseudomonas pyreudione synthetase9 up to 18 modules in length and can be encoded by one or more genes. The two longest NRPS machineries encoded in a single gene are kolossin A synthase from Photorhabdus luminescens, an impressive 15 module-long NRPS enzyme,10 and the peptaibol synthetase from Trichoderma virens with 18 modules.11
Although NRPSs are mostly found in bacteria and fungi and where the product peptides provide a fitness advantage to the producing host, some rare examples of NRPS-like enzymes do exist in higher eukaryotes. However, those that have been identified display altered architectures when compared to standard NRPS machineries, since such machineries are typically composed of an A-domain, PCP-domain and a dedicated C-terminal catalytic domain that is specific to each NRPS.12 Examples of those eukaryotic NRPS are the 2-aminoadipic 6-semialdehyde dehydrogenases (AASDH) involved in lysine metabolism13 and the carcinine and β-alanyl-dopamine synthetase Ebony involved in neurotransmitter recycling and cuticle sclerotisation in insects.14
Given that some of our most critical medicines such as last-resort antibiotics (e.g. vancomycin, daptomycin), anti-tumour drugs (bleomycin, cryptophycin) or immune modulatory compounds (cyclosporine) are NRPS-produced and remain tied to biosynthesis for their production, great interest exists in understanding their biosynthesis.15 Furthermore, the potential to produce new derivatives of such compounds in the future requires effective assembly line reengineering. Because of this, significant efforts have been made in order to understand how NRPS machineries function and the overall structure of these peptide assembly lines.16 Structural data are available for examples of each catalytic domain from standard NRPS machineries – these have been obtained either by NMR, X-ray crystallography or a combination of these techniques. Despite the wealth of information derived from structures of isolated domains, understanding NRPS synthesis as a whole requires imaging domain–domain interactions, module–module interactions and even module–module interactions between modules separated across different proteins. To facilitate the latter, it has been proposed that multi-chain NRPS machineries interact through “communication/docking domains”:17–20 although limited data is available concerning these domains, there is evidence that such interactions can be mediated through a hand/helix interaction.21 Of all domains present in NRPS assembly lines, PCPs are the ones involved in the most numerous interactions. As a domain involved in substrate shuttling, PCPs need to efficiently interact with catalytic domains from both upstream and downstream modules to ensure the effective transfer of amino acids and peptides along the NRPS machinery. Given the essential role of PCPs in all NRPS-processes, this review will summarise the structural data available for NRPS systems with a focus on PCPs and their interaction with other domains of the NRPS assembly line.
Whilst the exact timing of phosphopantetheine modification is not yet known, the highly similar structures of the unmodified apo-form and phosphopantetheine modified holo-form (vide infra) suggest that this modification does not need to be performed co-translationally in order for successful NRPS protein synthesis by the ribosome. PCPs are related to the acyl carrier protein (ACP) domains found in megaenzyme synthases such as fatty acyl synthases and the polyketide synthases, which also adopt a similar fold and require post-translational addition of a phosphopantetheine moiety. One important difference between ACPs and most PCPs is the sequestration of acyl chains within the core of ACPs, which is not observed for PCP-bound substrates (some examples of bound substrate interaction with the PCP have been identified, although these appear to be more on the surface of the PCP).23,24 Substrate sequestration can also lead to minor alterations in ACP structure due to the perturbation of the helical core of the ACP by the substrate.25 A further consequence of ACP sequestration of substrates is that this requires interacting domains to invoke a chain flipping mechanism in order to allow the bound substrates to engage with these interacting domains,26 which is not required for PCPs during NRPS biosynthesis.27
Due to their small size (typically smaller than 100 amino acids, ∼10 kDa) the fold of isolated PCPs has mainly been studied by solution-state NMR,24,28–30 although four structures are also available from X-ray diffraction experiments (ref. 31–33 and unpublished structure 4HKG). These studies have revealed that the PCP folds as a four helix-bundle, with the two N-terminal helices the longest and the remaining helices typically shorter in length (see Fig. 2). All four helices are mostly amphipathic and the hydrophobic sides serve as the interaction surface holding the bundle together. Helices 1 and 2 mostly run parallel to one another and form the back of the domain whilst the shorter helices 3 and 4 pack against the two N-terminal helices. It is also apparent that if helix 4 aligns in a parallel fashion to helices 1 and 2, helix 3 then adopts a perpendicular orientation in regards to the other helices. A long loop connects helices 1 and 2, with the post-translationally modified serine located at the start of helix 2.34 Although not visible on the NMR structures, all crystal structures of PCPs obtained so far (mainly as multidomain structures, vide infra) contain a small additional helix (labelled 1′ in Fig. 2) between helices 1 and 2. In early studies, Koglin et al.30 reported that PCPs adopted different conformations depending on the assembly line catalytic state. As indicated previously, PCPs first need to be “activated” by attachment of a PPant cofactor; it was postulated therefore that in order to reach the different domains with which it needs to interact (i.e. adenylation-, condensation-, thioesterase-domains), the PCP would change conformation depending on its loaded state. This would mean that the PCP would itself “drive” the synthesis machinery by actively shuttling substrates from one catalytic domain to the other. However, this model was first challenged with the crystal structure of BlmI PCP33 and in the light of many more recent studies it is now clear that the PCP can in fact be thought of as a rather rigid domain, with only slight differences between the different catalytic states of the NRPS machinery. This can be seen from a superposition of all available PCP structures deposited into the Protein Data Bank (PDB) in which the PCP has been studied in isolation: here, it is clear that – irrespective of its modification state (apo or holo) – the fold of the PCP remains consistent. To explore whether the PCP undergoes changes in structure during NRPS synthesis (especially considering new multi-domain and full module NRPS structures), a structural alignment of the PCPs from 18 structures was performed. This pool of structures (2VSQ,352FQ1,342JGP,362ROQ,373RG2,384IZ6,394ZXH,404ZXI,405ISW,415ISX,415JA1,425JA2,425T3D,405U89,434DG9,444PWV,455ES8,465EJD,47Fig. 2D and E) represents the conformation of PCPs while interacting with different domains – specifically isochorismate lyase, condensation, thioesterase, adenylation and epimerisation domains. PCPs included in the structural alignment readily superimposed and even the two most dissimilar domains (4XZ1 and 2VSQ) still aligned with a relatively low RMSD value (3.1 Å; calculated using the structure comparison server DALI48). Even though these two PCPs belong to different organisms and are involved in different interactions (4XZ1: thioester-forming state and 2VSQ: condensation state), the structural differences observed in these two cases (centred on helices 1 and 4) are most probably the result of the upstream/downstream linker regions (Fig. 2E). Thus, all available structural evidence to date indicates that PCPs do not undergoing major structural changes during interactions with either their cognate PPant transferase or catalytic NRPS domains. Rather, the motion of PCPs during NRPS-mediated synthesis appears linked to the different states adopted by the adenylation domains (vide infra), which in turn alters the positioning of PCPs due to their close attachment to the mobile subdomain of the A-domain (Asub).
Because PCPs play the role of substrate shuttles during peptide synthesis, they need to interact with catalytic domains from both upstream and downstream modules. Owing to this requirement, it could be anticipated that PCPs would not be very specific when it comes to binding other NRPS domains. However, studies from the Ackerley group have cast doubt on such assertions, as their results indicate that PCPs normally found to interact with a condensation domain are unable to interact effectively with a thioesterase (TE) domain in an reengineered NRPS machinery.49 Thus, in order to understand the grounds for the specificity of PCPs, we will now discuss the interaction network between PCPs and other interacting domains based on the structural evidence currently available. To assist the reader, a table of all structures mentioned in this review is included in the text (Table 1).
PDB code | Protein name | UniProt ID Number | References |
---|---|---|---|
2N5H | PltL-holo | Q4KCZ1 | 24 |
2N5L | PltL-pyrrolyl | Q4KCZ1 | 24 |
5U3H | PCP1 yersiniabactin | Q7CI41 | 28 |
2MR7 | PCP7 teicoplanin | Q70AZ6 | 29 |
2MR8 | PCP7 teicoplanin | Q70AZ6 | 29 |
2GDW | PCP TycC3 A/H state | O30409 | 30 |
2GDX | PCP TycC3 H state | O30409 | 30 |
2GDY | PCP TycC3 A state | O30409 | 30 |
1DV5 | PCP DltC | P55153 | 31 |
4BPH | PCP DltC | P39579 | 32 |
4HKG | PCP PksN | V5VHR7 | Non published |
2MY6 | PCP KstB | A0A023GUP0 | Non published |
4NEO | PCP BlmI | Q9XC48 | 33 |
2FQ1 | EntB | P0ADI4 | 34 |
2VSQ | SrfA-C | Q08787 | 35 |
2JGP | PCP-C TycC5-6 | O30409 | 36 |
2ROQ | PCP-TE EntF | P11454 | 37 |
3RG2 | A-ACP EntE-EntB | P10378/P0ADI4 | 38 |
4IZ6 | A-ACP EntE-EntB | P10378/P0ADI4 | 39 |
4ZXH/4ZXI | AB3404 | A0A0X1KH98 | 40 |
5ISW/5ISX | Apo/holo PCP-E (GrsA) | P0C062 | 41 |
5JA1 | EntF + YbdZ | P11454 + P18393 | 42 |
5JA2 | EntF + PA2412 | P11454 + Q9I169 | 42 |
5T3D | Holo EntF | P11454 | 40 |
5U89 | DhbF | A0A0F6BHX2 | 43 |
4DG9 | Holo PA1221 | Q9I4B7 | 44 |
4PWV | PCP + P450 skyllamycin | F2YRY5 + F2YRY7 | 45 |
5ES8 | LgrA (thiolation state) | Q70LM7 | 46 |
5ES5 | LgrA (open and closed states) | Q70LM7 | 46 |
5ES9 | LgrA (formylation state) | Q70LM7 | 46 |
5EJD | PCP-CT TqaA | F1CWE4 | 47 |
1QR0 | Sfp | P39135 | 51 |
4MRT | Sfp + PCP (TycC) | O30409 | 52 |
4D4I | ApnA A1 | G0WVH3 | 57 |
1AMU | PheA | P0C061 | 58 |
5WMM | TioS | Q333U7 | 67 |
2PST | PA2412 | Q9I169 | 69 |
1LC1 | Firefly luciferase | P08659 | 72 |
3E7W | DltA | P39581 | 73 |
3CW8 | 4-Chlorobenzoate: coA ligase | Q8GN86 | 74 |
3FCC | DltA | Q81G39 | 76 |
3DLP | 4-Chlorobenzoate: coA ligase | Q8GN86 | 77 |
1L5A | VibH | Q9KTV9 | 83 |
3CLA | Chloramphenicol acetyltransferase | P00484 | 84 |
4JN3 | C1 CDA synthase | Q9Z4X6 | 86 |
5DU9 | C1 CDA synthase | Q9Z4X6 | 95 |
2XHG | E-Domain TycA | G1K3P2 | 97 |
1JMK | TE surfactin | Q08787 | 104 |
2CB9 | TE fengycin | Q45563 | 105 |
2RON | TE surfactin | Q08788 | 106 |
3QMV | TE prodiginine | O54157 | 107 |
1KEZ | TE DEBS | Q03133 | 109 |
3TEJ | PCP – TE surfactin | P11454 | 110 |
3MGX | OxyD vancomycin | Q939Y1 | 113 |
3EJB | P450 + ACP biotin | P53554 + P0A6A8 | 116 |
4TX3 | X + OxyB teicoplanin | Q70AY8 + Q6ZZJ3 | 121 |
4TX2 | X-domain teicoplanin | Q70AY8 | 121 |
The amazing diversity of non-ribosomal peptides is to a large extent due to the significant diversity of molecules recognised by A-domains. In 1997 the structure of the phenylalanine-activating A-domain PheA from gramicidin synthetase 1 was solved,58 which led to the identification of a specificity code allowing the prediction of the monomer an A-domain would accept as a substrate.59 The refinement of this code and its utility led to the emergence of valuable web-based programs capable of predicting the substrate that an A-domain will accept (see websites NRPSpredictor2 (ref. 60) or NRPS/PKS substrate predictor61). Additionally, the interactions between A-domains and other NRPS domains, such as condensation (C) domains, have recently been shown to alter the selectivity of certain A-domains – this not only indicates the importance of protein–protein interactions in NRPS machineries but also implies that NRPS activity should be assessed in complete modules where possible.62,63
In addition to the great diversity of possible substrates, some A-domains – referred to as “interrupted A-domains” – harbour insertions of modification domains (called auxiliary domains) involved in the alteration of the selected substrate. Auxiliary domains typically exhibit methyltransferase, ketoreductase, oxidase or monooxygenase activities.† Significant research has been performed on auxiliary domains and it has been shown that it is possible to generate a bi-functional A-domain by inserting the sequence of an auxiliary domain in a standard A-domain.65 Similarly, it is possible to generate an uninterrupted A-domain from an interrupted one by deleting the sequence coding for the auxiliary domain.65,66 The potential for engineering specific gain/loss of function within A-domains through the use of auxiliary domains is of great value for reengineering these domains, helping pave the way to generating novel compounds through simple modifications to NRPS machineries. The recent structure of TioS, a natural A-domain interrupted by a methyltransferase auxiliary domain, is the first example of such architecture. In this structure, the adenylation catalytic site is located 60 Å away from the methylation site, which again highlights the crucial role of PCP-bound peptide shuttling in NRPS systems.67
A-domains are often found to form an essential complex with small proteins (c.a. 70 residues), named MbtH-like proteins (MLPs) due to their discovery in M. tuberculosis. MLPs have a consensus sequence identified as NXEXQXSXWPX5PXGWX13LX7WTDXRP68 and share a conserved, relatively flat fold consisting of a core central beta-sheet with three strands covered by a single alpha-helix; some MLPs also possess an extra C-terminal alpha-helix as seen in the first crystal structure of PA2412.69 The role of MLPs is still not fully understood since they have proven to be essential for the activity or solubility of some A-domains69,70 and totally dispensable for others.71
The general binding mode in A/PCP complexes involves PCP helix 2, which interacts in a parallel fashion with helix 11 of the Acorevia hydrophobic as well as ionic interactions ((Ar-CP) EntB/EntE (4IZ6, 3RG2),38,39 A-PCP in PA1221 (4DG9),44 EntF A-PCP interaction (5JA1 and 5T3D)).40 In addition “loop 1” between helices 1 and 2 of the PCP forms a network of charged interactions with the last structural motif (loop + strand) of the Asub domain. Mutation analyses confirmed the importance of these regions for the activity of EntB/EntE and assembly lines involved in the formation of pyoluteorin/prodiginine. Indeed, NRPS activity is reduced when point mutations are introduced in the loop 1 motif.40,79 Whilst the PA1221 PCP forms many interactions with its cognate A domain (Fig. 4A), the number of interactions reported in the case of LgrA in thiolation state is much lower. Specifically, Gln734 from PCP helix 2 forms a hydrogen bond with A-domain Gln447 and PCP helix 3 Tyr748 engages into hydrophobic interactions with Tyr421 (Fig. 4E). The analysis of the structure of LgrA in formylation state also reveals that the PCP uses the same Tyr748 from helix 3 to bind a small hydrophobic region on the F-domain composed of Leu127 and Met178 (Fig. 4C). In addition, the structure shows that the Asub participates in positioning the PCP for peptide formylation by creating an electrostatic surface composed of Asn648 and Asp652 allowing interaction with PCP helix 4 Arg residues 758 and 762 (Fig. 4C). It is important to mention here that in contrast to interactions described with any other domain, PCP helix 2 is not involved in any direct binding with the F-domain. Whilst many PCP-A structures show a conserved domain conformation, a recent structure of DhbF with a PCP-A di-domain arranged in the thiolation state varies substantially from the other available structures:43 in this structure, both PCP and Asub domains have moved away considerably from their canonical locations. Indeed, in this structure the Asub domain displays an “open” conformation and the PCP has rotated ∼86 degrees around the PPant arm attachment site. Whether this conformation is biologically relevant or imposed by the crystal packing remains to be assessed.
Interactions between A-domains and their cognate PCPs also include the linker connecting them. It has been demonstrated that over 70% of linkers between A-domains and PCPs have a conserved motif, which follows the essential conserved catalytic lysine of the Asub and displays a LPxP consensus. A mutation of the leucine residue (L958D in EntF) severely hindered the production of enterobactin (reduction of 1000-fold).80 The analysis of this linker in crystal structures of A-PCP di-domain reveals that the leucine residue docks in a conserved hydrophobic pocket created by residues from the beta-sheet in the C-subdomain, thus being important for the positioning of the Asub and the PCP in a conformation competent for the adenylation reaction.
Located at around 15 Å distance from the surface at each side of the end of the tunnel, the catalytic site is placed at the perfect distance from both the donor and acceptor PCP binding sites. No structural intermediate of a C-domain with both donor and acceptor substrates has been obtained so far, however a model of this tri-domain structure is reviewed in ref. 85. Given the pseudo-dimeric nature of the C-domain and the low number of interactions between each sub-domain half (floor loop and latch), it has been reported that C-domains are rather flexible and can be found in different conformations: these range from conformations seen as more “open” to those best described as being more “closed”.86 The relevance of such conformations is not fully understood, although interactions with PCP-bound substrates could be expected to provoke changes in C-domain state. In this way, controlling the specific order of PCP binding would help to maintain efficient NRPS synthesis, with the directionality of NRPS synthesis maintained through the asymmetry of the condensation reaction. Such ordered substrate binding would also provide a hypothesis to explain the increase in hydrolysis of peptides sometimes observed from NRPS assembly lines immediately preceding engineered A-domains:87 in these systems, modification of A-domain specificity can lead to a reduced rate of A-domain activity, which in turn would lead to water being able to competing effectively with the acceptor aminoacyl-PCP for attack of the thioester of the donor peptide substrate due to the slow rate of generation of this intermediate. C-domains can also be seen as crucial gatekeepers in ensuring not only the stereochemistry of the donor peptide (through the presumed dynamic competition for peptidyl-PCP substrate with a neighbouring E-domain) but also in ensuring the correct modification state of the aminoacyl-PCP acceptor substrate through allowing sufficient time for the PCP-bound amino acid to interact with the essential modifying domains, either in cis (such as a methyltransferase domains)67 or in trans (such as a hydroxylase or halogenase enzymes).88,89 Given these important roles that C-domains must play within NRPS catalysis (Fig. 1C), it is clear that many important insights remain to be gained from structural and biochemical investigation of these domains and their PCP-bound complexes.
Analysis of the structures from termination modules C-A-PCP-TE revealed that condensation domains share an extensive interaction surface with neighbouring adenylation domains (total of ∼1100 Å2).35,40 It has been hypothesised that these two domains could act as a catalytic platform possibly arranged in a helical fashion.35,90 However, the relevance of this interaction has been challenged with the structure of the EntF terminal module: the A/C interface in EntF is much less extensive than previously observed (∼780 Å2). The reason behind this discrepancy lies in the fact that in the first two structures the A-domains are seen in the “open-state” where Asub is packed against the C-domain – an interaction that is not present in the EntF structure as the A-domain is in “closed-state”, with Asub folded over the A-core and hence not interacting directly with the C-domain. Additionally, it would appear that whilst the A- and C-domains from the same module interact together, this cannot necessarily be extended to catalytic domains from different modules, since no direct interaction could be seen from a crystal structure of a cross-module NRPS (albeit the only example known to date).43 With a total of only four structures available to date providing insights into A/C interactions, it is difficult to provide a definitive picture of the interaction network between these two important domains and more structural and biochemical data are clearly needed to address this in the future.
C-domains are essential for the process of peptide chain elongation. As described above, both donor and acceptor PCPs bind at a dedicated side of the catalytic tunnel and present the peptides to the catalytic site (Fig. 1C). There, the catalytic histidine (H126 in VibH) (HHxxxDG) has been postulated to act as a general base, enhancing the nucleophilicity of the acceptor aminoacyl PCP and allowing the nucleophilic attack on the carbonyl of the thioester bound amino acid. This then results in the extension of the peptide chain by one residue, which is transferred from the upstream PCP-domain upon peptide bond formation (a mechanism conserved in both CAT and dihydrolipoamide acetyltransferase (E2p)).91 However, this mechanism has been questioned in the light of mutational analysis of several C-domains. Although in the CAT system the equivalent mutant (H195A) is six orders of magnitude less active than the wild type,92 the H126A VibH mutant shows only a minor decrease in catalytic activity (less than two fold).83 In a study by Bergendahl et al.,93 the mutation of the second histidine residue (H146A) in the C-domain of the NRPS TycB was shown to render the enzyme insoluble, suggesting an important structural role. In the same study, the mutation of the aspartate residue of the catalytic motif (D151N) was also reported to yield an inactive enzyme, which has been verified for the equivalent mutations in the other NRPS C-domains VibH (D130A) and EntF (D142A).94 There is evidence that the histidine residue can interact directly with the amino group of the acceptor substrate (gained from the structure of a C-domain from CDA biosynthesis (H157) that was engineered to covalently bind a mimic of the acceptor substrate),95 which could also indicate a role for this residue in positioning the amino group of the acceptor for attack of the donor thioester. Thus, despite being highly conserved, the HHxxxDG motif now appears to play varied roles in different C-domains and thus the specifics of the peptide bond reaction catalysed within C-domains could well vary depending on the specific domain involved.
Delivery of substrates to the catalytic site of the C-domain involves the correct docking of both donor and acceptor PCPs at the surface of the condensation domain. Although no structure of a donor PCP has been solved in complex with a standard elongation C-domain in a productive conformation, the structure of the fungal TqaA PCP-CT complex supports the original hypothesis that the binding location and the binding mode should resemble that of a PCP bound to an epimerisation domain for which structural data are also available (see below).41 Structures of the acceptor PCP-domain bound to the C-domain have, however, been determined: both structures of the terminal modules from SrfA-C and AB3403 are seen in the condensation state, with the acceptor PCP bound to the C-domain (Fig. 5B and C).35,40 When these structures are compared, C-domains superimpose relatively well (RMSD ∼4 Å; calculated using the Matchalign routine in Pymol (368 Cα aligned from a total of 443 Cα)). However, the PCPs are rotated around the PPant attachment site by more than 30° relatively to each other (Fig. 5B). In the case of the structure of AB3403, most of the interactions originate from PCP helix 2 (that carries the PPant attachment site, Ser1006 in this case) as well as the preceding and subsequent loops. In particular, Leu1007 and Val1010 (N-terminal portion of PCP helix 2) are engaged in hydrophobic interactions with Leu22 and Ile80 of the C-domain. Additionally, Val1026, Ala1027 and Ala1030 residues (beginning of PCP helix 3) form hydrophobic interactions with Tyr26 and Leu30 from the C-domain (Fig. 5E). There are limited hydrophilic interactions, with those noted involving the side chain of Lys1011 of the PCP and the main chain carbonyl of Gln78 from the C-domain together with Arg344 from the C-domain interacting with the phosphate of the PPant moiety. In the SrfA-C structure, it is noticeable that PCP helix 2 runs parallel to C-domain helix 1, making possible a number of hydrophobic interactions. Most of the interactions again involve PCP helix 2 and neighbouring loops in the same manner as seen for the AB3403 structure (Fig. 5A). Specific PCP residues include Met1007 and Phe1027 that form hydrophobic interactions with C-domain helix 1 Phe24/Leu28 and helix 10 Tyr337; the importance of these interactions has been probed for the EntB system, which showed the corresponding residues were essential for productive PCP-C interactions.96
Fig. 5 Interaction between condensation/epimerisation domains and PCPs. (A) linear localisation of PCP residues involved in interactions with the either C-domains or E-domains. (B) and (C) Crystal structures of complexes formed between PCPs and C-domains showing the flexible positioning of the PCP (PCPs rotated by 30 degrees between the structures 2VSQ and 4ZXH). (D) Crystal structure of a PCP-domain in complex with an epimerisation domain (5ISX). (E) Close-up of the interactions between the AB3404 PCP and C-domains (coloured as in panel (C)). (F) Crystal structure of a PCP bound to the donor site of a CT domain. (G) Close-up of the interactions identified in the complex of the TqaA PCP (cartoon representation, residues shown as sticks) and CT domains (protein surface representation, residues shown as sticks) coloured as in panel (F). |
Epimerisation (E)-domains are non-canonical V-shaped domains that are structurally highly reminiscent of C-domains (Fig. 5D) despite a rather low sequence homology (<20%).83,97,98 E-domains play a vital role in modifying the stereochemistry of amino acids incorporated in the growing peptide chain (i.e. altering the configuration of the C-terminal residue of the PCP-bound peptide from L to D). The first structure of an E domain was of an isolated domain excised from TycA (the first module of the tyrocidine synthetase). From the point of view of the overall structure, E and C-domains initially appear very similar: however, two E-domain specific features have been implicated as playing important roles in their catalytic function. The first feature is found in the so-called “floor loop”, which is extended by at least five residues in E-domains and is postulated to be involved in interactions with the neighbouring PCP-domain. The second important difference is located within the bridge region at the top of the V shaped structure that corresponds to the C-domain binding site for the acceptor PCP. In TycA, this region is blocked by an insertion of eleven residues, which serves to obstruct the catalytic site access from this side of the catalytic tunnel.83,97
The structure of the gramicidin synthetase GrsA PCP-E di-domain41 shows the interaction network required for a functional complex formation and can also be seen as a mimic of an acceptor PCP bound C-domain. One of the most noticeable features of the PCP-E domain interaction is that the linker between the two domains appears to play a prominent role in recognition and binding (Fig. 5D). Indeed, in contrast to usually flexible loops linking C-domains and PCPs, the linker region in this case forms extensive ordered interactions along the surface of the E-domain that are mainly charged/polar in nature. Notably, the residue pairs Arg613/Asp788 and Arg614/Glu785 act as anchor-like electrostatic “hooks” of importance for the localisation of the linker region on the E-domain and the correct positioning of the PCP relative to the E-domain active site tunnel.41 To confirm the significance of the linker interaction with the surface of the E-domain, mutation analyses were carried out revealing that a E785/D788R double mutation was enough to disturb the linker interaction network with the E-domain, which resulted in 20% by-pass of the epimerisation reaction. This result emphasises the importance of the linker in PCP-E domain interactions. This structure also resolved the direct interactions between the PCP and the E-domain, which are largely formed by residues from PCP helices 2 and 3. Four hydrogen bonds stabilise the interface between the domains: PCP/E Gln578 (helix 2)/Asp983, Asp572/Gln979, Gln 587/Glu785 and finally, PCP Thr592 (helix 3) forms a hydrogen bond with Glu898 from the extended floor-loop of the E-domain. It has been suggested that this floor-loop participates in the correct positioning of the PCP-helix 2 (and hence the PPant moiety) to allow catalysis: the recent structure of the unusual CT structure from TqaA, a fungi NRPS C-domain-like involved in macrolactamisation and release of the final product, also demonstrates such a positioning of the PCP in the donor site of the catalytic channel (Fig. 5F).47 Although the first structure of a PCP-C di-domain was obtained earlier for a part of the tyrocidine synthase Tyc6,36 TqaA represents the only structure to date of a donor PCP bound to a C-domain in a catalytic competent state. Upon analysis of the interaction surface between the donor PCP and the C-domain the following multiple interactions have been reported: PCP Arg3571 and CT Asp3906 form the only salt bridge, PCP Phe residues 3554 and 3555 are engaged in hydrophobic interactions with Gly3868 and Ile3869 from the CT domain as well as PCP Ile3561 that docks into a hydrophobic pocket contributed by residues CT Val3772 and Ile3981 (Fig. 5G). Such structural comparisons show that the similar structures of C/E domains are matched by comparable donor PCP-bound states, although the importance of the linker region in E-domains appears to be a crucial difference to C-domains. The example of TqaA also shows that condensation reactions can lead to peptide chain release from an NRPS, although this function is usually the result of a separate thioesterase domain (or less commonly, via reductive cleavage).
TE domains are relatively small (∼250 residues, ∼30 kDa) and belong to the superfamily of α/β hydrolases that includes a number of lipases and acetylcholinesterases with a catalytic triad typically composed of serine, aspartic acid and histidine residues. TE domains catalyse the release of the substrate from the bound PCP through a two-step reaction: firstly, the TE-domain mediates the transfer of the peptidyl group from the donor PCP onto the activated serine residue in the TE-domain active site, thus forming an O-acyl-enzyme intermediate (the only non-PCP bound intermediate after A-domain activation of amino acid residues). In the second step, a nucleophilic attack on the enzyme tethered ester can take one of several different – and typically highly specific – routes. One common example is hydrolysis, which is triggered when the nucleophile is a water molecule and leads to the release of the linear peptide from the NRPS. Another very important example is macrocyclisation, which occurs when the nucleophile is a functional group from within the linear peptide (i.e. the N-terminal amino group or a nucleophilic side chain) and leads to cyclisation of the peptide with concomitant release from the NRPS. For a comprehensive review on PKS and NRPS release mechanisms, see Du and Lou 2010 (ref. 102) and Horsman et al. 2016.103
Structural approaches have provided insights into both classes of TE domains. The structure of the terminal TE domain from the surfactin assembly line SrfA-C has been excised and structurally characterised as an exemplar of the type I TE-domain fold.104 It exhibits the conserved superfamily fold of a 7-stranded central beta-sheet surrounded by eight α-helices. Of particular interest are the three α-helices known as the “lid”: these cover the active site composed of the conserved serine (Ser80 in SrfA-C) within the signature motif GxSxG together with residues His207 and Asp107. The SrfA-C TE domain was crystallised with two monomers in the unit cell, with the lid regions of each monomer adopting different conformations referred to as the “open” and “closed” forms. The overall structure of the SrfA-C TE domain is reminiscent of a bowl, with a groove under the lid to accommodate the large final peptide substrate.104 The overall architecture of the TE domain was further confirmed by the structure of the excised fengycin NRPS TE-domain (FenTE).105 Aside from a different “lid” conformation, SrfA-C TE and FenTE are closely related structures, with a RMSD of around 1.1 Å when the domain cores are compared (and excluding the lid regions). Structural data concerning type II thioesterases were obtained several years later from the external thioesterase of the surfactin synthetase.106 As has been seen for type I TE domains, the core domain of SrfTEII (surfactin thioesterase type II) also superimposes well onto the core structure of type I TE domains, albeit with some important differences. When compared to SrfA-C TE, SrfTEII possesses an extra helix between the active site residues Asp189 and His216 and also shows a repositioning of the “lid” region. The consequences of those modifications are that the catalytic triad is only partially covered by a short loop, which in turn makes the catalytic site much more accessible than it is in the other (type I TE) structures. In addition, the catalytic pocket in SrfTEII is smaller than the one in SrfA-C TE, which matches well with a role in hydrolysing small groups from the PPant arm as opposed to large peptides. This also ensures that the type II thioesterases do not cleave off the growing peptide chain in a “normal” NRPS process, which would be highly deleterious to their efficiency.
The crystal structure of another type II TE, RedJ, confirmed the shared fold with type I TE and the importance of both the catalytic site pocket and the “lid” to maintain a high degree of selectivity regarding thioesterases' substrates.107 Although TE domains share an overall common fold as described above, examples from the NRPS machineries of the glycopeptide antibiotics and the related GPA-like peptide complestatin possess a longer than usual N-terminal linker to this TE domain. Enzymatic assays carried on the teicoplanin synthesis machinery have recently shown that this linker is important for the activity of the TE domain.108 Of particular interest is the fact that secondary structure predictions show the linker is mostly alpha-helical in nature, which in turn suggests that it could play a structural role. This is supported by the crystal structure of the macrocycle forming TE domain from the clinically relevant erythromycin antibiotic synthase displaying an extended N-terminal linker folded as two additional helices covering the “lid region”.109 Further structural studies of this unusual linker are needed in order to provide new insights into the activity and selectivity of terminal NRPS thioesterase domains in GPA systems.
One of the critical steps in the function of the NRPS assembly line is the recognition of the donor PCP by the TE domain. Structural information has been provided through NMR studies performed on the PCP-TE di-domain structures of the apo EntF37 (type I TE with PCP) and the type II surfactin thioesterase with its cognate PCP.106 The EntF structure reveals that the PCP lies in a small cradle formed by the lid region (residues 226–266, helices 4 and 5) and the core of EntF-TE (Fig. 6B); this lid region covers both PCP and TE domain catalytic sites. The PCP and TE domains mainly interact together through a network of hydrophobic interactions burying a surface of ∼1300 Å2. As in other complexes involving a PCP, interactions are predominantly found to involve PCP helix 2 and the loop between helices 1 and 2 (residues 41 to 55) (Fig. 6A). Within this region, the PCP interacts both with the core of the TE domain as well as with the tip of the lid. It was proposed that Phe41 is structurally important to maintain the 4-helix bundle fold of the PCP through hydrophobic interactions, whereas Phe42 directly interacts with the first beta-strand of the TE core. Specifically, PCP residues Phe41, Phe42 and Met72 act as a hydrophobic clamp on the TE Trp121 (Fig. 6C). These two Phe residues are highly important, since mutations of either result in the loss of interaction between the domains.37 Also, PCP residues Leu49 and His47 dock into a pocket formed at the surface of the TE domain (Fig. 6C). NOE couplings indicate additional interacting residues from the TE lid (Leu240, Ala241, Ala 242 and Gln244) and the TE core (Phe119, Gln122, Leu100 and Leu102). PCP residues involved in the interaction include helix 1/2 loop residue Gly46, helix 2 residues Leu50, and the distant residue Val73. Despite this large interaction network, the PCP-TE di-domain structure shows large movement around the contact region indicative of a continuous breathing/opening motion of the lid. The authors of this study emphasise the fact that this motion is essential for providing the conformational plasticity to the TE in order to accommodate the PCP PPant moiety and allowing the PPant to traverse the catalytic cradle.
Although class I and class II TE enzymes are involved in slightly different catalytic activities, NMR interaction studies of SrfTEII with TycC3 PCP indicates a very similar mode of interaction between PCPs and the two classes of thioesterase.106 The interface of the TycC3 PCP is mainly comprised of PCP helix 2 and some additional residues within PCP helix 1 and the PCP C-terminal region. As reported for type I thioesterases, the “lid” region of SrfTEII plays a great role in recognition of PCP helix 2, mostly in the region of the catalytic Ser45 residue. The additional crystal structure of a class I TE (ClbQ) in complex with a donor PCP, confirmed the role of the flexible lid region in substrate binding and specificity.110
In both these systems, P450 enzymes have been identified as a source of the β-hydroxyl groups found within the final peptide structures of these compounds. In the case of GPA biosynthesis, machineries encoding a homologue of the P450 OxyD have been shown to incorporate β-hydroxytyrosine (Bht) residues directly into the NRPS peptide113,114 (a further subgroup utilises a non-heme iron oxygenase that is believed to act directly against NRPS-bound amino acids during peptide synthesis, although this has yet to be investigated in detail).115 The production of Bht by OxyD also relies on two further proteins – a minimal NRPS module comprising A- and PCP-domains (balhimycin homologue BpsD), and a separate thioesterase (balhimycin homologue Bhp). Mechanistically, OxyD utilises amino acids bound to the PCP-domain of the BpsD protein, which following hydroxylation are then cleaved by the thioesterase for subsequent incorporation into the heptapeptide producing NRPS. The structure of OxyD reveals a well ordered and highly exposed active site, which is unusual for a structure of a substrate-free P450 enzyme.113
Subsequent analysis of the active site residues responsible for orchestrating the open and rigid conformation of the P450 active site revealed that these are highly conserved amongst P450s responsible for the β-hydroxylation of PCP-bound amino acids, suggesting that these P450s recognise and specifically bind to the carrier protein portion of the substrate.88,113 Structural data relating to a P450/PCP complex were obtained in 2014 with the co-crystal structure of a PCP (PCP7) and an P450 from the skyllamycin NRPS machinery.45 In this structure, the PCP adopts the classical 4-helix bundle with helices 2 and 3 arranged as a X-shaped cradle to accommodate residues from helix G of the P450 (Fig. 7B), with the majority of the interactions found within these regions of the two proteins (Fig. 7A). Specifically, a large hydrophobic cavity is formed by PCP residues Phe35, Phe36, Ala45, Phe65, Phe66 and Leu62 to accommodate Trp193 and Leu194 from helix G of the P450 (Fig. 7C). From the P450, residues Ala90, Met94, Leu200 and Leu239 interact with Leu43 from the PCP (the +1 residue from the catalytic Ser42) as well as the two methyl groups of the PPant moiety. It is important to note that except from Ala45, none of those residues belong to PCP helix-2 and mostly belong the PCP-helix 3 and the loop between helices 1 and 2. However, PCP helix-2 still plays an important role in the interaction with the P450, but mainly contributes residues involved in hydrogen bonding. Indeed, Thr46 and Lys47 from PCP helix-2 interact with E235 and Asn197 from the P450 G-helix, whilst PCP helix-3 residues Leu62 and Arg63 interact with Asp191 and Glu198 from the P450 G-helix, respectively. In addition to these protein–protein interactions, a network of hydrogen bonds within the P450 is involved in stabilising the PPant arm of the PCP. The true nature of these interactions is likely perturbed by the PCP cargo present in this structure, which was a small molecule inhibitor mimic of an amino acid that was necessary in order to improve the affinity of the P450/PCP complex for structural analysis.45 At this stage of our analysis of PCPs interaction with other domains, and when comparing TE-PCP and P450-PCP complexes (Fig. 6 and 7), it appears obvious that PCPs interact with these 2 domains in a very similar way, using a group of hydrophobic residues (phenylalanine clamp) to secure a solid anchor to their partner domain.
In addition to providing valuable data about P450/PCP complex formation interface, Haslinger et al.45 also discussed the role of the PCP three-dimensional structure in selectively recognising their cognate P450s. Given that PCPs are small and share a high level of sequence conservation, it is unlikely that the amino acid sequence of a PCP would dictate its selectivity. However, it has been shown that subtle changes in tertiary structure can be important for PCP specificity.45 It is important to note that in a comparable P450/CP structure solved from the biotin operon from B. subtilis in which the acyl carrier protein (ACP) is bound to a P450 (P450BioI),116 the ACP protein is located in a very different location on the P450. The differences in the substrate (amino acid vs. fatty acid), carrier protein (amphiphilic PCP vs. acidic ACP) and reaction performed (hydroxylation vs. carbon bond cleavage) likely guide the different binding modes observed in these two complexes, although in both cases these structures reveal how P450 enzymes can use carrier protein binding partners in order to bind and oxidise their desired substrates.
A further, highly complex in trans modification of NRPS-bound substrates has been identified from GPA biosynthesis, in which P450 enzymes perform sequential oxidative cyclisation reactions to generate rigid, biologically active aglycones from the original linear heptapeptide product of the NRPS machinery.117,118 These P450 enzymes (also known as Oxy enzymes) each insert one ring into the final GPA structure: the three enzymes from vancomycin-type GPAs catalyse insertion of the essential C–O–D, D–O–E and AB rings (catalysed in that order by OxyB, OxyA and OxyC, respectively), whilst the non-essential F–O–G ring from teicoplanin-type GPAs is inserted by the enzyme OxyE immediately after the activity of OxyB.119,120 Due to the complexity of the cyclisation cascade, a separate NRPS domain – known as the X-domain – has been implicated in the recruitment of these P450s to the PCP-bound heptapeptide substrate.121 This to date is the only example of a separate recruitment domain for trans-modifying enzymes, with comparable single step modifications (for example the aryl crosslinking observed in arylomycin biosynthesis)122 not requiring such a domain. The essential role of the X-domain in GPA crosslinking has been implied in a number of in vivo118–120 and more recently proven by in vitro experiments, where the use of X-domain containing constructs have allowed the characterisation of both OxyE and OxyA enzymes for the first time.108,121,123–125 Definitive evidence that the X-domain was indeed a binding platform for the Oxy enzymes came with the structure of the complex between the X-domain and OxyB from the teicoplanin NRPS assembly line.121 In this structure, as anticipated, the fold of the X-domain resembled that of a C/E-domain, albeit with insertions that blocked the tunnel usually occupied by the acceptor PCP substrate. In addition, the canonical C-domain catalytic motif was modified in the X-domain making it inactive for peptide bond formation or epimerisation. When compared, structures of the X-domain in the presence or absence of OxyB are extremely similar.121 This observation, also found to be true for OxyB, shows that the formation of this complex does not trigger domain rearrangement and depends solely on a rigid-body type of interaction. Unusually for an NRPS, the interaction forces are mostly driven by hydrogen bonds and salt bridges, with few hydrophobic residues involved. The novel position of the Oxy enzyme within the complex retains space for the simultaneous binding of a PCP bound peptide substrate. This structure together with extensive biochemical evidence supports the notion that the catalytically inactive X-domain acts as a platform onto which the Oxy enzymes can bind in order to affect the complex process of peptide cyclisation during GPA biosynthesis.123 In this case, the subsequent TE-domain also plays a role in proof reading the crosslinked state of the PCP-bound peptide, which only becomes active against fully crosslinked – and thus mature – peptide aglycones.108
Thus, it appears as though the ability to target PCP-domains is sufficient for enzyme targeting in relatively straightforward trans-modification steps of either aminoacyl- or peptidyl-PCP substrates, whilst the highly complex process of GPA crosslinking requires a separate recruitment domain in order to avoid this step from stalling the NRPS machinery.
Footnote |
† For a comprehensive review on auxiliary domains, refer to Labby et al. 2015.64 K. J. Labby, S. G. Watsula and S. Garneau-Tsodikova, Nat Prod Rep, 2015, 32, 641–653. |
This journal is © The Royal Society of Chemistry 2018 |