The cycloaspeptides: uncovering a new model for methylated nonribosomal peptide biosynthesis

The cycloaspeptide gene cluster includes a pathway-specific N-methyl transferase. Its disruption allowed incorporation of N-methylated amino acids provided in the culture medium, allowing efficient production of cycloaspeptide E and novel related products.


Introduction
Cycloaspeptides are bioactive cyclic pentapeptides which were originally identied from an unclassied Aspergillus species in 1987. 1 Cycloaspeptides A-C 1-3 contain anthranilic acid (position p1), L-alanine (p2), L-phenylalanine (p3), L-leucine (p4) and L-tyrosine (p5) residues, and differ only in the Nmethylation pattern of the phenylalanine and tyrosine moieties (Chart 1). A later screen of the Aspergillus genus failed to identify any cycloaspeptide producers, but multiple cycloaspeptide-producing Penicillium species have since been identied. [2][3][4] Cycloaspeptide D 4, which contains a valine at p4, was isolated along with cycloaspeptide A 1 from Penicillium ribeum and P. algidum, in work which also identied these cycloaspeptides as being active against the malarial parasite Plasmodium falciparum. 3,4 In 2006, cycloaspeptide E 5 was isolated as a minor metabolite from several Penicillium species and one Trichothecium species, and reported to exhibit insecticidal activity against lepidoptera. 5 5 differs from 1 in having phenylalanine instead of tyrosine at p5 (Chart 1), and was proposed to have a neurotoxic mode of action. There is signicant interest in natural products with insecticidal properties due to the huge economic and social burden created by insect damage to crop species worldwide. 6 Further research into the bioactivity of cycloaspeptide E 5 has been hampered by the fact that yields are only approximately 2% of the main metabolite; cycloaspeptide A 1, meaning that purication for bioactivity screening is problematic and the natural titres are too low to be commercially viable. Most recently cycloaspeptides F 6 and G 7 from the entomopathogenic fungus Isaria farinosa, have been isolated and characterized. These compounds have cytotoxic properties against tumour cell lines. 7 In order to investigate cycloaspeptide biosynthesis, with the key aim of improving the yields of 5, we set out to discover and manipulate the cycloaspeptide biosynthetic pathway, using two publicly available cycloaspeptide producers, Penicillium soppii (CBS 869.70) and Penicillium jamesonlandense (CBS 102888). 2 Our original working hypothesis was that the cycloaspeptides would t the accepted paradigm for methylated cyclic peptide biosynthesis, and as such would consist of a traditional 5module NRPS containing the two requisite N-methyltransferase (N-MeT) domains. We reasoned that substrate promiscuity of the NRPS could account for the range of cycloaspeptides reported, whereas we considered a RiPP system unlikely due to the delity of ribosomally synthesised peptides to their encoded precursor peptide sequence. A combination of approaches including bioinformatic analyses, heterologous expression, gene deletions and feeding experiments have demonstrated that indeed, an NRPS is responsible for the biosynthesis of the cycloaspeptides. However, rather than containing N-Met domains, the NRPS requires a pathway-specic trans-acting methyltransferase to provide the necessary substrates and preferentially incorporates methylated amino acids at two positions. This discovery unlocked a new avenue for pathway engineering through synthetic biology, by allowing unprecedented control over substrate availability.
Results and discussion P. soppii and P. jamesonlandense were cultured on a range of solid media and screened by LCMS for the production of Chart 1 Structures of the cycloaspeptidesisolated from various filamentous fungi. Fig. 1 Cycloaspeptide gene clusters from P. jamesonlandense (top) and P. soppii (bottom). The N-methyltransferase (PjcyA and PscyA) and NRPS (PjcyB and PscyB) genes are adjacent to one another in both species. Additional genes identified at the same locus in P. jamesonlandense, which could potentially be involved in secondary metabolite biosynthesis, such as an amino acid transporter and transcription factor were identified on a separate scaffold in P. soppii. Arrows denote homologous genes; vertical black bars within genes denote introns. cycloaspeptides. A compound with the correct mass for cycloaspeptide A 1 could be reliably detected by LCMS, from both species and on all media aer 2 weeks. This was conrmed by purication and NMR analysis (Table S8 †). P. soppii was also grown in three different liquid media (CYB, MEB and YEB), and again 1 could be detected in all, with the highest concentrations being achieved with CYB ( Fig. S21 †). A compound with the correct mass for cycloaspeptide E 5 could be detected in certain cultures, but only in trace amounts and not reliably. Analysis using the highly sensitive UPLC-Orbitrap mass analyser system conrmed that the accurate masses matched the chemical formulae of the predicted compounds ( Fig. S22 and Table S6 †). The sensitivity of this system also allowed the detection of a compound with the correct predicted formula to be cycloaspeptide G 7, a compound which has not been detected in a Penicillium species before.
Cycloaspeptide A 1 could also be easily detected by LCMS in MEB cultures of P. jamesonlandense (Fig. S23, S26 and Table  S7 †). There were also compounds present with the correct masses to be cycloaspeptides D 4, E 5 and G 7 (Table S7 †). Unfortunately, the yields of 5 were too low to purify for NMR and structural validation.
Paired-end genomic sequence data was generated for both species and assembled to produce dra genomes (Table S2 †), which were analysed using AntiSMASH 8 to identify putative biosynthetic gene clusters (BGCs). A total of 82 and 83 BGCs were detected for P. soppii and P. jamesonlandense respectively, demonstrating the rich metabolic potential of these species (Table S3 †).
P. soppii contains 20 NRPS clusters whereas P. jamesonlandense contains 17, with 8 such clusters being common to both species. The domain architecture of each predicted NRPS was analysed using the NCBI conserved domain database (CDD) (Table S4 †). Only one was found to contain the ve modules that would be expected for pentapeptides such as the cycloaspeptides (Fig. S7 †). Unexpectedly this NRPS lacked any integral N-methyltransferase (N-MeT) domains, which appeared inconsistent with cycloaspeptide biosynthesis, as N-methylation, if present, is normally performed by such domains embedded within the NRPS. However, an adjacent gene present in both species is predicted to encode a methyltransferase, so might perform this activity in trans. Additionally, the NRPS appears to contain a nal condensation-like (CT) domain, where rather than having the highly conserved HHXXXDXXS/T motif found in condensation domains, it has the modied motif SHXXXDXXS/T (5377SHAQYDGVS5385). CT domains are known to catalyse the macrocyclisation of cyclic peptides in lamentous fungi 9a role analogous to that of the nal thiolesterase in bacterial cyclic peptide biosynthesis.
In P. jamesonlandense, adjacent to the NRPS and putative N-MeT encoding genes are various additional predicted genes that could potentially be involved in cycloaspeptide production. These encode an amino acid transporter, a Zn 2 Cys 6 transcription factor and an aminotransferase. In P. soppii however, these genes are translocated on another scaffold (sc038, Fig. 1). It is still possible that they are involved in cycloaspeptide biosynthesis, as split secondary metabolite clusters have been observed in fungi multiple times. 10 However, it is also possible that the NRPS and N-MeT are the only proteins required for the production of cycloaspeptides.
Bioinformatic analysis was conducted for the adenylation (A) domains of the NRPS. This focused on the predicted presence of an anthranilic acid adenylating domain, due to their distinct nature. 11 The online NRPS predictor tool 12 was used to generate the 10 amino acid code for each domain (Table 1), and these  were compared with those of known fungal NRPS enzymes. 11 The presence of a glycine in the rst position of the 10 amino acid code for domain 1, rather than the aspartic acid typically seen, is indicative of domain 1 being an anthranilic acid adenylating domain. This is consistent with the fact that all known fungal anthranilic acid A-domains are positioned within the rst module of the synthetase. 11 It then follows that the second module incorporates an alanine, the third a phenylalanine, the fourth a leucine, and the h, either a tyrosine (to produce 1) or a phenylalanine (to produce 5). Evidence supporting this assignment is the similarity of the third and h modules, which are predicted to incorporate Phe and Phe/Tyr respectively. In P. soppii these domains have 80% sequence similarity when comparing the 10 amino acid code, and 85% sequence similarity when comparing the 34 extracted residues. Interestingly, it was not possible to identify homologous gene clusters in publicly available fungal genomes, even when specically searching Aspergillus, Penicillium, Trichothecium or Isaria genomes for homologues of PscyA and PscyB. This suggests that although the cycloaspeptides have been reported from various fungal genera, the gene cluster is not particularly wide-spread and cycloaspeptide biosynthesis may be limited to a fairly small fungal clade, or clades.
To allow an investigation of gene function in P. soppii, a protoplast-based transformation system was developed, using either hygromycin or geneticin resistance as selectable markers. This transformation system was optimized using eGFP as a reporter gene (Fig. S9 †) and a bipartite knock-out strategy 13 was then used to disrupt specic genes. The disruption of either the NRPS (PscyB) or the N-MeT (PscyA) led to a complete loss of production of both cycloaspeptide A 1 and E 5, ( Fig. 2A-C) conrming that both genes are required for cycloaspeptide biosynthesis and that 1 and 5 are synthesized by the same pathway.
The disruption of the transcription factor (PscyD) led to a marked reduction in cycloaspeptide yields, suggesting that although it is translocated in P. soppii, it is still involved in the regulation of cycloaspeptide biosynthesis (Fig. S24 †).
To further conrm the identity of the gene cluster and conclusively demonstrate that the loss of production was not an indirect consequence of the transformation system, strain NRPS-KO1 (DPscyB) was complemented by transformation with the plasmid pTYGen-N-MeT-NRPS which contains PscyA and PscyB under the control of Padh (the alcohol dehydrogenase promoter from A. oryzae) and Ptub (the tubulin promoter from P. soppii) respectively. Selection of geneticin resistant transformants led to the restoration of cycloaspeptide production (Fig. 2D).
Heterologous production of both 1 and 5 was achieved in Aspergillus oryzae NSAR1 via the co-expression of PscyA and PscyB from a multi-gene expression vector with arginine prototrophy selection ( Fig. 2E and F). 14 This demonstrates that PscyA and PscyB are the only two structural genes required for cycloaspeptide biosynthesis. Interestingly, the ratio of cycloaspeptide A 1 to E 5 appeared to differ in A. oryzae when compared to the natural producers, with cycloaspeptide E 5 being easily detectable in initial screens of the A. oryzae transformants. An apparent decrease in tness and growth rate was observed in A. oryzae transformants producing cycloaspeptides, and production was not stable. A loss of production occurring aer subculturing prevented quantication of titres. In an attempt to push the ratio of 5 : 1 further in favour of cycloaspeptide E 5 biosynthesis, a full A domain swap was conducted. The seemingly promiscuous A domain from module 5 (Tyr/Phe) was replaced with a second copy of the A domain from module 3 (Phe). This engineered NRPS was expressed both in A. oryzae and the P. soppii DPscyB strain but the NRPS was dysfunctional, with no cycloaspeptides being detected in either strain (data not shown).
To ascertain the order of events in the cycloaspeptide biosynthetic pathway, LCMS chromatograms for the N-MeT knock-out strains were searched for any putative un-methylated intermediates. The absence of any such compounds suggested that rather than the N-methylations serving to decorate the product of the NRPS, the N-MeT may act rst, providing methylated amino acids for incorporation by the NRPS. To test this, N-MeT knock-out strain 4 (NM-KO4) was fed with 1 mM of both N-methylated tyrosine (N-mTyr 8) and N-methylated phenylalanine (N-mPhe 9). This fully restored cycloaspeptide biosynthesis (Fig. 3B), demonstrating that the NRPS accepts free methylated amino acids, a feature not previously observed in non-ribosomal peptide systems in either fungi or bacteria.
In further experiments, wild-type P. soppii and NM-KO4 cultures were fed with either a 1 : 1 mixture of N-mPhe 8 and N-mTyr 9 (1 mM each), or with N-mPhe alone (either 1 mM or 5 mM nal culture concentration) (Fig. 3). Increases in the cycloaspeptide yields when the WT strain was fed with the Nmethylated amino acids suggests that substrate availability is a limiting factor in this system (Fig. 3C). The yields of the cycloaspeptides in the fed NM-KOS cultures were most striking. In addition to producing wild-type yields of cycloaspeptide A 1, cultures fed with both N-mPhe 8 and N-mTyr 9 produced over four times more cycloaspeptide E 5 than the unfed wild-type strain (Fig. 3D). When supplemented with either 1 mM or 5 mM N-mPhe 8 cycloaspeptide E 5 titres were increased further, to approximately 14.5 and 16.5 times respectively (Fig. 3D).
Such increased yields allowed purication of cycloaspeptide E 5 for structural conrmation by NMR (Table S9 and Fig. S23 and S24 †). In addition to 5, a compound could be detected in NM-KO4 cultures fed with N-mPhe 8 that has the expected mass and UV to be cycloaspeptide B 2 (Fig. 4). 2 was identied from the original Aspergillus sp. NE-45, 1 and has a methylated phenylalanine at p3, but an unmethylated tyrosine at p5. This compound has not been reported in Penicillium species before.
A minor metabolite with the correct accurate mass to be cycloaspeptide G 7, which has methylated tyrosine at both p3 and p5, was detected in wild-type P. soppii cultures using an UPLC-Orbitrap mass analyser. To determine whether the titres of this compound could be increased, NM-KO4 cultures were fed with N-mTyr alone. This resulted in the easy detection of 7 in all fed cultures using standard LC-MS methods (Fig. 4). Feeding with N-mTyr also led to the production of a compound not detected in the wild-type cultures which has the expected mass to be cycloaspeptide C 3. 3 has un-methylated phenylalanine at p3 and N-mTyr at p5. As with 5, this compound has not been previously observed in Penicillium strains. These results indicate that the third and h modules of the NRPS have a strong preference for methylated amino acids, but also the ability to accept and incorporate un-methylated amino acids.
To further investigate the substrate selectivity of the NRPS, a range of alternative amino acids were fed to strain NM-KO4. N-Methyl-leucine 10, N-methyl-isoleucine 11, N-methyltryptophan 12, a-methyl-phenylalanine 13 and N-methyl-Dphenylalanine 14 were obtained commercially (Chart 2). Racemic p-methyl-N-mPhe 15, and a range of uorinated N-mPhe analogues (with the uorine at the ortho, meta and para positions [18][19][20] were synthesised by alkylation of the Boc-Nmethylglycine dianion with the appropriate benzyl bromides, followed by Boc deprotection with triuoroacetic acid (TFA) Scheme 1 The synthesis of amino acid analogues. (Scheme 1A). N,O-Dimethyl-L-Tyr 17 was synthesised by double deprotonation of Boc-L-tyrosine with sodium hydride, followed by dimethylation with methyl iodide (Scheme 1C). Boc deprotection was again performed with TFA. N-Ethyl-L-Phe 16 was synthesised by reductive amination of L-phenylalanine with acetaldehyde and sodium cyanoborohydride (Scheme 1D). These compounds were either fed alone (5 mM) or in combination with N-mPhe 8 (at a ratio of 5 : 1-5 mM : 1 mM). The only cultures to produce novel compounds in the initial screens were those supplemented with uorinated phenylalanine analogues. The synthesis of 4F-N-mPhe 20 was therefore scaled up (see ESI †) and an LCMS analysis identied compounds with the correct masses to be uorinated analogues of cycloaspeptide E 5, and cycloaspeptide B 2 (Fig. 5). 4Fcycloaspeptide E 22 was puried and characterized using proton, carbon and uorine NMR (Table S10 and Fig. S28-S30 †). Puried 5 and 22 were screened for anti-lepidopteran activity using the tobacco budworm (H. virescens) as an industrially relavent target organism, but no activity was observed against this species following injection with either compound (Table S12 †).
A search of the NCBI database for any potential PscyA homologues identied an N-MeT from A. avus with 39% sequence identity to PscyA at the protein level (Table S5 †). Investigating the genomic context of this gene determined that it is part of a BGC containing a 2 module NRPS and cytochrome P450, which has been identied as the ditryptophenaline 23 gene cluster by Watanabe and coworkers. 15 23 is a dimeric diketopiperazine consisting of two cyclic Trp:N-Met-Phe dipeptides.
An A. avus strain with the ditryptophenaline N-MeT disrupted (DdtpB) was kindly supplied by Prof. Kenji Watanabe of the University of Shizuoka, and feeding studies were conducted. Analogous to the cycloaspeptide system, feeding cultures with N-mPhe 8 fully restored ditryptophenaline 23 production, demonstrating the acceptance of 8 by the ditryptophenaline NPRS (dtpA). Furthermore, the production of a uorinated ditryptophenylene analogue 24 was again achieved by feeding cultures with 4F-N-mPhe 20 (Fig. 6). Interestingly, feeding with N-mTyr 9 did not result in the production of hydroxylated analogues, demonstrating that the ditryptophenaline NRPS exhibits lower promiscuity than the cycloaspeptide NRPS. The rapid discovery of a second NRPS which accepts methylated amino acids suggests that these systems represent a new fungal route to methylated peptide natural products, rather than the cycloaspeptide system being unique.

Conclusions
The gene cluster responsible for the biosynthesis of the cycloaspeptides has been identied in two psychrotolerant Penicillium species: P. soppii and P. jamesonlandense. Heterologous expression in A. oryzae has demonstrated that the minimal gene set required to produce both cycloaspeptide A 1 and cycloaspeptide E 2 is a 5-module NRPS and a new type of pathway specic N-methyltransferase (N-MeT). Gene knock-outs and feeding studies have demonstrated that two modules of the NRPS preferentially accept and incorporate N-methylated amino acids, which are provided by the pathway specic N-MeT, a system not previously seen in secondary metabolism. Selective amino acid N-methyltransferases are very rare in either primary or secondary metabolism. The only known enzyme to catalyse a single N-methylation of a proteogenic amino acid is glycine Nmethyltransferase from primary metabolism. The most similar known examples from fungal secondary metabolism are 4dimethylallyltryptophan N-methyltransferase, which is the second pathway-specic enzyme in the production of ergot alkaloids, 16 and a methyltransferase from the ergothioneine pathway which trimethylates histidine. 17 Deletion of the cycloaspeptide N-Met allowed the supply of methylated amino acids to the NRPS to be controlled, and this was exploited to direct biosynthesis towards specic minor metabolites, accomplishing the original aim of increasing cycloaspeptide E 5 yields. Feeding cultures with a uorinated phenylalanine analogue also led to the production of novel uorinated cycloaspeptides. The ditryptophenaline pathway from Aspergillus avus was quickly conrmed as being a second example of such a system, suggesting that as more NRPS gene clusters are identied and characterised, further examples will be uncovered. Again, feeding a N-Met knock-out strain allowed the production of an unnatural ditryptophenaline analogue.
The preferential acceptance of methylated amino acids by an NRPS combined with the ability to remove the natural substrate supply provides a unique biotransformation opportunity over traditional feeding of modied amino acids. Firstly, methylated amino acids fed into the system have little competition from unmethylated cytoplasmic amino acids. Secondly, the incorporation will be more efficient because synthetic methylated amino acids are not at risk of being consumed by other cellular processes such as protein production.
The ability to produce natural product analogues is valuable due to the potential for altered bioactivities or pharmacokinetic properties. N-Methylated peptides are particularly desirable as they are known to oen display improved stability over their non-methylated counterparts. Currently medicinal chemistry employs expensive and toxic reagents to synthesise N-methylated peptides. Consequently, new approaches for de novo biosynthesis of N-methylated peptides could be an attractive alternative to chemical synthesis. The ability to generate uorinated natural product analogues is particularly relevant for natural product research due to their extreme rarity in nature (less than 0.005% of identied natural products contain uorine 18 ) combined with the various benets observed with compounds containing one or more uorine atom, 19,20 demonstrated by the fact that 15-20% of pharmaceuticals now contain at least one uorine atom. 18 Such compounds could also be more amenable to semi-synthetic derivatization, which is a major route to drug development.
Using A-domains which accept methylated amino acids in domain swaps could also be used to introduce methylated amino acids, or unnatural analogues such as the uorinated amino acids into other NRP natural products of agricultural of pharmaceutical interest. Indeed, there have been some major advances in NRPS structural biology and engineering recently, [21][22][23] which means that the prospect of swapping in domains/modules to introduce N-methylated amino acids into existing NRPs could become feasible. Detailed in vitro studies of the N-MeT enzyme and the N-Me amino acid activating Adomains and modules, including crystallography, could help to elucidate where the selectivity and specicity of such systems lie, and the roles that the individual NRPS domains play in controlling the output of such systems. In the longer term, such an understanding could help guide engineering to alter the substrate specicity of these systems, enabling methylation and incorporation of a wider range of N-methylated amino acids. Also, further genome sequencing and mining could lead to the discovery of other N-MeT and NRPS that incorporate different Nmethylated amino acids in nature. Taken together, such studies could provide a set of N-Met enzymes and NRPS domains/ modules for engineering the de novo biosynthesis of N-methylated peptide 'non-natural' products through the assembly of novel chimeric NRPS.

Conflicts of interest
There are no conicts to declare.