Ling Yu
Li
a,
Yi Ling
Hu
a,
Jia Lin
Sun
a,
Long Bo
Yu
a,
Jing
Shi
a,
Zi Ru
Wang
a,
Zhi Kai
Guo
b,
Bo
Zhang
a,
Wen Jie
Guo
a,
Ren Xiang
Tan
*a and
Hui Ming
Ge
*a
aState Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Chemistry and Biomedicine Innovation Center (ChemBIC), Nanjing University, Nanjing 210023, China. E-mail: rxtan@nju.edu.cn; hmge@nju.edu.cn
bKey Laboratory of Biology and Genetic Resources of Tropical Crops, Ministry of Agriculture, Institute of Tropical Bioscience and Bio-technology, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China
First published on 17th October 2022
Tetracyclines are a class of antibiotics that exhibited potent activity against a wide range of Gram-positive and Gram-negative bacteria, yet only five members were isolated from actinobacteria, with two of them approved as clinical drugs. In this work, we developed a genome mining strategy using a TetR/MarR-transporter, a pair of common resistance enzymes in tetracycline biosynthesis, as probes to find the potential tetracycline gene clusters in the actinobacteria genome database. Further refinement using the phylogenetic analysis of chain length factors resulted in the discovery of 25 distinct tetracycline gene clusters, which finally resulted in the isolation and characterization of a novel tetracycline, hainancycline (1). Through genetic and biochemical studies, we elucidated the biosynthetic pathway of 1, which involves a complex glycosylation process. Our work discloses nature's huge capacity to generate diverse tetracyclines and expands the chemical diversity of tetracyclines.
However, since the first discovery of chlorotetracycline over 70 years ago,9 only five natural tetracyclines have been reported as exemplified by oxytetracycline,10 SF2575,11 chelocardin,12 and dactylocycline13 (Scheme 1A). It is remarkable that among the five tetracyclines known to date, two (chlorotetracycline and oxytetracycline) have been developed into marked drugs, representing an astonishing 40% success rate. Considering the rarity of tetracycline-type compounds, their fascinating mode of action, and the promise of tetracyclines as novel drug leads, we set out to search for novel members of tetracyclines by exploring nature's biosynthetic repertoire. We reported here the use of a genome mining approach to target tetracycline-type natural products. From the genomic database, we discovered 25 distinct biosynthetic gene clusters (BGCs) that have the potential to produce different tetracyclines. As a proof of concept, we have successfully isolated and characterized a highly glycosylated tetracycline, hainancycline, and elucidated its biosynthetic pathway. This study indicates the huge potential of tetracycline BGCs encoded in actinobacteria.
Similarly, MarR controls the expression of a multidrug efflux pump.20 As all known tetracycline BGCs share this resistant mechanism, we reasoned that this pair of genes could be a good indicator for mining tetracycline-type natural products.
We thus collected all assembled genome annotation files from the NCBI database which included over 20000 genomes (June 2021). Surprisingly, using tetR/marR and the adjacent transporter gene as probes, we found 351172 BGCs. These BGCs are not all tetracycline biosynthesis-related, because many strains acquired the tetracycline-resistant genes due to the prolonged and extensive use of tetracyclines in the world.21,22 We reasoned that the minimal PKS genes, which are responsible for the assembly of the tetracycline carbon skeleton, should also be colocalized with the resistance genes in the potential tetracycline gene cluster (Scheme 1B). Thus, we took the conserved ketosynthase (KS) and chain length factor (CLF) into consideration. Based on this, an algorithm was developed that can identify various BGCs in which the tetR/marR and transporter genes are colocalized within a 30 gene distance with KS and CLF genes. This refinement resulted in 1744 BGCs, among which all known tetracycline BGCs were obtained, confirming the reliability of this approach. In contrast to the widespread tetracycline-resistant strains and a large number of type II PKS BGCs in actinobacteria, our first genome analysis using a resistance gene and type II PKS gene as probes resulted in only ∼1700 BGCs from over 20000 genomes in the public database, encouraging us to perform further analysis.
We noticed that the identified BGCs included many known BGCs for type II polyketide biosyntheses such as enterocin,23 landomycin,24 and tetracenomycin,25 indicating that they also share a similar resistance mechanism (Fig. S1†). To directly target the tetracycline BGC, we attempted to analyze the CLFs, whose phylogeny highly correlates with the polyketide condensation number and cyclization pattern.26,27 To test if the tetracycline BGCs are in a separate clade, we first generated a phylogenetic tree using CLFs from over 160 characterized BGCs for type II PKS. Indeed, we found that the compounds with different extension lengths and cyclization patterns can be well distinguished (Fig. S2†). Gratifyingly, we observed that five known CLFs from tetracycline BGCs are grouped together, although they are closely related to the BGC from cervimycin, supporting the feasibility of the CLF phylogeny in reflecting the product structure. Thus, the CLF sequences from 1744 BGCs were extracted and a phylogenetic tree was then constructed (Scheme 1D). The most abundant BGCs are angucyclines. Meanwhile spore pigment, naphthoquinone, pentangular polyphenol, and anthracycline BGCs were also widely distributed, indicating that the TetR/MarR-transporter resistance mechanism is widespread in the type II PKS biosynthetic pathway. Notably, we observed a small group that contained 103 CLFs including 5 known tetracycline CLFs clustered together. Among them, 72.8% are from Streptomyces, 14.5% are from Kitasatospora, 4.8% are from Amycolatopsis, and the others are from rare actinomycetes like Spongiactinospora and Micromonospora, indicating a wide spread of tetracycline BGCs in various strains (Fig. S3†). After dereplication, we obtained 25 BGCs that are distinct from all characterized tetracycline BGCs in the literature (Fig. S4 and Tables S3–S27†).
To analyze the BGCs, we carried out genome neighboring network (GNN) analysis (Scheme 1E), which has recently become a powerful and visualized bioinformatics tool to predict enzyme functions based on their genomic context.28,29 A GNN consisting of 1021 proteins from 30 BGCs (including five known BGCs) was generated to annotate these tetracycline BGCs. The GNN analysis showed that all of them contain the core genes for tetracyclic skeleton biosynthesis. Besides this, the newly discovered BGCs are diverse, featuring many new enzyme functions that are different from those in known BGCs. Notably, we noticed that the cluster in Streptomyces sp. NAK774 encoded multiple post-modification enzymes including six glycosyltransferases (GTs), three KAS III enzymes, and a FAD-dependent halogenase, suggesting that this strain may produce a tetracycline with interesting peripheral groups (Scheme 2A).
Scheme 2 Biosynthesis of hainancycline (1). (A) The biosynthetic gene cluster for 1 and (B) the proposed biosynthetic pathway for 1. |
To simplify the structure elucidation procedure, we attempted to dissect the sugar linkage in 1 through acid hydrolysis. The treatment of 1 with trifluoroacetic acid/MeOH/H2O (1:1:8) at 60 °C for 1 hour led to the complete hydrolysis of 1 and afford three major fragments 1a (m/z 658.2501 [M − H]−), 1b (m/z 329.0794 [M − H]−) and 1c (m/z 483.1390 [M + Na]+). Compound 1a was elucidated to have the same tetracyclic aglycon as observed in SF2575 (Table S30†).16,30 Meanwhile the 1H–1H COSY spectrum showed that the sugars in 1a were different to that in SF2575. The diagnostic HMBC correlations between H8/C1A, H1A/C8, H1A/C9, and H1B/C13 indicated that two sugar moieties were anchored at C9 and N1 positions, respectively. The relative configuration of sugar moieties was determined by the interpretation of the NOESY data and coupling constants. The large diaxial coupling constants, JH1A = 11.1 Hz, and NOE correlations of H1A with H5A, and H2Aβ with H4A indicated that H1A, H4A, and H5A all possessed axial orientations. Thus, sugar A was determined to be amicetose. Similarly, based on NMR analysis, sugar B was also determined to be amicetose. Fragment 1b was elucidated to have two subunits including 5-chloro-6-methylsalicylic acid and a methylated oliose, the stereochemistry of which was determined by NOE analysis (Table S31†). The HMBC correlation of H4F with C1′ suggested the oliose was substituted at the C1′ position. Compared to 1b, fragment 1c has an additional oliose moiety at the C1F position as determined by the HMBC correlation of H3E/C1F and H4F/C1′ (Table S32†). After further scrutiny of the NMR data of 1, two additional sugar moieties, sugar C and sugar D, were identified. The diagnostic HMBC correlations H1B/C13, H1C/C4B, H4C/C1′′, H2′′/C1′′, H1D/C12a, H8/C1A, H4A/C1E, H3E/C1F, H4F/C1′, and 1H–1H COSY of H1B with NH, linked all units, established the complete structure of 1 and designated as hainancycline (Scheme 2B and Table S29†). To the best of our knowledge, 1 represents the most modified tetracycline discovered so far.
To verify this hypothesis, we inactivated haiG1, whose product showed 57% sequence identity to SsfS6 (glycosyltransferase) in SF2575 biosynthesis.30 The resulting ΔhaiG1 mutant abolished the production of 1, but clearly accumulated 2 (Fig. 1, iii and Table S33†), a product that is also isolated from the ΔssfS6/ΔssfM1 double mutant,16 confirming that the function of HaiG1 is the same as SsfS6, which can transfer an amicetose at the C9 position. Based on the same biosynthetic origin, we proposed that 1 possessed the same stereochemistry at C4 and C12a as that in 2. In addition, the NOE correlations between H4a and H5a, H5a and 6-OCH3, and H4a and H1D determined the stereochemistry of C5a and C6 in 1. HaiO2, HaiJ, and HaiM3 were found to be homologous to SsfO1 (oxygenase), SsfP (dehydrogenase), and SsfM2 (O-methyltransferase) (Table S3†),16,30 respectively, which are involved in the successive decoration of the tetracyclic core to form a similar biosynthetic intermediate 8 in SF2575 biosynthesis (Fig. S7†). Indeed, the inactivation of haiO2 led to the disappearance of 1 and the accumulation of 3 (Fig. 1, iv and Table S34†). The inactivation of haiJ, which encoded an NADPH-dependent reductase with 60.4% sequence identity to SsfP, led to the accumulation of 4 and 5 (Fig. 1, v). NMR analysis indicated that 4 is a C-ring rearrange shunt product, and 5 is a C6-methoxyl derivative of 3 (Tables S35 and S36 and Fig. S8†). The structure of 4 is intriguing and has never been isolated from the SF2575 biosynthetic pathway. To test if 4 was derived from the proposed intermediate 6 (Scheme 2B), we overexpressed the HaiO2 protein in E. coli BL21(DE3). The purified HaiO2 protein showed a yellow color and was confirmed to contain FAD as a cofactor (Fig. S9†). When HaiO2 was incubated with 3 in the presence of NADPH, we detected the formation of 4 as a major product, together with a product (m/z 528.1510 (C26H26NO11 [M − H]−)), which matched the formula of 6 (Fig. 2, iv). In contrast, HaiO2 cannot recognize 2 as the substrate that lacks the sugar unit at C9 (Fig. 2, vi). Taking the structural features of 4 and 5 into account, we concluded that HaiO2 is responsible for the hydroxylation of 3 to give 6 (Scheme 2). In the absence of downstream enzyme, 6 can be further hydroxylated to 6a, which can undergo a spontaneous Michael-type addition, followed by a retro-Claisen condensation to give 4 (Fig. S8†). We speculated that the unexpected methoxyl group in 5 could be installed by HaiO2 and HaiM3 or other unknown enzymes on the biosynthetic intermediate 3. In addition, the ΔhaiM3 mutant strain accumulated 7 (Fig. 1, vi, Table S37†). Therefore, the HaiO2 functions immediately after HaiG1, and the roles for HaiO2, HaiJ, and HaiM3 are C6 hydroxylase, C5a–C11a-ene reductase, and C6–OH methyltransferase, respectively.
We noticed that in the hai BGC there are five GTs (HaiG2–HaiG6) left in the cluster, which are consistent with the remaining five deoxysugar units in 1 (Table S3†).34,35 To assign their functions, we individually inactivated these GT encoding genes (Fig. S5†). HPLC analysis of metabolic extracts indicated that each mutant showed a metabolic profile distinct from others and a wild-type strain. The subsequent large-scale fermentation led to the isolation of 9 products (8–15 and 1a) from these mutant strains (Fig. 1). All these structures were elucidated by extensive analysis of HRESIMS and NMR data (Tables S38–S45†). Compound 8 isolated from the ΔhaiG2 mutant is a C-glycoside with an intact tetracycline core, suggesting that HaiG2 is a second GT responsible for appending the sugar unit on 8 (Table S38†). Compounds 1a and 9 from the ΔhaiG3 mutant lack the sugar C unit, suggesting that HaiG3 is a terminal GT transferring L-amicetose to sugar B (Tables S29 and S39†). 8, 10 and 11 accumulated in the ΔhaiG4 mutant contain no sugar D unit at the C12a position (Tables S38, S40 and S41†). In addition, 12 and 13 which are produced in the ΔhaiG5 mutant lack sugar E and F units, whereas 14 and 15 from the ΔhaiG6 mutant only lack sugar F units (Tables S42–S45†). These data revealed that HaiG4, HaiG5, and HaiG6 account for transferring sugar D, E, and F units, respectively, which in turn indicated that the role of HaiG2 is transferring sugar B to the amide group of the tetracyclic core (Scheme 2).
1 | 2 | 9 | 11 | 17 | |
---|---|---|---|---|---|
HCT116 | 12.7 | 8.4 | 15.4 | 15.3 | 28.8 |
HT29 | 24.8 | 10.1 | 28.5 | 23.5 | >30 |
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2sc03965f |
This journal is © The Royal Society of Chemistry 2022 |