Progress on targeted discovery of microbial natural products based on the predictions of both structure and activity

Yuwei Zhang *a, Jianfa Zong ab, Yufeng Liu ac, Keyu Zhou ac, Haibo Shi ac, Wen-Bing Yin ac, Ling Liu ac and Yihua Chen *ac
aState Key Laboratory of Microbial Diversity and Innovative Utilization, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China. E-mail: zhangyw@im.ac.cn; chenyihua@im.ac.cn
bState Key Laboratory of Tea Plant Germplasm Innovation and Resource Utilization, Anhui Agricultural University, Hefei, 230036, China
cUniversity of Chinese Academy of Sciences, Beijing 100049, China

Received 18th February 2025

First published on 8th July 2025


Abstract

Covering: up to 2025

Microbial natural products (NPs) with diverse structures and fascinating activities are a fertile source of drug discovery. Genomic and metagenomic data have revealed that there are abundant valuable resources to be explored. With the advancement in technology, methods for discovering NPs from microorganisms are undergoing notable changes. In this highlight article, we summarized different NP discovery methods into activity-guided and structure-guided categories, emphasizing the characteristics of target compounds and providing typical examples of NPs. We primarily focused on recently developed representative methods that can simultaneously predict the structure and activity features of target compounds as well as the discovery trends of NPs reflected by these cutting-edge methods.


image file: d5np00008d-p1.tif

Yuwei Zhang

Dr Yuwei Zhang received her BS in Biological Science from the China Agricultural University. After receiving her PhD under the direction of Prof. Yihua Chen from the Institute of Microbiology, Chinese Academy of Sciences in 2023, she is currently a postdoctoral scholar in the same group. Her research interests focus on discovering and exploring the functions of bioactive natural products related to human health.

image file: d5np00008d-p2.tif

Jianfa Zong

Dr Jianfa Zong received his PhD degree in Pharmacognosy from the School of Pharmacy, Fudan University, in 2020. He completed two and a half years of postdoctoral research on natural product biosynthesis under the supervision of Prof. Yihua Chen at the Institute of Microbiology, Chinese Academy of Sciences. Since 2024, he has been working at the State Key Laboratory of Tea Plant Germplasm Innovation and Resource Utilization, Anhui Agricultural University. His research focuses on the discovery and biosynthesis of functional natural products in tea plants.

image file: d5np00008d-p3.tif

Yufeng Liu

Yufeng Liu earned his BS in Biology from Yunnan University. Since 2021, he has been engaged in a successive postgraduate and doctoral program at the Institute of Microbiology, Chinese Academy of Sciences, under the mentorship of Prof. Yihua Chen. His research focuses on natural products from oral and gut microbiota bacteria, chassis construction, aiming to find bioactive molecules, understand host–microbe interactions, and create new chassis for research.

image file: d5np00008d-p4.tif

Yihua Chen

Dr Yihua Chen received his PhD degree in Biochemistry and Molecular Biology from the Institute of Microbiology, Chinese Academy of Sciences, in 2005. He undertook postdoctoral studies in natural product biosynthesis for five years at the University of Wisconsin–Madison. Since 2011, he has been a Research Group Leader at the Institute of Microbiology, Chinese Academy of Sciences, and he is currently a professor and vice-director of the State Key Laboratory of Microbial Diversity and Innovative Utilization, China. His research interest lies at the crossroads of genetics and natural product chemistry, with an emphasis on the biosynthetic mechanisms of microbial natural products and the discovery of small-molecule drug leads from human-associated microorganisms.


1. Introduction

Natural products (NPs) derived from plants, animals, and microorganisms are important drug sources. Of the 1394 small-molecule drugs approved worldwide between 1981 and 2019, approximately 31.6% were directly sourced from NPs and their derivatives, while approximately 35.1% were synthetic drugs with NP pharmacophores or mimicked NPs.1 Notably, microbial NPs have been a crucial reservoir for antibiotics, anticancer drugs, insecticides, immunomodulators and other agents widely used in medicine, agriculture and livestock.2,3 The ongoing exploration of this NP source continues to deliver novel molecules with potent application values.

Over the past century, methods for exploring microbial NPs have significantly expanded.4–9 Although these mining methods vary in strategies, they primarily focus on two core features of NPs: their activities (application potential) and structures (structural novelty).

The activity-guided mining strategy encompasses the cultivation of microorganisms, followed by screening for specific biological activities; tracking, isolation, and purification of bioactive compounds; and structure elucidation of these NPs.10,11 Since the golden age of NP discovery (1940s–1960s), this strategy has facilitated the discovery of numerous drugs, such as neomycin,12 erythromycin,13 and daunorubicin,14 many of which are still used in our medical practice and have profoundly impacted human health and medical interventions. With the diversification of cultivation techniques,15 establishment of different activity screening systems (including high-throughput screening), and expansion of exploration scope from terrestrial to marine and other environments,6,16 this strategy continues to play a key role in obtaining bioactive NPs.

Since 2000, advancements in sequencing technology and a deeper understanding of the relationship between gene sequences and NP biosynthesis logic have enabled the development of algorithms capable of predicting biosynthetic gene clusters (BGCs) and their product structures based on sequence data.5 This has given rise to the structure-guided mining strategy, which involves analyzing sequences using bioinformatics tools, predicting BGCs and possible NP structures, and then employing various techniques, such as regulatory manipulation or heterologous expression, to obtain the inferred novel NPs and explore their activities.17–20 With extensive sequencing and analysis, this strategy has been successfully applied to mine novel compounds from well-studied NP-producing microbes, model and non-model strains, as well as numerous microbes from new habitat sources, effectively accelerating the progress of microbial NP discovery.

In recent years, a multitude of large-scale microbiome studies have sequenced and analyzed microorganisms from different environments, such as Genomes from Earth's Microbiomes (GEM) catalog,21 Human Microbiome Project (HMP),22 Cultivated Gut Fungi (CGF) catalog,23 Global Ocean Microbiome Genome Catalogue (GOMC),24 and Tibetan Glacier Genome and Gene (TG2G) catalog.25 The diversity and complexity of microbial resources are growing at an unprecedented rate. Faced with a massive amount of data, although the above two mining strategies are still important, they both have certain limitations. The activity-guided mining strategy may lead to the repeated discovery of known compounds because it does not predict the structures of NPs, thereby failing to ensure the structural novelty of NPs.26 The structure-guided mining strategy may overlook the activities of NPs during the mining process, which is not conducive to further exploration of the potential medicinal value of novel compounds. Consequently, there is a pressing need for innovative mining strategies to reveal the enormous potential of unexplored microbial NPs, enabling researchers to discover novel active compounds more efficiently and accurately.

In this highlight, we summarize several representative methods for microbial NP mining based on a strategy that simultaneously predicts both NP activity and structural novelty. These methods can effectively address the common issue of repeatedly discovering NPs in the process of using an activity-guided mining strategy and the difficulty of foreseeing NP activity when using a structure-guided mining strategy. They are expected to lead to breakthroughs in the efficient identification of new therapeutic agents and other useful compounds from microorganisms, heralding a new microbial NP research stage.

2. Discovery of NPs based on structure and activity prediction

As described above, the increasing availability of microbial resources provides a great chance for NP discovery. Meanwhile, it poses a major challenge: how to efficiently and accurately identify novel molecules with desired activities from numerous candidates. In the following sections, we introduce four representative types of methods: (1) structure–activity relationship (SAR) guided discovery; (2) self-resistance gene-guided discovery; (3) ecosystem-guided discovery; and (4) artificial intelligence (AI)-assisted discovery. These innovative methods are expected to achieve microbial NP exploration that predicts both activity and structural novelty.

2.1 Structure–activity relationship (SAR) guided discovery

As the structural features of microbial NP bioactivities are continuously accrued, researchers have identified an increasing number of correlations between compound structures and their activities. This enables the use of gene sequences that produce specific structural features as probes to explore NPs with particular activities. By employing this SAR-based mining method, it becomes more probable to obtain NPs with similar activities but varying in structure. This enhances the efficiency of discovering novel active NPs with certain activities. Additionally, it contributes to a deeper understanding of the SARs of active NPs, thereby providing valuable insights for the optimization of lead molecules.7,27

The general procedure of SAR-guided discovery can be summarized as follows: (1) based on known SARs, use the sequences of domain/gene/BGC responsible for generating bioactive structures as probes for genome mining; (2) from candidate BGCs, select those with the potential to generate NPs with novel structures to serve as research targets; (3) obtain corresponding NPs through various biological or chemical tools and conduct activity verification. Here, we introduce several representative NPs discovered through the SAR-guided method, which have different activities and were obtained using different techniques (Fig. 1).


image file: d5np00008d-f1.tif
Fig. 1 General process of structure–activity relationship-guided discovery and the representative NPs obtained by this method.

Enediynes are a class of NPs with potent antitumor activity that feature a pair of alkynes flanking an alkene within a 9- or 10-membered ring as the warhead.28 Four enediyne NPs have been used in clinical or preclinical research as antibody-drug or polymer conjugates.29 Since the discovery of enediyne BGC in 2002, thousands of enediyne BGCs have been discovered.30 For instance, Shen's group used specific primers for high-throughput screening. By inactivating the key gene tnmE in Streptomyces sp. strain CB03234, they discovered tiancimycin A (1), a novel enediyne with sub-nanomolar IC50 against various cancer cell lines (Fig. 1).31

Another type of anticancer NPs, the epoxyketone proteasome inhibitors (EPIs), possess the distinctive epoxyketone warhead and are potential therapeutic agents for diseases such as multiple myeloma. In 2013, the characterization of EPI-producing BGCs revealed the crucial role of the KS domain in epoxyketone biosynthesis.32 In a subsequent study, 185 global soil metagenomes were screened, and two promising BGCs were selected and heterologously expressed in Streptomyces albus J1074, leading to the discovery of novel EPIs such as clarepoxcin A (2) and landepoxcin A (3) (Fig. 1).33

ADP-D-glycero-β-D-manno-heptose (ADP-heptose (4)) is an effective innate immune agonist in mammals. It can be recognized by ALPK1 (alpha-protein kinase 1) and activate NF-κB (nuclear factor κB) through the ALPK1-TIFA (TRAF-interacting protein with forkhead-associated domain) axis, with potential for use as an immunomodulator or a vaccine adjuvant.34 Although 4 is biosynthesized by NDP-heptose biosynthetic enzymes (HBEs), the understanding of these enzymes has been limited to bacteria for years. Motivated by the activity of these compounds, we conducted bioinformatics analysis on HBEs, revealing their widespread distribution across bacteria, archaea, eukaryotes, and even viruses in our recent research. The polymorphism of HBEs indicates an opportunity to explore ADP-heptose analogues that can elicit innate immune responses. Through in vitro reconstruction of the NDP-heptose biosynthetic pathway with tens of different HBEs from diverse kingdoms, we obtained two new innate immune agonists, CDP-heptose (5) and UDP-heptose (6), which elicit stronger ALPK1-dependent immune responses in cells and mice compared to 4 (Fig. 1).35

Tetracyclines (TCs) are a class of antibiotics that have been used clinically for over 70 years, characterized by their typical linearly fused tetracyclic structures. In a recent study, researchers achieved efficient identification and analysis of TC BGCs using a set of enzymes responsible for TCs cyclization (OxyK, OxyN, OxyH, and OxyI). By overexpressing the SARP family regulatory genes, they activated a series of TCs, including the highly glycosylated misiomycin A (7), which exhibits strong antibacterial activity against methicillin-resistant Staphylococcus aureus (MRSA) and vancomycin-resistant Enterococcus faecium (VRE).36

Polyene macrolide antibiotics, such as amphotericin B, are important antifungal drugs, and the characteristic mycosamine in the structure of these drugs used clinically is the key to their activity. To discover novel polyene macrolide compounds, researchers developed a profile hidden Markov model (pHMM) using 11 mycosamine glycosyltransferases from reported polyene antifungal BGCs. They detected 280 candidate BGCs and identified the unique mand BGC in Streptomyces netropsis DSM 40259 through phylogenetic analysis, which resulted in a new polyene macrolide antibiotic, mandimycin (8), with potent and broad-spectrum activity against various multidrug-resistant (MDR) fungal pathogens. Moreover, 8 has a unique mechanism of action targeting various phospholipids in fungal cell membranes.37

Obtaining NPs from native producers can be complex and time-consuming in some cases. Improved accuracy of predicting NPs structures from sequence data makes it possible to directly synthesize potential active compounds based on predicted structures. The resulting compounds are termed synthetic–bioinformatic NPs (syn BNPs). This innovative method has been successfully employed in the study of different NRPs (non-ribosomal peptides). Colistin, a polymyxin, was used as the last defense against MDR Gram-negative pathogens. Through systematic analysis of polymyxin/colistin-like BGCs, researchers discovered the mac BGC encoding a novel decapeptide. This predicted compound was chemically synthesized and N-terminally acylated with (S)-6-methyloctanoic acid, leading to the discovery of macolacin (9) with superior antibacterial activity (Fig. 1).38 Subsequently, the same research team analyzed the condensation starter domains, which are typically critical for installing the N-terminal lipids in lipopeptides that are essential for their biological activity.39 Based on phylogenetic analysis, they identified a novel cil BGC and obtained cilagicin (10),40 an effective agent against Gram-positive pathogens, through a similar chemical synthesis approach (Fig. 1).

SAR-guided discovery has played a crucial role in microbial NP research for a long time. However, with the rapid growth of available microbial resources, researchers will need to invest a significant amount of time and experimental resources to screen numerous candidates. Therefore, establishing and utilizing data mining tools to integrate and analyze the correlations between gene sequences, structural features, and activities scattered across different databases and literature, and using techniques such as phylogenetic analysis mentioned above to assist in screening out more promising candidates, will help to improve the efficiency of this mining method.

2.2 Self-resistance gene-guided discovery

During active NP biosynthesis, microbial producers must develop resistance to toxic products for self-protection. This can be achieved through various mechanisms, such as efflux pumps, inactive prodrug production inside cells, or chemical modification of NPs.41 In addition, a prevalent way is to generate self-resistance genes (typically homologous variants of the targeted housekeeping genes), which are often located within or adjacent to BGCs and coordinate with the production of NPs to exert protective effects.42 Considering this fact, the identification of self-resistance genes can aid in pinpointing the BGCs of NPs with known activities but unclear biosynthetic pathways.43–45 It can also be used to infer the activities or targets of known NPs by analyzing the self-resistance genes in or adjacent to their BGCs.46–49 Furthermore, it can be employed for targeted genome mining of BGCs with specific self-resistance genes to discover NPs that may possess corresponding activities. The successful application of these methods has been extensively reviewed.7,9,42 Herein, we focus on representative examples from the last category and currently established mining platforms.

The general process of self-resistance gene-guided mining can be summarized as follows: (1) conserved housekeeping enzyme-encoding genes are selected as targets for genome mining; (2) desired BGCs that either contain or are adjacent to these genes are located; (3) various biological tools and chemical techniques are used to obtain BGC products; and (4) validate the bioactivity of NPs (Fig. 2).


image file: d5np00008d-f2.tif
Fig. 2 Process of self-resistance gene-guided discovery and the representative NPs obtained by this method.

In 2015, Moore's group pioneered the development of a self-resistance gene-guided genome mining workflow and applied it to Salinispora strains.50 They identified the tlm BGC containing a putative fatty acid synthase resistance gene homologous to FabB/F and obtained a series of thiotetronic acid compounds (11–14) with inhibitory activity against type II fatty acid synthase through heterologous expression (Fig. 2). Moreover, they discovered a related BGC containing two FabB/F homologs in Streptomyces afghaniensis and isolated several thiolactomycin analogues (15–18), including novel compounds 16 and 17.

A similar procedure was used by Müller's group in myxobacteria. A topoisomerase-targeting pentapeptide repeat protein was used as a probe, and a BGC was anchored adjacent to a homologous probe gene. Through promoter exchange and heterologous expression, they obtained novel compounds pyxidicyclines A (19) and B (20), which can effectively inhibit the unwinding of topoisomerases, including E. coli topoisomerase IV and human topoisomerase I (Fig. 2).51

This method has also been applied to the development of novel herbicides. Tang's group used the key enzyme dihydroxyacid dehydratase (DHAD) as a probe in the biosynthetic pathway of branched-chain amino acids, which are essential for plant growth and absent in animals. They also anchored a BGC conserved across multiple fungal genomes. Through heterologous expression, they obtained aspterric acid (21) with a known structure but previously unidentified activity and highlighted its potential as a broad-spectrum herbicide by validating its activity (Fig. 2).52

In recent years, bioactive molecules, such as clipibicyclene (22)53 (using ClbP protease as a probe, antibacterial), phenelfamycin B (23)54 (using EF-Tu as a probe, against MDR Neisseria gonorrhoeae), and roseopurpurin C (24)55 (using cyclin-dependent kinase 2 (CDK2) as a probe, CDK2 inhibitor) have been successively obtained by researchers (Fig. 2). These successes further underscore the significant role of this method in advancing the discovery of novel NPs with desired activities.

Concurrently, a series of corresponding bioinformatics tools have been developed to support this research. For example, ARTS (antibiotic resistant target seeker)56 is a sophisticated tool specifically designed for the screening of self-resistance genes in Actinobacteria. The enhanced version, ARTS 2.0,57 expands upon ARTS's capabilities to include both bacterial genomes and metagenomic data. ARTS-DB58 serves as a repository of pre-computed ARTS results, providing abundant resources to researchers. Additionally, FRIGG (fungal resistance gene-directed genome mining)59 and FunARTS (fungal bioactive compound-resistant target seeker)60 are designed for mining NPs in fungi. Collectively, these tools enable more efficient self-resistance gene-guided discovery.

Building on these pioneering efforts, Zhao's group developed FAST-NPS, a fully automated high-throughput platform for the discovery of bioactive NPs in Streptomyces by integrating ARTS and the Illinois Biological Foundry for Advanced Biomanufacturing (iBioFAB).61 They predicted BGCs from 11 newly sequenced Streptomyces genomes and analyzed BGCs containing potential self-resistance genes using ARTS 2.0. A total of 105 candidate BGCs containing known resistance genes or duplicated housekeeping genes were selected and subjected to Cas12a-assisted precise targeted cloning using in vivo Cre-lox recombination (CAPTURE) and heterologous expression in Streptomyces lividans TK24. Eight NPs with antibacterial and/or antitumor bioactivity were obtained (taking compounds 3 and 4 as examples, named FAST-NPS 3 (25) and 4 (26) here, which contain an acyl-CoA carboxylase self-resistance model gene in their BGC) (Fig. 2).

Information on self-resistance genes can be more conveniently integrated into databases and used for high-throughput discovery than SAR. Therefore, in recent years, relevant tools for this mining strategy have been continuously developed and widely applied. However, the application of this method is mostly limited to BGCs containing or adjacent to known resistance genes/duplicated housekeeping genes. Moreover, it cannot be excluded that housekeeping genes appear coincidentally within or around BGCs rather than being NP targets. We believe that the discovery of more self-resistance mechanisms will further expand the range of gene types applicable to this mining method. At the same time, with the in-depth study of enzyme mechanisms in NP biosynthesis and the advancement of bioinformatics analysis tools, the accuracy of distinguishing true self-resistance genes from biosynthetic genes will be improved, and the analysis scope will be expanded from within and near BGCs to a wider genomic region, thereby making the method more effective for the discovery of novel bioactive NPs.

2.3 Ecosystem-guided discovery

Microorganisms usually exist in the form of communities within their habitats, with the dynamic balance of ecosystems maintained through various interactions among microorganisms, such as symbiosis, competition, and predation, as well as interactions with their hosts. Throughout long-term co-evolution, microorganisms have developed the capacity to produce NPs to either outcompete rivals or adapt to specific environmental conditions. As a result, different environments shape the unique metabolic characteristics of microorganisms within these ecosystems, providing a basis for NP mining and insights into NP activity. As reviewed in some recent articles, bioactive NPs have been successfully obtained by exploiting the intense competition among microbes in oligotrophic habitats (such as oceans, plant leaves, and deserts), leveraging the interactions between microbes and insects, and considering predatory relationships.8,62 This effectively expands the chemical novelty of the NPs repertoire. In this section, we present some typical examples to demonstrate how ecosystem-guided mining can provide valuable clues for the discovery of novel NPs with specific bioactivities.

While a standardized research methodology for ecosystem-guided NP discovery has not yet been systematically developed, it can be considered as encompassing the following four primary steps: (1) select specific habitats for investigation based on identified ecological interactions; (2) isolate and cultivate microorganisms from these habitats, followed by bioactivity screening following particular interaction relationships; (3) establish the correlation between BGCs and bioactive NPs using bioinformatics tools combined with molecular biology techniques and other methodologies; (4) obtain the corresponding NPs and validate their activities (Fig. 3).


image file: d5np00008d-f3.tif
Fig. 3 General process of ecosystem-guided discovery and the representative NPs obtained from different niches.

Darobactin (27)63 serves as a typical example of this method (Fig. 3). Given the high abundance of Gram-negative bacteria in the microbiome of entomopathogenic nematodes, Photorhabdus and Xenorhabdus, two nematode gut symbiotic bacteria closely related to common human opportunistic pathogens, were chosen by researchers for mining. Through bioactivity screening against E. coli, 27 was identified from Photorhabdus khanii HGB1456, which uniquely targets BamA, an essential protein for folding outer membrane proteins. 27 also exhibits potent activity against pathogens such as Pseudomonas aeruginosa and Acinetobacter baumannii. Its unusual structure with two fused rings makes it a new class of antibiotics against Gram-negative bacteria since the 1960s.

By leveraging the predator–prey interactions between amoebas and bacteria, Götze et al. selected Pseudomonas strains closely associated with Dictyostelium discoideum as targets. They obtained keanumycins from Pseudomonas sp. QS1027 through genome mining, mutant strain construction, and amoebicidal activity screening. Keanumycin A (28) has strong amoebicidal activity and can inhibit the growth of various pathogenic fungi and fungal phytopathogens.64

Many phytopathogenic fungi can produce perithecia to survive long-term on infested plant debris, providing a highly specific ecological niche for bacteria. Targeting the major pathogen causing wheat Fusarium head blight, Fusarium graminearum, researchers isolated and cultured associated bacteria from its perithecia and conducted antagonistic activity screening, obtaining Pantoea agglomerans ZJU23, which strongly inhibits F. graminearum. Herbicolin A (29) with broad-spectrum antifungal activity has been identified through transposon mutagenesis.65

The human microbiota is increasingly recognized as a potential source of new therapeutic agents. It has undergone extensive co-evolution with humans and their pathogens, which may have endowed them with the ability to produce many active NPs with enhanced biosafety. Previous studies have revealed that some nasal Staphylococcus isolates can inhibit the growth of S. aureus.66 Consequently, the ecological niche nasal was selected by researchers for NP mining. Lugdunin (30), a novel thiazolidine-containing cyclic peptide, was discovered from Staphylococcus lugdunensis through bioactivity screening against S. aureus (Fig. 3).6730 exhibits potent antimicrobial activity against a wide range of Gram-positive bacteria and immunomodulatory activity.68 Similarly, epifadin (31) with broad-spectrum antimicrobial activity was discovered from another nasal commensal, Staphylococcus epidermidis IVK83(Fig. 3).69

Furthermore, ecosystem-guided discovery has been instrumental in elucidating the mechanisms by which human pathogens impact host health through NPs. For example, despite the early discovery of genotoxic colibactin (32) from E. coli, a series of milestone studies have been conducted in recent years on its unique structure, association with diseases such as colorectal cancer (CRC), and impact on other human microbes (Fig. 3).70–72 Similarly, the cytotoxic tilimycin (33) and its derivative tilivalline (34),73,74 isolated from the pathogen Klebsiella oxytoca, which causes antibiotic-associated hemorrhagic colitis (AAHC), are currently being studied for their association with colitis and effects on other human microbiota (Fig. 3).75

With further exploration of diverse habitats, ecosystem-guided mining will continue to uncover structurally novel NPs with different metabolic characteristics, and the corresponding ecological interactions will also provide reliable clues for NP activity. Nevertheless, owing to the heterogeneity of experimental conditions in different studies, such as varying resolutions of sequencing data and the utilization of different strains and cultivation conditions, establishing a clear connection between NP and its ecological functions through literature research and existing bioinformatics tools remains challenging. If more universal standards for microbiome research, especially in classification and sequencing, could be established, and technologies like culturomics and artificial microbial communities could be further developed to more precisely replicate microbial interactions within specific niches, the application of this method would undoubtedly expand.

2.4 Artificial intelligence (AI)-assisted discovery

In recent years, the integration of AI into biological research has marked a significant transformation. The application of AI techniques has greatly improved the efficiency of large-scale data analysis and has profoundly influenced the study of microbial NPs across various dimensions.4,76–78 AI techniques have demonstrated higher accuracy and efficiency than traditional bioinformatics tools and the ability to discover more types of novel BGCs in terms of BGC detection. In structural analysis, the use of AI techniques has played an important role in the structure elucidation of NPs and dereplication against established compound databases, which has also accelerated the identification process of novel NPs. Additionally, various AI models can be employed to process and analyze large-scale known compound datasets, which can then be used to predict the biological activity of NPs and identify potential targets through high-throughput virtual screening.

For example, in BGC prediction, Rios-Martinez et al. developed BiGCARP,79 a self-supervised masked language model on neural networks, which enhanced both BGC prediction and NP classification. Lai et al. established BGC Prophet,80 a transformer-based deep learning model that captures location-dependent relationships among biosynthetic genes, enabling ultrahigh-throughput BGC detection and accurate classification. AI-assisted methods are particularly effective for mining RiPPs (ribosomally synthesized and post-translationally modified peptides), which usually lack typical characteristic biosynthetic genes. In recent years, various tools, such as NeuRiPP,81 DeepRiPP,82 decRiPPter,83 RiPPMiner-Genome,84 and TrRiPP85 have been developed, greatly improving the efficiency of mining novel RiPPs.

In terms of activity prediction of BGC products, Hannigan et al. developed DeepBGC,86 a deep learning framework that utilizes recurrent neural networks (RNNs) to identify new BGC categories and then a random forest (RF) classifier for activity prediction (Fig. 4). PRISM4 (ref. 87) improved the chemical similarity between predicted BGC product structures and known real products by adding many hidden Markov models. It also includes a trained rate support vector machine (SVM) classifier for activity prediction (i.e., antibacterial, antifungal, antiviral, anti-tumor, or immune regulatory activity). Walker et al. assembled a dataset of known bacterial BGCs paired with the activity of their products using information from the MIBiG (minimum information about a biosynthetic gene cluster) database and literature. Based on this dataset, machine learning models were trained to predict the activity of NPs, and the resulting binary classifiers achieved balanced accuracies of 57–79%.88 The use of similar tools will facilitate the attainment of dual-focus NP mining that predicts both structural novelty and activity.77,88 This can be achieved using the structural prediction capabilities of the tools to screen for BGCs that may produce novel NPs. Concurrently, the activity prediction functions of the tool can be leveraged to make preliminary evaluations of corresponding products.


image file: d5np00008d-f4.tif
Fig. 4 General process of artificial intelligence-assisted discovery, representative tools developed, and NPs (displayed as structures predicted by AlphaFold3) obtained by this method.

Antimicrobial peptides (AMPs) have become an excellent application scenario for simultaneous structure and activity prediction. This is primarily because, compared to other types of NPs, their structures can be directly predicted from the amino acid sequences, and there is a wealth of data readily available for the training of AI models to predict their activities. Since 2010, models such as AntiBP2,89 AmPEP,90 and AMP scanner91 have been applied in the mining and activity prediction of AMPs.

Moreover, in recent years, some advancements have been made in AMP mining using AI-assisted methods combined with the aforementioned ecosystem-guided concept. Targeting the human body, a habitat rich in microbes and their untapped NPs, we developed an AMP mining pipeline in 2022 using three natural language processing neural network models, with various techniques such as cross-validation with metaproteomic data and correlation network analysis integrated. This pipeline efficiently reduced the number of candidate AMPs identified from the human gut microbiome from over 2 × 107 to approximately 200, with a positive rate of 83.8% in antibacterial activity testing, yielding 181 AMPs. Notably, the highly active AMPs (cAMP_1043 (35)) also exhibited activity against clinically isolated MDR bacteria (Fig. 4).92 Subsequently, based on the high overlap between anticancer peptides (ACPs) and AMPs, Ma et al. searched for potential ACPs among the 2349 candidate AMPs obtained through metaproteomic cross-validation in this study. Relative abundance analysis of these peptides in the metagenomes of patients with CRC and healthy individuals identified a series of potential ACPs that were significantly enriched in healthy samples, including pACP2283 (36) and pACP1780 (37), which exhibited activity against multiple cancer cell lines and significantly reduced tumor sizes in athymic nude mice.93

In another recent study, AmPEP (an RF classifier) and SmORFinder (a tool that combines pHMMs and deep learning models) were employed to analyze a dataset comprising 444[thin space (1/6-em)]054 predicted peptides annotated from HMP metagenomes. This analysis identified 323 AMPs detected by both models, with a positive rate of 70.5% for the antibacterial activity test of 78 synthesized AMPs. Notably, the activity of prevotellin-2 (38) derived from Prevotella copri is comparable to that of polymyxin B in vivo (Fig. 4).94

In addition to the human body, Chen et al. targeted Blattella germanica, a prevalent pathogenic insect that hosts various harmful microbes. They developed a deep learning model named AMPidentifier and obtained AMP1 (39) from the gut symbiotic microbe Blattabacterium cuenoti. AMP1 demonstrated broad-spectrum antibacterial activity and a potent wound-healing effect in mice.95

Another study further expanded the scope of mining, covering different habitats such as the human gut, mammalian gut, plants, and sediments. Researchers used machine learning to predict AMPs on a vast dataset containing 63[thin space (1/6-em)]410 metagenomes and 87[thin space (1/6-em)]920 prokaryotic genomes. This large-scale analysis identified nearly one million prokaryotic AMP sequences, of which a total of 79 peptides demonstrated activity among 100 tested AMPs, with 63 targeting pathogens (lachnospirin-1(40) and enterococcin-1(41) as examples) (Fig. 4).96

These latest developments indicate that introducing AI techniques into microbial NP mining can significantly improve research efficiency and help discover novel compounds with specific activities. In particular, methods such as DeepBGC and PRISM4, as well as the work of Walker et al.,88 have explored the possibility of using AI-assisted approaches to predict NP activities by leveraging the associations between known BGC products and their activities, and have extended this approach beyond AMPs to other types of NPs. It is expected that if the relevant AI methods continue to be optimized and information about BGCs and their product structures and activities in literature and databases can be further integrated to improve the data quality and quantity for model training, the AI-assisted method will greatly boost our NP mining efficiency.

3. Conclusions and prospects

Microbial NP discovery continues to play an important role in human health. With further in-depth research and the rapid development of tools such as various omics technologies and AI, the integration of structure and activity predictions in microbial NPs mining is becoming increasingly feasible within this field. Although the above methods have shown some success in discovering microbial NPs that predict both activity and novelty, further optimization of these methods and the development of new strategies are necessary.

Firstly, as mentioned above and emphasized in some reviews in recent years,4,78 the acquisition and standardization of high-quality data are crucial for further improving NP mining methods. Currently, the establishment of large-scale and standardized NP databases remains challenging. Therefore, increasing the utility of existing data is one of the more feasible ways. The progress in the standardization and data interoperability of mass spectrometry data in recent years is instructive. Researchers worldwide have developed tools such as ReDU (Reanalysis of Data User)97 and MassQL (Mass Spectrometry Query Language),98 which enable the search and analysis of data from different sources within a unified framework across public databases such as GNPS/MassIVE, Metabolomics Workbench, and MetaboLights. This has greatly enhanced the capabilities of researchers in analyzing and using mass spectrometry data. If we can improve the data interoperability between existing NP databases, such as MIBiG99 (Minimum Information about a Biosynthetic Gene cluster), NPASS (natural product activity and species source) database,100 the Natural Products Atlas,101 SMC (secondary metabolism collaboratory,102etc.), it will be a meaningful step towards improving NP data utilization. The stable accumulation of these database entries will continue to serve as a valuable source of data.

Secondly, it would be beneficial if AI could be further applied and developed in NP research. This may lead to breakthroughs in the NP discovery process that currently lack good solutions, such as employing suitable molecular representation to characterize structurally complex NPs or utilizing appropriate models to make efficient use of the existing data. AI-assisted mining has shown significant results in identifying compounds with desired activities based on the combination of high-quality data and suitable AI models. For example, Collins et al. used a large amount of standardized antibacterial activity data collected against specific microorganisms, which also incorporated the structural information of compounds, to train neural network models. This effectively screened out halicin and abaucin,103,104 both of which exhibit excellent activity against A. baumannii from known compound libraries, as well as multiple bioactive compounds that effectively inhibit MRSA,105 thereby achieving a structure–activity correspondence. These groundbreaking successes have demonstrated the necessity of using standardized structure–activity paired data for AI model training and the great potential of AI-assisted methods in achieving high-throughput virtual screening. Through a similar pattern, it is expected that it will be possible to replicate the successful progress of mining new antibiotics from small-molecule compound libraries to a broader range of NP research.

Furthermore, with the advancement of omics technologies, the establishment of standards for multi-omics research, and the increasing application of omics technology, we can more comprehensively reveal the biosynthetic potential of microorganisms. For instance, the improved accuracy of strain-level analysis of omics data enables the prediction of metabolic potential and functional association at strain resolution; metabolomics results, such as mass spectrometry data, can be effectively analyzed through computer-aided compound annotation, elucidation, and dereplication. These developments will facilitate the efficient discovery and investigation of novel NPs and the identification of NPs with specific activities.

The integration of cutting-edge technologies, such as protein structure and function prediction, molecular docking and molecular dynamics simulation, will significantly enhance our ability to evaluate the potential activities of NPs based on their structures and predict their potential binding patterns with candidate targets. Building on this, combining various mining methods—as demonstrated by the successful applications of integrating AI-assisted methods with ecosystem-guided mining in AMP discovery—will empower researchers to discover NPs more efficiently using structure and activity double-guided strategies, propelling microbial NPs research into a new stage.

4. Data availability

No primary research results have been included, and no new data were generated or analysed as part of this highlight article.

5. Conflicts of interest

There are no conflicts to declare.

6. Acknowledgements

This work was supported by the China NSFC (32025002, 32400090) and the Postdoctoral Fellowship Program of CPSF (GZB20230819 to Yuwei Zhang). Figures were created using BioRender. Zhang, Y. (2025) https://BioRender.com/x56e190.

7. References

  1. D. J. Newman and G. M. Cragg, J. Nat. Prod., 2020, 83, 770–803 Search PubMed.
  2. J. W. H. Li and J. C. Vederas, Science, 2009, 325, 161–165 CrossRef PubMed.
  3. M. I. Hutchings, A. W. Truman and B. Wilkinson, Curr. Opin. Microbiol., 2019, 51, 72–80 CrossRef CAS.
  4. M. W. Mullowney, K. R. Duncan, S. S. Elsayed, N. Garg, J. J. J. van der Hooft, N. I. Martin, D. Meijer, B. R. Terlouw, F. Biermann, K. Blin, J. Durairaj, M. G. González, E. J. N. Helfrich, F. Huber, S. Leopold-Messer, K. Rajan, T. de Rond, J. A. van Santen, M. Sorokina, M. J. Balunas, M. A. Beniddir, D. A. van Bergeijk, L. M. Carroll, C. M. Clark, D. A. Clevert, C. A. Dejong, C. Du, S. Ferrinho, F. Grisoni, A. Hofstetter, W. Jespers, O. V. Kalinina, S. A. Kautsar, H. Kim, T. F. Leao, J. Masschelein, E. R. Rees, R. Reher, D. Reker, P. Schwaller, M. Segler, M. A. Skinnider, A. S. Walker, E. L. Willighagen, B. Zdrazil, N. Ziemert, R. J. M. Goss, P. Guyomard, A. Volkamer, W. H. Gerwick, H. U. Kim, R. Müller, G. P. van Wezel, G. J. P. van Westen, A. K. H. Hirsch, R. G. Linington, S. L. Robinson and M. H. Medema, Nat. Rev. Drug Discovery, 2023, 22, 895–916 Search PubMed.
  5. N. Ziemert, M. Alanjary and T. Weber, Nat. Prod. Rep., 2016, 33, 988–1005 RSC.
  6. A. Milshteyn, D. A. Colosimo and S. F. Brady, Cell Host Microbe, 2018, 23, 725–736 Search PubMed.
  7. K. D. Bauman, K. S. Butler, B. S. Moore and J. R. Chekan, Nat. Prod. Rep., 2021, 38, 2100–2129 RSC.
  8. F. Hemmerling and J. Piel, Nat. Rev. Drug Discovery, 2022, 21, 359–378 CrossRef CAS.
  9. K. Scherlach and C. Hertweck, Nat. Commun., 2021, 12, 3864 CrossRef CAS PubMed.
  10. F. Bucar, A. Wube and M. Schmid, Nat. Prod. Rep., 2013, 30, 525–545 RSC.
  11. L. Katz and R. H. Baltz, J. Ind. Microbiol. Biotechnol., 2016, 43, 155–176 CrossRef CAS PubMed.
  12. S. A. Waksman and H. A. Lechevalier, Science, 1949, 109, 305–307 CrossRef CAS.
  13. K. C. Nicolaou and S. Rigol, J. Antibiot., 2018, 71, 153–184 CrossRef CAS.
  14. A. Di Marco, G. Cassinelli and F. Arcamone, Cancer Treat. Rep., 1981, 65, 3–8 CAS.
  15. W. H. Lewis, G. Tahon, P. Geesink, D. Z. Sousa and T. J. Ettema, Nat. Rev. Microbiol., 2021, 19, 225–240 CrossRef CAS.
  16. A. R. Carroll, B. R. Copp, R. A. Davis, R. A. Keyzers and M. R. Prinsep, Nat. Prod. Rep., 2023, 40, 275–325 RSC.
  17. S. Lautru, R. J. Deeth, L. M. Bailey and G. L. Challis, Nat. Chem. Biol., 2005, 1, 265–269 CrossRef CAS.
  18. J. P. Gomez-Escribano, L. Song, D. J. Fox, V. Yeo, M. J. Bibb and G. L. Challis, Chem. Sci., 2012, 3, 2716–2720 RSC.
  19. S. Bergmann, J. Schümann, K. Scherlach, C. Lange, A. A. Brakhage and C. Hertweck, Nat. Chem. Biol., 2007, 3, 213–217 CrossRef CAS PubMed.
  20. J. Franke, K. Ishida and C. Hertweck, Angew. Chem., 2012, 124, 11779–11783 CrossRef.
  21. S. Nayfach, S. Roux, R. Seshadri, D. Udwary, N. Varghese, F. Schulz, D. Y. Wu, D. Paez-Espino, I. M. Chen, M. Huntemann, K. Palaniappan, J. Ladau, S. Mukherjee, T. B. K. Reddy, T. Nielsen, E. Kirton, J. P. Faria, J. N. Edirisinghe, C. S. Henry, S. P. Jungbluth, D. Chivian, P. Dehal, E. M. Wood-Charlson, A. P. Arkin, S. G. Tringe, A. Visel, IMG/M Data Consortium, T. Woyke, N. J. Mouncey, N. N. Ivanova, N. C. Kyrpides and E. A. Eloe-Fadrosh, Nat. Biotechnol., 2021, 39, 499–509 CrossRef CAS PubMed.
  22. Integrative HMP (iHMP) Research Network Consortium, L. M. Proctor, H. H. Creasy, J. M. Fettweis, J. Lloyd-Price, A. Mahurkar, W. Y. Zhou, G. A. Buck, M. P. Snyder, J. F. Strauss, G. M. Weinstock, O. White and C. Huttenhower, Nature, 2019, 569, 641–648 CrossRef PubMed.
  23. Q. L. Yan, S. H. Li, Q. S. Yan, X. K. Huo, C. Wang, X. F. Wang, Y. Sun, W. Y. Zhao, Z. L. Yu, Y. Zhang, R. C. Guo, Q. B. Lv, X. He, C. L. Yao, Z. M. Li, F. Chen, Q. R. Ji, A. Q. Zhang, H. Jin, G. Y. Wang, X. Y. Feng, L. Feng, F. Wu, J. Ning, S. Deng, Y. An, D. A. Guo, F. M. Martin and X. C. Ma, Cell, 2024, 187, 2969–2989e24 CrossRef CAS.
  24. J. W. Chen, Y. Y. Jia, Y. Sun, K. Liu, C. H. Zhou, C. Liu, D. H. Li, G. L. Liu, C. S. Zhang, T. Yang, L. Huang, Y. Y. Zhuang, D. Z. Wang, D. Y. Xu, Q. L. Zhong, Y. Guo, A. D. Li, I. Seim, L. Jiang, L. S. Wang, S. M. Y. Lee, Y. J. Liu, D. T. Wang, G. Q. Zhang, S. S. Liu, X. F. Wei, Z. Yue, S. M. Zheng, X. C. Shen, S. Wang, C. Qi, J. Chen, C. Ye, F. Zhao, J. Wang, J. Fan, B. T. Li, J. H. Sun, X. D. Jia, Z. Y. Xia, H. Zhang, J. N. Liu, Y. Zheng, X. Liu, J. Wang, H. M. Yang, K. Kristiansen, X. Xu, T. Mock, S. Y. Li, W. W. Zhang and G. Y. Fan, Nature, 2024, 633, 371–379 CrossRef CAS.
  25. Y. Q. Liu, M. K. Ji, T. Yu, J. Zaugg, A. M. Anesio, Z. H. Zhang, S. N. Hu, P. Hugenholtz, K. S. Liu, P. F. Liu, Y. Y. Chen, Y. F. Luo and T. D. Yao, Nat. Biotechnol., 2022, 40, 1341–1348 CrossRef CAS PubMed.
  26. A. G. Atanasov, S. B. Zotchev, V. M. Dirsch, C. T. Supuran and I. N. P. S. Taskforce, Nat. Rev. Drug Discovery, 2021, 20, 200–216 CrossRef CAS PubMed.
  27. C. M. F. Ancajas, A. S. Oyedele, C. M. Butt and A. S. Walker, Nat. Prod. Rep., 2024, 41, 1543–1578 RSC.
  28. S. G. Van Lanen and B. Shen, Curr. Top. Med. Chem., 2008, 8, 448–459 CrossRef CAS.
  29. C. Gui, E. Kalkreuter, L. Lauterbach, D. Yang and B. Shen, Nat. Chem. Biol., 2024, 20, 1–10 CrossRef.
  30. E. J. Han and M. Seyedsayamdost, Curr. Opin. Chem. Biol., 2024, 81, 102481 CrossRef CAS.
  31. X. H. Yan, H. M. Ge, T. T. Huang, Hindra, D. Yang, Q. H. Teng, I. Crnovcic, X. L. Li, J. D. Rudolf, J. R. Lohman, Y. Gansemans, X. C. Zhu, Y. Huang, L. X. Zhao, Y. Jiang, F. Van Nieuwerburgh, C. Rader, Y. W. Duan and B. Shen, mBio, 2016, 7, e02104 CrossRef CAS PubMed.
  32. M. Schorn, J. Zettler, J. P. Noel, P. C. Dorrestein, B. S. Moore and L. Kaysser, ACS Chem. Biol., 2014, 9, 301–309 CrossRef CAS PubMed.
  33. J. G. Owen, Z. Charlop-Powers, A. G. Smith, M. A. Ternei, P. Y. Calle, B. V. B. Reddy, D. Montiel and S. F. Brady, Proc. Natl. Acad. Sci. U. S. A., 2015, 112, 4221–4226 CrossRef CAS PubMed.
  34. P. Zhou, Y. She, N. Dong, P. Li, H. B. He, A. Borio, Q. C. Wu, S. Lu, X. J. Ding, Y. Cao, Y. Xu, W. Q. Gao, M. Q. Dong, J. J. Ding, D. C. Wang, A. Zamyatina and F. Shao, Nature, 2018, 561, 122–126 CrossRef CAS PubMed.
  35. Y. Tang, X. Y. Tian, M. Wang, Y. L. Cui, Y. She, Z. X. Shi, J. Q. Liu, H. J. Mao, L. L. Liu, C. Li, Y. W. Zhang, P. W. Li, Y. Ma, J. Y. Sun, Q. Du, J. Li, J. Wang, D. F. Li, B. Wu, F. Shao and Y. H. Chen, Science, 2024, 385, 678–684 CrossRef CAS PubMed.
  36. H. Y. Wang, L. J. Wang, D. Li, K. Q. Fan, Y. Z. Yang, H. L. Cao, J. N. Sun, J. W. Ren, Y. Liu, L. J. Xiang, W. S. Li, M. H. Pan, H. T. Hu, Y. H. Chen, Z. R. Xu, Y. Huang, W. S. Wang and G. H. Pan, J. Am. Chem. Soc., 2025, 147, 15100–15114 CrossRef CAS.
  37. Q. S. Deng, Y. C. Li, W. Y. He, T. Chen, N. Liu, L. M. Ma, Z. X. Qiu, Z. Shang and Z. Q. Wang, Nature, 2025, 640, 743–751 CrossRef CAS PubMed.
  38. Z. Q. Wang, B. Koirala, Y. Hernandez, M. Zimmerman, S. Park, D. S. Perlin and S. F. Brady, Nature, 2022, 601, 606–611 CrossRef CAS.
  39. L. Zhong, X. T. Diao, N. Zhang, F. W. Li, H. B. Zhou, H. N. Chen, X. P. Bai, X. T. Ren, Y. M. Zhang, D. L. Wu and X. Y. Bian, Nat. Commun., 2021, 12, 296 CrossRef CAS.
  40. Z. Q. Wang, B. Koirala, Y. Hernandez, M. Zimmerman and S. F. Brady, Science, 2022, 376, 991–996 CrossRef CAS PubMed.
  41. K. H. Almabruk, L. K. Dinh and B. Philmus, ACS Chem. Biol., 2018, 13, 1426–1437 CrossRef CAS PubMed.
  42. Y. Yan, N. Liu and Y. Tang, Nat. Prod. Rep., 2020, 37, 879–892 RSC.
  43. Y. L. Zhang, J. Bai, L. Zhang, C. Zhang, B. Y. Liu and Y. C. Hu, Angew. Chem., Int. Ed., 2021, 60, 6639–6645 CrossRef CAS.
  44. T. B. Regueira, K. R. Kildegaard, B. G. Hansen, U. H. Mortensen, C. Hertweck and J. Nielsen, Appl. Environ. Microbiol., 2011, 77, 3035–3043 CrossRef CAS PubMed.
  45. Z.-X. Wang, S.-M. Li and L. Heide, Antimicrob. Agents Chemother., 2000, 44, 3040–3048 CrossRef CAS PubMed.
  46. B. F. Zhong, J. Wan, C. H. Shang, J. J. Wen, Y. J. Wang, J. Bai, S. Cen and Y. C. Hu, Acta Pharm. Sin. B, 2022, 12, 4193–4203 CrossRef CAS PubMed.
  47. T. A. Scott, S. F. Batey, P. Wiencek, G. Chandra, S. Alt, C. S. Francklyn and B. Wilkinson, ACS Chem. Biol., 2019, 14, 2663–2671 CrossRef CAS PubMed.
  48. A. Kling, P. Lukat, D. V. Almeida, A. Bauer, E. Fontaine, S. Sordello, N. Zaburannyi, J. Herrmann, S. C. Wenzel, C. Konig, N. C. Ammerman, M. B. Barrio, K. Borchers, F. Bordon-Pallier, M. Bronstrup, G. Courtemanche, M. Gerlitz, M. Geslin, P. Hammann, D. W. Heinz, H. Hoffmann, S. Klieber, M. Kohlmann, M. Kurz, C. Lair, H. Matter, E. Nuermberger, S. Tyagi, L. Fraisse, J. H. Grosset, S. Lagrange and R. Muller, Science, 2015, 348, 1106–1112 CrossRef CAS.
  49. K. L. Dunbar, B. Perlatti, N. Liu, A. Cornelius, D. Mummau, Y.-M. Chiang, L. Hon, M. Nimavat, J. Pallas and S. Kordes, Proc. Natl. Acad. Sci. U. S. A., 2023, 120, e2310522120 CrossRef CAS.
  50. X. Y. Tang, J. Li, N. Millán-Aguiñaga, J. J. Zhang, E. C. O'Neill, J. A. Ugalde, P. R. Jensen, S. M. Mantovani and B. S. Moore, ACS Chem. Biol., 2015, 10, 2841–2849 CrossRef CAS PubMed.
  51. F. Panter, D. Krug, S. Baumann and R. Müller, Chem. Sci., 2018, 9, 4898–4908 RSC.
  52. Y. Yan, Q. Liu, X. Zang, S. Yuan, U. Bat-Erdene, C. Nguyen, J. Gan, J. Zhou, S. E. Jacobsen and Y. Tang, Nature, 2018, 559, 415–418 CrossRef CAS PubMed.
  53. E. J. Culp, D. Sychantha, C. Hobson, A. C. Pawlowski, G. Prehna and G. D. Wright, Nat. Microbiol., 2022, 7, 451–462 CrossRef CAS PubMed.
  54. V. Yarlagadda, R. Medina, T. A. Johnson, K. P. Koteva, G. Cox, M. N. Thaker and G. D. Wright, ACS Infect. Dis., 2020, 6, 3163–3173 CrossRef CAS.
  55. K. L. Dunbar, B. Perlatti, N. Liu, A. Cornelius, D. Mummau, Y. M. Chiang, L. Hon, M. Nimavat, J. Pallas, S. Kordes, H. L. Ng and C. J. B. Harvey, Proc. Natl. Acad. Sci. U. S. A., 2023, 120, e2310522120 CrossRef CAS.
  56. M. Alanjary, B. Kronmiller, M. Adamek, K. Blin, T. Weber, D. Huson, B. Philmus and N. Ziemert, Nucleic Acids Res., 2017, 45, W42–W48 Search PubMed.
  57. M. D. Mungan, M. Alanjary, K. Blin, T. Weber, M. H. Medema and N. Ziemert, Nucleic Acids Res., 2020, 48, W546–W552 CrossRef CAS PubMed.
  58. M. D. Mungan, K. Blin and N. Ziemert, Nucleic Acids Res., 2022, 50, D736–D740 CrossRef CAS.
  59. I. Kjærbølling, T. Vesth and M. R. Andersen, mSystems, 2019, 4, e00085,  DOI:10.1128/msystems.00085-19.
  60. T. M. Yilmaz, M. D. Mungan, A. Berasategui and N. Ziemert, Nucleic Acids Res., 2023, 51, W191–W197 Search PubMed.
  61. Y. J. Yuan, C. S. Huang, N. Singh, G. H. Xun and H. M. Zhao, Cell Syst., 2025, 16, 101237 CrossRef CAS PubMed.
  62. Y. Y. Wang, Y. N. Shi, H. Xiang and Y. M. Shi, Nat. Prod. Rep., 2024, 41, 1630–1651 RSC.
  63. Y. Imai, K. J. Meyer, A. Iinishi, Q. Favre-Godal, R. Green, S. Manuse, M. Caboni, M. Mori, S. Niles, M. Ghiglieri, C. Honrao, X. Ma, J. J. Guo, A. Makriyannis, L. Linares-Otoya, N. Böhringer, Z. G. Wuisan, H. Kaur, R. Wu, A. Mateus, A. Typas, M. M. Savitski, J. L. Espinoza, A. O'Rourke, K. E. Nelson, S. Hiller, N. Noinaj, T. F. Schäberle, A. D'Onofrio and K. Lewis, Nature, 2019, 576, 459–464 CrossRef CAS.
  64. S. Götze, R. Vij, K. Burow, N. Thome, L. Urbat, N. Schlosser, S. Pflanze, R. Müller, V. G. Hänsch, K. Schlabach, L. Fazlikhani, G. Walther, H. M. Dahse, L. Regestein, S. Brunke, B. Hube, C. Hertweck, P. Franken and P. Stallforth, J. Am. Chem. Soc., 2023, 145, 2342–2353 CrossRef.
  65. S. D. Xu, Y. X. Liu, T. Cernava, H. K. Wang, Y. Q. Zhou, T. Xia, S. G. Cao, G. Berg, X. X. Shen, Z. Y. Wen, C. S. Li, B. Y. Qu, H. F. Ruan, Y. R. Chai, X. P. Zhou, Z. H. Ma, Y. Shi, Y. L. Yu, Y. Bai and Y. Chen, Nat. Microbiol., 2022, 7, 831–843 CrossRef CAS PubMed.
  66. B. Krismer, M. Liebeke, D. Janek, M. Nega, M. Rautenberg, G. Hornig, C. Unger, C. Weidenmaier, M. Lalk and A. Peschel, PLoS Pathog., 2014, 10, e1003862 CrossRef.
  67. A. Zipperer, M. C. Konnerth, C. Laux, A. Berscheid, D. Janek, C. Weidenmaier, M. Burian, N. A. Schilling, C. Slavetinsky, M. Marschal, M. Willmann, H. Kalbacher, B. Schittek, H. Brötz-Oesterhelt, S. Grond, A. Peschel and B. Krismer, Nature, 2016, 535, 511–516 CrossRef CAS PubMed.
  68. K. Bitschar, B. Sauer, J. Focken, H. Dehmer, S. Moos, M. Konnerth, N. A. Schilling, S. Grond, H. Kalbacher, F. C. Kurschus, F. Götz, B. Krismer, A. Peschel and B. Schittek, Nat. Commun., 2019, 10, 2730 CrossRef PubMed.
  69. B. O. T. Salazar, T. Dema, N. A. Schilling, D. Janek, J. Bornikoel, A. Berscheid, A. M. A. Elsherbini, S. Krauss, S. J. Jaag, M. Lämmerhofer, M. Li, N. Alqahtani, M. J. Horsburgh, T. Weber, J. M. Beltrán-Beleña, H. Brötz-Oesterhelt, S. Grond, B. Krismer and A. Peschel, Nat. Microbiol., 2024, 9, 200–213 CrossRef.
  70. M. Z. Xue, C. S. Kim, A. R. Healy, K. M. Wernke, Z. X. Wang, M. C. Frischling, E. E. Shine, W. W. Wang, S. B. Herzon and J. M. Crawford, Science, 2019, 365, eaax2685 CrossRef CAS PubMed.
  71. M. R. Wilson, Y. Jiang, P. W. Villalta, A. Stornetta, P. D. Boudreau, A. Carrá, C. A. Brennan, E. Chun, L. Ngo, L. D. Samson, B. P. Engelward, W. S. Garrett, S. Balbo and E. P. Balskus, Science, 2019, 363, eaar7785 CrossRef CAS PubMed.
  72. P. J. Dziubańska-Kusibab, H. Berger, F. Battistini, B. A. M. Bouwman, A. Iftekhar, R. Katainen, T. Cajuso, N. Crosetto, M. Orozco, L. A. Aaltonen and T. F. Meyer, Nat. Med., 2020, 26, 1063–1069 CrossRef PubMed.
  73. G. Schneditz, J. Rentner, S. Roier, J. Pletz, K. A. Herzog, R. Bücker, H. Troeger, S. Schild, H. Weber, R. Breinbauer, G. Gorkiewicz, C. Högenauer and E. L. Zechner, Proc. Natl. Acad. Sci. U. S. A., 2014, 111, 13181–13186 CrossRef CAS PubMed.
  74. K. Unterhauser, L. Pöltl, G. Schneditz, S. Kienesberger, R. A. Glabonjat, M. Kitsera, J. Pletz, F. Josa-Prado, E. Dornisch, C. Lembacher-Fadum, S. Roier, G. Gorkiewicz, D. Lucena, I. Barasoain, W. Kroutil, M. Wiedner, J. I. Loizou, R. Breinbauer, J. F. Díaz, S. Schild, C. Högenauer and E. L. Zechner, Proc. Natl. Acad. Sci. U. S. A., 2019, 116, 3774–3783 CrossRef CAS PubMed.
  75. L. Pöltl, M. Kitsera, S. Raffl, S. Schild, A. Cosic, S. Kienesberger, K. Unterhauser, G. Raber, C. Lembacher-Fadum, R. Breinbauer, G. Gorkiewicz, C. Sebastian, G. Hoefler and E. L. Zechner, Cell Rep., 2023, 42, 112199 CrossRef PubMed.
  76. V. J. Sahayasheela, M. B. Lankadasari, V. M. Dan, S. G. Dastager, G. N. Pandian and H. Sugiyama, Nat. Prod. Rep., 2022, 39, 2215–2230 RSC.
  77. Y. J. Yuan, C. Y. Shi and H. M. Zhao, ACS Synth. Biol., 2023, 12, 2650–2662 CrossRef CAS PubMed.
  78. A. Gangwal and A. Lavecchia, J. Med. Chem., 2025, 68, 3948–3969 CrossRef CAS PubMed.
  79. C. Rios-Martinez, N. Bhattacharya, A. P. Amini, L. Crawford and K. K. Yang, PLoS Comput. Biol., 2023, 19, e1011162 CrossRef CAS PubMed.
  80. Q. L. Lai, S. Yao, Y. G. Zha, H. H. Zhang, H. B. Zhang, Y. Ye, Y. H. Zhang, H. Bai and K. Ning, Nucleic Acids Res., 2025, 53, gkaf305 CrossRef PubMed.
  81. E. L. C. de los Santos, Sci. Rep., 2019, 9, 13406 CrossRef PubMed.
  82. N. J. Merwin, W. K. Mousa, C. A. Dejong, M. A. Skinnider, M. J. Cannon, H. X. Li, K. Dial, M. Gunabalasingam, C. Johnston and N. A. Magarvey, Proc. Natl. Acad. Sci. U. S. A., 2020, 117, 371–380 CrossRef CAS PubMed.
  83. A. M. Kloosterman, P. Cimermancic, S. S. Elsayed, C. Du, M. Hadjithomas, M. S. Donia, M. A. Fischbach, G. P. van Wezel and M. H. Medema, PLoS Biol., 2020, 18, e3001026 CrossRef CAS.
  84. P. Agrawal, S. Amir, Deepak, D. Barua and D. Mohanty, J. Mol. Biol., 2021, 433, 166887 CrossRef CAS.
  85. Y. Gao, Z. Zhong, D. W. Zhang, J. Zhang and Y. X. Li, Microbiome, 2024, 12, 94 CrossRef CAS.
  86. G. D. Hannigan, D. Prihoda, A. Palicka, J. Soukup, O. Klempir, L. Rampula, J. Durcak, M. Wurst, J. Kotowski, D. Chang, R. Wang, G. Piizzi, G. Temesi, D. J. Hazuda, C. H. Woelk and D. A. Bitton, Nucleic Acids Res., 2019, 47, e110 CrossRef CAS.
  87. M. A. Skinnider, C. W. Johnston, M. Gunabalasingam, N. J. Merwin, A. M. Kieliszek, R. J. MacLellan, H. Li, M. R. M. Ranieri, A. L. H. Webster, M. P. T. Cao, A. Pfeifle, N. Spencer, Q. H. To, D. P. Wallace, C. A. Dejong and N. A. Magarvey, Nat. Commun., 2020, 11, 6058 CrossRef CAS.
  88. A. S. Walker and J. Clardy, J. Chem. Inf. Model., 2021, 61, 2560–2571 CrossRef CAS PubMed.
  89. S. Lata, N. K. Mishra and G. P. Raghava, BMC Bioinf., 2010, 11, 1–7 CrossRef PubMed.
  90. P. Bhadra, J. Yan, J. Li, S. Fong and S. W. Siu, Sci. Rep., 2018, 8, 1697 CrossRef PubMed.
  91. D. Veltri, U. Kamath and A. Shehu, Bioinformatics, 2018, 34, 2740–2747 CrossRef CAS PubMed.
  92. Y. Ma, Z. Y. Guo, B. B. Xia, Y. W. Zhang, X. L. Liu, Y. Yu, N. Tang, X. M. Tong, M. Wang, X. Ye, J. Feng, Y. H. Chen and J. Wang, Nat. Biotechnol., 2022, 40, 921–931 CrossRef CAS PubMed.
  93. Y. Ma, X. L. Liu, X. Zhang, Y. Yu, Y. J. Li, M. S. Song and J. Wang, Adv. Sci., 2023, 10, 2300107 CrossRef CAS.
  94. M. D. T. Torres, E. F. Brooks, A. Cesaro, H. Sberro, M. O. Gill, C. Nicolaou, A. S. Bhatt and C. de la Fuente-Nunez, Cell, 2024, 187, 5453–5467 CrossRef CAS.
  95. S. Z. Chen, H. T. Qi, X. Z. Zhu, T. X. Liu, Y. T. Fan, Q. Su, Q. Y. Gong, C. Z. Jia and T. Liu, Microbiome, 2024, 12, 272 CrossRef CAS PubMed.
  96. C. D. Santos-Júnior, M. D. T. Torres, Y. Q. Duan, Á. Rodríguez del Río, T. S. B. Schmidt, H. Chong, A. Fullam, M. Kuhn, C. K. Zhu, A. Houseman, J. Somborski, A. Vines, X. M. Zhao, P. Bork, J. Huerta-Cepas, C. de la Fuente-Nunez and L. P. Coelho, Cell, 2024, 187, 3761–3778 CrossRef PubMed.
  97. A. K. Jarmusch, M. X. Wang, C. M. Aceves, R. S. Advani, S. Aguirre, A. A. Aksenov, G. Aleti, A. T. Aron, A. Bauermeister, S. Bolleddu, A. Bouslimani, A. M. C. Rodriguez, R. Chaar, R. Coras, E. O. Elijah, M. Ernst, J. M. Gauglitz, E. C. Gentry, M. Husband, S. A. Jarmusch, K. Jones, Z. Kamenik, A. Le Gouellec, A. Lu, L. I. Mccall, K. L. McPhail, M. J. Meehan, A. Melnik, R. C. Menezes, Y. A. M. Giraldo, N. H. Nguyen, L. F. Nothias, M. Nothias-Esposito, M. Panitchpakdi, D. Petras, R. A. Quinn, N. Sikora, J. J. J. van der Hooft, F. Vargas, A. Vrbanac, K. C. Weldon, R. Knight, N. Bandeira and P. C. Dorrestein, Nat. Methods, 2020, 17, 901–904 CrossRef CAS PubMed.
  98. T. Damiani, A. K. Jarmusch, A. T. Aron, D. Petras, V. V. Phelan, H. N. Zhao, W. Bittremieux, D. D. Acharya, M. M. A. Ahmed, A. Bauermeister, M. J. Bertin, P. D. Boudreau, R. M. Borges, B. P. Bowen, C. J. Brown, F. O. Chagas, K. D. Clevenger, M. S. P. Correia, W. J. Crandall, M. Crüsemann, E. Fahy, O. Fiehn, N. Garg, W. H. Gerwick, J. R. Gilbert, D. Globisch, P. W. P. Gomes, S. Heuckeroth, C. A. James, S. A. Jarmusch, S. A. Kakhkhorov, K. B. Kang, N. Kessler, R. D. Kersten, H. Kim, R. D. Kirk, O. Kohlbacher, E. E. Kontou, K. Liu, I. Lizama-Chamu, G. T. Luu, T. Luzzatto Knaan, H. Mannochio-Russo, M. T. Marty, Y. Matsuzawa, A. C. McAvoy, L.-I. McCall, O. G. Mohamed, O. Nahor, H. Neuweger, T. H. J. Niedermeyer, K. Nishida, T. R. Northen, K. E. Overdahl, J. Rainer, R. Reher, E. Rodriguez, T. T. Sachsenberg, L. M. Sanchez, R. Schmid, C. Stevens, S. Subramaniam, Z. Tian, A. Tripathi, H. Tsugawa, J. J. J. van der Hooft, A. Vicini, A. Walter, T. Weber, Q. Xiong, T. Xu, T. Pluskal, P. C. Dorrestein and M. Wang, Nat. Methods, 2025, 22, 1247–1254,  DOI:10.1038/s41592-025-02660-z.
  99. M. M. Zdouc, K. Blin, N. L. L. Louwen, J. Navarro, C. Loureiro, C. D. Bader, C. B. Bailey, L. Barra, T. J. Booth, K. A. J. Bozhüyük, J. D. D. Cediel-Becerra, Z. Charlop-Powers, M. G. Chevrette, Y. H. Chooi, P. M. D'Agostino, T. de Rond, E. Del Pup, K. R. Duncan, W. J. Gu, N. Hanif, E. J. N. Helfrich, M. Jenner, Y. Katsuyama, A. Korenskaia, D. Krug, V. Libis, G. A. Lund, S. Mantri, K. D. Morgan, C. Owen, C. S. Phan, B. Philmus, Z. L. Reitz, S. L. Robinson, K. S. Singh, R. Teufel, Y. J. Tong, F. Tugizimana, D. Ulanova, J. M. Winter, C. Aguilar, D. Y. Akiyama, S. A. A. Al-Salihi, M. Alanjary, F. Alberti, G. Aleti, S. A. Alharthi, M. Y. A. Rojo, A. A. Arishi, H. E. Augustijn, N. E. Avalon, J. A. Avelar-Rivas, K. K. Axt, H. B. Barbieri, J. C. J. Barbosa, L. G. B. Segato, S. E. Barrett, M. Baunach, C. Beemelmanns, D. Beqaj, T. Berger, J. Bernaldo-Agüero, S. M. Bettenbühl, V. A. Bielinski, F. Biermann, R. M. Borges, R. Borriss, M. Breitenbach, K. M. Bretscher, M. W. Brigham, L. Buedenbender, B. W. Bulcock, C. Cano-Prieto, J. Capela, V. J. Carrion, R. S. Carter, R. Castelo-Branco, G. Castro-Falcón, F. O. Chagas, E. Charria-Girón, A. A. Chaudhri, V. Chaudhry, H. Choi, Y. Choi, R. Choupannejad, J. Chromy, M. S. C. Donahey, J. Collemare, J. A. Connolly, K. E. Creamer, M. Crüsemann, A. A. Cruz, A. Cumsille, J. F. Dallery, L. C. Damas-Ramos, T. Damiani, M. de Kruijff, B. D. Martín, G. Della Sala, J. Dillen, D. T. Doering, S. R. Dommaraju, S. Durusu, S. Egbert, M. Ellerhorst, B. Faussurier, A. Fetter, M. Feuermann, D. P. Fewer, J. Foldi, A. Frediansyah, E. A. Garza, A. Gavriilidou, A. Gentile, J. Gerke, H. Gerstmans, J. P. Gomez-Escribano, L. A. González-Salazar, N. E. Grayson, C. Greco, J. E. G. Gomez, S. Guerra, S. G. Flores, A. Gurevich, K. Gutiérrez-García, L. Hart, K. Haslinger, B. B. He, T. Hebra, J. L. Hemmann, H. Hindra, L. Höing, D. C. Holland, J. E. Holme, T. Horch, P. Hrab, J. Hu, T. H. Huynh, J. Y. Hwang, R. Iacovelli, D. Iftime, M. Iorio, S. Jayachandran, E. Jeong, J. Y. Jing, J. J. Jung, Y. Kakumu, E. Kalkreuter, K. B. Kang, S. Kang, W. Kim, G. J. Kim, H. Kim, H. U. Kim, M. Klapper, R. A. Koetsier, C. Kollten, A. T. Kovács, Y. Kriukova, N. Kubach, A. M. Kunjapur, A. K. Kushnareva, A. Kust, J. Lamber, M. Larralde, N. J. Larsen, A. P. Launay, N. T. H. Le, S. Lebeer, B. T. Lee, K. Lee, K. L. Lev, S. M. Li, Y. X. Li, C. Licona-Cassani, A. Lien, J. Liu, J. A. V. Lopez, N. V. Machushynets, M. I. Macias, T. Mahmud, M. Maleckis, A. M. Martinez-Martinez, Y. Mast, M. F. Maximo, C. M. McBride, R. M. McLellan, K. M. Bhatt, C. Melkonian, A. Merrild, M. Metsä-Ketelä, D. A. Mitchell, A. V. Müller, G. S. Nguyen, H. T. Nguyen, T. H. J. Niedermeyer, J. H. O'Hare, A. Ossowicki, B. O. Ostash, H. Otani, L. Padva, S. Paliyal, X. Y. Pan, M. Panghal, D. S. Parade, J. Park, J. Parra, M. P. Rubio, H. T. Pham, S. J. Pidot, J. Piel, B. Pourmohsenin, M. Rakhmanov, S. Ramesh, M. H. Rasmussen, A. Rego, R. Reher, A. J. Rice, A. Rigolet, A. Romero-Otero, L. R. Rosas-Becerra, P. Y. Rosiles, A. Rutz, B. Ryu, L. A. Sahadeo, M. Saldanha, L. Salvi, E. Sánchez-Carvajal, C. Santos-Medellin, N. Sbaraini, S. M. Schoellhorn, C. Schumm, L. Sehnal, N. Selem, A. D. Shah, T. K. Shishido, S. Sieber, V. Silviani, G. Singh, H. Singh, N. Sokolova, E. C. Sonnenschein, M. Sosio, S. T. Sowa, K. Steffen, E. Stegmann, A. B. Streiff, A. Strüder, F. Surup, T. Svenningsen, D. Sweeney, J. Szenei, A. Tagirdzhanov, B. Tan, M. J. Tarnowski, B. R. Terlouw, T. Rey, N. U. Thome, L. R. T. Ortega, T. Torring, M. Trindade, A. W. Truman, M. Tvilum, D. W. Udwary, C. Ulbricht, L. Vader, G. P. van Wezel, M. Walmsley, R. Warnasinghe, H. G. Weddeling, A. N. M. Weir, K. Williams, S. E. Williams, T. E. Witte, S. M. W. Rocca, K. Yamada, D. Yang, D. Yang, J. W. Yu, Z. Y. Zhou, N. Ziemert, L. Zimmer, A. Zimmermann, C. Zimmermann, J. J. J. van der Hooft, R. G. Linington, T. Weber and M. H. Medema, Nucleic Acids Res., 2025, 53, D678–D690 CrossRef PubMed.
  100. H. Zhao, Y. Yang, S. Q. Wang, X. Yang, K. C. Zhou, C. L. Xu, X. Y. Zhang, J. J. Fan, D. Y. Hou, X. X. Li, H. B. Lin, Y. Tan, S. S. Wang, X. Y. Chu, D. Zhuoma, F. Y. Zhang, D. W. Ju, X. Zeng and Y. Z. Chen, Nucleic Acids Res., 2023, 51, D621–D628 CrossRef CAS PubMed.
  101. E. F. Poynton, J. A. van Santen, M. Pin, M. M. Contreras, E. Mcmann, J. Parra, B. Showalter, L. Zaroubi, K. R. Duncan and R. G. Linington, Nucleic Acids Res., 2024, 53, D691–D699 CrossRef PubMed.
  102. D. W. Udwary, D. T. Doering, B. Foster, T. Smirnova, S. A. Kautsar and N. J. Mouncey, Nucleic Acids Res., 2024, 53, D717–D723 CrossRef PubMed.
  103. J. M. Stokes, K. Yang, K. Swanson, W. G. Jin, A. Cubillos-Ruiz, N. M. Donghia, C. R. MacNair, S. French, L. A. Carfrae, Z. Bloom-Ackermann, V. M. Tran, A. Chiappino-Pepe, A. H. Badran, I. W. Andrews, E. J. Chory, G. M. Church, E. D. Brown, T. S. Jaakkola, R. Barzilay and J. J. Collins, Cell, 2020, 180, 688–702 CrossRef CAS.
  104. G. Liu, D. B. Catacutan, K. Rathod, K. Swanson, W. G. Jin, J. C. Mohammed, A. Chiappino-Pepe, S. A. Syed, M. Fragis, K. Rachwalski, J. Magolan, M. G. Surette, B. K. Coombes, T. Jaakkola, R. Barzilay, J. J. Collins and J. M. Stokes, Nat. Chem. Biol., 2023, 19, 1342–1350 CrossRef CAS PubMed.
  105. F. Wong, E. J. Zheng, J. A. Valeri, N. M. Donghia, M. N. Anahtar, S. Omori, A. Li, A. Cubillos-Ruiz, A. Krishnan, W. G. Jin, A. L. Manson, J. Friedrichs, R. Helbig, B. Hajian, D. K. Fiejtek, F. F. Wagner, H. H. Soutter, A. M. Earl, J. M. Stokes, L. D. Renner and J. J. Collins, Nature, 2024, 626, 177–185 CrossRef CAS PubMed.

Footnote

Authors contributed equally.

This journal is © The Royal Society of Chemistry 2025
Click here to see how this site uses Cookies. View our privacy policy here.