AI and mechanistic modeling for characterizing biosynthetic pathways of natural products

Byung Tae Lee a, Byeongsub Lee b, Joon Young Kwon a, Tilmann Weber *d and Hyun Uk Kim *abc
aDepartment of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea. E-mail: ehukim@kaist.ac.kr
bGraduate School of Engineering Biology, KAIST, Daejeon 34141, Republic of Korea
cBioProcess Engineering Research Center, KAIST, Daejeon 34141, Republic of Korea
dThe Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, 2800, Denmark. E-mail: tiwe@biosustain.dtu.dk

Received 28th August 2025

First published on 7th November 2025


Abstract

Covering: 2020 to 2025

Natural products are a major source of bioactive compounds, yet elucidating their biosynthetic pathways remains a major challenge due to complex genotype–phenotype relationships. Recent advances in computational approaches, particularly artificial intelligence (AI) and mechanistic modeling, are transforming this field. This highlight examines key databases that underpin computational studies, AI-driven methods for predicting biosynthetic pathways and enzyme–substrate interactions, and mechanistic simulations that provide energetic and structural insights. We also discuss current challenges and future opportunities for integrating these strategies to accelerate discovery, engineering, and application of natural products in drug discovery, biotechnology, and synthetic biology.


image file: d5np00059a-p1.tif

Byung Tae Lee

Byung Tae Lee obtained his BS, MS, and PhD degrees in Chemical and Biomolecular Engineering from the Korea Advanced Institute of Science and Technology (KAIST). He is currently working as a postdoctoral researcher at KAIST. His research focuses on the systems biology of microorganisms, with particular emphasis on genome-scale metabolic modeling and AI. He applies these approaches to rationally design and engineer microbial hosts for the efficient production of natural products.

image file: d5np00059a-p2.tif

Byeongsub Lee

Byeongsub Lee is an MS student at the Graduate School of Engineering Biology, KAIST. He earned his BS in the Department of Life Sciences at Gwangju Institute of Science and Technology (GIST). His research interests include AI-based prediction and mechanistic modeling simulations to study natural product biosynthesis, with the goal of enabling more efficient production strategies under the supervision of Prof. Hyun Uk Kim.

image file: d5np00059a-p3.tif

Joon Young Kwon

Joon Young Kwon is an MS student in the Department of Chemical and Biomolecular Engineering at KAIST. He earned his BS in the same department at KAIST. His research centers on the use of deep learning to characterize enzyme–substrate interactions, enabling the rational engineering of biosynthetic processes under the supervision of Prof. Hyun Uk Kim.

image file: d5np00059a-p4.tif

Tilmann Weber

Tilmann Weber is Professor for Natural Products Genome Mining at the Novo Nordisk Foundation Center for Biosustainability of the Technical University of Denmark. Here he leads the interdisciplinary research group “Natural Products Genome Mining”. His main research interest is focused on deciphering the molecular pathways and engineering the biosynthesis of natural products by combining genetic, biochemical and bioinformatics methods. He is a pioneer in developing software for the automated genome mining and analysis and engineering of secondary metabolite biosynthetic pathways.

image file: d5np00059a-p5.tif

Hyun Uk Kim

Hyun Uk Kim is an Associate Professor at KAIST, affiliated with the Department of Chemical and Biomolecular Engineering and the Graduate School of Engineering Biology. His research focuses on bio-big data and AI-driven systems biology, with applications in synthetic biology, metabolic engineering, and precision medicine. For natural product studies, his research group develops AI models, genome-scale metabolic models, and other types of computational models to better understand and engineer biosynthetic pathways.


1 Introduction

Natural products exhibit remarkable structural and functional diversity, often arising from the complex interplay between genetic information (genotype) and observable chemical traits (phenotype). Typically, natural products are synthesized by biosynthetic gene clusters (BGCs), which are groups of co-localized genes encoding enzymes responsible for sequential biosynthetic reactions.1 Characterizing BGCs is essential for understanding the genetic basis of natural product biosynthesis and for facilitating their engineering and improved production. However, characterizing the genotype–phenotype relationship of BGCs is very challenging due to highly complex biochemistry.2 Although modern high-throughput technologies have accelerated the accumulation of genetic and biochemical data,3 much remains to be explored for a comprehensive and systematic understanding of natural product biosynthesis.

Meanwhile, computational techniques, including artificial intelligence (AI), have experienced unprecedented advancements across various scientific fields. AI methods have demonstrated substantial progress in tasks such as image recognition,4 natural language processing,5 and drug discovery6 by automatically learning patterns from large-scale data. Despite these successes, the application of computational strategies to the study of natural product biosynthesis has been relatively limited.

Recent computational approaches are beginning to bridge this gap by predicting enzyme-catalyzed reactions within biosynthetic pathways and providing detailed analyses of enzyme–substrate interactions. Such methods typically rely on natural product biosynthesis databases, which serve as the basis for predicting individual reaction steps and substrate specificities through data-driven AI approaches. Additionally, mechanistic models that explicitly simulate atomic-level catalytic interactions represent an emerging opportunity to better understand underlying mechanisms.7 When properly integrated with existing biological databases and experimental knowledge, these predictive models can streamline the discovery of novel pathways, enhance enzyme engineering efforts, and guide synthetic biology strategies more efficiently.

In this review, we discuss recent advances in computational methods for characterizing natural product databases, AI-driven pathway predictions, and analyses of enzyme–substrate interactions using AI and mechanistic modeling. By elucidating current challenges and highlighting promising future directions, we aim to illustrate how computational approaches in biology can provide a powerful framework to deepen our understanding of natural product biosynthesis, thereby benefiting fields such as drug discovery, biotechnology, and synthetic biology.

2 Databases on natural products biosynthesis

2.1 Databases on natural product compounds

Databases on natural products provide essential chemical data, mass-spectrometric (MS) and nuclear magnetic resonance (NMR) spectra, and taxonomic information, serving as starting points for molecular activity and pathway characterization, compound discovery, and biosynthetic engineering (Table 1). Their curated and well-annotated datasets also power machine learning models that predict biological activity from structure, trace natural products back to their BGCs, and optimize synthetic routes. Representative databases, which are regularly updated, include but are not limited to the Natural Product Atlas (NP-Atlas),8 the Natural Products Magnetic Resonance Database (NP-MRD),9 the COlleCtion of Open Natural prodUcTs (COCONUT),10 and Global Natural Products Social molecular networking (GNPS).11
Table 1 Databases on various types of natural product information
Subsection DB name Main focus Data contents Data size Latest update
Natural products COCONUT10 Aggregated open natural products collection • Chemical structures • ∼450[thin space (1/6-em)]000 compounds 2024 (v2.0)
• Source organisms
• References
GNPS11 Public MS2 data repository & community molecular networking • MS2 spectral libraries & networks • More than 1000 public datasets 2025
• ∼220[thin space (1/6-em)]000 reference spectra
NP-Atlas8 Collection of microbially derived natural products • Chemical structures • ∼36[thin space (1/6-em)]500 compounds 2024 (v3.0)
• Biological & chemical metadata
NP-MRD9 NMR spectra and prediction for natural products • Experimental, simulated & predicted NMR spectra • ∼281[thin space (1/6-em)]800 compounds 2024 (v2.0)
• More than 5.5 million NMR spectra
Biosynthetic gene clusters (BGCs) ClusterCAD19 PKS/NRPS module design platform • PKS/NRPS and hybrid BGC module sequences • 531 BGCs, including 183 manually curated BGCs 2022 (v2.0)
• Predicted chemistries
JGI-SMC18 Large scale prediction of microbial biosynthetic potential • Predicted BGCs from isolate genomes & metagenomes • 13.1 M BGC regions from 1.3 M source sequence 2023
MIBiG14 Expert-curated experimentally validated BGCs • BGC metadata • ∼4200 BGCs 2024 (v4.0)
• Product structures & activities
antiSMASH database17 Genome mining for secondary metabolite BGCs • Annotated BGCs from microbial genomes • ∼231[thin space (1/6-em)]500 BGC regions from 592 archaeal, 35[thin space (1/6-em)]726 bacterial, and 236 fungal genomes 2024 (v4.0)
Enzymes Rhea20 Expert-curated biochemical reaction knowledgebase • Reactions • ∼13[thin space (1/6-em)]700 curated reactions 2025
• Participants • ∼11[thin space (1/6-em)]900 participants
• Literature references • ∼15[thin space (1/6-em)]500 references
BRENDA21 Enzyme function, kinetics, and taxonomy kcat, Km, Ki values • ∼92[thin space (1/6-em)]000 kcat value 2025
• Substrates • ∼183[thin space (1/6-em)]000 Km values
• Inhibitors • ∼48[thin space (1/6-em)]500 Ki values
SABIO-RK22 Experimentally measured biochemical reaction kinetics and conditions kcat, Km, Vmax values • ∼75[thin space (1/6-em)]800 kinetic entries for more than 11[thin space (1/6-em)]000 reactions 2024
• Rate laws
• pH, temperature
• Experimental setups
Metabolic pathways KEGG23 Integrated pathway maps, genes, compounds and networks • Pathways • ∼500 pathway maps 2025
• KEGG Orthology (KO) • ∼48[thin space (1/6-em)]000 KO entries
• BRITE hierarchies • Thousands of genomes
• Chemicals
MetaCyc24 Curated reference of metabolic pathways & enzymes • Pathways • ∼3600 pathways 2025 (v29.0)
• Reactions • ∼19[thin space (1/6-em)]600 reactions
• Enzymes • ∼20[thin space (1/6-em)]000 compounds
• Compounds


NP-Atlas8 is a curated repository of microbially derived secondary metabolites. Its latest version 3.0 release (2024) includes ∼36[thin space (1/6-em)]500 entries manually extracted from the primary literature, capturing Molfile structures, exact molecular formulae, strain identifiers, isolation sources, and both biosynthetic and chemical class annotations. These features make NP-Atlas the standard reference for dereplication and microbial metabolome mapping. The well-curated structures in this database also serve as a high-quality benchmark for exploring natural product biosynthesis.

NP-MRD9 serves as a comprehensive NMR spectral resource for natural products, integrating experimental, simulated, and AI-predicted spectra. An intuitive drag-and-drop portal allows researchers to submit raw files, including essential metadata such as solvent, field strength, and temperature. Researchers can also perform interactive searches by specifying one or more chemical shifts to quickly find matching 1D and 2D spectra. By bringing together diverse NMR data in a single platform, NP-MRD not only accelerates structure determination and metabolomics workflows but also provides high-quality benchmark spectra that drive automated dereplication, analog discovery, and cross-dataset comparison pipelines.

COCONUT10 is a comprehensive collection of open-access natural product structures. The latest version 2.0 integrates data from 64 public sources, providing each molecule with a standardized structure file, taxonomic origin, sampling location, and literature citation. To facilitate cross-database navigation, each entry is cross-linked to external resources such as PubChem12 and ChEBI.13 COCONUT's extensive and diverse chemical space forms an initial candidate pool for large-scale scaffold-ranking pipelines.

Finally, GNPS11 is a living LC-MS/MS (MS2) repository and molecular-networking platform for small molecules. Researchers upload their MS2 datasets to the Mass Spectrometry Interactive Virtual Environment (MassIVE), then run molecular-networking workflows that compute spectral similarity across reference spectra from public datasets. These features make GNPS an essential tool for efficient dereplication, analog discovery, and comparative metabolomics.

2.2 Databases on biosynthetic gene clusters

Connecting natural products to their corresponding BGCs introduces additional layers of biological complexity and requires integrated, standardized resources that bridge genetic information, chemical structures, and experimental data. To address these challenges, several important community-based platforms have been developed, which also serve as key resources for developing relevant computational models.

Minimum Information about a Biosynthetic Gene cluster (MIBiG)14 compiles over 3600 experimentally validated BGCs into a community-reviewed, standardized reference database. Each entry records genomic coordinates, domain architectures, and core annotation fields, and links to the product's chemical structure, known bioactivity, and primary literature. All submissions undergo community review to ensure consistency in naming and domain-boundary definitions. This makes MIBiG the definitive source for verifying whether a predicted locus corresponds to a biochemically characterized pathway. Genome mining tools such as Antibiotics & Secondary Metabolite Analysis Shell (antiSMASH)15 and Prediction informatics for secondary metabolomes (PRISM)16 routinely benchmark against MIBiG, while comparative studies of biosynthetic capacity across strains rely on its peer-validated annotations. A major complementary resource is the antiSMASH database,17 which offers a comprehensive set of BGCs annotated using antiSMASH, though without experimental validation, for publicly available microbial genomes.

The Joint Genome Institute's Secondary Metabolism Collaboratory (JGI-SMC)18 is a cloud-based platform for high-throughput discovery of novel BGCs across genomes and metagenomes. It processes hundreds of isolated genomes and metagenomes in parallel, annotating taxonomic origin, module and domain architecture, and similarity scores to MIBiG entries. These richly annotated candidate sets feed into workflows that connect genomic data with metabolite profiles.

In the context of pathway engineering, ClusterCAD 2.0[thin space (1/6-em)]19 provides an integrated suite of tools for the rational design of novel polyketide and nonribosomal peptide biosynthetic routes. The platform curates over 500 modular polyketide synthase (PKS) and nonribosomal peptide synthetase (NRPS) components, each annotated with domain boundaries, substrate specificity motifs, and interdomain linkers, enabling in silico assembly of chimeric enzymes. An interactive domain architecture search tool facilitates the identification of donor and acceptor modules with shared sequence motifs or intermediate-structure fingerprints (binary representations of structural features). The platform also supports modeling of enzyme assemblies, prediction of product scaffolds, and simulation of pathway flux. These modular design and flux prediction capabilities provide a foundation for machine learning-guided BGC engineering strategies.

2.3 Databases on enzymes and metabolic pathways

The characterization and engineering of natural product biosynthesis rely on comprehensively curated data resources that integrate information on enzymes, biochemical reactions, and metabolic networks. In recent years, several complementary databases have become foundational resources in enzymology, systems biology, and synthetic biology beyond natural products and their BGCs. Each database focuses on a distinct layer of curated information, ranging from enzyme kinetics to reaction stoichiometry and pathway maps. When used together, they enable a comprehensive understanding of metabolic logic, from individual enzymes to metabolic network architecture, and provide a structured knowledge base that supports data-driven prediction of biosynthetic pathways and enzyme properties.

BRaunschweig ENzyme DAtabase (BRENDA)20 is the most comprehensive repository of experimentally validated enzyme data. Each entry provides details on substrate specificity, cofactors, kinetic parameters such as Km and kcat, optimal assay conditions, organism and tissue sources, and links to primary literature. The platform also offers interactive pathway maps, 3D structure viewers, and omics integration tools. These features make BRENDA a key resource for identifying, comparing, and prioritizing enzymes in natural product biosynthesis and synthetic biology workflows. BRENDA's comprehensive enzyme property data are also widely used to benchmark models that predict kinetic parameters directly from protein sequences.

System for the Analysis of Biochemical Pathways - Reaction Kinetics (SABIO-RK)21 provides detailed kinetic data for enzymatic reactions under well-defined experimental conditions. Each entry describes a specific reaction for a specific organism, capturing experimental context such as pH, temperature, enzyme variant, substrate concentration, kinetic law, and parameter values. SABIO-RK is crucial for dynamic modeling, enzyme mechanism studies, and quantitative synthetic biology, providing a level of detail that complements the broader scope of BRENDA. Its detailed kinetic entries provide the foundation for calibrating dynamic models.

Rhea22 is a curated database that provides information on chemically balanced biochemical reactions. Each reaction is represented as a unique, stoichiometrically balanced equation using standardized ChEBI identifiers for all reactants and products. Each Rhea entry is linked to primary literature and to enzyme annotations in other databases such as UniProt.23 Its standardized stoichiometries enable consistent mapping, validation, and comparison of reactions across metabolic models, link enzyme- and pathway-focused databases, and serve as reaction templates for rule-based retrosynthesis engines.

At the pathway level, the Metabolic Encyclopedia of pathways and enzymes (MetaCyc)24 and the Kyoto Encyclopedia of Genes and Genomes (KEGG)25 provide curated maps of metabolic network organization. MetaCyc focuses on experimentally verified pathways, reactions, and enzymes, with each entry curated from the primary literature. KEGG integrates pathway diagrams with genomic, chemical, and disease information, using its KEGG Orthology system to enable organism-specific pathway reconstruction and functional annotation. Widely employed for pathway annotation, metabolic reconstruction, and cross-taxonomic comparisons, the MetaCyc and KEGG pathway maps also serve as reference frameworks that provide structured biochemical reaction data essential for training and evaluating AI models for biosynthetic pathway prediction.

3 AI approaches for predicting biosynthetic pathways of natural products

A central challenge in characterizing natural products lies in elucidating their biosynthetic pathways. Predicting these pathways not only advances our understanding of native metabolism but also enables pathway engineering for the efficient production of novel compounds. Traditional approaches have relied on analytical and omics methods, but computational strategies are now being more actively applied. A notable example is retrobiosynthesis, which predicts plausible biosynthetic routes by learning from reaction data using target molecular structures as input (Fig. 1). Here, we discuss recent advances in both template-based and template-free methods for retrobiosynthesis. The former applies predefined reaction rules encoding common structural changes around enzyme reactive sites, while the latter learns chemical transformations directly from large reaction datasets without predefined rules. Multi-step prediction builds on these single-step predictions to identify complete biosynthetic routes from simple precursors to target natural products. Recent work reframes retrobiosynthesis as an end-to-end, AI-enabled pipeline that not only designs candidate routes but also quantifies their kinetic plausibility26 and rapidly validates and improves key steps through autonomous experimentation,27 moving from route enumeration to deployable, host–compatible pathways.28
image file: d5np00059a-f1.tif
Fig. 1 AI-based strategies for predicting biosynthetic pathways of natural products. To construct a complete pathway toward a target compound, multiple iterations of single-step reaction predictions are performed within a retrosynthetic tree search framework. For each reaction step, two major prediction strategies are applied: (1) template-based methods, which select transformations from a predefined set of curated reaction rules; and (2) template-free methods, where the product SMILES is embedded into a numerical vector and a transformer model generates plausible reactants. The overall pathway is ranked by summing the cost functions of each reaction step, reflecting synthetic accessibility.33 Created with https://wwwBioRender.com.

3.1 Template-based methods

Recent retrobiosynthesis studies have improved traditional template-based methods by extending their scope to diverse enzymes, automating rule generation, and broadening the range of explored reactions.29 For efficient generation of reaction templates (or reaction rules), EHreact introduced a data-driven strategy to extract enzymatic templates by identifying conserved substructures across known enzyme-catalyzed reactions.30 EHreact builds templates at different levels of detail, generating broader templates for enzymes that act on chemically diverse substrates and more specific templates for enzymes with narrow substrate ranges.

Building on these advances, other studies have focused on enhancing template-based predictions through reaction scoring systems. Rather than relying solely on predefined templates, RetroBioCat31 incorporated a fingerprint-based similarity scoring system to match predicted steps with literature-reported reactions, thereby aiding enzyme selection. It also employed a neural network-based molecular complexity score to prioritize pathways that proceed through synthetically simpler intermediates. This combination allowed RetroBioCat to recover 52 known biocatalytic pathways, including the conversion of limonene to chiral carvolactones and the synthesis of cinnamyl alcohol and phenylpropionic acids, highlighting its practical utility in pathway prioritization.

Another study by Sankaranarayanan et al.32 developed an enzymatic retrosynthesis model using RDEnzyme, an automated tool that extracts stereochemically consistent enzymatic reaction templates directly from reaction databases such as UniProt23 and Rhea.22 To extend beyond predefined templates, RDEnzyme integrates a neural network trained on known enzyme pairs to evaluate catalytic feasibility. The neural network model distinguished between homologous and evolutionarily distant enzyme reactions, effectively filtering and prioritizing experimentally viable synthetic steps by favoring homologous ones. This AI-enhanced method was demonstrated by successfully rediscovering known enzymatic pathways for natural product compounds, such as (R)-4-hydroxyisophorone, a terpenoid-derived intermediate, and hydroxystyrene derivatives that serve as precursors for bioactive natural products.

3.2 Template-free methods

In contrast to template-based systems, recent advances in template-free approaches have leveraged deep learning models to directly learn reactions from large-scale data without predefined reaction rules. These models commonly adopt transformer-based neural architectures and are capable of generating novel or previously unseen biosynthetic reactions (Fig. 1). A key strength of these approaches is their generalizability to unseen substrates and flexibility in handling diverse chemical spaces, which significantly broadens the scope of biosynthetic pathway prediction.

Among template-free models, BioNavi-NP33 was developed to predict biosynthetic pathways by taking a target molecule as input and generating possible reactants through single-step retrosynthesis, which can be recursively applied to construct full pathways. It employs a transformer-based model trained on the BioChem dataset, which combines biosynthetic reaction data from MetaCyc,24 KEGG,25 and MetaNetX,34 with natural product-like synthetic reactions from the US Patent and Trademark Office (USPTO) dataset.35 BioNavi-NP integrates the Retro* algorithm36 to explore multistep biosynthetic routes by systematically building a reaction network in a tree-like structure. Retro* is a neural-guided search algorithm that assembles single-step reactions into complete pathways by prioritizing high-scoring reactions and selecting routes with the lowest overall cost. At each step, BioNavi-NP evaluates multiple enzymatic reactions that could generate a given molecule and, for each selected reaction, determines whether all required precursors can also be synthesized. By organizing these possibilities into a structured search process, the system efficiently identifies complete pathways from simple precursors to complex natural products.

In addition, READRetro37 adopted an ensemble architecture that combines reaction database searches with two complementary predictive models, RetroFormer38 and Graph2SMILES,39 to improve the accuracy of single-step retrosynthetic predictions. RetroFormer is a transformer-based model that learns reaction rules from SMILES strings (linear text representations of molecules), while Graph2SMILES uses graph neural networks to capture structural features and generate likely reactants. This combination of sequence- and structure-based representations allows the model to capture diverse aspects of chemical reactivity. READRetro was trained on the same datasets as BioNavi-NP33 and also employed the Retro* algorithm36 for multistep pathway construction. By integrating generalized model predictions with directly retrieved information from reference databases, READRetro proved effective in predicting biosynthetic pathways of complex plant secondary metabolites, including catharanthine, tabersonine, and cannabichromenic acid.

4 AI models for enzyme–substrate interactions in natural product biosynthesis

Building on pathway prediction from target structures, the next step is to assess whether the proposed steps are kinetically and mechanistically feasible. Several AI models evaluate substrate specificity of enzymes and estimate kinetic parameters such as kcat and Km. Using curated enzyme kinetics data, standardized reaction definitions, and pathway maps, these models can be integrated with pathway search to prioritize functionally viable routes and recommend enzyme candidates and operating conditions.

4.1 Prediction of enzyme–substrate interactions

Recent advances in AI have also established a general pipeline for predicting enzyme–substrate interactions in natural product biosynthesis (Fig. 2). The amino acid sequences of enzymes are often encoded using large pretrained protein language models such as Evolutionary Scale Modeling40 or ProtT5,41 while substrates are represented by SMILES strings, molecular fingerprints, or graph-based embeddings (network-like encodings of molecular connectivity). These features are concatenated and provided as input to AI models including tree-based ensembles, multilayer perceptrons, and transformer architectures to predict whether a given substrate is suitable for a specific enzyme, or to rank or classify potential substrates.
image file: d5np00059a-f2.tif
Fig. 2 AI-based machine learning pipelines for predicting enzyme–substrate interactions and kinetic parameters in natural product biosynthesis. Enzyme sequences are encoded as numerical embeddings using pretrained protein language models or graph neural networks applied to residue-level structural graphs. Substrates are represented by graph neural network embeddings or SMILES-based transformer models. Optional features such as temperature, host organism, and pH can also be incorporated to reflect biological context. These features are integrated using strategies such as multilayer perceptrons, cross-attention mechanisms, or tree-based models (e.g., XGBoost). The resulting joint representations enable classification tasks (e.g., enzyme–substrate compatibility, substrate class assignment) and regression tasks (e.g., kinetic parameters such as kcat, Km, or binding affinity). Model predictions further enable downstream applications such as motif discovery via interpretation of attention maps, prioritization of enzyme candidates for experimental testing, and reduction of experimental workload through activity-based filtering. This integrated pipeline enables accurate, scalable, and interpretable predictions of enzyme activity and substrate specificity, thereby accelerating discovery and engineering in natural product biosynthesis. Created with https://www.BioRender.com.

For predicting enzyme–substrate interactions, the Enzyme Substrate Prediction (ESP) model by Kroll et al.42 pioneered the use of protein embeddings from a transformer-based language model combined with graph neural network fingerprints for substrate encoding. These representations were then fed into a gradient-boosted tree classifier to predict whether a given enzyme–substrate pair is functionally feasible. ESP achieved high accuracy and demonstrated strong generalizability across diverse enzyme families and chemically varied small-molecule substrates, including pairs absent from the training set.

As a further development, Kroll et al.43 introduced ProSmith (PROtein-Small Molecule InTeraction, Holistic model), a multimodal transformer architecture that jointly processes both enzyme sequence and substrate SMILES. By learning linkage of specific amino acids in the protein to atoms in the substrate, ProSmith directly models enzyme–ligand interactions without relying on late-stage feature concatenation. This joint representation not only improved substrate-specificity classification over ESP but also extended the framework to quantitative tasks, enabling regression of kinetic parameters.

Goldman et al.44 analyzed three large enzyme families, each screened against hundreds of shared small-molecule substrates. They compared pairwise models, which concatenate enzyme and substrate embeddings, with single-entity baselines that score each component separately. The pairwise enzyme–substrate modelling approach performs better in overall, although single-entity models remained more robust in sparsely annotated families. To reduce overfitting in such cases, they introduced an active-site pooling technique (averaging features from the amino acids that form the active site), which yielded measurable accuracy gains on limited datasets.

For peptide-based and pathway-specific enzyme–substrate interaction tasks, Clark et al.45 applied masked language modeling (a machine learning approach in which parts of the peptide sequence are intentionally hidden and the model learns to predict them, thereby learning sequence rules) and transfer learning to encode peptide substrates for ribosomally synthesized and post-translationally modified peptides (RiPP) biosynthetic enzymes. This strategy enabled more accurate prediction of enzyme substrate scope and facilitated the identification of key sequence motifs.

Structural information has also been leveraged to improve enzyme–substrate interaction prediction. Englund et al.46 focused on acyl-transferase (AT) domains of polyketide synthases. By computing each AT pocket's three-dimensional volume, they predicted non-native extender-unit substrates that fit the cavity and validated the predictions by domain-swap engineering of new polyketide analogs.

In the area of membrane transporter substrate prediction, the SPOT model introduced by Kroll et al.47 adopted a transformer-based embedding strategy for both transport proteins and small-molecule substrates, and scored each pair with XGBoost to predict which metabolites a given transporter can import or export. This multi-class prediction model delivered high precision and recall across more than forty transporter families.

Although most current enzyme–substrate interaction models rely on sequence-based representations, there is a growing trend in regression studies to incorporate 3D structure-based embeddings and auxiliary features (additional biological or chemical information such as protein domains, cofactor presence, or pathway context). These developments suggest promising directions for expanding the feature space in enzyme–substrate interaction modeling.

4.2 Prediction of enzyme kinetic parameters

AI-based regression models for enzyme kinetic parameter prediction have received increasing attention, particularly for Km and kcat. These models typically use enzyme amino acid sequences, substrate SMILES, and occasionally environmental factors (e.g., pH and temperature) as inputs. They are usually trained on curated datasets such as BRENDA20 and SABIO-RK,21 which compile kinetic measurements and associated experimental conditions for enzyme-catalyzed reactions. Input data are embedded using pretrained AI models (e.g., ProtT5 (ref. 41) for proteins and SMILES Transformer48 for small molecules), concatenated, and then processed by regression networks to estimate kinetic parameters (Fig. 2). This standardized workflow enables accurate, scalable, and generalizable prediction of enzymatic activities across both wild-type and mutant enzymes.

A deep learning model named DLKcat49 was developed to predict turnover numbers (kcat) using a graph neural network for substrates and a convolutional neural network for proteins. It also incorporated an attention mechanism to identify key residues influencing enzymatic activity. However, Kroll et al.50 pointed out that DLKcat struggles to generalize to enzymes outside its training distribution and fails to accurately predict kcat changes caused by mutations.

To address these limitations, subsequent studies have focused on improving generalization and mutation sensitivity. UniKP51 adopted a unified framework that predicts kcat, Km, and kcat/Km using concatenated representations of enzyme sequences and substrate SMILES, generated via pretrained models such as ProtT5 (ref. 41) and SMILES Transformer.48 It introduced a two-layer architecture to incorporate environmental factors such as pH and temperature into predictions, offering broader biological relevance. Reweighting methods were also applied to correct for imbalanced data distributions. In practical applications, UniKP successfully identified highly active homologs and mutants of tyrosine ammonia lyase from Rhodotorula glutinis. A homolog TrTAL from Tephrocybe rancida and a mutant RgTAL-489T showed 2.6-fold and 3.5-fold increases in catalytic efficiency (kcat/Km), respectively.

CataPro52 refined kcat/Km predictions by combining outputs from kcat and Km submodels with an additional correction module. Distinct features include the use of molecular access system fingerprints alongside pretrained substrate embeddings, as well as a neural correction module for final outputs. As a demonstration, CataPro guided the engineering of Sphingobium sp. carotenoid cleavage oxygenases, a vanillin biosynthesis enzyme, where an optimized variant (T216M-M351F-V384G) exhibited a 65.2-fold increase in activity over the original enzyme.

CatPred53 introduced a probabilistic regression framework that outputs not only predicted values for kcat, Km, and Ki, but also quantifies prediction uncertainty arising from data noise and model limitations. By using an ensemble of models, CatPred supports more reliable prediction on unseen data and enables filtering based on prediction confidence. The model demonstrated strong performance for enzymes with less than 40% sequence identity to the training set, maintaining robustness even with limited data.

Finally, DeepEnzyme54 incorporated detailed 3D structural features of enzymes using a graph convolutional network trained on enzyme structures predicted by ColabFold.55 This approach provided spatially aware interpretation of residue importance. DeepEnzyme maintained high predictive performance even for enzymes with low sequence identity to the training set and accurately ranked both single and multiple mutations. As a demonstration, it identified activity-enhancing mutations in PafA, a phosphate-activating enzyme, by correctly separating between high- and low-activity mutants.

5 Mechanistic models for enzyme–substrate interactions in natural product biosynthesis

While AI-based approaches have shown great promise in predicting enzyme–substrate interactions, these models often lack interpretability in terms of biophysical mechanisms. To complement such limitations, computational chemistry-based mechanistic modeling such as molecular docking simulation, molecular dynamics simulation, and quantum mechanics/molecular mechanics approach can be utilized to interpret the mechanism of enzyme–substrate interactions.

5.1 Molecular docking simulation

Molecular docking simulations are used to predict how ligands interact with and bind to proteins (Fig. 3).56,57 This approach can be used to understand how substrates interact with enzymes and how enzymes catalyze reactions. Since experimental enzyme–substrate complex structures are not available for every enzyme, docking can be used to predict plausible binding poses of substrates within enzyme active sites. Molecular docking simulations are widely performed using software such as AutoDock,58 AutoDock Vina,59,60 Glide,61 and Genetic Optimisation for Ligand Docking (GOLD).62 In this context, docking simulations have been particularly useful for modeling enzyme–substrate interactions in natural product biosynthesis when experimental structures are unavailable.
image file: d5np00059a-f3.tif
Fig. 3 Schematic overview of mechanistic approaches for modeling enzyme–substrate interactions, ranging from less computationally intensive but more approximate methods to highly accurate but computationally expensive models. Molecular docking simulations predict optimal binding poses and affinities between proteins and ligands. Molecular dynamics (MD) simulations capture the dynamic behavior of molecular systems over time, while metadynamics explores the free energy surface by adding bias potentials to overcome energy barriers. Quantum mechanics/molecular mechanics (QM/MM) combines molecular mechanics and quantum mechanics to balance computational cost with accurate modeling of chemical reactions. The blue dotted line in the QM region (light green box) indicates hydrogen bonds between the enzyme and substrate. The gray dotted arrows labeled with computational method names indicate where AI augments each modeling approach. Specifically, AlphaFold 3[thin space (1/6-em)]107 enables more accurate protein structure prediction; the electrostatic machine learning embedding (EMLE),109 Graph Attentional Protein Parametrization (Grappa),112 and Espaloma-0.3[thin space (1/6-em)]113 improve force-field parametrization and electrostatic embedding with higher accuracy and lower computational cost in MD; and machine learning models trained on QM/MM-derived features114 predict reaction barriers and selectivity, thereby accelerating QM/MM calculations. Created with https://www.BioRender.com.

In the case of DesD, a siderophore synthetase that catalyzes the iterative ATP-dependent condensation of N1-hydroxy-N1-succinyl-cadaverine (HSC) using HSC-adenosine monophosphate (HSC-AMP) as its natural substrate, docking simulations were used to investigate the enzyme's catalytic mechanism.63 Although several crystal structures of DesD orthologs have been determined, none of them captured the enzyme in a catalytically poised state. To address this issue, molecular docking simulations were used to generate plausible substrate-binding poses. The structure of DesD in complex with the HSC-AMP analog (HSC-((2R,3S,4R,5R)-5-(6-amino-9H-purin-9-yl)-3,4-dihydroxytetrahydrofuran-2-yl)methyl sulfamate [AMS]), a partial inhibitor of DesD, was experimentally determined prior to docking with the native substrate. Docking of HSC-AMP to DesD was subsequently validated by consistency with the experimentally determined DesD–HSC–AMS complex. Using this strategy, a trimeric analog was also docked into the active site to model the enzyme–substrate complex poised for macrocyclization, suggesting a mechanistically reasonable arrangement for intramolecular nucleophilic attack.

5.2 Molecular dynamics simulation

While docking provides a mechanistically plausible model of enzyme–substrate complexes, the rigid treatment of protein structures makes it difficult to capture accurate substrate positioning. To overcome this limitation, several studies have utilized molecular dynamics (MD) simulations, which allow proteins and ligands to move over time and thus reveal the structural determinants of enzyme–substrate interactions (Fig. 3).57 Widely used MD engines include GROMACS,64 the Assisted Model Building with Energy Refinement (AMBER) suite,65,66 and the Chemistry at Harvard Macromolecular Mechanics (CHARMM) program.67 These are typically combined with biomolecular force fields such as AMBER force fields (e.g., ff14SB, ff19SB),68–70 CHARMM (e.g., CHARMM36, CHARMM36m),71–73 GROningen MOlecular Simulation (GROMOS),74,75 and Optimized Potentials for Liquid Simulations (OPLS)76,77 for proteins. A force field is a set of mathematical functions and parameters that describe the potential energy of a molecular system. Small-molecule substrates are generally parameterized using ligand-specific schemes such as the General AMBER Force Field (GAFF)78,79 or the CHARMM General Force Field (CGenFF).80

Kalkreuter et al.81 conducted MD-guided motif swaps to reprogram AT domain specificity in PKS. MD simulations with the Yet Another Scientific Artificial Reality Application (YASARA) package were employed to pre-screen structural viability by evaluating conformational stability and active-site geometry of AT mutants (Fig. 3). In particular, steric compatibility introduced by motif-swapping was shown to be important for specificity reprogramming.

While Kalkreuter et al. used MD simulation as an evaluation tool for substrate reprogramming, Huang et al.82 utilized it to investigate substrate specificity of AT domains in PKS. Their simulations revealed dynamic hydrogen-bonding patterns between malonyl-CoA derivatives and AT domains, highlighting persistent hydrogen bonds that stabilized suitable binding poses for catalysis. Binding free energy calculations using the molecular mechanics Poisson–Boltzmann surface area (MM-PBSA),83 which estimates binding affinities by combining molecular mechanics energies with polar (electrostatic solvation) and nonpolar solvation contributions, further demonstrated the importance of electrostatic interactions in substrate recognition. Additionally, mutating residues near the α-substituent groups altered steric hindrance, indicating the critical role of steric effects in substrate specificity.

5.2.1. Metadynamics. Although conventional MD simulations provide insights into the dynamic nature of enzyme–substrate interactions, they are often trapped in local minima (i.e., incomplete solutions), and the sampling may be insufficient to explore the full conformational space.84 For example, Watson et al.85 conducted conventional MD to evaluate ribosomal reactivity toward non-L-α-amino acid monomers. The first ribosomal reaction involves nucleophilic addition of the A-site amine to the P-site carbonyl carbon. Two geometric parameters are critical: the distance between the α-amino group in the A-site and the sp2-hybridized carbonyl carbon (Nα–Csp2 distance), and the angle of nucleophilic approach, known as the Bürgi–Dunitz angle (αBD).86 However, no correlation was found between reactivity and the Nα–Csp2 distance, a parameter usually important for nucleophilic attack.

The limitation of conventional MD in this context is its inability to adequately sample the conformational space defined by the Nα–Csp2 distance and αBD. To address this, Watson et al. employed metadynamics, an enhanced sampling method that accelerates exploration of the free energy landscape by adding Gaussian bias potentials along predefined reaction coordinates (collective variables, CVs) (Fig. 3).84 Using Nα–Csp2 distance and αBD as CVs, metadynamics revealed that reactive amino acids are significantly enriched in conformational states with Nα–Csp2 distance less than 4 Å and αBD values between 76° and 115°, regions of conformational space that conventional MD failed to sample adequately (Fig. 3).

5.3 Quantum mechanics/Molecular mechanics

While previous approaches have been effective in explaining structural and mechanistic determinants of enzyme–substrate interactions, they do not directly address the catalytic reaction itself. To evaluate enzymatic catalysis at the electronic level, quantum mechanics/molecular mechanics (QM/MM) simulations can be utilized to model bond formation and cleavage events.87 Such simulations are commonly implemented in quantum chemistry packages such as Gaussian,88 ORCA,89,90 and Q-Chem,91 while widely used MD engines such as AMBER65,66 and CHARMM67 also provide QM/MM interfaces.

In QM/MM calculations, the QM region can be treated at different levels of theory (the degree of accuracy and computational cost of the quantum calculation). Semiempirical methods (e.g., Austin Model 1 (AM1), Parametric Method 6 (PM6)),92,93 which introduce empirical parameters derived from experimental data into the quantum equations, are computationally efficient but less accurate. In contrast, ab initio approaches such as Hartree-Fock94–96 or second-order Møller–Plesset perturbation theory (MP2)97 compute electronic structures directly from first principles, offering higher accuracy but at much greater computational cost.98 In practice, density functional theory (DFT),99,100 which replaces explicit wavefunctions with electron density as the central variable, offers the best balance of accuracy and efficiency. Functionals such as “Becke, three-parameter, Lee-Yang-Parr” (B3LYP)101,102 and M06-2X103 combined with medium-sized basis sets (predefined sets of mathematical functions used to approximate molecular orbitals) like 6-31G(d)104 are widely used in enzymatic studies. The MM region is typically described with biomolecular force fields such as AMBER68–70,78 or CHARMM.71–73,80

Ji et al.105 combined MD simulations with QM/MM to elucidate the mechanism of substrate recognition and the first-half transacylation reaction in the AT domain of salinomycin PKS. To simulate the acyl–transfer reaction, QM/MM calculations were performed using the method known as “our own n-layered integrated molecular orbital and molecular mechanics” (ONIOM). ONIOM is a hybrid computational method that divides the system into multiple layers (different regions of the molecule), each treated at a different level of theory. The high-level layer (the active-site residues and substrate) is treated with accurate quantum mechanics to capture bond-making and bond-breaking events, while the lower-level layers (the rest of the protein and solvent environment) are treated with simpler molecular mechanics to reduce computational cost (Fig. 3).106 QM/MM calculations revealed that the first step of transacylation, where the active-site serine is deprotonated by histidine and subsequently attacks the thioester carbonyl group of acyl-CoA, corresponds to the highest energy point, indicating that it is the rate-limiting step of the reaction (Fig. 3). Moreover, the energy barrier of the first reaction step was not the only factor influencing substrate specificity in the AT domain, as catalytic activities did not always correlate with calculated energy barriers.

5.4 AI-assisted mechanistic modeling

Even though mechanistic models provide robust biophysical explanations of enzyme–substrate interactions, they also have limitations. One of the key challenges in mechanistic modeling is its strong dependence on high-quality protein structural data, which is often unavailable or incomplete for many enzymes. The emergence of protein structure–predicting AI, even in the absence of experimental data, such as AlphaFold, has had a major impact on overcoming this limitation.107 For example, Su et al.108 applied this combined strategy to monoterpene synthases, where AlphaFold-predicted structures were essential to resolve two missing loop regions (residues 82–99 and 147–170) absent in the crystal structure. These models guided the design of enzyme variants, which were subsequently validated and refined through MD and QM/MM simulations, enabling catalytic modifications that boosted terpenoid production.

The accuracy of MD simulations also strongly depends on how well the force fields reflect real physical systems. However, classical force fields typically handle only classical mechanics, and quantum mechanical effects are not explicitly included. AI-based learned force fields are narrowing the gap between empirical potentials and quantum accuracy while reducing computational demands.109–111 AI-based force fields such as Graph Attentional Protein Parametrization (Grappa)112 and Espaloma-0.3,113 and an embedding method such as the Electrostatic Machine Learning Embedding (EMLE),109 which allows the learned force field to incorporate electrostatic influences of surrounding environments, are being developed for more accurate simulations.

Finally, another major limitation of mechanistic modeling is its requirement for extensive computational resources and time. Even for single mutations or homologous proteins, mechanistic modeling requires recomputation of the entire system for each case. AI models trained on pre-calculated results from mechanistic simulations can address these limitations by learning the relationships between protein sequences and mechanistic effects. Studies on Candida parapsilosis carbonyl reductase have shown that machine-learning models trained on QM/MM-derived pre-reaction-state features can predict activation barriers and stereoselectivity with high accuracy, effectively substituting for costly QM/MM calculations in this context.114 These advances reduce the long-standing trade-off between computational expense and predictive accuracy.

6 Conclusion

Natural products, largely BGCs, display immense structural diversity but their genotype–phenotype relationships remain difficult to resolve. Curated databases of compounds, BGCs, enzymes, and pathways provide the foundation for computational methods, while recent AI approaches, ranging from template-based and template-free pathway prediction to enzyme–substrate interaction and kinetic parameter modeling, are rapidly advancing the field. Complementary mechanistic simulations, including docking, molecular dynamics, and QM/MM, offer structural and electronic insights into catalysis. Together, these strategies integrate experimental and computational knowledge to accelerate discovery, prediction, and engineering of natural product biosynthesis for applications in biotechnology and drug development.

Despite recent progress, both AI and mechanistic models in natural product biosynthesis still face significant challenges. While AI models can suggest plausible biosynthetic pathways, they have yet to see widespread integration into real-world synthetic biology pipelines. A major bottleneck is the lack of metabolic reaction data specific to secondary metabolites, which hampers generalization and limits the ability to predict novel or less-studied pathways. To enhance their practical utility, future work must address key issues such as enzyme availability and expression compatibility, reaction condition prediction, and integration with chassis-specific metabolic constraints.

AI and mechanistic models, however, are increasingly complementary. AI delivers impressive accuracy, but its predictions are often difficult to interpret mechanistically. Physics-based simulations such as docking, MD, and QM/MM therefore remain indispensable for mechanistic explanation and validation. Although AlphaFold 3[thin space (1/6-em)]107 reduces reliance on large-scale docking searches for structure generation, tasks such as free-energy profiling, transition-state stabilization, and kinetic feasibility still require classical simulations. In this integrated workflow, AI predicts enzyme properties, while mechanistic simulations offer the energetic and dynamic rationale necessary to elevate static predictions into validated models of catalysis.

In parallel, AI is reshaping mechanistic simulations themselves. AI-based learned force fields are narrowing the gap between empirical potentials and quantum accuracy and decreasing the computational resources.109–111 AI-accelerated MD engines apply reinforcement learning and differentiable algorithms to overcome sampling bottlenecks.115 AI-integrated QM/MM frameworks are beginning to deliver quantum-level accuracy across thousands of catalytic sites at tractable cost.116 The emerging workflow thus forms an iterative cycle of AI-driven hypothesis generation, mechanistic validation, and AI-guided refinement, enabling faster, more interpretable, and more reliable discovery and engineering of biosynthetic pathways for the next generation of drug discovery, biotechnology, and synthetic biology.

7 Author contributions

Byung Tae Lee: Visualization, writing – original draft, writing – review & editing. Byeongsub Lee: Visualization, writing – original draft, writing – review & editing. Joon Young Kwon: Visualization, writing – original draft, writing – review & editing. Tilmann Weber: Conceptualization, supervision, writing – review & editing. Hyun Uk Kim: Conceptualization, funding acquisition, project administration, supervision, writing – review & editing

8 Conflicts of interest

There are no conflicts to declare.

9 Data availability

No new data were generated or analyzed in this highlight article.

10 Acknowledgements

This work was supported by the National Research Foundation (NRF) funded by the Korean government (MSIT) (RS-2024-00352229 and RS-2024-00509338). This work was also funded by the Novo Nordisk Foundation through the Center for Biosustainability at the Technical University of Denmark (NNF20CC0035580).

11 References

  1. H. E. Augustijn, A. M. Roseboom, M. H. Medema and G. P. van Wezel, J. Ind. Microbiol. Biotechnol., 2024, 51, kuae011 CrossRef CAS PubMed.
  2. K. Scherlach and C. Hertweck, Nat. Commun., 2021, 12, 3864 CrossRef CAS PubMed.
  3. M. W. Mullowney, K. R. Duncan, S. S. Elsayed, N. Garg, J. J. J. van der Hooft, N. I. Martin, D. Meijer, B. R. Terlouw, F. Biermann, K. Blin, J. Durairaj, M. Gorostiola González, E. J. N. Helfrich, F. Huber, S. Leopold-Messer, K. Rajan, T. de Rond, J. A. van Santen, M. Sorokina, M. J. Balunas, M. A. Beniddir, D. A. van Bergeijk, L. M. Carroll, C. M. Clark, D.-A. Clevert, C. A. Dejong, C. Du, S. Ferrinho, F. Grisoni, A. Hofstetter, W. Jespers, O. V. Kalinina, S. A. Kautsar, H. Kim, T. F. Leao, J. Masschelein, E. R. Rees, R. Reher, D. Reker, P. Schwaller, M. Segler, M. A. Skinnider, A. S. Walker, E. L. Willighagen, B. Zdrazil, N. Ziemert, R. J. M. Goss, P. Guyomard, A. Volkamer, W. H. Gerwick, H. U. Kim, R. Müller, G. P. van Wezel, G. J. P. van Westen, A. K. H. Hirsch, R. G. Linington, S. L. Robinson and M. H. Medema, Nat Rev Drug Discov, 2023, 22, 895–916 CrossRef CAS.
  4. M. Awais, M. Naseer, S. Khan, R. M. Anwer, H. Cholakkal, M. Shah, M.-H. Yang and F. S. Khan, IEEE Trans. Pattern Anal. Mach. Intell., 2025, 47, 2245–2264 Search PubMed.
  5. D. Khurana, A. Koli, K. Khatter and S. Singh, Multimed Tools Appl, 2023, 82, 3713–3744 CrossRef PubMed.
  6. K. Zhang, X. Yang, Y. Wang, Y. Yu, N. Huang, G. Li, X. Li, J. C. Wu and S. Yang, Nat. Med., 2025, 31, 45–59 CrossRef CAS PubMed.
  7. P. Peluso and B. Chankvetadze, J. Pharm. Biomed. Anal., 2024, 238, 115836 CrossRef CAS.
  8. E. F. Poynton, J. A. van Santen, M. Pin, M. M. Contreras, E. McMann, J. Parra, B. Showalter, L. Zaroubi, K. R. Duncan and R. G. Linington, Nucleic Acids Res., 2025, 53, D691–D699 CrossRef.
  9. D. S. Wishart, T. Sajed, M. Pin, E. F. Poynton, B. Goel, B. L. Lee, A. C. Guo, S. Saha, Z. Sayeeda, S. Han, M. Berjanskii, H. Peters, E. Oler, V. Gautam, T. Jordan, J. Kim, B. Ledingham, Z. M. Tretter, J. T. Koller, H. A. Shreffler, L. R. Stillwell, A. M. Jystad, N. Govind, J. L. Bade, L. W. Sumner, R. G. Linington and J. R. Cort, Nucleic Acids Res., 2025, 53, D700–D708 CrossRef PubMed.
  10. V. Chandrasekhar, K. Rajan, S. R. S. Kanakam, N. Sharma, V. Weißenborn, J. Schaub and C. Steinbeck, Nucleic Acids Res., 2025, 53, D634–D643 CrossRef PubMed.
  11. A. T. Aron, E. C. Gentry, K. L. McPhail, L.-F. Nothias, M. Nothias-Esposito, A. Bouslimani, D. Petras, J. M. Gauglitz, N. Sikora, F. Vargas, J. J. J. van der Hooft, M. Ernst, K. B. Kang, C. M. Aceves, A. M. Caraballo-Rodríguez, I. Koester, K. C. Weldon, S. Bertrand, C. Roullier, K. Sun, R. M. Tehan, C. A. Boya P, M. H. Christian, M. Gutiérrez, A. M. Ulloa, J. A. Tejeda Mora, R. Mojica-Flores, J. Lakey-Beitia, V. Vásquez-Chaves, Y. Zhang, A. I. Calderón, N. Tayler, R. A. Keyzers, F. Tugizimana, N. Ndlovu, A. A. Aksenov, A. K. Jarmusch, R. Schmid, A. W. Truman, N. Bandeira, M. Wang and P. C. Dorrestein, Nat. Protoc., 2020, 15, 1954–1991 CrossRef CAS PubMed.
  12. S. Kim, J. Chen, T. Cheng, A. Gindulyte, J. He, S. He, Q. Li, B. A. Shoemaker, P. A. Thiessen, B. Yu, L. Zaslavsky, J. Zhang and E. E. Bolton, Nucleic Acids Res., 2023, 51, D1373–D1380 CrossRef.
  13. J. Hastings, G. Owen, A. Dekker, M. Ennis, N. Kale, V. Muthukrishnan, S. Turner, N. Swainston, P. Mendes and C. Steinbeck, Nucleic Acids Res., 2016, 44, D1214–D1219 CrossRef CAS.
  14. M. M. Zdouc, K. Blin, N. L. L. Louwen, J. Navarro, C. Loureiro, C. D. Bader, C. B. Bailey, L. Barra, T. J. Booth, K. A. J. Bozhüyük, J. D. D. Cediel-Becerra, Z. Charlop-Powers, M. G. Chevrette, Y. H. Chooi, P. M. D'Agostino, T. de Rond, E. Del Pup, K. R. Duncan, W. Gu, N. Hanif, E. J. N. Helfrich, M. Jenner, Y. Katsuyama, A. Korenskaia, D. Krug, V. Libis, G. A. Lund, S. Mantri, K. D. Morgan, C. Owen, C.-S. Phan, B. Philmus, Z. L. Reitz, S. L. Robinson, K. S. Singh, R. Teufel, Y. Tong, F. Tugizimana, D. Ulanova, J. M. Winter, C. Aguilar, D. Y. Akiyama, S. A. A. Al-Salihi, M. Alanjary, F. Alberti, G. Aleti, S. A. Alharthi, M. Y. A. Rojo, A. A. Arishi, H. E. Augustijn, N. E. Avalon, J. A. Avelar-Rivas, K. K. Axt, H. B. Barbieri, J. C. J. Barbosa, L. G. Barboza Segato, S. E. Barrett, M. Baunach, C. Beemelmanns, D. Beqaj, T. Berger, J. Bernaldo-Agüero, S. M. Bettenbühl, V. A. Bielinski, F. Biermann, R. M. Borges, R. Borriss, M. Breitenbach, K. M. Bretscher, M. W. Brigham, L. Buedenbender, B. W. Bulcock, C. Cano-Prieto, J. Capela, V. J. Carrion, R. S. Carter, R. Castelo-Branco, G. Castro-Falcón, F. O. Chagas, E. Charria-Girón, A. A. Chaudhri, V. Chaudhry, H. Choi, Y. Choi, R. Choupannejad, J. Chromy, M. S. C. Donahey, J. Collemare, J. A. Connolly, K. E. Creamer, M. Crüsemann, A. A. Cruz, A. Cumsille, J.-F. Dallery, L. C. Damas-Ramos, T. Damiani, M. de Kruijff, B. D. Martín, G. D. Sala, J. Dillen, D. T. Doering, S. R. Dommaraju, S. Durusu, S. Egbert, M. Ellerhorst, B. Faussurier, A. Fetter, M. Feuermann, D. P. Fewer, J. Foldi, A. Frediansyah, E. A. Garza, A. Gavriilidou, A. Gentile, J. Gerke, H. Gerstmans, J. P. Gomez-Escribano, L. A. González-Salazar, N. E. Grayson, C. Greco, J. E. G. Gomez, S. Guerra, S. G. Flores, A. Gurevich, K. Gutiérrez-García, L. Hart, K. Haslinger, B. He, T. Hebra, J. L. Hemmann, H. Hindra, L. Höing, D. C. Holland, J. E. Holme, T. Horch, P. Hrab, J. Hu, T.-H. Huynh, J.-Y. Hwang, R. Iacovelli, D. Iftime, M. Iorio, S. Jayachandran, E. Jeong, J. Jing, J. J. Jung, Y. Kakumu, E. Kalkreuter, K. B. Kang, S. Kang, W. Kim, G. J. Kim, H. Kim, H. U. Kim, M. Klapper, R. A. Koetsier, C. Kollten, Á. T. Kovács, Y. Kriukova, N. Kubach, A. M. Kunjapur, A. K. Kushnareva, A. Kust, J. Lamber, M. Larralde, N. J. Larsen, A. P. Launay, N.-T.-H. Le, S. Lebeer, B. T. Lee, K. Lee, K. L. Lev, S.-M. Li, Y.-X. Li, C. Licona-Cassani, A. Lien, J. Liu, J. A. V. Lopez, N. V. Machushynets, M. I. Macias, T. Mahmud, M. Maleckis, A. M. Martinez-Martinez, Y. Mast, M. F. Maximo, C. M. McBride, R. M. McLellan, K. M. Bhatt, C. Melkonian, A. Merrild, M. Metsä-Ketelä, D. A. Mitchell, A. V. Müller, G.-S. Nguyen, H. T. Nguyen, T. H. J. Niedermeyer, J. H. O'Hare, A. Ossowicki, B. O. Ostash, H. Otani, L. Padva, S. Paliyal, X. Pan, M. Panghal, D. S. Parade, J. Park, J. Parra, M. P. Rubio, H. T. Pham, S. J. Pidot, J. Piel, B. Pourmohsenin, M. Rakhmanov, S. Ramesh, M. H. Rasmussen, A. Rego, R. Reher, A. J. Rice, A. Rigolet, A. Romero-Otero, L. R. Rosas-Becerra, P. Y. Rosiles, A. Rutz, B. Ryu, L.-A. Sahadeo, M. Saldanha, L. Salvi, E. Sánchez-Carvajal, C. Santos-Medellin, N. Sbaraini, S. M. Schoellhorn, C. Schumm, L. Sehnal, N. Selem, A. D. Shah, T. K. Shishido, S. Sieber, V. Silviani, G. Singh, H. Singh, N. Sokolova, E. C. Sonnenschein, M. Sosio, S. T. Sowa, K. Steffen, E. Stegmann, A. B. Streiff, A. Strüder, F. Surup, T. Svenningsen, D. Sweeney, J. Szenei, A. Tagirdzhanov, B. Tan, M. J. Tarnowski, B. R. Terlouw, T. Rey, N. U. Thome, L. R. Torres Ortega, T. Tørring, M. Trindade, A. W. Truman, M. Tvilum, D. W. Udwary, C. Ulbricht, L. Vader, G. P. van Wezel, M. Walmsley, R. Warnasinghe, H. G. Weddeling, A. N. M. Weir, K. Williams, S. E. Williams, T. E. Witte, S. M. W. Rocca, K. Yamada, D. Yang, D. Yang, J. Yu, Z. Zhou, N. Ziemert, L. Zimmer, A. Zimmermann, C. Zimmermann, J. J. J. van der Hooft, R. G. Linington, T. Weber and M. H. Medema, Nucleic Acids Res., 2025, 53, D678–D690 CrossRef.
  15. K. Blin, S. Shaw, L. Vader, J. Szenei, Z. L. Reitz, H. E. Augustijn, J. D. D. Cediel-Becerra, V. de Crécy-Lagard, R. A. Koetsier, S. E. Williams, P. Cruz-Morales, S. Wongwas, A. E. Segurado Luchsinger, F. Biermann, A. Korenskaia, M. M. Zdouc, D. Meijer, B. R. Terlouw, J. J. J. van der Hooft, N. Ziemert, E. J. N. Helfrich, J. Masschelein, C. Corre, M. G. Chevrette, G. P. van Wezel, M. H. Medema and T. Weber, Nucleic Acids Res., 2025, 53, W32–W38 CrossRef PubMed.
  16. M. A. Skinnider, C. W. Johnston, M. Gunabalasingam, N. J. Merwin, A. M. Kieliszek, R. J. MacLellan, H. Li, M. R. M. Ranieri, A. L. H. Webster, M. P. T. Cao, A. Pfeifle, N. Spencer, Q. H. To, D. P. Wallace, C. A. Dejong and N. A. Magarvey, Nat. Commun., 2020, 11, 6058 CrossRef CAS.
  17. K. Blin, S. Shaw, M. H. Medema and T. Weber, Nucleic Acids Res., 2024, 52, D586–D589 CrossRef CAS.
  18. D. W. Udwary, D. T. Doering, B. Foster, T. Smirnova, S. A. Kautsar and N. J. Mouncey, Nucleic Acids Res., 2025, 53, D717–D723 CrossRef.
  19. X. B. Tao, S. LaFrance, Y. Xing, A. A. Nava, H. G. Martin, J. D. Keasling and T. W. H. Backman, Nucleic Acids Res., 2023, 51, D532–D538 CrossRef CAS.
  20. A. Chang, L. Jeske, S. Ulbrich, J. Hofmann, J. Koblitz, I. Schomburg, M. Neumann-Schaal, D. Jahn and D. Schomburg, Nucleic Acids Res., 2020, 49, D498–D508 CrossRef.
  21. U. Wittig, M. Rey, A. Weidemann, R. Kania and W. Müller, Nucleic Acids Res., 2018, 46, D656–D660 CrossRef CAS PubMed.
  22. P. Bansal, A. Morgat, K. B. Axelsen, V. Muthukrishnan, E. Coudert, L. Aimo, N. Hyka-Nouspikel, E. Gasteiger, A. Kerhornou, T. B. Neto, M. Pozzato, M.-C. Blatter, A. Ignatchenko, N. Redaschi and A. Bridge, Nucleic Acids Res., 2022, 50, D693–D700 CrossRef CAS.
  23. UniProt Consortium, Nucleic Acids Res., 2023, 51, D523–D531 CrossRef.
  24. R. Caspi, R. Billington, I. M. Keseler, A. Kothari, M. Krummenacker, P. E. Midford, W. K. Ong, S. Paley, P. Subhraveti and P. D. Karp, Nucleic Acids Res., 2020, 48, D445–D453 CrossRef CAS.
  25. M. Kanehisa, M. Furumichi, Y. Sato, Y. Matsuura and M. Ishiguro-Watanabe, Nucleic Acids Res., 2025, 53, D672–D677 CrossRef.
  26. S. Choudhury, I. Toumpe, O. Gabouj, V. Hatzimanikatis and L. Miskovic, bioRxiv, 2025,  DOI:10.1101/2025.03.31.646317.
  27. N. Singh, S. Lane, T. Yu, J. Lu, A. Ramos, H. Cui and H. Zhao, Nat. Commun., 2025, 16, 5648 CrossRef PubMed.
  28. A. Sveshnikova, O. Oftadeh and V. Hatzimanikatis, Nat. Commun., 2025, 16, 4839 CrossRef CAS PubMed.
  29. G. Gricourt, P. Meyer, T. Duigou and J.-L. Faulon, ACS Synth. Biol., 2024, 13, 2276–2294 CrossRef CAS.
  30. E. Heid, S. Goldman, K. Sankaranarayanan, C. W. Coley, C. Flamm and W. H. Green, J. Chem. Inf. Model., 2021, 61, 4949–4961 CrossRef CAS PubMed.
  31. W. Finnigan, L. J. Hepworth, S. L. Flitsch and N. J. Turner, Nat. Catal., 2021, 4, 98–104 CrossRef CAS.
  32. K. Sankaranarayanan, E. Heid, C. W. Coley, D. Verma, W. H. Green and K. F. Jensen, Chem. Sci., 2022, 13, 6039–6053 RSC.
  33. S. Zheng, T. Zeng, C. Li, B. Chen, C. W. Coley, Y. Yang and R. Wu, Nat. Commun., 2022, 13, 3342 CrossRef CAS PubMed.
  34. S. Moretti, V. D. T. Tran, F. Mehl, M. Ibberson and M. Pagni, Nucleic Acids Res., 2021, 49, D570–D574 CrossRef CAS.
  35. D. M. Lowe, PhD thesis, University of Cambridge, 2012,  DOI:10.17863/CAM.16293.
  36. B. Chen, C. Li, H. Dai and L. Song, in Proceedings of the 37th International Conference on Machine Learning, PMLR, 2020, pp. 1608–1616 Search PubMed.
  37. T. Kim, S. Lee, Y. Kwak, M.-S. Choi, J. Park, S. J. Hwang and S.-G. Kim, New Phytol., 2024, 243, 2512–2527 CrossRef CAS PubMed.
  38. Y. Wan, C.-Y. Hsieh, B. Liao and S. Zhang, in Proceedings of the 39th International Conference on Machine Learning, PMLR, 2022, pp. 22475–22490 Search PubMed.
  39. Z. Tu and C. W. Coley, J. Chem. Inf. Model., 2022, 62, 3503–3513 CrossRef CAS PubMed.
  40. Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y. Shmueli, A. Dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido and A. Rives, Science, 2023, 379, 1123–1130 CrossRef CAS.
  41. A. Elnaggar, M. Heinzinger, C. Dallago, G. Rehawi, Y. Wang, L. Jones, T. Gibbs, T. Feher, C. Angerer, M. Steinegger, D. Bhowmik and B. Rost, IEEE Trans. Pattern Anal. Mach. Intell., 2022, 44, 7112–7127 Search PubMed.
  42. A. Kroll, S. Ranjan, M. K. M. Engqvist and M. J. Lercher, Nat. Commun., 2023, 14, 2787 CrossRef CAS PubMed.
  43. A. Kroll, S. Ranjan and M. J. Lercher, PLoS Comput. Biol., 2024, 20, e1012100 CrossRef CAS PubMed.
  44. S. Goldman, R. Das, K. K. Yang and C. W. Coley, PLoS Comput. Biol., 2022, 18, e1009853 CrossRef CAS PubMed.
  45. J. D. Clark, X. Mi, D. A. Mitchell and D. Shukla, Digit Discov, 2025, 4, 343–354 RSC.
  46. E. Englund, M. Schmidt, A. A. Nava, A. Lechner, K. Deng, R. Jocic, Y. Lin, J. Roberts, V. T. Benites, R. Kakumanu, J. W. Gin, Y. Chen, Y. Liu, C. J. Petzold, E. E. K. Baidoo, T. R. Northen, P. D. Adams, L. Katz, S. Yuzawa and J. D. Keasling, J. Am. Chem. Soc., 2023, 145, 8822–8832 CrossRef CAS.
  47. A. Kroll, N. Niebuhr, G. Butler and M. J. Lercher, PLoS Biol., 2024, 22, e3002807 CrossRef CAS PubMed.
  48. S. Honda, S. Shi and H. R. Ueda, arXiv, arXiv:arXiv:1911.04738, 2019,  DOI:10.48550/arXiv.1911.04738.
  49. F. Li, L. Yuan, H. Lu, G. Li, Y. Chen, M. K. M. Engqvist, E. J. Kerkhoven and J. Nielsen, Nat. Catal., 2022, 5, 662–672 CrossRef.
  50. A. Kroll and M. J. Lercher, Biol Methods Protoc, 2024, 9, bpae061 CrossRef CAS PubMed.
  51. H. Yu, H. Deng, J. He, J. D. Keasling and X. Luo, Nat. Commun., 2023, 14, 8211 CrossRef CAS PubMed.
  52. Z. Wang, D. Xie, D. Wu, X. Luo, S. Wang, Y. Li, Y. Yang, W. Li and L. Zheng, Nat. Commun., 2025, 16, 2736 CrossRef CAS.
  53. V. S. Boorla and C. D. Maranas, Nat. Commun., 2025, 16, 2072 CrossRef CAS.
  54. T. Wang, G. Xiang, S. He, L. Su, Y. Wang, X. Yan and H. Lu, Briefings in Bioinformatics, 2024, 25, bbae409 CrossRef CAS PubMed.
  55. M. Mirdita, K. Schütze, Y. Moriwaki, L. Heo, S. Ovchinnikov and M. Steinegger, Nat. Methods, 2022, 19, 679–682 CrossRef CAS.
  56. P. C. Agu, C. A. Afiukwa, O. U. Orji, E. M. Ezeh, I. H. Ofoke, C. O. Ogbu, E. I. Ugwuja and P. M. Aja, Sci. Rep., 2023, 13, 13398 CrossRef CAS.
  57. L. H. S. Santos, R. S. Ferreira and E. R. Caffarena, in Docking Screens for Drug Discovery, ed. W. F. de Azevedo Jr, Springer, New York, NY, 2019, pp. 13–34 Search PubMed.
  58. G. M. Morris, R. Huey, W. Lindstrom, M. F. Sanner, R. K. Belew, D. S. Goodsell and A. J. Olson, J. Comput. Chem., 2009, 30, 2785–2791 CrossRef CAS PubMed.
  59. O. Trott and A. J. Olson, J. Comput. Chem., 2010, 31, 455–461 CrossRef CAS.
  60. J. Eberhardt, D. Santos-Martins, A. F. Tillack and S. Forli, J. Chem. Inf. Model., 2021, 61, 3891–3898 CrossRef CAS.
  61. R. A. Friesner, J. L. Banks, R. B. Murphy, T. A. Halgren, J. J. Klicic, D. T. Mainz, M. P. Repasky, E. H. Knoll, M. Shelley, J. K. Perry, D. E. Shaw, P. Francis and P. S. Shenkin, J. Med. Chem., 2004, 47, 1739–1749 CrossRef CAS.
  62. G. Jones, P. Willett, R. C. Glen, A. R. Leach and R. Taylor, J. Mol. Biol., 1997, 267, 727–748 CrossRef CAS.
  63. J. Yang, V. S. Banas, K. D. Patel, G. S. M. Rivera, L. S. Mydy, A. M. Gulick and T. A. Wencewicz, J. Biol. Chem., 2022, 298, 102166 CrossRef CAS.
  64. H. BEKKER, H. BERENDSEN, E. DIJKSTRA, S. ACHTEROP, R. VONDRUMEN, D. VANDERSPOEL, A. SIJBERS, H. Keegstra and M. RENARDUS, PHYSICS COMPUTING ’, 1993, 92, 252–256 Search PubMed.
  65. D. A. Case, T. E. Cheatham III, T. Darden, H. Gohlke, R. Luo, K. M. Merz Jr., A. Onufriev, C. Simmerling, B. Wang and R. J. Woods, J. Comput. Chem., 2005, 26, 1668–1688 CrossRef CAS.
  66. R. Salomon-Ferrer, D. A. Case and R. C. Walker, Wiley Interdiscip. Rev. Comput. Mol. Sci., 2013, 3, 198–210 CrossRef CAS.
  67. B. R. Brooks, C. L. Brooks III, A. D. Mackerell Jr., L. Nilsson, R. J. Petrella, B. Roux, Y. Won, G. Archontis, C. Bartels, S. Boresch, A. Caflisch, L. Caves, Q. Cui, A. R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodoscek, W. Im, K. Kuczera, T. Lazaridis, J. Ma, V. Ovchinnikov, E. Paci, R. W. Pastor, C. B. Post, J. Z. Pu, M. Schaefer, B. Tidor, R. M. Venable, H. L. Woodcock, X. Wu, W. Yang, D. M. York and M. Karplus, J. Comput. Chem., 2009, 30, 1545–1614 CrossRef CAS.
  68. J. W. Ponder and D. A. Case, in Advances in Protein Chemistry, Academic Press, 2003, vol. 66, pp. 27–85 Search PubMed.
  69. J. A. Maier, C. Martinez, K. Kasavajhala, L. Wickstrom, K. E. Hauser and C. Simmerling, J. Chem. Theory Comput., 2015, 11, 3696–3713 CrossRef CAS.
  70. C. Tian, K. Kasavajhala, K. A. A. Belfon, L. Raguette, H. Huang, A. N. Migues, J. Bickel, Y. Wang, J. Pincay, Q. Wu and C. Simmerling, J. Chem. Theory Comput., 2020, 16, 528–552 CrossRef.
  71. A. D. MacKerell, D. Bashford, M. Bellott, R. L. Dunbrack, J. D. Evanseck, M. J. Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph-McCarthy, L. Kuchnir, K. Kuczera, F. T. Lau, C. Mattos, S. Michnick, T. Ngo, D. T. Nguyen, B. Prodhom, W. E. Reiher, B. Roux, M. Schlenkrich, J. C. Smith, R. Stote, J. Straub, M. Watanabe, J. Wiórkiewicz-Kuczera, D. Yin and M. Karplus, J. Phys. Chem. B, 1998, 102, 3586–3616 CrossRef CAS PubMed.
  72. R. B. Best, X. Zhu, J. Shim, P. E. M. Lopes, J. Mittal, M. Feig and A. D. Mackerell, J. Chem. Theory Comput., 2012, 8, 3257–3273 CrossRef CAS PubMed.
  73. J. Huang, S. Rauscher, G. Nawrocki, T. Ran, M. Feig, B. L. de Groot, H. Grubmüller and A. D. MacKerell, Nat. Methods, 2017, 14, 71–73 CrossRef CAS.
  74. L. D. Schuler, X. Daura and W. F. van Gunsteren, J. Comput. Chem., 2001, 22, 1205–1218 CrossRef CAS.
  75. M. M. Reif, P. H. Hünenberger and C. Oostenbrink, J. Chem. Theory Comput., 2012, 8, 3705–3723 CrossRef CAS PubMed.
  76. W. L. Jorgensen and J. Tirado-Rives, J. Am. Chem. Soc., 1988, 110, 1657–1666 CrossRef CAS PubMed.
  77. W. L. Jorgensen, D. S. Maxwell and J. Tirado-Rives, J. Am. Chem. Soc., 1996, 118, 11225–11236 CrossRef CAS.
  78. J. Wang, R. M. Wolf, J. W. Caldwell, P. A. Kollman and D. A. Case, J. Comput. Chem., 2004, 25, 1157–1174 CrossRef CAS.
  79. J. Wang, W. Wang, P. A. Kollman and D. A. Case, J. Mol. Graphics Modell., 2006, 25, 247–260 CrossRef CAS PubMed.
  80. K. Vanommeslaeghe, E. Hatcher, C. Acharya, S. Kundu, S. Zhong, J. Shim, E. Darian, O. Guvench, P. Lopes, I. Vorobyov and A. D. Mackerell, J. Comput. Chem., 2010, 31, 671–690 CrossRef CAS PubMed.
  81. E. Kalkreuter, K. S. Bingham, A. M. Keeler, A. N. Lowell, J. J. Schmidt, D. H. Sherman and G. J. Williams, Nat. Commun., 2021, 12, 2193 CrossRef CAS.
  82. S. Huang, H. Ji and J. Zheng, FEBS J., 2024, 291, 3839–3855 CrossRef CAS.
  83. P. A. Kollman, I. Massova, C. Reyes, B. Kuhn, S. Huo, L. Chong, M. Lee, T. Lee, Y. Duan, W. Wang, O. Donini, P. Cieplak, J. Srinivasan, D. A. Case and T. E. Cheatham, Acc. Chem. Res., 2000, 33, 889–897 CrossRef CAS.
  84. A. Laio and M. Parrinello, Proc. Natl. Acad. Sci. U. S. A., 2002, 99, 12562–12566 CrossRef CAS PubMed.
  85. Z. L. Watson, I. J. Knudson, F. R. Ward, S. J. Miller, J. H. D. Cate, A. Schepartz and A. M. Abramyan, Nat. Chem., 2023, 15, 913–921 CrossRef CAS PubMed.
  86. H. B. Burgi, J. D. Dunitz, J. M. Lehn and G. Wipff, Tetrahedron, 1974, 30, 1563–1572 CrossRef.
  87. A. Warshel and M. Levitt, J. Mol. Biol., 1976, 103, 227–249 CrossRef CAS PubMed.
  88. M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Mennucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg Williams, F. Ding, F. Lipparini, F. Egidi, J. Goings, B. Peng, A. Petrone, T. Henderson, D. Ranasinghe, V. G. Zakrzewski, J. Gao, N. Rega, G. Zheng, W. Liang, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, K. Throssell, J. A. Montgomery Jr., J. E. Peralta, F. Ogliaro, M. J. Bearpark, J. J. Heyd, E. N. Brothers, K. N. Kudin, V. N. Staroverov, T. A. Keith, R. Kobayashi, J. Normand, K. Raghavachari, A. P. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M. Cossi, J. M. Millam, M. Klene, C. Adamo, R. Cammi, J. W. Ochterski, R. L. Martin, K. Morokuma, O. Farkas, J. B. Foresman and D. J. Fox, Gaussian 16 Rev. C.01 2016 Search PubMed.
  89. F. Neese, Wiley Interdiscip. Rev. Comput. Mol. Sci., 2012, 2, 73–78 CrossRef CAS.
  90. F. Neese, Wiley Interdiscip. Rev. Comput. Mol. Sci., 2025, 15, e70019 CrossRef.
  91. Y. Shao, Z. Gan, E. Epifanovsky, A. T. B. Gilbert, M. Wormit, J. Kussmann, A. W. Lange, A. Behn, J. Deng, X. Feng, D. Ghosh, M. Goldey, P. R. Horn, L. D. Jacobson, I. Kaliman, R. Z. Khaliullin, T. Kuś, A. Landau, J. Liu, E. I. Proynov, Y. M. Rhee, R. M. Richard, M. A. Rohrdanz, R. P. Steele, E. J. Sundstrom, H. L. Woodcock III, P. M. Zimmerman, D. Zuev, B. Albrecht, E. Alguire, B. Austin, G. J. O. Beran, Y. A. Bernard, E. Berquist, K. Brandhorst, K. B. Bravaya, S. T. Brown, D. Casanova, C.-M. Chang, Y. Chen, S. H. Chien, K. D. Closser, D. L. Crittenden, M. Diedenhofen, R. A. DiStasio Jr., H. Do, A. D. Dutoi, R. G. Edgar, S. Fatehi, L. Fusti-Molnar, A. Ghysels, A. Golubeva-Zadorozhnaya, J. Gomes, M. W. D. Hanson-Heine, P. H. P. Harbach, A. W. Hauser, E. G. Hohenstein, Z. C. Holden, T.-C. Jagau, H. Ji, B. Kaduk, K. Khistyaev, J. Kim, J. Kim, R. A. King, P. Klunzinger, D. Kosenkov, T. Kowalczyk, C. M. Krauter, K. U. Lao, A. D. Laurent, K. V. Lawler, S. V. Levchenko, C. Y. Lin, F. Liu, E. Livshits, R. C. Lochan, A. Luenser, P. Manohar, S. F. Manzer, S.-P. Mao, N. Mardirossian, A. V. Marenich, S. A. Maurer, N. J. Mayhall, E. Neuscamman, C. M. Oana, R. Olivares-Amaya, D. P. O'Neill, J. A. Parkhill, T. M. Perrine, R. Peverati, A. Prociuk, D. R. Rehn, E. Rosta, N. J. Russ, S. M. Sharada, S. Sharma, D. W. Small, A. Sodt, T. Stein, D. Stück, Y.-C. Su, A. J. W. Thom, T. Tsuchimochi, V. Vanovschi, L. Vogt, O. Vydrov, T. Wang, M. A. Watson, J. Wenzel, A. White, C. F. Williams, J. Yang, S. Yeganeh, S. R. Yost, Z.-Q. You, I. Y. Zhang, X. Zhang, Y. Zhao, B. R. Brooks, G. K. L. Chan, D. M. Chipman, C. J. Cramer, W. A. Goddard III, M. S. Gordon, W. J. Hehre, A. Klamt, H. F. Schaefer III, M. W. Schmidt, C. D. Sherrill, D. G. Truhlar, A. Warshel, X. Xu, A. Aspuru-Guzik, R. Baer, A. T. Bell, N. A. Besley, J.-D. Chai, A. Dreuw, B. D. Dunietz, T. R. Furlani, S. R. Gwaltney, C.-P. Hsu, Y. Jung, J. Kong, D. S. Lambrecht, W. Liang, C. Ochsenfeld, V. A. Rassolov, L. V. Slipchenko, J. E. Subotnik, T. Van Voorhis, J. M. Herbert, A. I. Krylov, P. M. W. Gill and M. Head-Gordon, Mol. Phys., 2015, 113, 184–215 CrossRef CAS.
  92. M. J. S. Dewar, E. G. Zoebisch, E. F. Healy and J. J. P. Stewart, J. Am. Chem. Soc., 1985, 107, 3902–3909 CrossRef CAS.
  93. J. J. P. Stewart, J Mol Model, 2007, 13, 1173–1213 CrossRef CAS.
  94. D. R. Hartree, Math. Proc. Cambridge Philos. Soc., 1928, 24, 89–110 CrossRef CAS.
  95. V. Fock, Z. Phys., 1930, 61, 126–148 CrossRef.
  96. C. C. J. Roothaan, Rev. Mod. Phys., 1951, 23, 69–89 CrossRef CAS.
  97. C. Møller and M. S. Plesset, Phys. Rev., 1934, 46, 618–622 CrossRef.
  98. D. R. Katti, A. Sharma and K. S. Katti, in Materials for Bone Disorders, ed. S. Bose and A. Bandyopadhyay, Academic Press, 2017, pp. 453–492 Search PubMed.
  99. P. Hohenberg and W. Kohn, Phys. Rev., 1964, 136, B864–B871 CrossRef.
  100. W. Kohn and L. J. Sham, Phys. Rev., 1965, 140, A1133–A1138 CrossRef.
  101. C. Lee, W. Yang and R. G. Parr, Phys. Rev. B:Condens. Matter Mater. Phys., 1988, 37, 785–789 CrossRef CAS PubMed.
  102. A. D. Becke, J. Chem. Phys., 1993, 98, 5648–5652 CrossRef CAS.
  103. Y. Zhao and D. G. Truhlar, Theor Chem Account, 2008, 120, 215–241 CrossRef CAS.
  104. W. J. Hehre, R. Ditchfield and J. A. Pople, J. Chem. Phys., 1972, 56, 2257–2261 CrossRef CAS.
  105. H. Ji, T. Shi, L. Liu, F. Zhang, W. Tao, Q. Min, Z. Deng, L. Bai, Y. Zhao and J. Zheng, Catal. Sci. Technol., 2021, 11, 6782–6792 RSC.
  106. M. Svensson, S. Humbel, R. D. J. Froese, T. Matsubara, S. Sieber and K. Morokuma, J. Phys. Chem., 1996, 100, 19357–19363 CrossRef CAS.
  107. J. Abramson, J. Adler, J. Dunger, R. Evans, T. Green, A. Pritzel, O. Ronneberger, L. Willmore, A. J. Ballard, J. Bambrick, S. W. Bodenstein, D. A. Evans, C.-C. Hung, M. O'Neill, D. Reiman, K. Tunyasuvunakool, Z. Wu, A. Žemgulytė, E. Arvaniti, C. Beattie, O. Bertolli, A. Bridgland, A. Cherepanov, M. Congreve, A. I. Cowen-Rivers, A. Cowie, M. Figurnov, F. B. Fuchs, H. Gladman, R. Jain, Y. A. Khan, C. M. R. Low, K. Perlin, A. Potapenko, P. Savy, S. Singh, A. Stecula, A. Thillaisundaram, C. Tong, S. Yakneen, E. D. Zhong, M. Zielinski, A. Žídek, V. Bapst, P. Kohli, M. Jaderberg, D. Hassabis and J. M. Jumper, Nature, 2024, 630, 493–500 CrossRef CAS.
  108. L. Su, P. Liu, W. Liu, Q. Liu, J. Gao, Q. Zhao, K. Jia, X. Sheng, H. Ma, Q. Wang and Z. Dai, ACS Catal., 2024, 14, 17699–17715 CrossRef CAS.
  109. K. Zinovjev, L. Hedges, R. Montagud Andreu, C. Woods, I. Tuñón and M. W. van der Kamp, J. Chem. Theory Comput., 2024, 20, 4514–4522 CrossRef CAS.
  110. T. Wang, X. He, M. Li, Y. Li, R. Bi, Y. Wang, C. Cheng, X. Shen, J. Meng, H. Zhang, H. Liu, Z. Wang, S. Li, B. Shao and T.-Y. Liu, Nature, 2024, 635, 1019–1027 CrossRef CAS PubMed.
  111. V. Gradisteanu, E. W. Chan, L. Hedges, M. Malagarriga, R. David, M. de la Puente, D. Laage, I. Tuñón, M. W. van der Kamp and K. Zinovjev, ChemRxiv, 2025 DOI:10.26434/chemrxiv-2025-nw9lt.
  112. L. Seute, E. Hartmann, J. Stühmer and F. Gräter, Chem. Sci., 2025, 16, 2907–2930 RSC.
  113. K. Takaba, A. J. Friedman, C. E. Cavender, P. Kumar Behara, I. Pulido, M. M. Henry, H. MacDermott-Opeskin, C. R. Iacovella, A. M. Nagle, A. Matthew Payne, M. R. Shirts, D. L. Mobley, J. D. Chodera and Y. Wang, Chem. Sci., 2024, 15, 12861–12878 RSC.
  114. S. Luo, L. Liu, C.-J. Lyu, B. Sim, Y. Liu, H. Gong, Y. Nie and Y.-L. Zhao, Cell Rep. Phys. Sci., 2022, 3, 101128 CrossRef CAS.
  115. J. Zhang, D. Chen, Y. Xia, Y.-P. Huang, X. Lin, X. Han, N. Ni, Z. Wang, F. Yu, L. Yang, Y. I. Yang and Y. Q. Gao, J. Chem. Theory Comput., 2023, 19, 4338–4350 CrossRef CAS.
  116. S. Naserifar, Y. Chen, S. Kwon, H. Xiao and W. A. Goddard, Matter, 2021, 4, 195–216 CrossRef CAS.

Footnote

These authors contributed equally to this work.

This journal is © The Royal Society of Chemistry 2026
Click here to see how this site uses Cookies. View our privacy policy here.