 Open Access Article
 Open Access Article
      
        
          
            Yiling 
            Zhou
          
        
      a, 
      
        
          
            Dejun 
            Jiang
          
        
      a, 
      
        
          
            Xiao 
            Wei
          
        
      a, 
      
        
          
            Jiacai 
            Yi
          
        
      b, 
      
        
          
            Yikun 
            Wang
          
        
      a, 
      
        
          
            Youchao 
            Deng
          
        
      a and 
      
        
          
            Dongsheng 
            Cao
          
        
       *a
*a
      
aXiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, Hunan, P.R. China. E-mail: oriental-cds@163.com
      
bCollege of Computer, National University of Defense Technology, Changsha 410073, Hunan, China
    
First published on 5th September 2025
Predicting drug metabolism remains a long-standing challenge in pharmacokinetics due to the mechanistic complexity of enzymatic transformations and the fragmented nature of current computational tools. Existing models are typically limited to isolated tasks – substrate recognition, metabolic site identification, or metabolite generation – lacking mechanistic fidelity, holistic integration, and chemical interpretability. Here, we introduce DeepMetab, the first comprehensive and mechanistically informed deep graph learning framework for end-to-end prediction of CYP450-mediated drug metabolism. DeepMetab uniquely integrates three essential prediction tasks – substrate profiling, site-of-metabolism (SOM) localization, and metabolite generation – within a unified multi-task architecture. It employs a dual-labeling strategy that simultaneously captures atom- and bond-level reactivity, and infuses multi-scale features including quantum-informed and topological descriptors into a graph neural network (GNN) backbone. A curated knowledge base of expert-derived reaction rules further ensures mechanistic consistency during metabolite synthesis. DeepMetab consistently outperformed existing models across nine major CYP isoforms in all three prediction tasks. Its strong generalizability was further validated on 18 recently FDA-approved drugs, achieving 100% TOP-2 accuracy for SOM prediction and accurately recovering several experimentally confirmed metabolites absent from the training set. Visualization of learned representations reveals expert-level discernment of electronic characteristics, steric architecture, and regiochemical determinants, underscoring the model's interpretability. Together, DeepMetab represents a next-generation AI system that bridges symbolic reaction rules and deep graph reasoning to deliver accurate, interpretable, and end-to-end metabolism predictions, offering tangible value for both preclinical research and regulatory applications.
Among numerous metabolic enzymes in the human body, the cytochrome P450 enzyme system stands out as the most critical Phase I metabolic pathway, responsible for approximately 75% of drug metabolism processes in the human body.5 As a typical monooxygenase, CYP450 primarily introduces polar functional groups into substrate molecules via oxidation, reduction, hydrolysis, and other related transformations, thereby facilitating subsequent Phase II metabolism.6 Given the versatility of its catalytic functions, the CYP450 enzyme system is subdivided into 18 families and 43 subfamilies, with isoenzymes from the CYP1, CYP2, and CYP3 families accounting for the predominant metabolic activities.7 Notably, different isoenzymes exhibit substrate specificity, catalyzing the metabolism of specific drugs and their metabolites through distinct functional transformations.8 For instance, CYP2C19 catalyzes the conversion of voriconazole to its inactive N-oxide metabolite;9 CYP2D6 mediates the transformation of codeine into morphine, enhancing its analgesic activity,10 while CYP3A4 catalyzes the conversion of nefazodone to its toxic metabolite 4-hydroxynefazodone.11 These examples illustrate the functional diversity of CYP450 isoenzymes and their profound impact on drug efficacy and toxicity. Therefore, comprehensive understanding of drug metabolism processes, particularly the elucidation of metabolic pathways and metabolites facilitated by major CYP450 isoenzymes, holds significant theoretical value and practical implications for pharmaceutical development and safety assessment. However, traditional experimental approaches are often constrained by high resource demands, time-consuming procedures, and substantial economic costs.12 In contrast, computational approaches offer an efficient alternative, significantly reducing resource consumption.
Current computational approaches on drug metabolism prediction primarily employs two approaches. The first is a data-driven method based on natural language processing (NLP) models, such as MetaPredictor.13 The second approach is mechanism-based, emphasizing the systematic prediction of metabolic enzymes, identification of sites of metabolism, and application of relevant metabolic transformation rules to generate metabolites, as exemplified by tools such as GLORYx14 and BioTransformer3.0.15 While the data-driven approach offers end-to-end metabolite prediction, it often overlooks critical intermediate metabolic representations, such as metabolic pathways, which are equally important. Additionally, its performance in metabolite prediction tasks remains suboptimal. For example, MetaPredictor,13 due to its reliance on promiscuous datasets and the inherent limitations of text generation algorithms, demonstrates limited accuracy in metabolite generation. Specifically, its site of metabolism (SOM) predictor achieves only 57.8% of TOP-1 accuracy on the test set. Moreover, MetaPredictor13 frequently generates meaningless text, requiring multiple filtering steps through automated scripts following the prompt-based metabolite predictor module.
Mechanism-based approaches, which offer more profound insights into the intricate stages of metabolism, have gained significant recognition. Nevertheless, current approaches face critical challenges. Firstly, previous studies are typically limited to isolated tasks and have not provided comprehensive predictions for metabolic characterization, particularly in terms of metabolite generation.16 For instance, CypReact17 and CYPstrate18 just concentrate on metabolizing enzyme prediction, while SMARTCyp3.0,19 FAME3,20 and CyProduct21 (the CYP450 metabolism prediction module of BioTransformer3.0 (ref. 15)) only specialize in SOM prediction. Among these, FAME3 (ref. 20) does not differentiate between isoforms, while tools like SMARTCyp3.0 (ref. 22) only predict metabolism for just three CYP450 isoforms. Secondly, existing work constrained accuracy stemming from algorithmic limitations and insufficient exploitation of available features. While studies like BioTransformer 3.0,15 which integrates CypReact,17 CyProduct,21 and a metabolite generation module,21 claim to offer relatively comprehensive coverage of major isoforms and metabolic phenotypes, key limitations remain. Specifically, the metabolite generation module lacks induction of several critical reaction rules (e.g., de-halogenation), its algorithmic framework relies solely on simple machine learning (ML) models such as random forest, and it utilizes only basic descriptors, restricting the richness of the information employed. Unlike traditional ML methods, deep learning (DL) also has been applied to SOM prediction for its superior ability in molecular representation learning. Vladimir et al.23 (2023) demonstrated that Graph Convolutional Networks (GCNs), extracting features directly from molecular topology graphs, deliver more better performance in predicting enzyme metabolic sites than traditional ML methods. However, these methods rely on basic chemical features, lacking critical reactivity and physicochemical details necessary for accurate predictions, and depend on promiscuous enzyme data, which fail to effectively distinguish the metabolism of different enzymes. While ML and DL methods tend to offer increased convenience and faster processing times, quantum chemistry approaches are capable of delivering more comprehensive and detailed information regarding molecular interactions and mechanisms. In 2022, StarDrop24 addressed this by integrating semi-empirical quantum chemical descriptors with ML techniques. Despite the limited dataset and the use of simpler algorithms, this approach enhanced efficiency and accuracy to a certain extent, suggesting that combining computational chemistry with DL may produce more robust and generalizable models for metabolic predictions. All in all, despite significant advances, contemporary mechanistic-driven metabolism prediction models remain largely confined to discrete tasks, with their predictive capacity further hampered by intrinsic algorithmic and representational limitations. These constraints substantially undermine both accuracy and translatability across the vast chemical and structural diversity of drug-like molecules. Moreover, the persistent absence of a comprehensive, systematized, and hierarchically structured formalization of metabolic rules continues to fundamentally compromise the robustness, reproducibility, and precision of metabolite generation.
To address these challenges, we present the first integrated and mechanistically grounded graph learning platform for end-to-end prediction of CYP450-mediated drug metabolism. Unlike prior integrated or end-to-end tools, BioTransformer 3.0 (ref. 15) relies on traditional ML heuristics with limited mechanistic fidelity and scalability; the commercial platforms StarDrop24 and ADMET Predictor25 employ semi-empirical/force-field methodologies that are computationally demanding and less amenable to end-to-end learning; and MetaPredictor13 targets mixed-enzyme metabolism without recovering pathway information. In contrast, DeepMetab's mechanism-informed GNN is the first deep-learning-based system that prioritizes pathway inference while localizing SOMs and generating metabolites, thereby broadening task coverage and mechanistic rigor, culminating in a comprehensive end-to-end platform for CYP450-mediated drug metabolism. Notably, its strong generalizability was further validated on 18 recently FDA-approved drugs, achieving 100% TOP-2 accuracy for SOM prediction and accurately recovering several experimentally confirmed metabolites absent from the training set. Visual analysis of model outputs highlights advanced recognition of key electronic, steric, and regiochemical factors, attesting to its high interpretability. Altogether, DeepMetab offers a uniquely robust computational tool for advancing drug metabolism studies.
The SOM data in our study were compiled from multiple sources, including the research work of Zaretzki et al.,26 modeling data from CyProduct,21 established databases such as DrugBank29 and BRENDA,30 and relevant literature.31 Compared to the existing comprehensive EBoMD dataset,21 our expanded SOM dataset includes approximately one-third more compounds, reaching over 900 molecular substrates. This expansion substantially enhances both the richness and diversity of the available SOM data. Considering the current challenges of ambiguous matching and incomplete summarization of metabolic rules, we developed an innovative and efficient site-labeling methodology that integrates the advantages of both atom of metabolism (AOM) and bond of metabolism (BOM) to ensure accuracy and uniqueness in metabolite generation through reaction rules. This methodology encompasses two key aspects: first, we implemented a combined SOM labeling approach utilizing both AOM and BOM to prevent reaction type misclassification. In curation of the SOM dataset, reactions occurring at single atomic site (such as hydroxylation and heteroatom oxidation) are labeled using AOM, while reactions involving atomic pairs (chemical bonds), including dealkylation, hydrolysis, and epoxidation, are labeled using BOMs. This approach significantly reduces misclassification of multiple reactions at the same site. As illustrated in the right panel of Fig. 1 Rules, the parent molecule undergoes an N-dealkylation reaction, leading to the formation of an exposed amino group while the cleaved methyl group is converted to formaldehyde as a byproduct. Due to the AOM labeling at the C adjacent to the N in amino nitrogen, conventional AOM annotation methodology could lead to ambiguity with C-hydroxylation reactions – a common error in GLORYx.14 To address this, our refined annotation strategy effectively differentiates between these reaction types by implementing N-dealkylation as BOM and C-hydroxylation as AOM.
Second, we introduced the principles of minimal and optimal labeling to address multi-site reactions. This approach minimizes reaction site labeling when multiple sites are involved, thereby reducing misclassification risks. The left panel of Fig. 1 Rules demonstrates this principle with the reduction of an α,β-unsaturated ketone carbonyl. Simultaneous labeling of all changing chemical bonds could lead to misidentification of the carbon–carbon double bond as an epoxidation site—a limitation of BioTransformer rules.15 Our minimal labeling approach effectively resolves such multi-site reaction ambiguities. Finally, as shown in the SOM part of Fig. 1, we encoded both AOMs and BOMs labeling as binary vectors (consisting of zeros and ones), where element positions correspond to the indices of respective atoms or bonds within the molecule.
The metabolic reaction rules in this study were systematically compiled and summarized through extensive literature review.32 Building upon existing rules from BioTransformer15 and SyGMa,33 combined with our innovative metabolic site labeling approach mentioned above, we developed a more comprehensive set of metabolic rules through extensive literature collection and analysis. To achieve systematic and comprehensive categorization, metabolic reactions were hierarchically classified into four primary categories: oxidation, hydrolysis, reduction, and dehalogenation. These were further refined into 15 secondary subcategories and several tertiary subcategories, exceeding the number of BioTransformer15 rules by about 25% and demonstrating an improvement of over 10% in comparative analyses with similar models, as illustrated by the pie chart in the Rules section of Fig. 1. The implementation of these rules was accomplished using the RDKit34 package in Python.
Specifically, the workflow of GNN architecture consists of three steps: initialization, message passing and updating, and final embedding generation. During initialization, edge features are derived from their individual properties (e.g., bond type) and the features of connected nodes (e.g., atom types), ensuring that the directed nature of the graph is respected as initial messages do not rely on reverse messages. That message mt+1vw does not depend on its reverse message mtvw from the previous iteration. Eqn (1) shows how to initialize the edge hidden states before the first step of message passing:
| h0vw = τ(Wicat(xv,evw)) | (1) | 
| ht+1vw = τ(h0vw + Wmmt+1vw) | (2) | 
|  | (3) | 
|  | (4) | 
| hv = τ(Wαcat(xv,mv)) | (5) | 
After several iterations, the final embeddings for nodes and edges are obtained. In the substrate prediction module, atom states are derived by aggregating the features of connected edges, while bond states are directly computed from the updated edge representations, resulting in comprehensive and informative molecular graph embeddings. In contrast, the SOM prediction module does not aggregate atom and bond embeddings into a full molecular graph representation. Instead, individual bond or atom embeddings are utilized directly for predictive tasks.
These embeddings (hv) are then processed by task-specific feed-forward neural networks (FFNN), which produce outputs for each prediction.
| y = f(h) | (6) | 
In this study, the model predicts the metabolism for nine CYP enzymes, with predictions represented as FFNN outputs. Specifically, the substrate prediction module generates predictions at the molecular level, indicating whether the entire molecule serves as a metabolic substrate. In contrast, the SOM prediction module performs predictions at the atomic (node) or bond (edge) level, identifying specific metabolic sites within the molecule.
We implemented differentiated feature construction strategies for the two core modules: substrate prediction and SOM prediction. The substrate prediction module primarily utilizes basic topological information from molecular graphs given that the distinction between substrates and non-substrates for CYP enzymes depends largely on global molecular features such as overall shape, size, and structural connectivity, which can be effectively captured by simple graph representations, while the SOM module employs a multi-scale feature representation that systematically integrates molecular graph structural information, atom-level reactivity descriptors, and global molecular descriptors. Notably, the global molecular descriptors derived from the molecule are directly stacked onto each individual atomic feature within the molecular graph, enabling comprehensive incorporation of both local and global molecular properties. Further details are provided in the SI (Section S2.3).
![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 10. In contrast, more common isoforms display a relatively better balance. For example, CYP3A4 has a positive-to-negative ratio of approximately 1
10. In contrast, more common isoforms display a relatively better balance. For example, CYP3A4 has a positive-to-negative ratio of approximately 1![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) :
:![[thin space (1/6-em)]](https://www.rsc.org/images/entities/char_2009.gif) 2. This ratio reflects not only its metabolic activity but also the greater availability of data for these more prevalent enzymes. The pie chart in Fig. 2B highlights the distribution of substrate counts among the different CYP isoforms, with CYP3A4 accounting for 34% of the total substrates, underscoring its prevalence and significance in metabolism compared to the other isoforms. The further UpSet plot (Fig. S1.1A) reveals that over half of the molecules are metabolized by multiple enzymes, with 16 molecules being substrates for all enzymes. These findings demonstrate strong correlations among the metabolic tasks of different isoforms, suggesting that a multi-task modeling approach would be beneficial.
2. This ratio reflects not only its metabolic activity but also the greater availability of data for these more prevalent enzymes. The pie chart in Fig. 2B highlights the distribution of substrate counts among the different CYP isoforms, with CYP3A4 accounting for 34% of the total substrates, underscoring its prevalence and significance in metabolism compared to the other isoforms. The further UpSet plot (Fig. S1.1A) reveals that over half of the molecules are metabolized by multiple enzymes, with 16 molecules being substrates for all enzymes. These findings demonstrate strong correlations among the metabolic tasks of different isoforms, suggesting that a multi-task modeling approach would be beneficial.
        
| Enzyme | CYP1A2 | CYP2A6 | CYP2B6 | CYP2C8 | CYP2C9 | CYP2C19 | CYP2D6 | CYP2E1 | CYP3A4 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Training | Substrate | 416 | 102 | 154 | 157 | 369 | 314 | 396 | 148 | 1081 | 
| Non-substrate | 1719 | 1736 | 1684 | 1681 | 1779 | 1776 | 1694 | 1690 | 1910 | |
| Test | Substrate | 40 | 14 | 12 | 12 | 33 | 29 | 27 | 8 | 63 | 
| Non-substrate | 115 | 120 | 123 | 120 | 114 | 120 | 117 | 122 | 113 | 
SOM training dataset contains nearly 1500 reactions from 874 substrate molecules (Table 2). We performed a comprehensive statistical analysis of the nine CYP enzymes, focusing on the number of substrates classified as AOM and non-AOM, as well as BOM and non-BOM. The results reveal that the largest number of substrates are metabolized by CYP3A4, CYP1A2, and CYP2D6, with CYP3A4 accounting for more than half of the total, reflecting its association in the SOM training dataset with 420 AOMs and 361 BOMs, while CYP1A2 follows with 276 AOMs and 214 BOMs, and CYP2D6 shows 724 AOMs and 90 BOMs. This highlights the metabolic significance of these enzymes, which facilitates the development of models based on this dataset to achieve more effective applications.
| Enzyme | CYP1A2 | CYP2A6 | CYP2B6 | CYP2C8 | CYP2C9 | CYP2C19 | CYP2D6 | CYP2E1 | CYP3A4 | |
|---|---|---|---|---|---|---|---|---|---|---|
| Training | Substrates | 313 | 104 | 152 | 150 | 251 | 238 | 297 | 129 | 550 | 
| AOM | 276 | 80 | 125 | 115 | 224 | 210 | 209 | 98 | 420 | |
| Non-AOM | 2515 | 718 | 1135 | 1374 | 2502 | 2201 | 2694 | 724 | 6105 | |
| BOM | 214 | 72 | 108 | 105 | 152 | 142 | 217 | 90 | 361 | |
| Non-BOM | 3272 | 890 | 1609 | 1850 | 2371 | 2441 | 3854 | 923 | 7561 | |
| Test | Substrates | 17 | 9 | 13 | 14 | 17 | 16 | 27 | 10 | 40 | 
| AOM | 293 | 86 | 123 | 112 | 220 | 197 | 213 | 107 | 390 | |
| Non-AOM | 2649 | 793 | 1136 | 1310 | 2532 | 2145 | 2671 | 798 | 5713 | |
| BOM | 230 | 83 | 113 | 108 | 157 | 153 | 225 | 115 | 348 | |
| Non-BOM | 3438 | 1083 | 1657 | 1884 | 2627 | 2578 | 3949 | 1169 | 7182 | 
Further analysis is performed on the datasets. It is evident that hydroxylation reactions (Fig. 2a3) are the most prevalent, constituting nearly half of the reactions. The next most common type is cleavage reactions, which include O-dealkylation, N-dealkylation and ester hydrolysis and account for about one-third of the reactions. Fig. 2a1 shows the distribution of the proportions of each reaction type in each enzyme, with certain reaction types showing significantly elevated frequencies in specific enzymes. For instance, SNP-oxidation reactions occur at approximately twice the average frequency in CYP2A6 compared to other isoforms, while epoxidation reactions demonstrate notably higher prevalence in CYP2E1 than in other variants. In contrast, hydroxylation and cleavage reactions maintain consistently high proportions across all isoforms, whereas rearrangement reactions exhibit uniformly low frequencies across all variants. These findings highlight that modeling each enzymatic isoform respectively is crucial for accurately predicting metabolic outcomes and understanding drug interactions, efficacy, and toxicity. To display the shared and unique substrate molecules of the SOM dataset more intuitively among different metabolic enzymes, an UpSet plot was constructed as shown in Fig. S1.1B. It can be observed that the metabolic enzyme with the highest number of unique substrate molecules is CYP3A4, with 193 unique substrates. However, these unique molecules account for only about 1/3 of all CYP3A4-metabolized molecules. Additionally, in the entire dataset, the sum of shared substrates metabolized by more than one enzyme molecule amounts to 473, which exceeds the number of unique substrate molecules, totaling 401. These observations suggest a high degree of interconnectivity among the metabolism of these enzymes.
To elucidate the reaction rules we formulated, and the SOM annotation methodologies employed for various metabolic processes, Table 3 delineates the principal categories along with illustrative examples of metabolic reactions mediated by CYP450 enzymes. The dataset comprises a total of 6 AOMs and 14 BOMs as examples, meticulously classified according to their labeling types, which reflect the specific enzymatic activities involved. In our classification of reaction types, we adopted a streamlined approach to enhance clarity and comprehensiveness. For instance, within the category of dealkylation reactions, we have effectively consolidated N-dealkylation, O-dealkylation, and S-dealkylation into a unified category termed X-dealkylation (shown in No. 12 of Table 3). This category captures the broader concept of heteroatom dealkylation, facilitating a clearer understanding of the underlying mechanisms. Conversely, when examining ester hydrolysis reactions, we have distinguished among assorted groups, such as thioesters and nitrate esters, differentiating them from general carbon esters due to their significant biochemical variances (shown in No. 13 and No. 14 of Table 3). This distinction underscores the need for precise labeling in the study of metabolic pathways. To further exemplify these classifications, the “SOM annotation method” column provides representative instances of specific reaction types, illuminating the signaling pathways involved. Notably, the red markings indicate the labeled sites on the molecular structures, drawing attention to critical functional groups affected during the metabolic transformations.
The input module was used to convert molecular SMILES into molecular graphs with multi-scaled features. For both substrate and SOM components, we implemented multi-task GNN36 architecture to construct models for metabolic substrate prediction and metabolic site prediction. This multi-task approach shares underlying feature representations and learning parameters across different CYP isoform prediction tasks, enabling the model to capture common characteristics among isoforms while preserving their unique features. In the training of multi-task models, loss weighting strategies37 have been introduced to address the imbalance between positive and negative samples in the dataset, enabling the model to pay more attention to underrepresented classes during training. In our pipeline, the substrate module initially analyzes compounds to predict which metabolic enzymes will process them. Once these enzyme predictions are established, the SOM module leverages this information to determine which enzyme-specific model should be applied, enabling accurate prediction of sites of metabolism. What's more, to achieve precise metabolic site prediction, we developed another two complementary multi-task models within the SOM module: one dedicated to identifying AOM and another focused on predicting BOM. This dual-model architecture enables comprehensive characterization of metabolic sites from both atomic and chemical bond perspectives.
Next, in the rules module, we established a systematic metabolite generation framework that precisely matches predicted metabolic sites from the SOM module with a pre-constructed metabolic reaction rule database to generate corresponding metabolite structures. To enhance prediction efficiency and accuracy, we implemented a differentiated matching mechanism based on site types, employing distinct reaction matching strategies for AOM and BOM respectively. Finally, in the last output module, we developed a systematic integration to provide comprehensive and structured prediction results. The system provides structured data output systematically documenting all predicted metabolizing enzymes, metabolic sites, metabolites, and scoring metrics. This comprehensive output format ensures thorough documentation and facilitates subsequent analysis and interpretation of the prediction results.
In the SOM prediction module, both single-task and multi-task models employ the GNN algorithm framework across nine CYP enzyme isoforms. For predicting the SOM of each enzyme isoform, our approach inherently encompasses predictions for both AOM and BOM. Fig. 4A reveals that the multi-task model consistently outperforms its single-task counterpart across all evaluation metrics, demonstrating an average improvement of 2–3%. Notably, the most substantial improvement is observed in the PRC-A metric, reaching approximately 5%. Furthermore, the multi-task approach exhibits superior performance in terms of variance, achieving lower values across all evaluation metrics. As illustrated in Fig. 4C, the performance analysis across different CYP isoforms reveals distinct patterns for SOM prediction. The major CYP enzymes including CYP1A2, CYP2D6, and CYP3A4-demonstrate remarkable performance in terms of AUC and PRC metrics, achieving AUC scores of 0.92, 0.93, and 0.94 respectively in the single-task framework. However, enzymes with limited substrate data, such as CYP2E1, failed to achieve an AUC score exceeding 0.9, presumably due to data scarcity constraints affecting model performance. Notably, the implementation of multi-task learning substantially improved the performance on these data-limited tasks, with CYP2E1 showing a significant enhancement of 4% in AUC score. Similar performance improvements were consistently observed in the PRC curves across these isoforms. This consistent reduction in variance, coupled with improved performance metrics, strongly indicates that the multi-task learning strategy not only significantly enhances various aspects of model performance in SOM prediction but also substantially improves the stability of models.
To further enhance the reliability of models, we implemented an optimized data split strategy for the SOM dataset. While the initial approach solely considered substrate proportions, our refined methodology ensures proportional representation of various reaction types across all folds (detailed optimization procedures are documented in the SI (Section S2.2)). This optimization strategy resulted in a more balanced dataset distribution. As demonstrated in Fig. 4A, this refined methodology yielded improvements across all weighted average evaluation metrics. Notably, the PRC-R metric showed the most substantial enhancement, with an improvement of approximately 1%. The multi-task model incorporating optimized data partitioning demonstrated superior performance, achieving significant improvements across all evaluation metrics. This comprehensive enhancement suggests that the combination of multi-task learning and optimized data partitioning creates a synergistic effect, leading to more robust and reliable model performance for SOM prediction.
Furthermore, we conducted comprehensive ablation studies to investigate the impact of multi-scale information integration on the performance of SOM prediction. Based on the multi-task framework, we designed three experimental groups categorized by information source: (1) DeepMetab without atom-level reactivity descriptors and global molecular descriptors (DeepMetab (w/o atom, mol)), (2) DeepMetab without global molecular descriptors (DeepMetab (w/o mol)), (3) DeepMetab without atom-level reactivity descriptors (DeepMetab (w/o atom)), (4) full integration of baseline, atomic, and molecular descriptors, DeepMetab. Fig. 4B demonstrates that both the incorporation of atom-level reactivity descriptors and molecular descriptors outperforms the DeepMetab (w/o atom, mol) across all evaluation metrics. Furthermore, DeepMetab achieves the best performance on all metrics except for Jaccard, where it trails behind DeepMetab (w/o mol) by only 0.4%, which highlights that the integration of the multi-scale information into the GNN framework is important for reliable prediction of SOM.
Then, we conducted comprehensive external validation of the SOM module by comparing DeepMetab's predictive capabilities with established models in the field, including FAME3, SMARTCyp3.0, and the CYP450 component (CyProduct) of Biotransformer3.0. Since FAME3 does not differentiate between CYP450 isoforms, the comparison focuses on overall CYP450 metabolic site prediction capability (Fig. 5b1). DeepMetab consistently outperformed FAME3 across all metrics, with particularly notable improvements of approximately 10% in TOP-1 and TOP-2 metrics (Fig. 5b1). Fig. 5b2 presents a weighted average comparison with CyProduct, where weights were assigned based on substrate quantities for each enzyme isoform. DeepMetab demonstrated superior performance across all metrics, with a particularly significant advantage exceeding 10% in the TOP-1 and TOP-2 metrics. The comparison with SMARTCyp3.0, shown in Fig. 5b3, focuses on the three CYP450 isoforms (CYP2C9, CYP2D6, and CYP3A4) that SMARTCyp3.0 can predict. The weighted average results across these three enzymes demonstrate DeepMetab's substantial advantages, with performance improvements exceeding 10% in multiple metrics (Jaccard, TOP-1 and TOP-2). To provide detailed insights into isoform-specific performance, Fig. 5b4–8 presents radar plots comparing performances across different CYP450 isoforms. DeepMetab exhibited superior performance across most isoforms, with particularly notable improvements in low-data isoforms such as CYP2A6, CYP2B6, and CYP2C8. Substantial improvements were also observed across other isoform-specific tasks.
To further evaluate DeepMetab's rule coverage and metabolite prediction accuracy, we conducted a comparative analysis with two other rule-based metabolite prediction platforms, with detailed comparison methodology and specific results presented in the SI (Section S3.3). The results are presented in Fig. 5c, where performance was assessed using Jaccard, TOP-1, and TOP-2 metrics for the final metabolite predictions. Fig. 5c1 illustrates the overall comparison of CYP450 metabolite predictions without isoform differentiation, while Fig. 5c2–4 present isoform-specific comparisons across different CYP450 isoforms. The analysis demonstrates DeepMetab's overall superior overall performance, with only a relatively notable exception in the Jaccard metric for CYP2D6. In all other cases, DeepMetab either significantly outperformed or matched the competing platforms, with particularly impressive advantages in TOP-1 and TOP-2 metrics, where improvements exceeded 10%. These results not only validate DeepMetab's excellence in metabolic site prediction but also demonstrate the robustness of our metabolic rule system. The consistent superior performance across multiple metrics and isoforms underscores the comprehensive nature of our approach to metabolite prediction.
Fig. 6B presents the visualization analysis of chemical bonds across various environmental contexts, employing the same methodology. The analysis reveals that C–N bonds and aromatic bonds predominantly exhibit blue coloration, indicating higher predicted reactivity. Similarly, certain C–O, C–X, and aromatic bonds demonstrate elevated reactivity potential. Specific examples illustrated in Fig. 6D provide deeper insights into these patterns. For instance, the red coloration of certain C–O bonds can be attributed to the absence of available hydrogen atoms on the carbon center (Case 2), which precludes the empirically common dealkylation reactions at these sites. Conversely, other examples highlighted in the figure represent sites with historically documented high metabolic reactivity (Case 1, 3, 4). In addition, Case 5 and Case 6 are examples of aromatic carbon–carbon bonds with high similarity scores. Based on empirical knowledge, epoxidation typically occurs at this site. The slightly lower similarity in Case 5 compared to Case 6 may be due to the influence of the electron-withdrawing chlorine substituent on the benzene ring.
Through this comprehensive hidden layer visualization analysis, we discovered that the model can deeply learn and understand the complex chemical environments of metabolic sites, thereby developing expert-like “insight” for accurately predicting metabolic sites. This visualization not only uncovers multiple potential patterns in CYP450-catalyzed metabolism but also significantly enhances the interpretability of the model.
First, we examined amiodarone (AMD), an FDA-approved antiarrhythmic drug (1985) known for its potential hepatotoxicity risk,38 where CYP450-mediated metabolites are suspected to be significant contributors to liver injury. DeepMetab identified CYP2C8 and CYP3A4 as the responsible enzymes (with scores of 0.86 and 0.93), predicted the metabolic site with a probability of 0.98, and – by applying the corresponding N-dealkylation biotransformation rule – accurately anticipated the formation of desethylamiodarone (DEA). This metabolite has been experimentally validated by Shohei et al.39 to induce hepatotoxicity through oxidative stress mechanisms. To further evaluate DeepMetab's capability in predicting functionally significant metabolites, we analyzed codeine metabolism (Fig. 7B). Codeine, one of the earliest opioid medications, undergoes metabolic transformations that are critical for its therapeutic effects.40 Specifically, DeepMetab accurately predicted the formation of morphine, the CYP2D6-mediated active metabolite included in the training dataset. Moreover, DeepMetab successfully predicted norcodeine, the inactive metabolite primarily formed via CYP3A4 – with enzyme and metabolic site prediction scores both at 0.99 – by applying the corresponding dealkylation biotransformation rule, despite this metabolite being absent from the training data. Additionally, we investigated flecainide, a classical sodium channel blocker first marketed in Europe in 1982. Its metabolism involves CYP450-mediated pathways, with CYP3A4 being the primary enzyme responsible for the formation of hydroxy-flecainide. This metabolite retains similar electrophysiological properties to those of the parent drug, as demonstrated in previous studies.41 DeepMetab accurately predicted the formation of hydroxy-flecainide, identifying CYP3A4 as the responsible enzyme with a score of 0.77 and the metabolic site with a score of 0.98, before successfully applying the dealkylation biotransformation rule-highlighting CYP3A4's role in the dealkylation process (Fig. 7C). These applications demonstrate DeepMetab's reliability and practical value in predicting bioactive metabolites beyond the training dataset and show excellent generalization ability across diverse chemical structures and metabolic pathways.
DeepMetab accurately predicted the primary CYP-mediated metabolite (TOP-1) for 14 out of 18 drugs, achieving approximately 78% accuracy, while for the remaining four drugs (Mobocertinib, Ritlecitinib, Quizartinib, and Palovarotene), the correct metabolite appeared as the second-best prediction (TOP-2), yielding an overall TOP-2 accuracy of 100%. This performance reflects the model's effective end-to-end prediction capability: it successfully identified several major metabolizing CYP enzymes, such as CYP3A4 and CYP2D6, with scores ranging from 0.81 to 1.00. It also predicted fewer common enzymes, for example, the metabolism of Infigratinib by CYP2C8 with a score of 0.74. Concurrently, DeepMetab precisely localized the metabolic sites, often scoring above 0.9, exemplified by the dealkylation site on avacopan (0.97) and hydroxylation on daprodustat (0.92). Finally, the model applied specific metabolic transformation rules – such as dealkylation, hydroxylation, epoxidation, S/P-oxidation, and hydrolysis – to generate detailed metabolite structures consistent with enzyme and site predictions, resulting in accurate and comprehensive metabolic predictions. This high performance highlights not only the reliability and accuracy of DeepMetab but also its ability to prioritize the most relevant metabolites of a drug. Specifically, DeepMetab excelled at ranking the most important metabolites at the top of its predictions, demonstrating their practical value in drug metabolism research. Furthermore, DeepMetab demonstrated the ability to identify and prioritize the most significant metabolites with high accuracy, even in cases where detailed metabolic pathway information was limited. These findings underscore the potential of DeepMetab to significantly advance the study of drug metabolism, particularly for new and unexplored molecules, by providing accurate and actionable predictions that can guide further experimental investigations.
Despite these notable advancements, several methodological and practical limitations remain to be addressed in subsequent research. The expansion of training datasets to encompass a more diverse chemical space and rare metabolic transformation patterns would significantly enhance the model's generalizability across the pharmaceutical chemical space. Furthermore, the incorporation of more sophisticated enzymatic interaction mechanisms and phase II conjugation pathways would yield a more comprehensive metabolic prediction system that better reflects the complexity of in vivo drug biotransformation.
Future methodological enhancements will concentrate on integrating additional cytochrome P450 isoforms and non-CYP metabolic enzyme systems to facilitate comprehensive metabolite prediction and construction of more holistic drug metabolism networks. We intend to extend DeepMetab's predictive domain to encompass a broader spectrum of metabolic enzyme systems, thereby providing more exhaustive metabolic prediction capabilities across multiple biotransformation pathways. Concurrently, we will develop an intuitive web-based interface to facilitate accessibility for researchers and pharmaceutical scientists, enabling seamless integration of these computational predictions into established drug discovery and development workflows. Collectively, these advances position DeepMetab to compress preclinical timelines by rapidly surfacing metabolic hotspots and high-risk metabolites, thereby guiding targeted in vitro follow-ups (e.g., isoform panels, TDI/reactivity screens) and rational prioritization of metabolically favorable scaffolds; in turn, earlier identification of CYP-mediated bioactivation and reactive species is expected to curtail late-stage attrition, while mechanism-aware, end-to-end predictions—informing SOM-blocking substitutions, soft-spot hopping, and isoform-selective optimization—enable safer compound design and tighter integration into DMTA workflows, ultimately improving decision quality across discovery and development.
Supplementary information is available. See DOI: https://doi.org/10.1039/d5sc04631a.
| This journal is © The Royal Society of Chemistry 2025 |