Guanlue Li†
a,
Chenran Jiang†
d,
Ziqi Gao
c,
Yu Liu
b,
Chenyang Liu
d,
Jiean Chen
d,
Yong Huang
*b and
Jia Li
*c
aData Science and Analytics, The Hong Kong University of Science and Technology (Guang Zhou), Guangzhou, 511400, China. E-mail: guanlueli@gmail.com
bDepartment of Chemistry, The Hong Kong University of Science and Technology, Hong Kong SAR 999077, China. E-mail: yliuil@connect.ust.hk; yonghuang@ust.hk
cDivision of Emerging Interdisciplinary Areas, The Hong Kong University of Science and Technology, Hong Kong SAR 999077, China. E-mail: zgaoat@connect.ust.hk; jialee@ust.hk
dPingshan Translational Medicine Center, Shenzhen Bay Laboratory, Shenzhen, 518118, China. E-mail: jiangcr@szbl.ac.cn; cyberyanglinox@outlook.com; chenja@szbl.ac.cn
First published on 2nd September 2025
Effective generation of molecular structures that bind to target proteins is crucial for lead identification and optimization in drug discovery. Despite advancements in atom- and motif-wise models for 3D molecular generation, current methods often struggle with validity and reliability. To address these issues, we develop the Atom-Motif Consistency Diffusion Model (AMDiff), utilizing a joint-training paradigm for multi-view learning. This model features a hierarchical diffusion architecture that integrates both atom- and motif-views of molecules, allowing for comprehensive exploration of complementary information. By leveraging classifier-free guidance and incorporating topological features as conditional inputs, AMDiff ensures robust molecule generation across diverse targets. Compared to existing approaches, AMDiff exhibits superior validity and novelty in generating molecules tailored to fit various protein pockets. Case studies targeting protein kinases, including Anaplastic Lymphoma Kinase (ALK) and Cyclin-dependent kinase 4 (CDK4), demonstrate the capability in structure-based de novo drug design. Overall, AMDiff bridges the gap between atom-view and motif-view drug discovery and accelerating the development of target-specific molecules.
Existing models for de novo molecular generation often draw inspiration from real-life lead optimization strategies used in drug discovery, primarily chemical derivatization13 and scaffold hopping.14 Chemical derivatization involves a sequential approach, where molecules branch out from a known starting point. In contrast, scaffold hopping retains the molecule's overall 3D shape while altering atom connectivity. Recent tools like GraphBP15 and FLAG16 implement chemical derivatization by sequentially introducing specific atoms or motifs into a binding site. Meanwhile, ScaffoldGVAE17 employs scaffold hopping by preserving side chains and modifying the main core. Additionally, novel frameworks, such as one-shot generation methods, present intriguing possibilities by creating entire molecular structures simultaneously.18 TargetDiff19 and DecompDiff20 utilize this approach, employing diffusion models to generate molecules at the atom level in a single step. Regardless various strategies, these models typically use either individual atoms or motifs as building blocks for molecule construction. Atom-based de novo drug design methods21–23 are not limited by predefined motif libraries, allowing exploration of vast chemical spaces and generation of highly diverse compounds. Yet, these methods are confronted with the validity of bond lengths and angles, which can result in the formation of structurally bizarre molecules. In contrast, motif-based approaches16,24,25 utilize predefined libraries to assemble molecules, but the reliance on existing datasets and current chemical knowledge limits exploration of unknown chemical spaces. This restriction confines the potential to generate novel structures beyond the available fragments.
To balance novelty and validity, a hierarchical graph model can be used to generate molecules simultaneously at both the atom and motif levels. Several pioneering works have been inspired by this multi-granularity modeling. DrugGPS21 incorporates an intrinsic two-level structure of the protein, featuring atom-level and residue-level encoders to learn sub-pocket prototypes, generating molecules motif-by-motif. Our development, HIGH-PPI27 aims to establish a robust understanding of Protein–Protein Interactions (PPIs) by creating a hierarchical graph that includes both the PPI graph and the protein graph.
To achieve hierarchical performance at both the atom and motif levels, we introduce the Atom-Motif Consistency Diffusion Model (AMDiff), designed to efficiently generate high-quality 3D molecules for specific binding targets. AMDiff learns target information and constructs a graph structure incorporating topological details. At the ligand level, it employs a hierarchical diffusion approach, capturing both atom-view and motif-view of molecules to fully utilize available information. During molecular generation, we ensure that samples from both views are closely aligned in chemical space. The motif view provides insights into prior patterns, like aromatic rings, which the atom view might miss, while the atom view models diverse structures without being constrained by predefined motifs. The joint training approach leverages complementary information from different views, enhancing interaction during training. AMDiff employs the classifier-free guidance diffusion model in each view. We incorporate features extracted from binding sites as conditional inputs and train both conditional and unconditional diffusion models by randomly omitting the conditioning. This approach ensures balanced molecule synthesis across multiple targets. To enhance the coherence and connectivity of generated molecules, we incorporate persistent homology, a technique from topological data analysis (TDA). This method captures multiscale topological features from both molecules and binding sites. By integrating these topological features, we strengthen the structural characteristics of the generated molecules and refine binding site topology identification based on shape properties. We apply AMDiff to benchmark dataset and two kinase targets, demonstrating superior generation performance and effectiveness compared to other models. AMDiff exhibits exceptional performance when benchmarked to baseline methods, encompassing both atom- and motif-alone models, across diverse metrics. Further analysis on its robustness has verified that AMDiff can produce compounds tailored to varying pocket sizes.
In the proposed model, AMDiff operates hierarchically, incorporating both atomic and motif views, as illustrated in Fig. 1(b). To connect these two views effectively, an interaction network is utilized. This network facilitates the exchange of complementary information between the atom-view and motif-view, enhancing the overall model performance. We establish ligand–protein interactions and cross-view interactions. Ligand–protein interactions are modeled through an equivariant graph neural network, which ensures that the generated molecules fit the target binding sites accurately by considering both geometric and chemical properties. Moreover, cross-view interactions are constructed to bridge the gap between atom-level precision and motif-level abstraction. Motifs interact with the target pocket, offering clustering information to the atom view, while the atom view provides detailed positioning information to the motif view. This bidirectional flow of information ensures that the generated ligands not only fit the binding sites but also maintain structural coherence beyond the predefined motif vocabulary. A schematic view of the AMDiff architecture is shown in Fig. 1(c). The initial step in AMDiff involves obtaining representations of the protein and ligand through an embedding network. Subsequently, a denoising network predicts the state of ligand without noise. Each view employs a denoising process to predict the structure conditioned on binding sites, which includes a forward chain that perturbs data to noise and a reverse chain that converts noise back to data. In the atom-view, the model focuses on capturing the fine-grained details of atomic positions and interactions. This involves learning the precise atomic-level forces and positional information, providing a broader context that aids in forming reasonable molecular clusters and overall topology. In the motif-view, the model captures higher-level structural patterns, such as functional groups and larger molecular fragments, ensuring that the generated ligands are structurally coherent and chemically valid. By obtaining the persistence diagram and encoding it as topological fingerprints, AMDiff effectively captures the multi-scale topological features essential for accurate ligand generation, as detailed in Section 4.3.
We train our model on the CrossDocked dataset.26 During the training phase, both atom-view and motif-view particles, along with their corresponding binding protein pockets, are input to the model. The protein pocket remains fixed as it serves as the conditional information. In the sampling stage, we initialize the data state by sampling from a standard normal distribution, . Subsequent states are iteratively generated using pθ(Gt−1|Gt,C), where C represents the condition. We evaluate AMDiff on the CrossDocked dataset, as well as on Anaplastic Lymphoma Kinase (ALK) and Cyclin-dependent kinase 4 (CDK 4) targets. We assess the performance of our model from two perspectives: (1) understanding the characteristic property distributions of ligands in different protein pockets. This entails learning the interaction patterns with protein pockets in order to achieve stronger binding. (2) Generating molecules for real-world therapeutic targets and exploring their interactions in the presence of mutated target proteins and varying pocket sizes.
In Table 1, we show the mean values with standard deviations of evaluation metrics. Generally, our method demonstrates best performance compared to the baseline methods. AMDiff achieves a 98.9% output validity, indicating its ability to accurately learn the chemical constraints of topological structures. The higher percentages of diversity and novelty in the generated molecules indicate that our model effectively explores the chemical space beyond the molecular structures present in the training dataset. We assess the affinity of the generated conformations of molecules by calculating the Vina docking value between the molecules and target proteins. Compared to the second-best method, AMDiff improves the QED ratio by 5.5%. Among the baseline methods, our model achieves the highest SA score. The results in Table 1 indicate that our model achieves an average affinity of −7.466 kcal mol−1, demonstrating the model's capability to generate molecules with favorable binding affinity. Overall, our model exhibits improved performance compared to other methods.
To further evaluate AMDiff's capacity to accurately capture the distribution of training dataset and detect the distribution shifts with the test dataset, we conduct an additional analysis focusing on the physicochemical properties and topological structures of the generated molecules. We apply medicinal chemistry filters as described in Section 4.6 to exclude molecules with excessively high or low molecular weights, as well as those that are toxic or chemically infeasible. The distribution patterns of various key metrics, including docking score, molecular weight, QED, and SA, are presented in Fig. 2. The result shows that the generated molecules exhibit lower docking scores and smaller standard deviations compared to other baselines, indicating a higher affinity with target proteins. Regarding molecular weight, our model closely resembles that of known ligands, outperforming DiffBP and FLAG models. When comparing the QED distribution respectively, our model demonstrates mean values closer to the training and test sets. Models like AMDiff and FLAG tend to produce compounds with lower SA scores. This is primarily due to their propensity to generate more structurally intricate molecules, such as those with fused ring systems or polycyclic frameworks, which are inherently more challenging to synthesize. To better understand the chemical space occupied by the generated molecules, we assess the 3D shape distribution using NPR descriptors. As shown in Fig. 2(e), the molecules generated by our model tend towards a linear shape, slightly leaning towards the “planar” corner. This alignment with the training set suggests that our model produces molecules consistent with reference ligands. Conversely, FLAG model exhibit more divergent distributions, indicating a deviation from the expected shape characteristics, whose center of the distribution is positioned apart from the center of bioactive ligands. Although DiffBP has a distribution center closer to that of bioactive ligands, its shape distribution is more widely spread toward the planar region. Through above quantitative evaluations, AMDiff demonstrates excellent results by generating diverse and active results molecules, outperforming other methods in the drug design process.
![]() | ||
Fig. 2 Quantitative evaluations of the models targeting the CrossDocked26 test set. (a–d) The distribution of the following metrics: (a) docking score; (b) molecular weight; (c) QED; (d) SA, comparing AMDiff (purple), DiffBP (yellow), FLAG (blue), train set (red), and test set (green) molecules. (e) Visualizes the 3D shape distribution of the generated molecules using NPR descriptors. |
We also calculate various bond angles and dihedral angle distributions for the generated molecules and compare them against the respective reference empirical distributions using Kullback–Leibler (KL) divergence. Bond angles and dihedral angles describe the spatial arrangement between three connected atoms and the rotation between planes defined by four sequential atoms, respectively. As depicted in Table 2, our model demonstrates lower KL divergence compared to all other atom-based baselines. Our model show competitive performance with FLAG, which operates on a motif-wise basis. The motif-wise models offer the advantage of predicting more precise angles within motifs by employing a strategy of combining predefined motifs from an established vocabulary. However, motif-based models also face the challenge of constructing cohesive connections between motifs. We additionally report statistics for aromatic angle distributions, where our model performs comparably to FLAG. The results highlight the effectiveness of AMDiff in capturing geometric characteristics and realistic substructures of bond angles and dihedral angles by utilizing atom and motif-views, thereby approaching the performance of motif-based models without relying on a predefined substructure.
Following previous medicinal chemistry efforts,28,29 we focus on the ATP-binding pockets of these two proteins, executed large-scale molecular generation, and systematically summarized and compared the performance of the generated molecules across various parameters. We employ AMDiff to generate 15000 molecules and utilized a molecular filter, as described in Section 4.6, to identify high-quality candidates. Fig. 3(a)–(d) illustrates the affinity and drug-likeness properties of molecules targeting ALK (PDB id: 3LCS) produced by different methods. A bioactive ligand dataset serves as the reference, establishing a baseline for models capable of designing ALK-targeted active-like molecules.
![]() | ||
Fig. 3 Quantitative evaluations of the models targeting ALK (PDB id: 3LCS). The distributions of the following metrics were analyzed: (a) docking score; (b) molecular weight; (c) QED; (d) log![]() |
Among the affinity prediction (docking score), AMDiff demonstrates the highest performance, indicating that our model has effectively learned the favorable molecular conformations within the target pockets. Additionally, we assess the molecular properties of the generated compounds. AMDiff achieves the highest QED and SA values, which closely resemble the distribution of active compounds. Fig. 3(e) displays the heatmaps of the docking score and QED value distributions for molecules generated by DiffBP, FLAG, and AMDiff. Each data grid is color-coded according to the corresponding SA score. Molecules generated using AMDiff exhibited higher docking scores compared to those generated using DiffBP and FLAG, and were in proximity to the docking scores of bioactive ligands. In addition to the aforementioned indicators, we also assess the spatial similarity between the molecular poses directly generated by models and those obtained after molecular docking. Specifically, we investigate the root-mean-square deviation (RMSD) between the generated conformations and docked structures, with detailed definitions provided in eqn (16) of the Methods section 4.7. As shown in Fig. 3(f), our model exhibits lower RMSD values, indicating minimal deviation from the optimal docked conformations. This demonstrates AMDiff's ability to generate conformations with minimal shifts, closely aligning with the docked poses.
To verify the capability of AMDiff in recognizing protein pockets, we further validated it using 3D visualization. Firstly, to visualize the generative process of the AMDiff, we showcase the gradual generative diffusion process of the model within the binding pocket of CDK4, as in Fig. 4(a). At nodes with time steps of 200, 500, 800, and 1000, atom-view and motif-view progressively capture the features of the protein pockets and guide the diffusion process. Through generation and interaction at these nodes, compound 1 is ultimately formed. AMC-Diff demonstrates distinct generative processes in both atom- and motif-views, allowing for observations of atom or motif substitutions. However, following the cross-view interaction within the hierarchical diagram, the final molecule integrates the benefits of both views, resulting in a structure that is well-suited to the protein pocket. Then, based on similar molecular generation processes, we screen out other compounds from the generated molecules as potential CDK4 inhibitors, and analyze whether our model effectively learns the intricate microscopic interaction patterns within protein-ligand complexes. Key molecular descriptors are exhibited, including the quantitative estimate of QED, SA scores and top-1 docking score from AutoDcok Vina. The best conformation for each compound is also described in both 2D and 3D view. As shown in Fig. 4(b), most of these molecules exhibit interactions with the same amino acid residues. This suggests that the generated molecules are capable of fitting into the binding sites. Regarding pharmacophoric groups, AMDiff creates common important pharmacophore elements as the reference ligands. Specifically, compounds 1 and 2 form hydrogen bonds with Val96 and Glu144. Compound 3 forms hydrogen bonds with Val96, Glu144, and Asp99, as well as a pi–cation interaction with Asp99. Compound 4 forms hydrogen bonds with Val96, Asp97, and Asp158. The binding modes of the generated compounds align with the recognized binding patterns, demonstrating the target-aware ability of AMDiff to utilize known interactions while potentially uncovering novel ones.
![]() | ||
Fig. 4 Examples of molecules generated by AMDiff targeting CDK 4 (PDB id:7SJ3). (a) An example of a conditional design trajectory. At initial time steps, substructures progressively explore interactions with the pocket in both atom-view and motif-view. The trajectory gradually refines into a realistic molecule structure. (b) Molecules designed to target CDK 4 (PDB id:7SJ3), with molecular properties such as QED and SA score, as well as binding affinity and protein-ligand interaction analysis. |
To address this, we generate ALK inhibitors against both wild-type ALK proteins from PDB bank (ALKWT, PDB id: 3LCS) and Alphafold3 (AF-ALKWT). We also design two mutant proteins based on AF-ALKWT through site-directed mutagenesis: (i) AF-ALKG1202R, where glycine (Gly) at position 1202 is replaced with arginine (Arg), and (ii) AF-ALKS1206Y, where serine (Ser) at position 1206 is substituted with tyrosine (Tyr). Fig. 4(c) present the t-SNE visualization of the distributions of the USRCAT fingerprint30 for molecules generated for these four proteins. The results reveal significant overlap in the chemical space of the generated molecules, yet there are distinct regions where the distributions do not overlap. This indicates that AMDiff can explore variations in local areas and align generated molecules with the target binding sites effectively. We further showcase the 3D-binding modes and molecular differences by AMDiff in Fig. 4(d). These molecules can form robust hydrogen bonds with Met1199 on AF-ALKWT, AF-ALKG1202R and AF-ALKS1206Y. For mutations at positions 1202 and 1206, AMDiff recognizes steric hindrance and generates differentiated molecular structures in a targeted manner, coordinating energy loss at the global level. Given AMDiff's sensitivity to mutagenesis-induced differences during molecular generation, we further investigate its performance at different scales of protein pockets. Specifically, we assess ligands generated within ALK pocket sizes ranging from 4 Å to 30 Å. Fig. 5(a) compares the docking score, molecular weight, QED, and SA score of these molecules with those generated by the DiffBP and FLAG. Fig. 5(b) performs the 3D molecular spheres illustrating pocket fitness for molecules generated by AMDiff. The results indicate that AMDiff successfully generated viable molecules across all pocket scales. In contrast, FLAG exhibit limitations in generating normal molecules when the binding sites were too small (4 Å), suggesting these methods are constrained by their preset pocket boundaries. This could be attributed to the introduction of atom-view in AMDiff, which can construct smaller pieces when encountering larger hindrances, thus accommodating small pocket size.
![]() | ||
Fig. 5 (a) The distribution of molecules generated after mutating ALK (PDBID: 3LCS) is shown. The clustering results of USRCAT fingerprints for molecules targeting three mutations were visualized using t-SNE in two-dimensional space. ALKWT: wild-type ALK proteins form PDB bank (PDB id: 3LCS). AF-ALKWT: wild-type ALK proteins form Alphafold (PDB id: 3LCS). AF-ALKG1202R: a substitution of the amino acid Gly with Arg at position 1202 in the protein sequence. AF-ALKS1206Y: a substitution of the amino acid Ser with Tyr at position 1206 in the protein sequence. (b) Examples of molecules generated after modifying residues within the pocket of AF-ALKG1202R, AF-ALKS1206Y and AF-ALKWT. (c) Conditional generation of molecules for various pocket sizes targeting ALK (PDB ID: 3LCS). Comparison of key property performance when utilizing binding pockets of varying sizes, including docking score, molecular weight, SA and QED. (d) Visualization examples showcasing generated samples adjusted to match different pocket sizes. The molecular volumes are tailored to correspond with the given pocket volumes. |
Variant | Validity ↑ | Diversity ↑ | Novelty ↑ | QED ↑ | SA ↑ | Affinity ↓ | |
---|---|---|---|---|---|---|---|
w/o M | 0.896 ± 0.105 | 0.623 ± 0.089 | 0.576 ± 0.035 | 0.427 ± 0.073 | 0.603 ± 0.047 | −7.349 ± 1.205 | |
w/o CG | 0.919 ± 0.061 | 0.598 ± 0.123 | 0.607 ± 0.156 | 0.459 ± 0.018 | 0.638 ± 0.104 | −7.026 ± 1.050 | |
w/o TF | 0.937 ± 0.026 | 0.631 ± 0.078 | 0.618 ± 0.118 | 0.462 ± 0.063 | 0.652 ± 0.069 | −7.109 ± 1.509 | |
AMDiff | 0.989 ± 0.007 | 0.672 ± 0.013 | 0.663 ± 0.104 | 0.479 ± 0.209 | 0.684 ± 0.125 | −7.466 ± 2.062 | |
Test set | — | — | — | 0.476 ± 0.206 | 0.727 ± 0.140 | −7.502 ± 1.898 |
In this paper, we tackle the de novo ligand design problem from a hierarchical perspective, introducing a cross-view diffusion model that generates molecular structures at both the atomic and motif views. Our model excels in recognizing multi-level geometric and chemical interactions between ligands and target proteins. By capturing varying levels of granularity, we construct ligand–protein and cross-view interactions. Existing methods often neglect or inadequately utilize the hierarchical structure within molecules. Through empirical evaluation, our model, AMDiff, demonstrates its strong capability to generate valid molecules while maintaining the integrity of local segments. Furthermore, it exhibits robustness across diverse protein structures and pocket sizes.
To evaluate the performance of AMDiff, we conduct experiments aimed at designing potential hits for selected drug targets, using ALK and CDK4 as case studies. The results demonstrate that AMDiff can successfully generate drug-like molecules with novel chemical structures and favorable properties, exhibiting significant pharmacophore interactions with the target kinases. Additionally, AMDiff demonstrates high flexibility in generating 3D structures with various user-defined pocket sizes, further enhancing its utility. Overall, our work introduces a promising tool for structure-based de novo drug design, with the potential to significantly accelerate the drug discovery process.
![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
The representation of atom types can be achieved through a one-hot vector, denoted as vt. To predict the atom types in molecules, we utilize a discrete diffusion model.38,39 At each timestep, a uniform noise term, βt, is added to the previous timestep vt−1 across the K classes. This discrete forward transition can be expressed as follows:
![]() | (5) |
![]() | (6) |
![]() | (7) |
For the training object of atom types, we can compute KL-divergence of categorical distributions:
![]() | (8) |
Motivated by the ability of guided diffusion models to generate high quality conditional samples, we apply classifier-free guided diffusion to the problem of pocket-conditional molecule generation. The feature of pocket provides a useful guidance signal to generation 3D structure in binding sites. During training process, the pocket in the diffusion model εθ(Gt|R,P) is replaced with a null label ∅ with a fixed probability. During sampling, the output of the model is extrapolated further in the direction of εθ(Gt|R,P) and away from εθ(Gt|∅,P):
![]() | (9) |
Each motif is treated as a rigid structure associated with a local coordinate frame. During the diffusion process, the motif view predicts both a motif ID (from a pre-defined vocabulary) and a Euclidean transformation consisting of a 3D translation vector. This transformation is applied to the idealized coordinates of the selected motif, thereby situating it in global space. Motif-level updates influence atom-level generation through a shared embedding space and synchronized updates in the joint diffusion network. Additional details are provided in SI C. For motif IDs, we adopt a discrete diffusion process governed by a uniform transition matrix, consistent with the atom-level scheme in AMDiff. Additionally, we introduce an alternative variant for comparison, AMDiff-Embedding Distance (AMDiff-ED), which employs transition matrices constructed from motif embedding similarities to guide the forward nosing process. Further details are provided in SI Section E.4.
We employ atom-view and motif-view diffusion models to generate feature representations in a latent space. To formulate the proposed AMDiff approach, we introduce a hierarchical diffusion model for the atom view and motif view, denoted by Φ(Gt,θ1) and Φ(Mt,θ2), respectively. These hierarchical diffusion networks update simultaneously, and the overall recovery process can be represented as:
(Ĝ0,![]() | (10) |
The reverse network, denoted as Φθ1,θ2(Gt,Mt,t,(R,P)), is constructed using equivariant graph neural networks (EGNNs).40 We build the k-nearest neighbors graph between ligand atoms, motifs with the condition pocket and protein atoms, and using message passing to model the interaction between them:
![]() | (11) |
vG = ϕv(hGL), w = ϕw(hML). | (12) |
For final molecular integration, we employ an atom-view rather than motif-based assembly, enabling the construction of more flexible structures unconstrained by predefined motif vocabulary. In our implementation, motif rotational degrees prediction is omitted since the combination of motif positions and their vocabulary IDs adequately captures both the structural features and protein-ligand interaction patterns.
We utilize filtration functions, denoted as f, to calculate the persistence diagrams (PD) ph(x,f) = {D0, …, Dl}, where Dl is the l-dimensional diagram and x is the point cloud of pockets or ligands. The resulting PD reflects the multi-scale summarized topological information. Then we calculate the normalized persistent entropy from the PD through the method introduced in:44
![]() | (13) |
FG = ϕd(Enorm(ph(xG))), FR = ϕd(Enorm(ph(xR))). | (14) |
The topological fingerprints is incorporated to pocket and ligand representations. FG and FR are concatenated with geometric and chemical features of the ligand and pocket to form the initial input representations. Then the features are also fed into the global context encoder, allowing the model to learn how topological patterns influence atomistic interactions. To further preserve critical global structures—such as rings or cavities—topological coherence constraints are incorporated during optimization. These constraints are implemented as regularization terms in the loss function, comparing the persistence diagram of the predicted molecule to that of the ground truth.
L = Lapos + λ1Latype + Lmpos + λ2Lmid. | (15) |
In this work, we apply the widely used metrics of deep generative models to evaluate the performance of our method. (1) Validity measures the percentage of generated molecules that successfully pass the RDKit sanitization check. (2) Uniqueness is assessed by calculating the percentage of unique structures among the generated outputs. (3) Diversity considers the proportion of unique scaffold structures generated, as well as the internal diversity values calculated for 2D and 3D structures using the Morgan fingerprint and USRCAT, respectively. (5) Novelty measures the proportion of unique scaffold structures generated that are not present in either the test set or the known ligand set. (6) Molecular properties. We report the distributions of several important 2D and 3D molecular properties, including molecular weight (MW), QED, Synthetic Accessibility (SA), logP, normalized principal moment of inertia ratios (NPR1 and NPR2). SA is a widely used computational metric that estimates how easily a molecule can be synthesized based on its structural features. These distributions are compared to those of the train and test set to assess the model's ability to learn important molecular properties. (6) Affinity: the average binding affinity of the generated molecules, which is assumed to be characterized by the docking score. The value in kcal mol−1 is estimated by AutoDock Vina. (7) RMSD stands for root-mean-square deviation, which is a metric used to measure the dissimilarity between different conformations of the same molecule. A smaller RMSD value indicates a higher degree of similarity between the conformations. The formula for calculating RMSD is as follows:
![]() | (16) |
We conducted a comparative analysis with several baselines: liGAN50 is a method that combines a 3D CNN architecture with a conditional variational autoencoder (VAE), enabling the generation of molecular structures using atomic density grids. AR47 introduces an auto-regressive sampling scheme to estimates the probability density of atom occurrences in 3D space. The atoms are sampled sequentially from the learned distribution until there is no room within the binding pocket. Pocket2Mol51 design a new graph neural network capturing both spatial and bonding relationships between atoms of the binding pockets, then samples new drug candidates conditioned on the pocket representations from a tractable distribution. GraphBP15 use a flow model to generate the type and relative location of new atoms. It also obtains geometry-aware and chemically informative representations from the intermediate contextual information. TargetDiff19 is a 3D SE(3)-equivariant diffusion model that jointly generates atomic coordinates and atom types in a non-autoregressive manner for target-aware molecular design. DecompDiff20 is an end-to-end diffusion-based method that utilizes decomposed priors and validity guidance to generate atoms and bonds of 3D ligand molecules. FLAG16 constructs a motif vocabulary by extracting common molecular fragments from the dataset. The model selects the focal motif, predicts the next motif type, and attaches the new motif. DiffBP52 proposed a diffusion model that generates molecular 3D structures by simultaneously denoising both atomic element types and 3D coordinates using an equivariant network architecture.
The SI file provides a detailed description of the data collection and preprocessing steps, together with illustrations of the network architecture, motif representation, model training procedure, and hyperparameter settings. It also reports extended results across several dimensions, including performance on multiple targets, sensitivity to motif vocabulary size, evaluation on the PoseBusters test, and analyses of discrete diffusion transition matrices. Finally, it includes a comprehensive description of the algorithm used in this work. Supplementary information is available. See DOI: https://doi.org/10.1039/d5sc02113h.
Code availability: the source code for model architecture is publicly available on our GitHub repository https://github.com/guanlueli/AMDiff.
Footnote |
† Equal contributions. |
This journal is © The Royal Society of Chemistry 2025 |