Open Access Article
Jintao
Zhu‡
a,
Zhonghui
Gu‡
b,
Jianfeng
Pei
*a and
Luhua
Lai
*abcd
aCenter for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China. E-mail: lhlai@pku.edu.cn; jfpei@pku.edu.cn
bPeking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
cBNLMS, College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, China
dPeking University Chengdu Academy for Advanced Interdisciplinary Biotechnologies, Chengdu, Sichuan, China
First published on 9th April 2024
Molecular docking, a key technique in structure-based drug design, plays pivotal roles in protein–ligand interaction modeling, hit identification and optimization, in which accurate prediction of protein–ligand binding mode is essential. Conventional docking approaches perform well in redocking tasks with known protein binding pocket conformation in the complex state. However, in real-world docking scenario without knowing the protein binding conformation for a new ligand, accurately modeling the binding complex structure remains challenging as flexible docking is computationally expensive and inaccurate. Typical deep learning-based docking methods do not explicitly consider protein side chain conformations and fail to ensure the physical plausibility and detailed atomic interactions. In this study, we present DiffBindFR, a full-atom diffusion-based flexible docking model that operates over the product space of ligand overall movements and flexibility and pocket side chain torsion changes. We show that DiffBindFR has higher accuracy in producing native-like binding structures with physically plausible and detailed interactions than available docking methods. Furthermore, in the Apo and AlphaFold2 modeled structures, DiffBindFR demonstrates superior advantages in accurate ligand binding pose and protein binding conformation prediction, making it suitable for Apo and AlphaFold2 structure-based drug design. DiffBindFR provides a powerful flexible docking tool for modeling accurate protein–ligand binding structures.
000 to 108 molecules,2,3 leading to the discovery of several drugs. While the HTS libraries represent a significant advancement over traditional lab-designed ones, they encompass only a fraction of potential drug-like molecules.4 Given the challenges and expenses associated with synthesizing such a vast chemical space, computational methods for screening virtual libraries are frequently employed in drug discovery, allowing exploration of chemical spaces comprising tens of billions of molecules or even more.5–7
Structure-based virtual screening (SBVS) enables rapid and cost-effective modeling of target-molecule binding structures from large-scale compound libraries together with the evaluation of their binding affinities for identifying potential hits.8–10 Molecular docking is one of the most frequently employed techniques for SBVS, which is utilized to predict ligand binding poses, characterize protein–ligand binding strength, and identify key interactions.11,12 In general, conventional docking approaches, including AutoDock4,13 AutoDock Vina,14,15 Smina,16 Glide,17 and GOLD,18 leverage heuristic search algorithms, to explore a variety of potential ligand conformations. Scoring functions with simplified terms are utilized for fast estimation of binding affinity and priority of ligand poses.
Classical molecular docking methods describe protein–ligand interactions based on the lock-and-key model,19 wherein a rigid receptor binding pocket serves as the “lock” and the molecular docking algorithm primarily optimizes the ligand's conformation to find a complementary “key”. Such rigid receptor docking methods, for the trade-off between accuracy and computational efficiency, strive to determine the optimal and complementary binding conformation. When known complex structures are available, and ligand molecules are removed and then re-docked into the native Holo pockets, rigid receptor docking often achieves impressive success rate.20 However, in real-world docking tasks without knowing the binding conformation in advance, ligand induced pocket conformational changes may produce wrong docking results.21 Virtual screening and rational design on the unbound (Apo) or computationally modeled structures usually give unsatisfied hit rate.22–27 Without accounting for pocket flexibility, the performance of docking methods experiences drastic decrease in such cases,28 which may rule out potential hits during the early stage of drug discovery. Although AlphaFold29 is capable of accurately modeling target protein structures, traditional docking methods that overlook potential side chain flexibility perform less effectively when applied to the predicted structures.30
Currently, there are two main strategies to address the flexibility of protein pockets. The first approach involves inducing local conformational rearrangements of the target using force field or scoring function-based calculations. For instance, rDock31 allows movements of functional groups that can form hydrogen bonds including –OH and –NH3. AutoDockFR (AutoDock for Flexible Receptors)32 allows users to specify up to 14 flexible side chains in advance and samples reasonable side chain dihedral angles from a rotamer library. Despite its better performance than AutoDock Vina in cross-docking benchmarks with Apo structures, it is considerably time-consuming and requires prior knowledge of potentially critical side chains in the pocket, which limits its application in SBVS. These methods consider partial flexibility of pockets and are suitable to handle the cases with only minor conformational rearrangements mainly resulting from side chain movements. There are a few methodologies that have been carefully devised to account for full flexibility, primarily categorized into ensemble docking and induced fit docking.33 Ensemble docking aims to implicitly mimic the dynamical behavior of a receptor's binding site represented by a conformational ensemble.34 The most common implementation is straightforward by docking each ligand to multiple rigid conformations of a receptor. The conformational ensemble could be obtained either by crystallographic structures or structure models derived through computational methods such as molecular dynamics (MD) simulation35 and normal modes.36 Clearly, ensemble docking depends significantly on the diversity and structural quality of the conformational ensemble for well-defined representation of the receptor's flexibility. It has been observed to be ineffective in identifying the true receptor conformations derived from clustering.37 Instead, Schrödinger IFD-MD,38 one of the rigorous induced fit docking workflows, explicitly emulates the interactions between receptors and ligands. IFD-MD elaborately integrates pharmacophore docking, rigid receptor docking, energy-driven binding site refinement, and MD simulations for multiple iterations.38 IFD-MD requires a template pose which can be obtained either by docking a known binder to the target, or by aligning the structure to a protein homologue and grafting the post-aligned ligand coordinates. Such a knowledge-based approach adroitly leverages the available protein–ligand interaction patterns or homologous ones, which ensures the generation of a reliable initial pose. The placement often results in clashes between the ligand and the pocket, which can be resolved later in the side chain repacking and backbone minimization. Subsequently, multiple independent MD simulations and metadynamics simulation for the assessment of pose stability are run. Finally, the final poses are ranked by a composite scoring function. It is evident that this workflow burdens a significant allocation of computational resources.38,39 Despite the efforts to relax the systems, it generally fails in cases involving backbone motion.38,39
The second approach is the recently developed deep learning-based methods,40 which coarsen the representation of protein pockets by only encoding the protein backbone atoms without explicitly including side chain atoms. This kind of representation is insensitive to minor pocket backbone flexibility and side chain adaptability. Earlier works, like DeepDock,41 TankBind,42 and EDM-Dock,43 predicted pocket residue-ligand distance map, which is used to reconstruct the binding structure. Leveraging powerful equivariant neural networks like EGNN,44 geometric deep learning45 models such as EquiBind,46 LigPose,47 E3Bind,48 Uni-Mol,49 and KarmaDock50 iteratively predict the three-dimensional coordinates of ligands directly around the whole protein (blind docking) or predefined pocket. Recent SOTA blind docking method DiffDock,51 based on the diffusion generative modelling,52 employed the SE(3) equivariant neural network53 to denoise the rotation, translation, and bond torsion of ligand, and then rank poses by additional confidence model. However, these existing deep learning-based docking approaches face limitations in effectively handling protein flexibility and the generated ligand poses are often implausible.54 The generated ligand structures often contain clashes with the target and irrational bond lengths, angles, and torsion angles that lead to high intra energies. Ligand conformational optimization using tools such as RDKit alignment50,55 cannot completely alleviate ligand and protein clashes. Furthermore, ignoring the target flexibility and validity of ligand poses makes it challenging for these deep learning-based methods to capture key interactions in docking.54 Recently, building upon the methodology of optimizing ligand coordinate recycling as developed in LigPose and KarmaDock, a deep learning-based flexible docking method named FlexPose56 has extended its predictive capabilities of pocket side chain coordinates. This advancement allows for more details of interaction information in cross-docking applications. However, like LigPose and KarmaDock, FlexPose encounters the same inherent limitation. The methodological focus on fitting coordinates within Euclidean space tends to overfit the overall RMSD (Root Mean Square Deviation). Consequently, FlexPose, akin to its predecessors, is inevitably limited by conformational rationality. Overall, these inadequacies of current methods impede subsequent steps, such as post-optimization of ligands by experts based on the detailed interactions or conducting further studies through molecular dynamics simulations.
Early Apo–Holo pair analysis has shown general consensus that upon ligand binding protein pocket undergoes significant side chain conformation heterogeneity while backbone is relatively rigid in most cases.57–59 Therefore, in most cases, side chain flexibility modelling is enough for flexible docking. In this study, we developed a full-atom flexible docking model, DiffBindFR, based on the diffusion framework (Fig. 1). In the comprehensive evaluation, starting from pocket conformations with randomized side chain torsion angles, DiffBindFR outperforms state-of-the-art (SOTA) deep learning techniques and traditional docking methods. Owing to the explicit full-atom modeling of pocket residues, and its learning of joint optimization of variables within the entire system across a product space composed of torsional angles, rotations, and translations, DiffBindFR can not only accurately recover protein pocket side chain conformations, but also generate precise and highly physically plausible ligand binding poses. In cross-docking benchmark, DiffBindFR, due to its capability of full pocket side chain optimization, significantly outperforms traditional flexible docking methods like IFD-MD,38 AutoDock VinaFlex32 and rDock.31 It also gives superior performance in generating both accurate and valid binding structures compared to existing docking methods in the Apo dataset and AlphaFold2 modeled structures.
log
Pt(x(t)).60 In the docking procedure, starting from the randomly initialized binding conformation, the scores predicted by DiffBindFR are used to solve the reverse VE-SDE process60 to implement denoising sampling. With physics-based scoring function Smina16 or mixture density neural network (MDN)41 serving as confidence model, binding structures sampled by DiffBindFR can be ranked, and then the top-1 complex pose can be selected as the final prediction.
| dx = f(x,t)dt + g(t)dw | (1) |
For
,
denotes a vector-valued function called the drift coefficient of x(t), and
denotes a scalar function called the diffusion coefficient of x(t). The lack of canonical local coordinate system defined for ligand molecules, makes the drift coefficient hard to design for the ligand rotation. Consequently, the drift coefficient f(x,t) is set to be 0, and the model becomes the score-based generative model.60 The reverse diffusion running backwards in time, which is also known as the denoising process, is given by the following reverse-time SDE:
dx = −g(t)g(t)T∇x(t) log Pt(x)dt + g(t)d![]() | (2) |
To estimate ∇x(t)
log
Pt(x), we can train a score-based neural network Sθ(x(t),t) to fit it. The standard score-match loss function is as follows:
![]() | (3) |
is the precomputed weight factors.
![]() | (4) |
, translation of a ligand pose
lies in the 3D translation group
. The diffusion kernel for ligand is a Gaussian function with variance σT as follows, which is also utilized for computing the score of ligand translation ∇Pt|0(Xl(t)|Xl(0)):![]() | (5) |
As rotation of a ligand pose lies in the 3D rotation group
, IGSO(3) distribution63,64 was chosen as the diffusion kernel. In specific, rotation matrix
can be split into a unit vector
uniformly as the rotation axis and a axis-angle ω ∈ [0, π]. Consequently, the functionality of σR can be replaced by
and ω. The diffusion kernel for ligand rotation is as follows:
Pt|0(R(t)|R(0)) = R( ,ω)R(0) | (6) |
The score of rotation diffusion can be computed according to
![]() | (7) |
![]() | (8) |
. We introduced the diffusion kernel from the work of Torsional Diffusion65 to satisfy angle periodicity, and compute its score ∇Pt|0(χ(t)|χ(0)) as follows:![]() | (9) |
Torsion of ligand rotatable bonds are dealt with the same way as pocket side chains.
Following the eqn (3), the loss function is set as follows:
![]() | (10) |
The forward diffusion and reverse diffusion are both performed in the product space66 of
, corresponding to the aforementioned four kinds of transformation.
During the forward diffusion process, we would sample t ∈ [0, 1] for each pocket–ligand pair, and then utilize the defined diffusion kernel to sample each transformation. The torsions of ligand and pocket side chains are first applied to the pose, followed by translation and rotation.
The starting point of the denoising stage is a ligand conformation generated by RDKit55 and pocket side chains, with each type of transformation sampling from their σmax. According to eqn (2), we update complex pose using the predicted score for each type of transformation. After applying the translation and rotation matrix constructed from predicted score, torsion angles get updated. It is noteworthy that there exists entanglement between ligand translation/rotation and its bond torsion, ligand pose need to be re-aligned after bonds are twisted, which will lead to model-unaware structural alignment error. With the sampled pocket side chains fixed, we perform fast local energy relaxation on the pose using Smina16 for error correction, obtaining the final binding conformations. The number of the denoising steps is defined as 22, and 40 poses are sampled for each pocket–ligand pair, which takes in average 50 s when the batch size is set to 16 on a single 32 GB NVIDIA Tesla V100-SXM2 GPU card.
739 structures are used for training and 968 structures are used for validation. The dataset is named as “PDBbind time-split dataset” in the article.
The time-split of PDBbind is supported to be more strict and reasonable with the protein sequence similarity of 0.484 between test set and training&validation set, compared to CASF2016-split whose protein sequence similarity is 1.00 (ESI Table S3†). Volkov et al.68 have shown that the time-split of PDBbind is more practical and critical over artificial splits such as ligand scaffolds or protein sequence/structure similarity for the model generalization in drug repurposing, lead optimization, and virtual screening.
30 that contains 18 human GPCR complexes published after April 30, 2018. ApoBind,71 AHoJ72 and SIENA73 are utilized to search for corresponding Apo and Holo states based on queried Holo structures, thereby creating pairs for the Apo–Holo and Holo–Holo mixed cross-dock dataset. The detailed protocol for constructing the CD test set can be found in ESI Section 1.† The finalized CD test set comprises of 14
462 structural pairs for cross-docking benchmark tests.
![]() | (11) |
Subsequently, the performance of traditional and recent deep learning-based methods is evaluated for comparison. The top-1 docking poses of each method, selected based on its confidence model or scoring function, are analyzed. DiffBindFR-Smina and DiffBindFR-MDN significantly outperform other deep learning-based pocket docking methods (Fig. 3(A)), including KarmaDock with RDKit55 ligand conformation alignment (KarmaDock Align) and TankBind with predefined pocket (TankBind-Pocket). This highlights the advantage of our full-atom based model. Even when compared to the traditional rigid receptor docking methods AutoDock Vina and Glide, with the experimentally determined side chain conformations (redock), DiffBindFR-Smina achieves a marginally higher success rate without knowing the side chain conformations in the complex. It is noteworthy that on this test set, the re-docking success rates of Glide and AutoDock Vina are only 42.2% and 46.3%, respectively, which may be lower than the expectations for conventional methods in re-docking performance.15,75 Our analysis suggests that this discrepancy could stem from dataset differences and the extent of exhaustive sampling (ESI Section 10†).
We further used Rosetta76 to repack side chain conformations in these Holo structures to simulate an Apo-like state for each target protein. The docking success rate of Vina and Glide significantly decreases in these Apo-like proteins, underscoring the limitations of rigid receptor docking methods in handling side chain movements and the importance of flexible docking in virtual screening. To illustrate the challenges of flexible docking, we re-trained the KarmaDock by integrating a ResNet module77 (ESI Fig. S4†) for predicting side chain torsion angles, resulting in a new model named KarmaDock-sc Align. DiffBindFR significantly surpasses KarmaDock-sc Align in terms of side chain recovery (Fig. 3(B)). Compared to KarmaDock, the performance of KarmaDock-sc Align significantly declines (Fig. 3(A)) due to the difficulty in balancing ligand coordinate recovery with side chain torsion recovery, highlighting the complexity inherent in flexible docking. As is widely recognized, factors like the number of heavy atoms and rotatable bonds in a ligand profoundly impact the success rate of conventional docking programs.78 Hence, we examine the relationship between L-RMSD from the DiffBindFR model and various ligand characteristics, such as molecular weight, rotatable torsion bonds, and changes in solvent accessible surface area (DeltaSASA) upon binding. Contrary to traditional methods,14,17,31,47 as the difficulty of docking increases, including the increase in molecular weight, the number of rotatable bonds, and the decrease in the ligand buriedness, the L-RMSD achieved by DiffBindFR-Smina is not significantly affected (Fig. 3(C)). Additionally, DiffBindFR is able to achieve a more significant advantage in docking success rates compared to traditional methods (Fig. 3(D)).
Among blind docking methods, DiffDock shows better performance (success rate of 38%) than TankBind (16%) and EquiBind (2%), but most of their generated poses are invalid due to ignoring protein side chains (Fig. 4(A)). Ligand energy minimization significantly improves the PB-success rate of DiffDock (35%). TankBind and EquiBind also see improvements in PB-success rate with energy minimization, but still lag behind DiffDock (Fig. 4(B)). Although blind docking is a tough task for its broad searching space in the whole protein, flexible pocket docking method like DiffBindFR, denoising a chaotic side chain conformation into a well-packed conformation having valid interaction with the ligand, has much more objectives for prediction. DiffBindFR, utilizing Smina scoring function or MDN network as the confidence model to select the top-1 pose from 40 generated ones, outperforms all other deep learning-based blind docking methods and pocket docking methods. DiffBindFR-Smina and DiffBindFR-MDN demonstrate both high success rate (50.2% for DiffBindFR-Smina and 48.1% for DiffBindFR-MDN) and PB-success rate (49.1% for DiffBindFR-Smina and 44.4% for DiffBindFR-MDN), with lower penalties by Posebusters compared to other deep learning-based methods, showcasing the capability of DiffBindFR in generating accurate and physically plausible complex poses. The performance of DiffBindFR is comparable to traditional rigid receptor docking methods using known ground-truth side chain conformations for redocking. As is depicted from ESI Fig. S7,† DiffBindFR shows its effectiveness in binding site identification and pocket side chain recovery on Posebusters test set, as well. Force field optimization has minimal impact on DiffBindFR generated structures, which also demonstrates the high physical plausibility of DiffBindFR generated poses. Among other deep learning-based pocket docking methods, KarmaDock Align achieves the highest success rate (30.4%) but a very low PB-success rate (6.1%). Force field optimization of ligands rescues most poses with L-RMSD < 2 Å into good physical validity. KarmaDock Align, EDM-Dock, TankBind-pocket, Uni-Mol, and EDM-Dock, which focus on fitting the RMSD of the ligand during training, tend to ignore the intra energy of the generated poses and protein side chains, as is shown from ESI Fig. S9.† Indeed, force field optimization is not allowed in realistic docking to meet the demands of high-throughput screening.
Four specific cases from the Posebusters test set (ESI Fig. S10†), never trained or seen by DiffBindFR, are presented to highlight its superiority over other methods focusing solely on ligand coordinates while neglecting ligand conformation validity. In complexes with PDB ID 6TW5, 7PRM, 7T1D, and 7CD9, DiffBindFR successfully docks ligands into precise positions with valid conformations and recovers pocket side chains into good interaction with ligands. In contrast, KarmaDock Align, EDM-Dock, and TankBind-pocket fail to predict correct binding ligand poses, and their generated poses cannot pass the physical plausibility check of the Posebusters suite. As is shown in ESI Table S5,† ligand poses generated by EDM-Dock and TankBind-pocket exhibit both internal invalidity (including internal steric clash, bump aromatic ring, etc.) and steric clash with proteins, while KarmaDock Align, due to using RDKit for ligand pose alignment, frequently fails in reducing ligand–protein clash.
67 with target proteins in the Apo and Holo states, GPCR-AF2 set with Apo-like proteins predicted by AlphaFold2,29 DUDE27-HoloEns set with different protein Holo states and Ensemble sets featuring prominent docking targets including CDK2, EGFR and FXA. Cα RMSD of binding site backbone (within 5 Å cutoff away from crystal ligand) conformational changes in these subsets predominantly range between 0–2 Å, as shown in ESI Fig. S1.†
In these subsets, DiffBindFR-MDN and DiffBindFR-Smina achieve significantly higher PB-success rate (Table 1) than all the traditional docking methods and deep learning-based docking methods. When considering L-RMSD alone (Fig. 5), traditional rigid receptor docking methods such as Vina, Smina, LinF9,82 and Glide underperform, were compared to deep learning-based methods that use main-chain coarse-grained representations of proteins. As indicated in Fig. 5 and ESI Table S6,† the L-RMSD median for methods like TankBind-pocket, EDM-Dock, and KarmaDock Align hovers around 2 Å across the subsets, whereas for traditional rigid receptor docking methods, it even surpasses 5 Å in subsets like CDK2, EGFR, and ApoRef. However, when physical plausibility is taken into account, the PB-success rate for TankBind-pocket, EDM-Dock, and KarmaDock Align drops to levels similar to traditional rigid receptor docking methods (below 10%). Notably, in the Ensemble-FXA subset, methods such as Vina, LinF9, Smina, Gnina, and Glide perform significantly better, achieving PB-success rates of 34.4%, 36.5%, 36.2%, 39.4%, and 27.1%, respectively. Traditional flexible methods such as VinaFlex and rDock, developed for cross-dock scenarios, were also evaluated. VinaFlex shows poorer performance than rigid receptor docking methods in both L-RMSD distribution and PB-success rate, due to its reliance on predefined side chains and limitations on the number of flexible side chains. The flexible method rDock, capable of optimizing side chains conducive to hydrogen bonding, outperforms all traditional rigid receptor docking methods in L-RMSD distribution and has higher PB-success rate than TankBind-pocket, EDM-Dock, and KarmaDock Align. We observed that both traditional rigid receptor and flexible docking methods perform better in subsets like Ensemble-FXA, DUDE27-HoloEns and CASF2016, where pocket backbone conformational changes are minimal (mostly between 0–0.5 Å, as is depicted from ESI Fig. S1†). Our method DiffBindFR, leveraging a full-atom based neural network to learn additional side chain movements, marginally outperforms all other methods in accurately recalling ligand coordinates and ensuring the validity of complex poses. On the CD test set, the MDN network surpasses the Smina scoring function for pose ranking. DiffBindFR-MDN achieves state-of-the-art PB-success rate of 67.4%, 47.8%, 79.4%, 47.6%, 36.2% and 63.6% in CDK2, EGFR, FXA, ApoRef, DUDE27-HoloEns and CASF2016, respectively. Furthermore, to investigate the efficacy of various methods in the context of cross-docking scenarios involving Apo–Holo pairs, we conduct a detailed computational assessment of these methods using a subset of 660 Apo–Holo pairs from the CASF2016 dataset. In this analysis, DiffBindFR-MDN emerges as the most effective technique, demonstrating superior performance in both terms of L-RMSD distribution and PB-success rate (56.1%, as is shown from ESI Table S7†). When the Holo structures for targets of interest are available, experts prefer to use these structures for virtual screening and lead optimization through molecular docking. Thus, we have constructed the DUDE27-HoloEns test set, entirely made up of Holo–Holo pairs, to evaluate the application potential of DiffBindFR in scenarios where target proteins possess resolved Holo structures. The results show that DiffBindFR-MDN performs the best on the DUDE27-HoloEns subset, achieving a PB-success rate of 36.2% and a median L-RMSD of 2.08 Å (ESI Table S6†), with DiffBind-Smina following closely (PB-success rate of 29.5%). We find that traditional docking programs demonstrate enhanced proficiency in docking with Holo-state structures and perform better on the DUDE27-HoloEns subset and CASF2016 subset than other Apo–Holo cross-dock sets (ESI Tables S6–S9†), which could be attributed to subtle variations in conformational heterogeneity, favoring the binding of various bioactive molecules (see ESI Fig. S1 and Table S2†). Additionally, in extreme scenarios where the target protein lacks resolved Holo structures or even available crystal Apo structures, drug developers must rely on AlphaFold2 to obtain a predicted target structure as the starting point for drug design. On the GPCR-AF2 subset, serving as approximations of Apo–Holo cross-docking where the target proteins are predicted by AlphaFold2, DiffBindFR-MDN shows a slightly lower PB-success rate of 21.7% compared to DiffBindFR-Smina (22.7%). This decrease in performance is attributed to higher penalties from ligand–protein clashes.
| Method | PB-success rate | |||||||
|---|---|---|---|---|---|---|---|---|
| Ensemble-CDK2 | Ensemble-EGFR | Ensemble-FXA | ApoRef | CASF2016 | DUDE27-HoloEns | GPCR-AF2 | ||
| a Best performance in bold for the highest PB-success rate. | ||||||||
| Traditional rigid receptor docking methods | Vina | 0.079 | 0.060 | 0.344 | 0.082 | 0.294 | 0.160 | 0.136 |
| LinF9 | 0.061 | 0.119 | 0.365 | 0.089 | 0.272 | 0.175 | 0.076 | |
| Smina | 0.079 | 0.090 | 0.362 | 0.077 | 0.304 | 0.157 | 0.152 | |
| Gnina | 0.099 | 0.090 | 0.394 | 0.082 | 0.320 | 0.179 | 0.106 | |
| Glide | 0.154 | 0.090 | 0.271 | 0.091 | 0.219 | 0.172 | 0.182 | |
| Traditional flexible docking methods | VinaFlex | 0.013 | 0.000 | 0.005 | 0.015 | 0.023 | 0.034 | 0.045 |
| rDock | 0.257 | 0.134 | 0.440 | 0.157 | 0.296 | 0.220 | 0.212 | |
| Deep learning-based docking methods | TankBind-pocket | 0.100 | 0.015 | 0.110 | 0.040 | 0.123 | 0.049 | 0.015 |
| EDM-Dock | 0.051 | 0.015 | 0.009 | 0.011 | 0.064 | 0.022 | 0.00 | |
| KarmaDock Align | 0.135 | 0.045 | 0.009 | 0.047 | 0.136 | 0.034 | 0.015 | |
| DiffBindFR-Smina | 0.564 | 0.403 | 0.789 | 0.434 | 0.566 | 0.295 | 0.227 | |
| DiffBindFR-MDN | 0.674 | 0.478 | 0.794 | 0.476 | 0.636 | 0.362 | 0.212 | |
Here, we present the docking poses of various methods on two examples from the GPCR-AF2 subset (Fig. 6). The first example involves a crystal structure with the PDB ID 6RZ6, where we investigate the target protein from 6RZ6 identified as the human Cysteinyl leukotriene receptor 2.83 This receptor plays a role in regulating pro-inflammatory responses associated with allergic disorders. The ligand in this case is an antagonist, ONO-257036. We predict the AF2 structure (Apo-like) of the receptor protein and then dock the ligand molecule into this AF2 structure. Following binding sites alignment, the predicted AF2 structure exhibited a binding site (within 5 Å cutoff away from crystal ligand) RMSD of 0.73 Å when compared with the crystal structure. However, in the Apo-like AF2 predicted structure, residues R267 and L190 is found to block ONO-25703 binding to pocket, preventing traditional rigid receptor docking methods like Glide and Vina from locating the correct binding site. VinaFlex, despite its ability to leverage side chain flexibility, also fails to dock the ligand correctly. KarmaDock Align, similarly did not achieve correct docking, although it shows better L-RMSD than the methods mentioned above. In contrast, our flexible docking method, DiffBindFR, coupled with Smina and MDN confidence model can select the top-1 complex pose from the 40 poses. DiffBindFR adeptly repacks the side chain of the R267 residue, enabling the ligand to successfully dock into the correct binding position. The position of the R267 side chain predicted by DiffBindFR is somewhat different. In the crystal structure, the R267 side chain forms a hydrogen bond interaction with the Y123 side chain, whereas in the DiffBindFR-predicted structure, R267 forms a hydrogen bond interaction with the Y119 side chain, which can be attributed to the similarity in the spatial positions of the Y123 residue and the Y119 residue relative to R267.
![]() | ||
| Fig. 6 The binding poses of two cases from the GPCR-AF2 subset in the CD test set. In all panels, Holo protein and ligand are shown in grey. AF2 modeled protein structure is shown in pale green. DiffBindFR sampled ligand and pocket side chains are shown in blue. Note that DiffBindFR calculations were done using the AF2 modeled backbone structures. For each frame, the docking method used is given at the bottom left. The pocket Cα RMSD between Holo and AF2 structure within 5 Å cutoff away from crystal ligand, and ligand RMSD are reported on left top, except the two right top frames in each panel for the comparison between AF2 modeled and DiffBindFR sampled side chain conformations. (A) Human cysteinyl leukotriene receptor 2 bound to its antagonist ONO-2570366 (PDB ID: 6RZ6). The side chains of Y119, Y123, L190 and R267 are shown; (B) 5-hydroxytryptamine receptor 2A bound to its inverse agonist methiothepin (PDB ID: 6WH4). The side chain of D155 is shown. | ||
The second example involves a crystal structure with the PDB ID 6WH4, where the target protein is the human 5-HT2A serotonin receptor, which is associated with the actions mediated by psychedelics,84 and the ligand is methiothepin. Following alignment of the binding sites, the AF2-predicted structure displays a pocket RMSD of 0.65 Å when compared to the crystal structure. In the AF2 structure, the side chain of the D155 residue has vdW (van der Waals) clash (pair distance is 1.5 Å) with the ligand, leading to the failure of docking attempts by Glide and Vina. Although KarmaDock Align and the traditional flexible docking method rDock predicts the ligand position with a smaller L-RMSD, they still do not achieve successful docking. In contrast, the top-1 complex pose selected by the MDN confidence model in DiffBindFR not only accurately reproduces the ligand position but also successfully repacks the side chain of D155 residue, enabling it to form electrostatic interactions with the ligand. This example further demonstrates the capability of DiffBindFR to effectively manage protein–ligand interactions, particularly in challenging docking scenarios.
We also present four cases from ApoRef subset where DiffBindFR successfully docks ligands into the Apo pockets of crystal structures (ESI Fig. S11†), with these complexes not having appeared in the training set. The PDB IDs for these four cross-dock examples are as follows: Holo: 1ZGY, Apo: 1PRG; Holo: 2XIR, Apo: 1VR2; Holo: 3UVR, Apo: 1WFC; and Holo: 3RM6, Apo: 4EK3. In each of these instances, top-1 complex poses generated by DiffBindFR accurately recover the ligand poses, while side chains in the Apo state pockets that would otherwise clash with the ligand are also optimized. These results highlight the potential of DiffBindFR in aiding researchers to study detailed interactions in real scenarios when no complex structures are available and provide insights for further lead optimization.
The superior performance of DiffBindFR on different Apo–Holo, Holo–Holo and AF2 predicted-Holo cross-docking subsets have shown that the potential of DiffBindFR under the real circumstances for application.
On the cross-docking benchmark, CD test set, DiffBindFR has also demonstrated superior performance. Notably, previous methods that employ deep learning to characterize protein pockets through coarse-grained main-chain representations also show promise results, but they lead to a lack of detailed information regarding the interactions between side chain atoms and ligands. DiffBindFR that simulates side chains addresses this gap in deep learning methodologies. Previous research57–59 has indicated that the majority of proteins undergo minimal backbone alterations upon ligand binding, with the primary conformational changes caused by side chains. Therefore, DiffBindFR remains adept at predicting accurate docking poses in most scenarios with slight protein backbone movements. However, in cases where the pocket backbone exhibits significant flexibility (such as cryptic sites85), the docking results may not be as satisfactory. Recent benchmarking studies25–27,30,86,87 have revealed that structures modeled by AF2 tend to exhibit a pocket backbone conformation more akin to the Holo state, presumably owing to the conservative functional sites within the multiple sequence alignment (MSA). However, inaccuracies in side chain placement often lead to suboptimal virtual screening performance when compared with the Holo pocket. In this context, DiffBindFR emerges as a promising tool for refining the side chains in AF2 modeled structures, potentially enhancing enrichment in virtual screening campaigns in the absence of available Holo structures. Further investigation of this application will be in future research endeavors.
The physical validity of DiffBindFR generated complex poses, coupled with the simulated detailed three-dimensional interactions, provides users with correct interactions to facilitate further optimization. In addition, the conformation alterations predicted by DiffBindFR will significantly augment comprehension of the molecular mechanisms underlying specific actions, and better elucidate the relationship between structure and function.
, where each atom corresponds to a node. For the node representation
of pocket residue atoms, we employ one-hot encoding encompassing atom type, residue type, and whether the atom is part of the backbone. The ligand node features
include atom type, hybridization type, atomic connectivity, explicit valence, implicit valence, number of rings it belonging to, aromaticity, formal charge, partial charge, chirality type, the number of radical electrons, the number of hydrogens, and whether it is in an N-membered ring (with nitrogen ranging from 3 to 8). Furthermore, pharmacophore features such as hydrogen bond acceptor/donor, aromaticity, hydrophobicity, and positive/negative charge are integrated. The edges eij, based on the covalent bonds of ligand, incorporate features like bond type, stereochemistry, conjugation, and whether the bond is part of a ring system. Diffusion times t are encoded using a sinusoidal format and are concatenated to the scalar representations of nodes and edges. For ligand atoms, internal edges
connected by covalent bonds are pre-constructed. For pocket atoms, in addition to covalent bonds, edges
are linked between pocket atoms and their own Cα and Cβ atoms. During the forward inference of the model, edges are dynamically constructed based on the three-dimensional coordinates of all atoms. Within the ligand molecule, the graph construction uses a cutoff radius of 5 Å, and a similar cutoff is applied for the full atom graph of the pocket and directed edges from pocket to ligand atoms
. These edges, serving as non-covalent interactions, solely encode distances. Given the model's need to predict ligand translational updates, it's essential for the ligand to be aware of the entire pocket atom's position. Therefore, directed edges
from ligand to pocket atoms are dynamically constructed based on the diffusion process, with the translational noise determining the cutoff radius as 0.2σT + 5 Å. This ensures that even in high noise scenarios, where the ligand is distant from the pocket, there remains an informational interaction between the ligand and the pocket, thereby pulling the ligand closer. All edges from the heterogeneous graph are
, and their distance features utilize Gaussian radial basis for encoding.
and pocket residue atom nodes
. We harness the irreducible representations (Irreps) to encode features by spherical harmonics. As the depth of feature encoding advances, the scalar inputs evolve towards higher-order physical quantity representations. Every interaction module is constructed using the Tensor Product Layer (TPL), establishing SE(3) equivariant message-passing functions. The tensor products are realized by encoding edge vectors with spherical harmonic functions and then doing spherical tensor product of Irreps with path weights. The weights of these tensor products are derived from a transformation of node representations constituting the edge and the edge representation itself through a layer of Multi-Layer Perceptrons (MLP); these weights also constitute the primary learnable parameters at each layer. For any given submodule, the general formula for message passing to node a is:![]() | (12) |
represents Irreps of node a, which is the concatenation of scalar representation h0a and vector representation
. za is the node type of node a, and z can be any node type from the pocket node or the ligand node. Na(z) denotes the neighbour nodes of node a. Y are the spherical harmonics up to l = 2. LN is the equivariant layer normalization. ⊕ refers to normal vector addition, and ⊗ψab refers to the spherical tensor product of Irreps with path weights, with ψab = MLP(za,z)(eab,h0a,h0b) following the graph message passing paradigm.
For predicting the scores of ligand translation and rotation, we construct a node o for each ligand center and gather the message from other ligand nodes to the center. We output the final single odd and single even vectors through layer normalization for translational and rotational score prediction as follows:
![]() | (13) |
.
For both the rotatable bonds of ligands and the dihedral angles of protein residue side chains, updates for each angle are anticipated based on a consistent paradigm. We define the central axis of the rotatable bond or dihedral angle as B = (i, j), represented by a bond formed by atoms i and j. Further, the center of the rotatable bond is denoted as c. A radius graph of ligand nodes with a 4 Å cutoff is constructed to predict the torsion score of the ligand rotatable bonds.
![]() | (14) |
To satisfy the parity of dihedral angles, spherical harmonics Y2 up to l = 2 is utilized. We will employ the scalar features derived from the final obtained hc to predict the torsion angles. An analogous procedure is adopted for the torsion of pocket side chains.
Footnotes |
| † Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3sc06803j |
| ‡ These authors contributed equally to this work. |
| This journal is © The Royal Society of Chemistry 2024 |