Revealing the limits of covalent docking and advancing affinity prediction with covalent-aware multi-task learning
Abstract
Targeted covalent inhibitors (TCIs) have become an important modality in modern drug discovery, but computational tools for covalent pose prediction and quantitative affinity ranking remain underdeveloped. We constructed a large, structure and activity-resolved benchmark to systematically evaluate covalent docking and to develop a covalent-aware drug-target affinity (DTA) prediction framework. Starting from CovalentInDB 2.0 and related structural resources, we curated 2172 high quality covalent protein–ligand complexes spanning diverse protein classes and nine electrophilic warhead types, and used them to benchmark four docking engines (AutoDock4, CovDock in the Schrödinger Suite, GNINA and Boltz-2) in a self-docking setting. Boltz-2 shows the strongest pose-reproduction performance on our structure-resolved benchmark. However, because co-folding engines are trained on broad PDB corpora and our benchmark is also derived from PDB-resolved complexes, potential train-test overlap is likely; thus, Boltz-2 results are reported as a reference upper bound rather than a leakage-free estimate of prospective generalization. Across 17 covalent targets with quantitative IC50 data, we further assessed the relationship between docking scores and experimental pIC50 values and found that score-affinity correlations are generally weak and highly target dependent, with |r| < 0.2 for most target-software pairs and even pronounced negative correlations for several systems. We propose CovMTL-DTA to overcome these limitations, a covalent-aware multi-task DTA model that integrates ligand molecular graphs augmented with SMARTS-based warhead descriptors, pretrained protein sequence embeddings, cross-modal ligand–protein attention, and a task-relation module for inter-target transfer. Trained on curated covalent ligand-target pairs, the model outperforms classical machine-learning regressors and state-of-the-art deep DTA baselines, achieving a Pearson correlation of ∼0.77 with reduced RMSE and MAE on an independent test set. In an EGFR-focused virtual screening of ∼14 000 Michael-acceptor-containing compounds, the model prioritizes three clinically relevant EGFR covalent inhibitors within the top 1% of the ranked library and identifies structurally novel, favorable physicochemical properties hits. Our benchmark and model highlight both the strengths and limitations of current covalent docking and demonstrate how covalent-specific representations and multi-task learning can substantially improve affinity prediction and hit prioritization in covalent drug discovery.

Please wait while we load your content...