Revealing the limits of covalent docking and advancing affinity prediction with covalent-aware multi-task learning

Abstract

Targeted covalent inhibitors (TCIs) have become an important modality in modern drug discovery, but computational tools for covalent pose prediction and quantitative affinity ranking remain underdeveloped. We constructed a large, structure and activity-resolved benchmark to systematically evaluate covalent docking and to develop a covalent-aware drug-target affinity (DTA) prediction framework. Starting from CovalentInDB 2.0 and related structural resources, we curated 2172 high quality covalent protein–ligand complexes spanning diverse protein classes and nine electrophilic warhead types, and used them to benchmark four docking engines (AutoDock4, CovDock in the Schrödinger Suite, GNINA and Boltz-2) in a self-docking setting. Boltz-2 shows the strongest pose-reproduction performance on our structure-resolved benchmark. However, because co-folding engines are trained on broad PDB corpora and our benchmark is also derived from PDB-resolved complexes, potential train-test overlap is likely; thus, Boltz-2 results are reported as a reference upper bound rather than a leakage-free estimate of prospective generalization. Across 17 covalent targets with quantitative IC50 data, we further assessed the relationship between docking scores and experimental pIC50 values and found that score-affinity correlations are generally weak and highly target dependent, with |r| < 0.2 for most target-software pairs and even pronounced negative correlations for several systems. We propose CovMTL-DTA to overcome these limitations, a covalent-aware multi-task DTA model that integrates ligand molecular graphs augmented with SMARTS-based warhead descriptors, pretrained protein sequence embeddings, cross-modal ligand–protein attention, and a task-relation module for inter-target transfer. Trained on curated covalent ligand-target pairs, the model outperforms classical machine-learning regressors and state-of-the-art deep DTA baselines, achieving a Pearson correlation of ∼0.77 with reduced RMSE and MAE on an independent test set. In an EGFR-focused virtual screening of ∼14 000 Michael-acceptor-containing compounds, the model prioritizes three clinically relevant EGFR covalent inhibitors within the top 1% of the ranked library and identifies structurally novel, favorable physicochemical properties hits. Our benchmark and model highlight both the strengths and limitations of current covalent docking and demonstrate how covalent-specific representations and multi-task learning can substantially improve affinity prediction and hit prioritization in covalent drug discovery.

Graphical abstract: Revealing the limits of covalent docking and advancing affinity prediction with covalent-aware multi-task learning

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
22 Dec 2025
Accepted
27 Jan 2026
First published
04 Feb 2026

Phys. Chem. Chem. Phys., 2026, Advance Article

Revealing the limits of covalent docking and advancing affinity prediction with covalent-aware multi-task learning

J. Leng, Z. Huang, L. Zheng, Y. Yang, H. Huang, Y. Gao, L. Huo, Y. Li, Z. Sun and J. Z. H. Zhang, Phys. Chem. Chem. Phys., 2026, Advance Article , DOI: 10.1039/D5CP04981D

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements