Quantifying the failure modes of current one-step retrosynthesis models

Suong B. A. Tran; Jihye Roh; Connor W. Coley

doi:10.1039/D6SC01323F

Quantifying the failure modes of current one-step retrosynthesis models

Suong B. A. Tran,

†^a Jihye Roh

†^b and Connor W. Coley

*^bc

Author affiliations

* Corresponding authors

^a Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA

^b Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
E-mail: ccoley@mit.edu

^c Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA

Abstract

Computer-aided synthesis planning (CASP) automates retrosynthetic analysis, generally by recursively applying one-step retrosynthesis models within multistep search algorithms to simplify a target molecule into commercially available starting materials. Despite their utility, these tools often fail to recover literature-reported pathways. Such failures arise from two causes: either (i) the literature-reported precursor is not proposed at all or (ii) it is proposed but ranked too low to be discovered during a multistep search. In this work, we quantify the challenges that data-driven one-step retrosynthesis models face in reproducing literature-reported precursors. We first evaluate model performance using standard top-k exact-match accuracy and stratify this accuracy by product and reaction complexity, demonstrating a decrease in performance with increasing complexity. This decline is accompanied by a systematic underprediction of the number of reacting atoms and changing rings, indicating a bias toward simpler transformations, even when complex examples are included in the training data. To gain deeper insights into failure modes, we evaluate models with complementary metrics that account for incorrect stereochemistry, leaving groups, and multi-stage reactions. Overall, our work provides a quantitative analysis of how one-step retrosynthesis models fail to capture literature-reported reactions, highlighting opportunities for improving future models and providing guidance on using model predictions more effectively in prospective synthesis planning.

Chemical Science

Quantifying the failure modes of current one-step retrosynthesis models

Abstract

Supplementary files

Article information

Download Citation

Permissions

Quantifying the failure modes of current one-step retrosynthesis models

Social activity

Search articles by author

Spotlight

Advertisements