Data augmentation in a triple transformer loop retrosynthesis model

Yves Grandjean; David Kreutter; Jean-Louis Reymond

doi:10.1039/D5DD00465A

Data augmentation in a triple transformer loop retrosynthesis model

Yves Grandjean,

^a David Kreutter

^a and Jean-Louis Reymond

*^a

Author affiliations

* Corresponding authors

^a Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
E-mail: jean-louis.reymond@unibe.ch

Abstract

Reactions in the US Patent Office (USPTO) are biased towards a few over-represented reaction types, which potentially limits their usefulness for computer-assisted synthesis planning (CASP). To obtain an equilibrated dataset, we applied retrosynthesis templates to USPTO molecules as products (P) to generate starting materials (SM). We then used transformer T2 from our recently reported triple transformer loop (TTL) retrosynthesis model to predict reagents (R) for the SM → P reaction. Finally, we validated the prediction by requesting a high confidence prediction (>95%) for the prediction of P from SM + R by TTL transformer T3. We generated up to 5000 reactions per template, resulting in 27.5m validated fictive reactions covering the chemical space of the original USPTO dataset. To exemplify the use of this dataset, we demonstrate that a single-step retrosynthesis transformer model trained on a template equilibrated subset of 1 097 374 fictive reactions outperforms the corresponding model trained on USPTO reactions only.

Article information

https://doi.org/10.1039/D5DD00465A

Article type

Paper

Submitted

16 Oct 2025

Accepted

21 Jan 2026

First published

21 Jan 2026

This article is Open Access

Download Citation

Digital Discovery, 2026,5, 653-661

Permissions

Request permissions

Data augmentation in a triple transformer loop retrosynthesis model

Y. Grandjean, D. Kreutter and J. Reymond, Digital Discovery, 2026, 5, 653 DOI: 10.1039/D5DD00465A

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Digital Discovery

Data augmentation in a triple transformer loop retrosynthesis model

Abstract

Article information

Download Citation

Permissions

Data augmentation in a triple transformer loop retrosynthesis model

Social activity

Search articles by author

Spotlight

Advertisements