ProcedureT5: Adaptive Experimental Procedure Prediction with Data-Augmented Pre-Training and Multi-Source Data Integration

Abstract

Computer-aided synthesis planning (CASP) has shown strong potential to accelerate chemical re search. However, a key challenge remains: the lack of effective automated techniques to translate computer-generated synthesis routes into executable experimental procedures, which still require extensive planning and evaluation by chemists. To address this gap, we introduce ProcedureT5, an approach that integrates chemistry-oriented pre-trained models with augmented multi-source datasets to enhance the prediction of experimental procedures across broader scenarios. Our method achieves state-of-the-art performance on the Pistachio dataset - a collection of reaction procedures derived from US patent literature, showing a 4-point increase in BLEU score and a 1.22% im provement in exact-match accuracy compared to existing methods. Additionally, we curate a small expert-annotated dataset, Orgsyn, consisting of verified organic synthesis procedures, to assess the model’s performance in more diverse applications. Fine-tuning ProcedureT5 on the Orgsyn dataset demonstrates its adaptability, yielding a BLEU score of 40.34 and an average similarity of 49.72%. This work underscores the crucial role of ProcedureT5 in bridging the gap between computational synthesis planning and practical laboratory implementation.

Supplementary files

Article information

Article type
Paper
Submitted
27 Dec 2025
Accepted
07 Apr 2026
First published
13 Apr 2026
This article is Open Access
Creative Commons BY license

React. Chem. Eng., 2026, Accepted Manuscript

ProcedureT5: Adaptive Experimental Procedure Prediction with Data-Augmented Pre-Training and Multi-Source Data Integration

Y. Zhang, Y. Fang, H. Zhou, B. Yu, T. F. FUNG, Q. Liu, C. Len and H. Gao, React. Chem. Eng., 2026, Accepted Manuscript , DOI: 10.1039/D5RE00572H

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements