Issue 5, 2025

Improving reaction prediction through chemically aware transfer learning

Abstract

Practical applications of machine learning (ML) to new chemical domains are often hindered by data scarcity. Here we show how data gaps can be circumvented by means of transfer learning that leverages chemically relevant pre-training data. Case studies are presented in which the outcomes of two classes of pericyclic reactions are predicted: [3,3] rearrangements (Cope and Claisen rearrangements) and [4 + 2] cycloadditions (Diels–Alder reactions). Using the graph-based generative algorithm NERF, we evaluate the data efficiencies achieved with different starting models that we pre-trained on datasets of different sizes and chemical scope. We show that the greatest data efficiency is obtained when the pre-training is performed on smaller datasets of mechanistically related reactions (Diels–Alder, Cope and Claisen, Ene, and Nazarov) rather than >50× larger datasets of mechanistically unrelated reactions (USPTO-MIT). These small bespoke datasets were more efficient in both low re-training and low pre-training regimes, and are thus recommended alternatives to large diverse datasets for pre-training ML models.

Graphical abstract: Improving reaction prediction through chemically aware transfer learning

Supplementary files

Article information

Article type
Paper
Submitted
31 Dec 2024
Accepted
26 Mar 2025
First published
28 Mar 2025
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025,4, 1232-1238

Improving reaction prediction through chemically aware transfer learning

A. Keto, T. Guo, N. Gönnheimer, X. Zhang, E. H. Krenske and O. Wiest, Digital Discovery, 2025, 4, 1232 DOI: 10.1039/D4DD00412D

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements