Predicting reaction conditions: a data-driven perspective

Abstract

The selection of optimal reaction conditions is a critical challenge in synthetic chemistry, influencing the efficiency, sustainability, and scalability of chemical processes. While machine learning (ML) has emerged as a promising tool for predicting reaction conditions in computer-aided synthesis planning (CASP), existing approaches face many significant challenges, including data quality, sparsity, choice of reaction representation and method evaluation. Recent studies have suggested that these models may fail to surpass literature-derived popularity baselines, underscoring these problems. In this work, we provide a critical review of state-of-the-art ML techniques, identifying innovations which have addressed the key challenges facing researchers when modelling conditions. To illustrate how relevant reaction representations can improve existing models, we perform a case study of heteroaromatic Suzuki–Miyaura reactions, derived from US patent data (USPTO). Using Condensed Graph of Reaction-based inputs, we demonstrate how this alternative representation can enhance the predictive power of a model beyond popularity baselines. Finally, we propose future directions for the field beyond improving data quality, suggesting potential options to mitigate data issues prevalent in existing literature data. This perspective aims to guide researchers in understanding and overcoming current limitations in computational reaction condition prediction.

Graphical abstract: Predicting reaction conditions: a data-driven perspective

Article information

Article type
Perspective
Submitted
26 Apr 2025
Accepted
30 Jul 2025
First published
06 Aug 2025
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry
Creative Commons BY-NC license

Chem. Sci., 2025, Advance Article

Predicting reaction conditions: a data-driven perspective

M. Ball, D. Horvath, T. Kogej, M. Kabeshov and A. Varnek, Chem. Sci., 2025, Advance Article , DOI: 10.1039/D5SC03045E

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements