Machine learning models for catalytic asymmetric reactions of simple alkenes: from enantioselectivity predictions to chemical insights

Abstract

The increasing number of applications of machine learning (ML) in chemical catalysis has engendered considerable confidence in predicting reaction outcomes. Despite the successful applications of ML to high-throughput experimentation (HTE) datasets, extension to small real-world datasets prevalent in organic synthesis remained more difficult, primarily due to their imbalanced and sparse distribution. Herein, we present a new chemical reaction dataset curated from published literature that bears class imbalance (CI) with a skewness of −1.37. The reactions in focus belong to an important class of transition metal-catalysed asymmetric transformations of alkenes such as cyclopropanation, aziridination, and arylation. Such reactions are indispensable for the construction of three-membered structural motifs, a versatile building block found in complex bioactive molecules. In cognizance of the CI in the reaction outcome, measured in terms of enantiomeric excess (% ee), we employ the AttentiveFP-CI model to predict % ee. This class-imbalance aware graph-based model with an attention mechanism exhibits commendable performance, as evidenced by the root mean square error (RMSE) of 9.80 ± 1.40. Upon evaluation across various molecular representations of these reactions (OHE, fingerprints, SMILES, and graphs) and ML algorithms (DNN, T5Chem, Transformer, and MPNN), AttentiveFP-CI emerged as the best model distinguished by its minimal overfitting (train-test RMSE difference of 3.59, compared to up to 5.40 for other CI-aware models). When extended to other important reaction datasets such as N,S-acetylation, asymmetric hydrogenation of alkenes, and USPTO, improved predictions could be obtained by using AttentiveFP-CI. Furthermore, attention visualization identifies key atoms and substructures contributing to high enantioselectivity, offering valuable chemical insights for planning the synthesis of new molecular targets. Harnessing insights derived from ML models could serve as an efficient and cost-effective approach for expedited developments in asymmetric catalysis.

Graphical abstract: Machine learning models for catalytic asymmetric reactions of simple alkenes: from enantioselectivity predictions to chemical insights

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
04 Nov 2025
Accepted
26 Feb 2026
First published
07 Mar 2026
This article is Open Access
Creative Commons BY license

Digital Discovery, 2026, Advance Article

Machine learning models for catalytic asymmetric reactions of simple alkenes: from enantioselectivity predictions to chemical insights

A. Hoque, N. Jain, D. Chenna and R. B. Sunoj, Digital Discovery, 2026, Advance Article , DOI: 10.1039/D5DD00483G

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements