Kinetic predictions for SN2 reactions using the BERT architecture: Comparison and interpretation

Abstract

The accurate prediction of reaction rates is an integral step in elucidating reaction mechanisms and designing synthetic pathways. Traditionally, kinetic parameters have been derived from activation energies obtained from quantum mechanical (QM) methods and, more recently, machine learning (ML) approaches. Among ML methods, Bidirectional Encoder Representations from Transformers (BERT), a type of transformer-based model, is the state-of-the-art method for both reaction classification and yield prediction. Despite its success, it has yet to be applied to kinetic prediction. In this work, we developed a BERT model to predict experimental log k values of bimolecular nucleophilic substitution (SN2) reactions and compare its performance to the top-performing Random Forest (RF) literature model in terms of accuracy, training time, and interpretability. Both BERT and RF models exhibit near-experimental accuracy (RMSE ≈1.1 log k) on similarity-split test data. Interpretation of the predictions from both models reveals that they successfully identify key reaction centres and reproduce known electronic and steric trends. This analysis also highlights the distinct limitations of each; RF outperformed BERT in identifying aromatic allylic effects, while BERT showed stronger extrapolation capabilities.

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
11 May 2025
Accepted
23 Dec 2025
First published
26 Dec 2025
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025, Accepted Manuscript

Kinetic predictions for SN2 reactions using the BERT architecture: Comparison and interpretation

C. Wilson, M. Calvo, S. Zavitsanou, J. D. Somper , E. Wieczorek, T. Watts, J. Crain and F. Duarte, Digital Discovery, 2025, Accepted Manuscript , DOI: 10.1039/D5DD00192G

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements