Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies

Kjell Jorner; Tore Brinck; Per-Ola Norrby; David Buttar

doi:10.1039/D0SC04896H

Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies†

Kjell Jorner,

^a Tore Brinck,

^b Per-Ola Norrby

^c and David Buttar

*^a

Author affiliations

* Corresponding authors

^a Early Chemical Development, Pharmaceutical Sciences, R&D, AstraZeneca, Macclesfield, UK
E-mail: david.buttar@astrazeneca.com

^b Applied Physical Chemistry, Department of Chemistry, CBH, KTH Royal Institute of Technology, Stockholm, Sweden

^c Data Science & Modelling, Pharmaceutical Sciences, R&D, AstraZeneca, Gothenburg, Sweden

Abstract

Accurate prediction of chemical reactions in solution is challenging for current state-of-the-art approaches based on transition state modelling with density functional theory. Models based on machine learning have emerged as a promising alternative to address these problems, but these models currently lack the precision to give crucial information on the magnitude of barrier heights, influence of solvents and catalysts and extent of regio- and chemoselectivity. Here, we construct hybrid models which combine the traditional transition state modelling and machine learning to accurately predict reaction barriers. We train a Gaussian Process Regression model to reproduce high-quality experimental kinetic data for the nucleophilic aromatic substitution reaction and use it to predict barriers with a mean absolute error of 0.77 kcal mol⁻¹ for an external test set. The model was further validated on regio- and chemoselectivity prediction on patent reaction data and achieved a competitive top-1 accuracy of 86%, despite not being trained explicitly for this task. Importantly, the model gives error bars for its predictions that can be used for risk assessment by the end user. Hybrid models emerge as the preferred alternative for accurate reaction prediction in the very common low-data situation where only 100–150 rate constants are available for a reaction class. With recent advances in deep learning for quickly predicting barriers and transition state geometries from density functional theory, we envision that hybrid models will soon become a standard alternative to complement current machine learning approaches based on ground-state physical organic descriptors or structural information such as molecular graphs or fingerprints.

This article is part of the themed collections: Celebrating five years of ChemRxiv, Most popular 2021 physical and theoretical chemistry articles, 2021 and Editor’s Choice – Graeme Day

Chemical Science

Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Machine learning meets mechanistic modelling for accurate prediction of experimental activation energies

Social activity

Search articles by author

Spotlight

Advertisements