Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors

Yanfei Guan; Connor W. Coley; Haoyang Wu; Duminda Ranasinghe; Esther Heid; Thomas J. Struble; Lagnajit Pattanaik; William H. Green; Klavs F. Jensen

doi:10.1039/D0SC04823B

Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors†

Yanfei Guan,

^a Connor W. Coley,

^a Haoyang Wu,^a Duminda Ranasinghe,^a Esther Heid,

^a Thomas J. Struble,

^a Lagnajit Pattanaik,

^a William H. Green

*^a and Klavs F. Jensen

*^a

Author affiliations

* Corresponding authors

^a Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
E-mail: whgreen@mit.edu, kfjensen@mit.edu

Abstract

Accurate and rapid evaluation of whether substrates can undergo the desired the transformation is crucial and challenging for both human knowledge and computer predictions. Despite the potential of machine learning in predicting chemical reactivity such as selectivity, popular feature engineering and learning methods are either time-consuming or data-hungry. We introduce a new method that combines machine-learned reaction representation with selected quantum mechanical descriptors to predict regio-selectivity in general substitution reactions. We construct a reactivity descriptor database based on ab initio calculations of 130k organic molecules, and train a multi-task constrained model to calculate demanded descriptors on-the-fly. The proposed platform enhances the inter/extra-polated performance for regio-selectivity predictions and enables learning from small datasets with just hundreds of examples. Furthermore, the proposed protocol is demonstrated to be generally applicable to a diverse range of chemical spaces. For three general types of substitution reactions (aromatic C–H functionalization, aromatic C–X substitution, and other substitution reactions) curated from a commercial database, the fusion model achieves 89.7%, 96.7%, and 97.2% top-1 accuracy in predicting the major outcome, respectively, each using 5000 training reactions. Using predicted descriptors, the fusion model is end-to-end, and requires approximately only 70 ms per reaction to predict the selectivity from reaction SMILES strings.

This article is part of the themed collection: Editor’s Choice – Jinlong Gong

Supplementary files

Article information

DOI: https://doi.org/10.1039/D0SC04823B
Article type: Edge Article
Submitted: 02 sep 2020
Accepted: 19 dec 2020
First published: 22 dec 2020
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry

Download Citation

Chem. Sci., 2021,12, 2198-2208

Permissions

Request permissions

Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors

Y. Guan, C. W. Coley, H. Wu, D. Ranasinghe, E. Heid, T. J. Struble, L. Pattanaik, W. H. Green and K. F. Jensen, Chem. Sci., 2021, 12, 2198 DOI: 10.1039/D0SC04823B

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Chemical Science

Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Regio-selectivity prediction with a machine-learned reaction representation and on-the-fly quantum mechanical descriptors

Social activity

Search articles by author

Spotlight

Advertisements