Accelerating the optimization of enzyme-catalyzed synthesis conditions via machine learning and reactivity descriptors†
Abstract
Enzyme-catalyzed synthesis reactions are of crucial importance for a wide range of applications. An accurate and rapid selection of optimal synthesis conditions is crucial and challenging for both human knowledge and computer predictions. In this work, a new scenario, which combines a data-driven machine learning (ML) model with reactivity descriptors, is developed to predict the optimal enzyme-catalyzed synthesis conditions and the reaction yield. Fourteen reactivity descriptors in total are constructed to describe 125 reactions (classified into five categories) included in different reaction mechanisms. Nineteen ML models are developed to train the dataset and the Quadratic support vector machine (SVM) model is found to exhibit the best performance. The Quadratic SVM model is then used to predict the optimal reaction conditions, which are subsequently used to obtain the highest yield among 109 200 reaction conditions with different molar ratios of substrates, solvents, water contents, enzyme concentrations and temperatures for each reaction. The proposed protocol should be generally applicable to a diverse range of chemical reactions and provides a black-box evaluation for optimizing the reaction conditions of organic synthesis reactions.
- This article is part of the themed collection: Mechanistic, computational & physical organic chemistry in OBC