A machine learning approach for predicting the empirical polarity of organic solvents†
Abstract
The focus of modern organic synthesis on the use of new sustainable and eco-friendly solvents emphasizes the need for understanding the correlation of structure and properties of these solvents to the reactivity in an organic transformation, thus enabling us to rationalize and predict the reaction outcome. Polarity is one such solvent property which is widely considered during drug design and synthesis; however, it cannot be characterized solely on the basis of one or more physical constants such as refractive index, relative permittivity, or dipole moment. Also, empirical determination of ET(30) parameter, which is an established benchmark for quantifying polarity is not only tedious but also time-consuming. In view of the spectacular development in the field of cheminformatics, we have resorted to computational tools to efficiently predict the empirical polarity of organic liquids with handful of structural features. In this report we have established a comprehensive database of wide diversity of 421 organic solvents using ET(30) parameters available in the literature, and quantum chemical and other descriptors calculated computationally. This dataset was employed to develop a statistically sound machine learning model with high predictive power. An artificial neural network architecture was found to be the best performing model amongst others screened in this study as suggested by R2 value of 0.96 and root mean square error of 1.29 for the test set. To the best of our knowledge, this is the first modelling approach that has been successfully used in predicting the empirical polarity of organic solvents on such a large dataset.