An accurate and interpretable deep learning model for yield prediction using hybrid molecular representations†
Abstract
In recent years, imidazolium-based ionic liquids (ILs) and pyrazolium-based ILs have shown efficient catalytic abilities in CO2 cycloaddition reactions. However, these catalysts require stringent conditions for the reactions in the absence of co-catalysts, thereby limiting their applicability. Therefore, there is an increasing demand for developing new IL catalysts capable of operating under milder conditions. Traditional methods for designing these ILs, whether through theoretical calculations or experimental exploration, are both costly and challenging. This study presents a deep learning model for predicting the yield of CO2 cycloaddition reactions catalyzed by imidazolium-based and pyrazolium-based ILs. The model utilizes hybrid fingerprint features to describe the structural information of molecules, achieving a squared correlation coefficient (R2) value of 0.85. Moreover, the SHapley Additive exPlanations (SHAP) technique is employed to identify the key factors influencing yield. Additionally, a molecular generation scheme is established to create new IL structures. Through a two-step screening strategy involving yield prediction using the deep learning model and energy barrier calculations via density functional theory (DFT), 14 promising imidazolium-based ILs are identified as potential efficient catalysts for CO2 cycloaddition reactions with epichlorohydrin under mild conditions. This work introduces a novel machine learning approach for designing imidazolium-based IL and pyrazolium-based IL catalysts, aimed at reducing the experimental burden and exploration costs associated with catalyst development.