Interpretable, Low-Compute Machine Learning Integrating Experimental and Catalytic Descriptors for Sustainable CO2 Electroreduction
Abstract
Applying machine learning (ML) sustainably to green chemistry is challenging because reaction complexity often drives the use of large, energy-intensive models. Here, we combine pre-trained models for information extraction with low-compute, interpretable shallow-learning models to deliver mechanistic insight while minimizing computational cost. Using the electrocatalytic CO2 reduction reaction (CO2RR) as a model green chemistry reaction, we automatically extracted 3,880 experimentally reported reaction conditions from peer-reviewed literature with a pre-trained large language model and augmented these data with relaxation energies of key (CO2RR) intermediates obtained via community-sourced density functional theory (DFT) and ML surrogates for DFT. Training 98 random-forest binary classifiers across diverse feature sets, we find that models integrating both experimental and computational descriptors consistently achieve the best performance. Because these models can be run locally-without data-center resources-they offer a computationally and environmentally sustainable route to discovery. Furthermore, interpretable ML analysis revealed mechanistic trends, such as CH3OH formation needing catalysts with weak adsorptions of O* and H2O* for selective production, while C2H4 production required catalysts that combine moderate adsorption of CO* with moderate to strong adsorption of O* and H2O* . The model also identified that similar catalytic properties produce C2H4 and CH4 , but the applied voltage is the major driving force with more negative voltages favoring C2H4 production. These findings underscore the value of integrating experimental and theoretical insights into ML frameworks and demonstrate how pre-trained and interpretable ML can uncover fundamental principles governing catalytic selectivity for sustainable production of fuels and chemicals.
Please wait while we load your content...