Interpretable, Low-Compute Machine Learning Integrating Experimental and Catalytic Descriptors for Sustainable CO2 Electroreduction

Abstract

Applying machine learning (ML) sustainably to green chemistry is challenging because reaction complexity often drives the use of large, energy-intensive models. Here, we combine pre-trained models for information extraction with low-compute, interpretable shallow-learning models to deliver mechanistic insight while minimizing computational cost. Using the electrocatalytic CO2 reduction reaction (CO2RR) as a model green chemistry reaction, we automatically extracted 3,880 experimentally reported reaction conditions from peer-reviewed literature with a pre-trained large language model and augmented these data with relaxation energies of key (CO2RR) intermediates obtained via community-sourced density functional theory (DFT) and ML surrogates for DFT. Training 98 random-forest binary classifiers across diverse feature sets, we find that models integrating both experimental and computational descriptors consistently achieve the best performance. Because these models can be run locally-without data-center resources-they offer a computationally and environmentally sustainable route to discovery. Furthermore, interpretable ML analysis revealed mechanistic trends, such as CH3OH formation needing catalysts with weak adsorptions of O* and H2O* for selective production, while C2H4 production required catalysts that combine moderate adsorption of CO* with moderate to strong adsorption of O* and H2O* . The model also identified that similar catalytic properties produce C2H4 and CH4 , but the applied voltage is the major driving force with more negative voltages favoring C2H4 production. These findings underscore the value of integrating experimental and theoretical insights into ML frameworks and demonstrate how pre-trained and interpretable ML can uncover fundamental principles governing catalytic selectivity for sustainable production of fuels and chemicals.

Supplementary files

Article information

Article type
Paper
Submitted
23 Mar 2026
Accepted
03 Jun 2026
First published
09 Jun 2026
This article is Open Access
Creative Commons BY-NC license

Green Chem., 2026, Accepted Manuscript

Interpretable, Low-Compute Machine Learning Integrating Experimental and Catalytic Descriptors for Sustainable CO2 Electroreduction

B. Farris, J. J. Meckstroth and K. C. Leonard, Green Chem., 2026, Accepted Manuscript , DOI: 10.1039/D6GC01753C

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements