Accelerated discovery of M6@g-N4 catalysts for CO2 electroreduction via machine learning and DFT: Descriptor engineering and activity trend validation
Abstract
Graphitic nitrogen-doped graphene (g-N4)-supported M 6 metal clusters are promising candidates for efficient CO2 electroreduction (CO2 RR). However, traditional trial-and-error experiments and computationally intensive DFT calculations hinder the rapid development of high-performance catalysts. Herein, we integrate machine learning (ML) with DFT to screen and predict the CO2 RR performance of M6@g-N4 catalysts, where M represents 36 transition/main group metals (excluding unstable Na/K/Ir/Hg clusters). A DFT-derived dataset covering 16 structural, electronic, and physicochemical descriptors was constructed, and 8 ML algorithms were systematically evaluated. Ridge Regression (RR) emerged as the optimal model, achieving a high coefficient of determination (R2 = 0.963) and low root mean square error (RMSE = 0.228) with strong anti-multicollinearity and interpretability. Pearson correlation and RR-based feature importance analyses revealed that Bader charge transfer, hydrogen evolution reaction (HER) competition, and CO2 structural distortion (∠OCO and C-O bond length) are the dominant activity descriptors. The ML predicted top-performing catalysts for CO2 RR were Cd6@g-N4, Zn6@g-N4, and Sn6@g-N4 , which were further validated by additional DFT calculations on the *CO2 → *CO reaction pathway. This work demonstrates that the integration of ML and DFT provides a data-driven route to accelerate the discovery of high-performance CO2 RR catalysts, offering quantitative guidance for materials design and contributing to climate-change mitigation.
Please wait while we load your content...