Machine learning assisted approximation of descriptors (CO and OH) binding energy on Cu-based bimetallic alloys†
Abstract
Data driven machine learning (ML) based methods have the potential to significantly reduce the computational as well as experimental cost for the rapid and high-throughput screening of catalyst materials using binding energy as a descriptor. In this study, a set of eight widely used ML models classified as linear, kernel and tree-based ensemble models were evaluated to predict the binding energy of catalytic descriptors (CO* and OH*) on (111)-terminated Cu3M alloy surfaces using the readily available metal properties in the periodic table as features. Among all the models tested, the extreme gradient boosting regressor (xGBR) model showed the best performance with the root mean square errors (RMSEs) of 0.091 eV and 0.196 eV for CO and OH binding energy predictions on (111)-terminated A3B alloy surfaces. Moreover, the xGBR model gave the highest R2 scores of 0.970 and 0.890 for CO and OH binding energies. The time taken by the ML predictions for 25 000 fits for each model was varied between 5 and 60 min on a 6 core and 8 GB RAM laptop, which was very negligible as compared to DFT calculations. Our ML model showed remarkable performance for accurately predicting the CO and OH binding energies on a (111)-terminated Cu3M alloy with a mean absolute error (MAE) of 0.02 to 0.03 eV compared to DFT calculated values. The ML predicted binding energies can be further used with an ab initio microkinetic model (MKM) to efficiently screen A3B-type bimetallic alloys for the formic acid decomposition reaction.