Accelerating the evaluation of crucial descriptors for catalyst screening via message passing neural network†
Abstract
A priori catalyst design guidelines from first principles simulations and reliable data-driven models are essential for cost efficient catalyst discovery. Nonetheless, acquiring all properties that control catalytic activity and stability is computationally challenging due to the complex interactions among reactants, intermediates, and products at the active sites. Therefore, predictions of only the most relevant catalytic properties, or catalyst descriptors, are often used to guide new catalyst design. In the context of upgrading biomass materials via deoxygenation reaction to value-added chemicals, the molybdenum carbides (Mo2C) have been considered among the most active and economically viable catalysts. Unfortunately, one of the bottlenecks related to longer term stability of Mo2C catalysts is the susceptibility to surface oxidation, a common problem in heterogeneous catalysis, which requires the use of excess hydrogen for active site regeneration. By using surface dopants to tune the oxygen affinity (catalyst descriptor) of Mo2C surfaces, it is possible to design new doped Mo2C catalysts with desired reactivity and stability. Here, we first employed periodic density functional theory to perform 20 000 high-throughput VASP simulations of oxygen binding energies (BEO) on various pristine and doped Mo2C surfaces. We computed and developed a binding energy database of 20 000 oxygen adsorption structures consisting of 7 low Miller-index surfaces, 23 d-block elements as single-atom dopants, all possible surface terminations, dopant locations, and adsorption sites. Utilizing this dataset, we developed a message passing neural network (MPNN) machine learning model for extremely fast BEO prediction using only unoptimized local adsorption geometries as inputs. The best model yields a mean absolute error of 0.176 eV for BEO with respect to computed values from DFT. Our results highlight the use of MPNN as an accurate and broadly applicable machine learning approach to accelerate descriptor-based catalyst discovery.