An Interpretable Machine Learning Framework for Prediction of Adsorption Energies and Generative Design of Active Sites on Arbitrary Catalysts
Abstract
We present a highly interpretable and efficient machine learning framework for predictive and generative modeling of adsorption energies on surfaces using subgraph isomorphic decision trees (SIDTs). Extracting graph representations of 344,756 relaxed geometries and their associated adsorption energies from the OC20 database, we used them to train a 24,777 node SIDT that achieves 0.36 eV MDAE, 0.54 eV MAE, and 0.82 eV RMSE. We then developed and implemented novel techniques to use SIDTs as generative models enabling efficient catalyst optimization for arbitrary objective functions and constraints as a function of the adsorption energies and prediction uncertainties of multiple adsorbates and the catalyst structure itself. In particular, our SIDT provides substructure representations of the subdistributions of adsorption energy, rather than mere samples from the subdistributions as is commonly done in traditional generative modeling. We show how this can be exploited for efficient and interpretable catalyst active site design in two examples. For the ammonia decomposition reaction sequence we show we are able to use our generative techniques to minimize the overall barrier height of the sequence generating catalysts substructures predicted to decrease the overall barrier from 2.7 eV on Pt(111) to 0.4 eV. We also discuss how we can exploit the accurate SIDT uncertainties and the interpretability of the SIDT to identify regions of chemical space that are in need of improved coverage and might be improved using active learning schemes.
- This article is part of the themed collection: Bridging the Gap from Surface Science to Heterogeneous Catalysis Faraday Discussion
Please wait while we load your content...