Two-dimensional interaction parameter histograms as a simple and versatile nanoporous material representation for machine learning prediction of adsorption properties
Abstract
Machine-learning (ML) adsorption models are essential to computationally screen nanoporous materials, which are spearheaded by metal-organic frameworks (MOFs). Physics-based MOF representations offer advantages for the training of ML adsorption models such as compatibility with artificial training datasets, model applicability beyond MOFs, and resilience to flawed chemistry-adsorption relationships in the data. However, emerging physics-based MOF representations tend to require specialized expertise for their creation and/or are prone to raising training scalability issues. Here, we demonstrate two-dimensional, interaction-parameter histograms (2D-IPHs) as physics-based MOF representations that are simple, scalable, and informative for adsorption learning. The construction of 2D-IPHs simply needs statistics of the distance of adsorption sites to their closest pore wall atom, along with its interaction parameters. Demonstrating scalability, 2D-IPHs facilitated the use of a multi-million-point, multi-molecule dataset to yield a model for zero-shot prediction of adsorption isotherms for unseen small, non-polar, near-spherical molecules (R2 = 0.97 - 0.99 for H2, CH4, C2H8, N2, Ar, Xe, and Kr). Demonstrating informativeness, 2D-IPHs facilitated training from multi-thousand-point, single-molecule datasets to yield models for: i) full adsorption isotherm prediction for small, high-quadrupole and non-spherical molecules (R2 = 0.98 - 0.99 for CO2 and C3H8), and ii) Henry’s constant prediction for small, molecules of varied adsorption dependence on dispersion and electrostatic interactions (R² = 0.76 - 0.90 for, CO2, H2O, and NH3 and N2). Moreover, training with 2D-IPHs tended to be robust to training dataset trimming, at least until running into obvious data-scarce scenarios. Even so, in data-scarce scenarios, the use of 2D-IPHs with techniques such as single feature stacking (SFS) and transfer learning (TL) led to significant (even if not total) recovery in model accuracy. Nuances regarding SFS and TL, and the practical screening performance of the models trained herein, are also discussed in this work.
- This article is part of the themed collection: MSDE 10th Anniversary Collection
Please wait while we load your content...