Two-dimensional interaction parameter histograms as a simple and versatile nanoporous material representation for machine learning prediction of adsorption properties

Abstract

Machine-learning (ML) adsorption models are essential to computationally screen nanoporous materials, which are spearheaded by metal-organic frameworks (MOFs). Physics-based MOF representations offer advantages for the training of ML adsorption models such as compatibility with artificial training datasets, model applicability beyond MOFs, and resilience to flawed chemistry-adsorption relationships in the data. However, emerging physics-based MOF representations tend to require specialized expertise for their creation and/or are prone to raising training scalability issues. Here, we demonstrate two-dimensional, interaction-parameter histograms (2D-IPHs) as physics-based MOF representations that are simple, scalable, and informative for adsorption learning. The construction of 2D-IPHs simply needs statistics of the distance of adsorption sites to their closest pore wall atom, along with its interaction parameters. Demonstrating scalability, 2D-IPHs facilitated the use of a multi-million-point, multi-molecule dataset to yield a model for zero-shot prediction of adsorption isotherms for unseen small, non-polar, near-spherical molecules (R2 = 0.97 - 0.99 for H2, CH4, C2H8, N2, Ar, Xe, and Kr). Demonstrating informativeness, 2D-IPHs facilitated training from multi-thousand-point, single-molecule datasets to yield models for: i) full adsorption isotherm prediction for small, high-quadrupole and non-spherical molecules (R2 = 0.98 - 0.99 for CO2 and C3H8), and ii) Henry’s constant prediction for small, molecules of varied adsorption dependence on dispersion and electrostatic interactions (R² = 0.76 - 0.90 for, CO2, H2O, and NH3 and N2). Moreover, training with 2D-IPHs tended to be robust to training dataset trimming, at least until running into obvious data-scarce scenarios. Even so, in data-scarce scenarios, the use of 2D-IPHs with techniques such as single feature stacking (SFS) and transfer learning (TL) led to significant (even if not total) recovery in model accuracy. Nuances regarding SFS and TL, and the practical screening performance of the models trained herein, are also discussed in this work.

Supplementary files

Article information

Article type
Paper
Submitted
23 Feb 2026
Accepted
31 May 2026
First published
01 Jun 2026
This article is Open Access
Creative Commons BY-NC license

Mol. Syst. Des. Eng., 2026, Accepted Manuscript

Two-dimensional interaction parameter histograms as a simple and versatile nanoporous material representation for machine learning prediction of adsorption properties

T. Gercina de Vilas, J. F. Fajardo-Rojas, O. Mansurov, R. Devaisher, E. Toberer and D. Gomez-Gualdron, Mol. Syst. Des. Eng., 2026, Accepted Manuscript , DOI: 10.1039/D6ME00034G

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements