Issue 43, 2025

Density-aware active learning for materials discovery: a case study on functionalized nanoporous materials

Abstract

Machine learning algorithms often rely on large training datasets to achieve high performance. However, in domains like chemistry and materials science, acquiring such data is an expensive and laborious process, involving highly trained human experts and material costs. Therefore, it is crucial to develop strategies that minimize the size of training sets while preserving predictive accuracy. The objective is to select an optimal subset of data points from a larger pool of possible samples, one that is sufficiently informative to train an effective machine learning model. Active learning (AL) methods, which iteratively annotate data points by querying an oracle (e.g., a scientist conducting experiments), have proven highly effective for such tasks. However, challenges remain, particularly for regression tasks, which are generally considered more complex in the AL framework. This complexity stems from the need for uncertainty estimation and the continuous nature of the output space. In this work, we introduce density-aware greedy sampling (DAGS), an active learning method for regression that integrates uncertainty estimation with data density, specifically designed for large design spaces (DS). We evaluate DAGS in both synthetic data and multiple real-world datasets of functionalized nanoporous materials, such as metal–organic frameworks (MOFs) and covalent-organic frameworks (COFs), for separation applications. Our results demonstrate that DAGS consistently outperforms both random sampling and state-of-the-art AL techniques in training regression models effectively with a limited number of data points, even in datasets with a high number of features.

Graphical abstract: Density-aware active learning for materials discovery: a case study on functionalized nanoporous materials

Supplementary files

Article information

Article type
Paper
Submitted
30 Jul 2025
Accepted
26 Sep 2025
First published
29 Sep 2025
This article is Open Access
Creative Commons BY license

Phys. Chem. Chem. Phys., 2025,27, 23152-23165

Density-aware active learning for materials discovery: a case study on functionalized nanoporous materials

V. Gkatsis, P. Maratos, C. Rekatsinas, G. Giannakopoulos and P. Krokidas, Phys. Chem. Chem. Phys., 2025, 27, 23152 DOI: 10.1039/D5CP02908B

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements