Issue 6, 2026

Out-of-distribution evaluation of active learning pipelines for molecular property prediction

Abstract

Active learning (AL) has been widely applied as a strategy to reduce the data requirements of training machine learning models. Such a strategy can be especially valuable in fields where data collection is costly or time-consuming, as is the case for molecular property data. In this study, we evaluate AL for molecular property prediction, focusing on the performance on out-of-distribution (OOD) data. This OOD evaluation framework mimics the scenario found in real-world applications but is understudied in the prior literature. In our study, we focus on the prediction of solvation energy from molecular structure and develop an AL framework based on prediction uncertainties derived from Evidential Deep Learning (EDL). We started by training our model on an in-distribution training dataset and progressively augmented it with molecules from an OOD dataset sampled from PubChem, selected either randomly or using the AL strategy. We further examined generalization capabilities of AL by beginning with a subset of the in-distribution dataset, intentionally chosen to reduce initial diversity. Our results indicate that EDL demonstrates an advantage over random sampling. To further understand the behavior of the AL algorithm, we performed analysis of how the similarity between the training dataset and the held-out dataset affects the AL performance and of the distributional differences in the types of molecules selected by random sampling and AL.

Graphical abstract: Out-of-distribution evaluation of active learning pipelines for molecular property prediction

Supplementary files

Article information

Article type
Paper
Submitted
20 Oct 2025
Accepted
30 Dec 2025
First published
23 Jan 2026
This article is Open Access
Creative Commons BY license

RSC Adv., 2026,16, 5281-5295

Out-of-distribution evaluation of active learning pipelines for molecular property prediction

T. Yin, P. Gao, G. Panapitiya and E. G. Saldanha, RSC Adv., 2026, 16, 5281 DOI: 10.1039/D5RA08055J

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements