SurfPro – a curated database and predictive model of experimental properties of surfactants

Abstract

Despite great industrial interest, modeling the physical properties of surfactants in water based on their molecular structure remains a challenge. A significant part of this challenge is in obtaining sufficient amounts of high-quality data. Experimentally determined properties such the critical micelle concentration (CMC) and surface tension at CMC (γCMC) have been reported for many surfactants. However, surfactant data are scattered across many literature sources, and reported in a manner which is often unsuitable as input for predictive models. In this work, we address this limitation by compiling the SurfPro database of surfactant properties. SurfPro consists of 1624 surfactant entries curated from 223 literature sources, containing 1395 CMC values, 972 γCMC values and more than 657 values for Γmax, C20, πCMC and Amin. However, only 647 structures have all reported properties, and for most surfactants multiple properties are missing. We trained a previously reported graph neural network architecture for single- and multi-property prediction on these incomplete data of all surfactant types in the database to accurately predict pCMC (−log10(CMC)), γCMC, Γmax and pC20. We achieved state-of-the-art performance of these four properties using an ensemble of AttentiveFP models trained on ten different folds of the training data in the multi-property setting. Finally, we leveraged the predictions and uncertainties of the ensemble model to impute all missing properties for all 977 surfactants with an incomplete set of properties. We make our curated SurfPro database, proposed test split and training datasets, the imputed database, as well as our code publicly available.

Graphical abstract: SurfPro – a curated database and predictive model of experimental properties of surfactants

Supplementary files

Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article.

View this article’s peer review history

Article information

Article type
Paper
Submitted
13 Dec 2024
Accepted
18 Mar 2025
First published
19 Mar 2025
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025, Advance Article

SurfPro – a curated database and predictive model of experimental properties of surfactants

S. L. Hödl, L. Hermans, P. F. J. Dankloff, A. Piruska, W. T. S. Huck and W. E. Robinson, Digital Discovery, 2025, Advance Article , DOI: 10.1039/D4DD00393D

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements