Issue 5, 2025

Understanding the language of molecules: predicting pure component parameters for the PC-SAFT equation of state from SMILES

Abstract

A major bottleneck in developing sustainable processes and materials is a lack of property data. Recently, machine learning approaches have vastly improved previous methods for predicting molecular properties. However, these machine learning models are often not able to handle thermodynamic constraints adequately. In this work, we present a machine learning model based on natural language processing to predict pure-component parameters for the perturbed-chain statistical associating fluid theory (PC-SAFT) equation of state. The model is based on our previously proposed SMILES-to-Properties-Transformer (SPT). By incorporating PC-SAFT into the neural network architecture, the machine learning model is trained directly on experimental vapor pressure and liquid density data. Combining established physical modeling approaches with state-of-the-art machine learning methods enables high-accuracy predictions across a wide range of pressures and temperatures, while keeping the thermodynamic consistency of an equation of state like PC-SAFT. SPTPC-SAFT demonstrates exceptional prediction accuracy even for complex molecules with various functional groups, outperforming traditional group contribution methods by a factor of four in the mean average percentage deviation. Moreover, SPTPC-SAFT captures the behavior of stereoisomers without any special consideration. To facilitate the application of our model, we provide predicted PC-SAFT parameters of 13 279 components, making PC-SAFT accessible to all researchers.

Graphical abstract: Understanding the language of molecules: predicting pure component parameters for the PC-SAFT equation of state from SMILES

Supplementary files

Article information

Article type
Paper
Submitted
14 Mar 2024
Accepted
20 Dec 2024
First published
29 Jan 2025
This article is Open Access
Creative Commons BY license

Digital Discovery, 2025,4, 1142-1157

Understanding the language of molecules: predicting pure component parameters for the PC-SAFT equation of state from SMILES

B. Winter, P. Rehner, T. Esper, J. Schilling and A. Bardow, Digital Discovery, 2025, 4, 1142 DOI: 10.1039/D4DD00077C

This article is licensed under a Creative Commons Attribution 3.0 Unported Licence. You can use material from this article in other publications without requesting further permissions from the RSC, provided that the correct acknowledgement is given.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements