Prediction of self-diffusion coefficients via a hybrid PCP-SAFT + ANN model incorporating COSMO-SAC sigma-profile descriptors

Aliakbar Roosta; Nima Rezaei; Hamid Reza Godini

doi:10.1039/D6CP01425A

View PDF Version

Open Access Article

This Open Access Article is licensed under a
Creative Commons Attribution 3.0 Unported Licence

DOI: 10.1039/D6CP01425A (Paper) Phys. Chem. Chem. Phys., 2026, Advance Article

Prediction of self-diffusion coefficients via a hybrid PCP-SAFT + ANN model incorporating COSMO-SAC sigma-profile descriptors

Aliakbar Roosta*^ab, Nima Rezaei^a and Hamid Reza Godini^b
^aDepartment of Separation Science, School of Engineering Science, LUT University, Lappeenranta, Finland
^bDepartment of Energy and Mechanical Engineering, School of Engineering, Aalto University, Espoo, Finland. E-mail: aliakbar.roosta@aalto.fi

Received 16th April 2026 , Accepted 5th June 2026

First published on 8th June 2026

Abstract

Reliable estimation of self-diffusion coefficient is fundamental for characterizing mass transport within fluids; however, an accurate prediction remains difficult due to the strong influence of thermodynamic conditions (temperature and pressure) and molecular characteristics (such as size, shape, and intermolecular forces). In this study, a hybrid predictive model is introduced, combining the PCP-SAFT equation of state with an artificial neural network (ANN) to estimate self-diffusion coefficients over a broad range of conditions. The model is developed using a dataset comprising 2263 experimental measurements for 67 compounds, spanning temperatures between 93.0 and 973.2 K, pressures up to 3036 bar, corresponding to self-diffusion coefficients spanning nearly five orders of magnitude from 10⁻¹² to 10⁻⁷ m² s⁻¹. To regorously assess the predictive performance, the dataset was partitioned into 30% reserved for independent validation and 70% for training. The proposed model incorporates thermodynamic inputs, namely density and dimensionless form of residual entropy obtained from PCP-SAFT, together with molecular descriptors derived from COSMO-SAC sigma profiles. The selected ANN architecture, comprising two hidden layers with 14 and 7 neurons, respectively, provides high predictive performance, achieving R² values of 0.9937 and 0.9763 and AARD values of 8.89% and 15.89% for the training and testing datasets, respectively. Overall, the proposed framework offers a unified, reliable model for predicting diffusion behavior under diverse thermodynamic conditions.

1 Introduction

Self-diffusion coefficient is a fundamental transport property. It characterizes molecular diffusion in fluids and plays a critical role in a wide range of engineering and scientific applications, including materials science,¹ separation processes,² reaction engineering,³ energy systems,⁴ and corrosion.⁵ However, predicting self-diffusion coefficients remains a challenging task due to strong dependence on temperature, pressure, intermolecular interactions, and molecular structure (e.g., size, shape, and functional groups), whose effects on transport behavior are difficult to capture in a unified predictive model.⁶

Experimental measurements of self-diffusion coefficients are often time-consuming, costly, and limited to specific compounds and conditions. As a result, there is a growing need for predictive models capable of estimating diffusion coefficients over wide thermodynamic ranges and for diverse chemical families. Conventional approaches such as theoretical^7–9 and semi-empirical models^10–12 have been developed to describe the self-diffusion behavior. While such methods can provide reasonable accuracy for specific systems, they are typically restricted to the compounds and the experimental conditions studied, limiting their applicability to new or uncharacterized compounds and their predictability outside the studied conditions.

To address these limitations, recent efforts have focused on data-driven approaches, particularly machine learning techniques such as artificial neural networks (ANNs), to establish relationships between molecular structure, thermodynamic properties, and transport behavior.^13–16

As a potential source of molecular-level input for these models, COSMO-based methods have emerged as powerful tools for describing molecular interactions through sigma-profiles, which represent the surface charge density distribution of the molecules.^17,18 These profiles can be transformed into compact molecular descriptors that capture key molecular features such as polarity, charge distribution, and hydrogen-bonding capability. When combined with machine learning models, these descriptors enable the development of predictive models that incorporate molecular-level information into the model.^19–21

In this work, we propose a hybrid modeling approach that integrates COSMO-SAC-derived molecular descriptors with thermodynamic properties (density and dimensionless form of residual entropy) calculated from perturbed-chain polar statistical associating fluid theory (PCP-SAFT) equation of state within an ANN framework. The developed model is designed to provide accurate predictions of self-diffusion coefficients for different chemicals across a wide range of temperatures and pressures. A notable advantage of the proposed model is its broad applicability across extensive operating conditions, including elevated temperatures (up to 973 K) and pressures (up to 3036 bar), conditions that are seldom covered in the existing studies. This capability is particularly relevant for practical applications where fluids are exposed to extreme conditions, such as high-temperature and high-pressure processes, including supercritical separation.^22,23

The remainder of this paper is structured as follows. Section 2 describes the dataset compilation, the extraction of molecular descriptors from COSMO-SAC sigma-profiles, and the implementation of the PCP-SAFT equation of state, along with the design and training of the ANN model. Section 3 presents a detailed evaluation of the model performance using statistical metrics, graphical analyses, and sensitivity analysis to identify the most influential input variables. Finally, Section 4 summarizes the main findings and outlines potential directions for future work.

2 Methods and data collection

2.1 Self-diffusion coefficient data sources

Experimental self-diffusion coefficient data were gathered from the literature across a wide spectrum of chemical families, encompassing alkanes, cycloalkanes, alcohols, polyols, aromatics, ketones, ethers, water, olefins, nitriles, amines, and halogenated species. In total, the assembled dataset includes 2263 measurements corresponding to 67 different compounds, ensuring coverage of broad chemical diversity. The collected data also cover extensive thermodynamic conditions, with temperatures varying between 93 and 973 K, pressures ranging from 1 to 3036 bar, and self-diffusion coefficients spanning from 1.89 × 10⁻¹² to 3.61 × 10⁻⁷ m² s⁻¹. Having such a wide range of input data enables the model to be trained and evaluated across significantly different conditions affecting molecular diffusion. To enhance the reliability of the dataset, the data were obtained from multiple independent literature sources, thereby reducing potential biases arising from individual studies and better representing experimental variability. An overview of the compiled dataset is provided in Table 1, where the investigated compounds, the corresponding operating conditions, and the number of available data points are summarized. To develop and validate the ANN model, the dataset was divided into training and testing stages by also considering the chemical species classes, so that both sub-datasets include a wide range of chemical types. Specifically, 45 compounds (1586 data points) were assigned to the training set, while the remaining 24 compounds (677 data points) were reserved exclusively for testing. This strategy, corresponding to approximately 30% of the data used for validation, ensures that the model is evaluated on entirely unseen compounds, providing a stringent assessment of its predictive capability and generalization performance.

Table 1 Summary of experimental self-diffusion coefficient datasets employed for the ANN model training and validation

No.	Name	CAS no.	No. of data	T/K	P/bar	Ref.
Train data
1	Methane	74-82-8	117	93–454	1–898.3	34–41
2	Ethane	74-84-0	54	136–454	43.6–978.5	34, 36, 37 and 39
3	Propane	74-98-6	21	112–453	14.7–500	34 and 39
4	Butane	106-97-8	9	150–451	50–500	34
5	Pentane	109-66-0	29	174–450	1–981	34 and 42–44
6	Heptane	142-82-5	47	186.1–360.6	1–981	34 and 42–44
7	Octane	111-65-9	36	248.14–383.7	1–998	34, 44 and 45
8	Decane	124-18-5	39	247.86–448	1–750	34 and 42–45
9	Undecane	1120-21-4	17	293–353	1–981	34 and 45
10	Dodecane	112-40-3	33	268.75–434.7	1–510	34, 45 and 46
11	Tetradecane	629-59-4	23	279.36–443	1–750	34, 45 and 46
12	Pentadecane	629-62-9	16	288.16–353	1–981	34 and 45
13	Hexadecane	544-76-3	31	292.68–472.5	1–996	34 and 45
14	Heptadecane	629-78-7	14	303–353	1–981	34
15	Octadecane	593-45-3	11	301.86–425.8	1	34
16	Eicosane	112-95-8	6	323.16–443.7	1	34
17	Tetracosane	646-31-1	10	322.16–423.7	1	34
18	2-Methylpentane	107-83-5	5	200–308.2	1	34
19	3-Methylpentane	96-14-0	8	200–313.2	1	34 and 47
20	2,3-Dimethylbutane	79-29-8	11	175.48–453	1–500	34
21	2,2-Dimethylbutane	75-83-2	18	262.37–450	1–600	34 and 47
22	Cyclopentane	287-92-3	12	273.16–328	1–750	34
23	Cyclohexane	110-82-7	84	281.7–393.2	1–900	34, 44–46, 48 and 49
24	Cycloheptane	291-64-5	7	288.16–348.8	1	34
25	Ethanol	64-17-5	54	173–437	1–931	34, 45 and 50–53
26	1-Propanol	71-23-8	32	212–441	1–750	34, 45 and 53
27	1-Butanol	71-36-3	11	268.16–353.2	1	34, 45, 54 and 55
28	2-Pentanol	6032-29-7	46	237.1–483.1	50–500	34 and 56
29	3-Pentanol	584-02-1	52	249.7–474.5	50–500	34 and 56
30	1-Pentanol	71-41-0	22	213–428.6	1–500	34 and 56
31	1-Hexanol	111-27-3	5	278.16–338.2	1	34 and 55
32	1-Octanol	111-87-5	9	288.16–343.2	1	34
33	Glycerol	56-81-5	35	296.8574–435.1	1	57–60
34	Benzene	71-43-2	146	279.96–373.2	1–980.7	34, 44, 48–50 and 61–64
35	Toluene	108-88-3	61	175.4286–729.2	1–997	34 and 61
36	o-Terphenyl	84-15-1	16	328.16–438.2	1	65
37	Acetone	67-64-1	20	182.86–323.2	1	34
38	Water	7732-18-5	264	273–973.2	1–976	34, 66 and 67
39	Tetrahydrofuran	109-99-9	7	180.56–308.2	1	34
40	Ethylene	74-85-1	62	123.15–348.2	20.4–810.6	34 and 36
41	Carbon disulfide	75-15-0	10	268.2–313.2	1–811	34
42	1,2-Dichloroethane	107-06-2	12	278.15–298.2	1–2795	68
43	Acetonitrile	75-05-8	64	238.2–343.2	1–3036	69

Test data
44	Ammonia	7664-41-7	18	199.2–473	1–750	34, 70 and 71
45	2-Propanol	67-63-0	10	263–360	1–500	72
46	Methanol	67-56-1	43	157–453	1–981	50–52 and 72
47	Tridecane	629-50-5	18	288.2–353	1–981	34 and 45
48	Nonane	111-84-2	38	235.5–403.2	1–990	34, 44 and 45
49	Hexane	110-54-3	72	188.5–443	1–998	34, 44, 47, 61, 72 and 73
50	Isopentane	78-78-4	24	298–328	1–2000	74
51	Bromoform	75-25-2	8	283.2–343.2	1	75
52	N,N-Dimethylacetamide	127-19-5	35	255–468	1–2000	76
53	Dimethyl ether	115-10-6	40	184.5–458	500–2000	77
54	Diiodomethane	75-11-6	24	285.7–351.3	1	78
55	Dichloromethane	75-09-2	36	186–406	1–2000	79
56	Chloroform	67-66-3	40	217–397	1–1500	79
57	Carbon tetrachloride	56-23-5	3	313.2–333.2	1	80
58	Chlorotrifluoromethane	75-72-9	60	133–433	250–2000	81
59	Bromotrifluoromethane	75-63-8	59	141–432	250–2000	81
60	Fluorobenzene	462-06-6	13	240–360	1	82
61	Iodobenzene	591-50-4	15	330–440	1	82
62	Bromobenzene	108-86-1	18	250–420	1	82
63	Trimethylamine	75-50-3	44	174–423	100–2000	83
64	N,N-Dimethylformamide	68-12-2	36	243–448	1–2000	84
65	Propylene glycol	57-55-6	4	304–318	1	85
66	Dibromomethane	74-95-3	8	285–363	1	86
67	1,2-Dibromoethane	106-93-4	11	285–400	1	86

2.2 Reference diffusion coefficient and dimensionless scaling

Rosenfeld^24,25 proposed that the transport properties, when expressed in dimensionless form, can be correlated to the residual entropy (s^res). In the case of self-diffusion, this relationship can be written as:


	(1)

where D*, D and D^ref denote dimensionless, real, and reference self-diffusion coefficients, respectively. Residual entropy represents the deviation from ideal-gas behavior and reflects the reduction in configurational freedom due to intermolecular interactions, making it a compact thermodynamic descriptor of structural disorder of fluid structure and non-ideality.²⁶ This concept has been extensively validated for simple fluids, where diffusion, viscosity, and thermal conductivity follow quasi-universal relationships when expressed in reduced form as functions of excess entropy.²⁷

However, entropy-scaling relationships are not universally exact. Deviations have been reported for complex fluids, mixtures, strongly associating systems, and systems exhibiting thermodynamic anomalies, where the coupling between the molecular structure and transport becomes more intricate. In particular, near critical regions, at elevated densities, or in systems with strong directional interactions, local structuring and fluctuations may limit the applicability of simple entropy-based scaling.²⁷

In the present work, the Rosenfeld scaling concept is employed as a physically motivated normalization framework for the self-diffusion coefficient, while the final nonlinear relationship is learned using an ANN that incorporates both thermodynamic variables and molecular descriptors.

Several formulations are suggested in the literature for defining the reference self-diffusion coefficient, including those proposed by Rosenfeld,²⁴ Chapman-Enskog,^28,29 and Bretonnet.³⁰ In addition, various empirical relationships are reported to correlate dimensionless self-diffusion (D*) with residual entropy.^31–33 However, these correlations are typically parameterized for specific compounds, and their predictive capability is restricted to the systems studied, limiting their applicability to broader chemical systems.

In this work, a generalized predictive model is developed to estimate self-diffusion coefficients for a wide variety of compounds over extended temperature and pressure ranges. First, the Rosenfeld²⁴ scaling approach is adopted to define the reference for the self-diffusion coefficient:


	(2)

where

denotes the number density of molecules (m⁻³), T (K) is the temperature, M (kg kmol⁻¹) stands for the molar mass, and R (8314.46 J K⁻¹ kmol⁻¹) is the universal gas constant. The reference diffusion coefficient has dimensions of diffusivity. The term

represents a characteristic molecular length scale, while

represents a characteristic molecular thermal velocity. Their product, therefore, gives units of m² s⁻¹, consistent with a diffusion coefficient. In this work, D^ref is not used as an independent model for diffusion. Instead, it is used to nondimensionalize the experimental self-diffusion coefficient. The effects of the molecular structure are subsequently incorporated through PCP-SAFT-derived thermodynamic properties and COSMO-SAC molecular descriptors in the ANN framework.

Having the reference parameter, we present in the subsequent sections, the methodology to develop a generalized model for predicting the dimensionless self-diffusion coefficient (D*), following the procedure for obtaining the actual self-diffusion coefficient (D) using the reference value D^ref.²⁴

2.3 COSMO-SAC sigma-profile and molecular descriptor extraction

The COSMO-SAC sigma-profiles of all compounds considered in this study were obtained from the literature and used to derive the molecular descriptors for developing a correlation with the dimensionless self-diffusion coefficient (D*).⁸⁷ Each sigma-profile describes the distribution of surface charge density (e Å⁻²) over 51 discrete intervals spanning from −0.025 e Å⁻² to +0.025 e Å⁻². For each bin, COSMO-SAC provides the associated surface area related to each surface charge density, denoted as A_j (in Å²). The total surface area of each molecule (A_tot) is obtained by summing the contributions from all bins (j):


	(3)

In addition, key molecular descriptors can be derived using sigma-moments defined by eqn (4)–(8):


	(4)


	(5)


	(6)


	(7)


	(8)

These moments characterize different aspects of the surface charge distribution: M₁ gives the net surface charge, M₂ reflects the polarity, and M₃ describes the asymmetry. However, during model development, M₃ was found to have a negligible influence on the prediction of self-diffusion coefficients. The sensitivity analysis on the experimental data showed that M₃ contributes only approximately 3% to the prediction of the self-diffusion coefficient, which is significantly lower than the contributions from the other descriptors considered in this work; therefore, M₃ was excluded from the input variables.

M^HBD₁ and M^HBA₁ quantify hydrogen-bond donor and acceptor strengths, respectively. The threshold value σ^HB = 0.008 e Å⁻² separates nonpolar and polar surface segments.^17,18

2.4 PCP-SAFT equation of state

Previous studies have demonstrated PCP-SAFT equation of state (EoS),^88,89 which extends the original PC-SAFT EoS^90,91 by incorporating polar interactions in addition to dispersion and association effects, thereby improving accuracy for polar compounds, is capable of accurately predicting thermodynamic properties such as density and residual entropy for a wide variety of fluids. In this work, we use PCP-SAFT to generate these properties for all compounds considered, including nonpolar, polar, and associating systems.

The residual Helmholtz energy is expressed in eqn (9):^88,89


a^res = a^hc + a^d + a^p + a^assoc	(9)

The contributions correspond to hard-chain repulsion (hc), dispersion interactions (d), polar interactions (p), and association (assoc), respectively. For non-associating, nonpolar compounds, the model requires three parameters: segment number (m), segment diameter (σ), and dispersion energy (ε). Associating fluids require two additional parameters, the association energy and association volume , while polar compounds are characterized using the dipole moment (μ^D).

The dimensionless form of residual entropy is calculated from the Helmholtz energy as:⁹⁰


	(10)

where Z is the compressibility factor obtained from PCP-SAFT.

In addition to residual entropy, density (ρ) is also included as an input variable, which is calculated from the EoS. The inclusion of density as an input variable is because of the dependency of molecular diffusion on fluid density.

2.5 ANN structure and training approach

The self-diffusion coefficient was modeled using a feedforward ANN implemented in MATLAB (R2025a, neural network toolbox). The model combines thermodynamic variables (dimensionless form of residual entropy and density) with COSMO-SAC-derived molecular descriptors to predict diffusion behavior. The input vector consists of seven variables (Table 2), while the output corresponds to the logarithm of the dimensionless self-diffusion coefficient (D*).

Table 2 Definition of input variables used in the ANN model

	Input variables	Description
COSMO-SAC parameters	A_tot	Total surface area of each molecule (Å²)
	M₁	Index of net surface charge (e)
	M₂	Index of polarity (e² Å⁻²)
	M^HBD₁	Index of hydrogen-bond donor strength (e)
	M^HBA₁	Index of hydrogen-bond acceptor strength (e)
Thermodynamic properties	s^res/R	Dimensionless form of residual entropy
Thermodynamic properties	ρ	Molar density (kmol m⁻³)

Prior to training, all inputs and outputs were linearly scaled to the range [−1, 1], which improved numerical stability and facilitated convergence. The same scaling parameters derived from the training data were applied to the testing dataset. To identify an optimal and robust network configuration, we systematically explored different architectures, activation functions, and training algorithms. Both shallow and deep network structures were examined, including single hidden layers with 10–40 neurons and two-layer configurations with varying neuron counts.

Three activation functions of tansig, logsig, and poslin were evaluated for the hidden layers, while a linear activation was consistently used in the output layer to do the regression task. Training was conducted using multiple optimization algorithms, including Levenberg–Marquardt, scaled conjugate gradient, and Bayesian regularization. Model performances using training and independent testing datasets were assessed through statistical metrics such as the coefficient of determination (R²), mean absolute error (MAE), average absolute relative deviation (AARD%), and maximum absolute relative deviation. In addition, we analyzed parity plots, error distributions, and error trends across diffusion ranges to evaluate the accuracy and robustness of the model.

3 Results and discussion

In this section, the development and validation of the hybrid model, combining PCP-SAFT and an artificial neural network (ANN), are presented for predicting the self-diffusion coefficient. The predictive capability of the model is systematically evaluated through comparison with experimental data, along with statistical performance indicators. In addition, a sensitivity analysis is conducted to determine the relative importance of the input variables.

3.1 Model development and evaluation

A range of ANN architectures was initially constructed and trained. The resulting models were then ranked based on their predictive performance on the test dataset, using statistical criteria such as the coefficient of determination (R²), mean absolute error (MAE), and average absolute relative deviation (AARD). The architecture corresponding to the best overall performance was selected for further analysis.

To justify the selected ANN topology, several network architectures were evaluated, and some of them are summarized in Table 3. Networks with fewer neurons, such as 8-4 and 10-5, showed higher AARD values for both training and testing datasets, indicating insufficient flexibility to capture the nonlinear relationship between the input descriptors and the dimensionless self-diffusion coefficient. Increasing the number of neurons improved the training performance; however, architectures larger than 14-7 led to reduced testing accuracy despite lower training errors, suggesting the onset of overfitting. The 14-7 architecture provided the best compromise between model complexity and generalization capability, achieving high accuracy for the training data while maintaining the lowest testing AARD among the evaluated structures. Therefore, this topology was selected as the final ANN configuration.

Table 3 Effect of the ANN architecture on model performance

Hidden-layer neurons	Train AARD%	Test AARD%	Comment
8-4	15.36	21.84	Underfitting
10-5	11.92	19.96	Improved but less accurate
12-6	9.84	17.51	Good performance
14-7	8.89	15.89	Selected model
16-8	7.45	18.54	Slight overfitting
20-10	6.43	22.11	Overfitting

In addition, to evaluate the statistical robustness and generalization capability of the 14-7 architecture model, a repeated compound-wise validation procedure was performed. In this approach, the dataset was randomly partitioned multiple times at the compound level, ensuring that each testing set contained entirely unseen chemical species. The ANN model was retrained for each split, and the resulting performance metrics were statistically analyzed.

As shown in Table 4, the variation in model performance across different splits is relatively small. The AARD and R² values for both training and testing datasets exhibit limited standard deviations, and the corresponding 95% confidence intervals remain narrow. This indicates that the predictive performance of the model is not sensitive to the specific selection of compounds in the training or testing sets. Overall, this repeated validation analysis provides strong evidence that the hybrid PCP-SAFT + ANN model exhibits both robustness and generalizability.

Table 4 Statistical robustness of the model based on repeated compound-wise data splits, reported as mean values ±95% confidence intervals

Dataset	Training	Testing
R²	0.9937 ± 0.0038	0.9763 ± 0.0131
ARAD	8.89 ± 1.07	15.89 ± 2.08

Table 5 presents a statistical evaluation of the 14-7 architecture model using training and independent testing datasets. The results clearly demonstrate the high predictive accuracy of the proposed hybrid framework. For the training dataset (1586 data points), the model achieves a coefficient of determination of R² = 0.9937 along with an MAE of 9.23 × 10⁻¹⁰ m² s⁻¹ and an AARD of 8.89%. These low error values, combined with the high R², indicate that the ANN successfully captures the complex and nonlinear dependence of the self-diffusion coefficient on the selected input variables. The predictive performance remains robust when evaluated against the independent testing dataset (677 data points). In this case, the model yields R² = 0.9763, MAE = 4.32 × 10⁻¹⁰ m² s⁻¹, and AARD = 15.89%. Considering the full dataset (2263 data points), the overall performance remains consistently high, with R² = 0.9839. The model maintains reliable accuracy across a wide range of self-diffusion coefficients, spanning approximately five orders of magnitude (1.89 × 10⁻¹²–3.61 × 10⁻⁷ m² s⁻¹). This broad applicability demonstrates the robustness of the hybrid PCP-SAFT + ANN approach for predicting self-diffusion behavior in diverse chemical systems.

Table 5 Statistical evaluation of the 14-7 architecture model for self-diffusion coefficient prediction using training and testing datasets

Set	No data	R²	MAE (m² s⁻¹)	AARD%	Max ARD%
Train	1586	0.9937	9.23 × 10⁻¹⁰	8.89	54.38
Test	677	0.9763	4.32 × 10⁻¹⁰	15.89	61.14
Total	2263	0.9839	7.76 × 10⁻¹⁰	10.98	61.14

Fig. 1 shows the structure of the proposed hybrid framework, which integrates thermodynamic information from the PCP-SAFT EoS with a data-driven ANN model. The input layer consists of seven neurons representing key descriptors, including COSMO-SAC-derived molecular parameters, dimensionless residual entropy, and density obtained from PCP-SAFT calculations. These inputs provide both molecular-level and thermodynamic information, enabling a physically informed prediction. The proposed ANN includes two hidden layers with 14 and 7 neurons, respectively. This configuration was found to be sufficiently flexible to capture the nonlinear interactions between the descriptors without overfitting the data. The output layer contains a single neuron that predicts the dimensionless self-diffusion coefficient (D*). The complete set of model parameters and implementation details are provided in the SI.


	Fig. 1 Schematic representation of the hybrid PCP-SAFT + ANN framework used for self-diffusion coefficient prediction.

Fig. 2 presents the parity plot comparing the predicted and experimental values of the self-diffusion coefficient (D) for both the training and testing datasets. The close alignment of the data points along the diagonal reference line (y = x) indicates a high level of agreement between the model predictions and experimental measurements. The model maintains a strong predictive accuracy over a broad range of self-diffusion coefficients, spanning approximately five orders of magnitude (10⁻¹² to 10⁻⁷ m² s⁻¹). This wide coverage highlights the capability of the hybrid PCP-SAFT + ANN framework to reliably capture diffusion behavior for systems with significantly different transport characteristics. Data points corresponding to the training set (blue circles) are densely distributed around the parity line, confirming that the model has effectively learned the underlying nonlinear relationships between the input descriptors and the target property. More importantly, the testing dataset (green triangles), which includes compounds not involved in the training process, also follows the parity line closely. This demonstrates that the model retains its high predictive accuracy when applied to unseen data. No noticeable systematic bias or deviation is observed across the entire range of D values, further supporting the robustness and stability of the selected network architecture. Overall, the results confirm that the proposed hybrid model provides reliable and consistent predictions of self-diffusion coefficients across diverse chemical systems and thermodynamic conditions.


	Fig. 2 Parity plot comparing the estimated and experimental self-diffusion coefficients for the training (1586 points) and testing (677 points) datasets.

Fig. 3 shows the distribution of relative error residuals for both the training and testing datasets as a function of the experimental self-diffusion coefficient. The error profiles provide additional insight into the consistency and reliability of the developed model. For the training dataset, the relative errors are predominantly distributed around zero across the entire range of diffusion coefficients. The narrow spread of the data indicates that the model achieves high accuracy with minimal dispersion, confirming its ability to represent the underlying relationships within the training data. In the case of the testing dataset, a slightly broader distribution of errors is observed, which is expected for data not included during the model calibration. Nevertheless, the errors are randomly distributed around zero, and no systematic overprediction or underprediction trends are observed. This relatively low error and its random distribution indicate that the model preserves its predictive capability when applied to unseen compounds. Importantly, the absence of any noticeable bias or trend in the error magnitude with respect to the diffusion coefficient suggests that the model performance is reliable across the investigated range. Overall, the results demonstrate that the hybrid PCP-SAFT + ANN model delivers unbiased and robust predictions, with no indication of overfitting and with strong generalization across diverse chemical systems and operating conditions.


	Fig. 3 Relative prediction error (%) as a function of experimental self-diffusion coefficient for the training (top panel) and testing (bottom panel) datasets.

Fig. 4 presents the histograms of relative prediction errors for both the training and testing datasets, providing a statistical perspective on the model accuracy and error distribution. For the training dataset, the error distribution is sharply centered around zero, with 70% falling to within ±10%, and over 90% within ±20%. This narrow and symmetric distribution confirms the high prediction accuracy of the model and demonstrates its ability to accurately represent the nonlinear dependence of the self-diffusion coefficient on the selected descriptors. Only a small fraction of the data exhibits larger deviations, which are mainly associated with conditions at very low diffusion coefficients, where sensitivity to input parameters and experimental uncertainty are typically higher. For the testing dataset, the error distribution remains centered close to zero, indicating that the model predictions are essentially unbiased for unseen compounds. Despite a slightly broader spread compared to the training data, 50% of error residuals are located within ±10% and 70% within ±20%, which are practically acceptable error bounds. The symmetric shape of the histogram further confirms the lack of skew relating to overprediction or underprediction. Overall, these quantitative error distributions demonstrate that the hybrid PCP-SAFT + ANN model achieves both high accuracy and strong generalization capability, maintaining reliable performance across a wide range of self-diffusion coefficients and thermodynamic conditions.


	Fig. 4 Histograms of relative prediction errors (%) for the self-diffusion coefficient for the training (left panel) and testing (right panel) datasets.

Fig. 5 presents the variation of the average absolute relative deviation (AARD%) across different ranges of self-diffusion coefficient for both the training and testing datasets. Each interval corresponds to one order of magnitude of D, enabling a consistent assessment of model performance across the entire diffusion range. For the training dataset, the AARD shows a clear decreasing trend with increasing self-diffusion coefficient. The highest deviations are observed at the lowest diffusion range (10⁻¹²–10⁻¹¹ m² s⁻¹); the AARD steadily declines as D increases, reaching its minimum values at the highest diffusion ranges. The increase in the relative deviation observed at very low diffusion coefficients can be attributed to both numerical sensitivity and experimental uncertainty. From a mathematical standpoint, the AARD involves normalization by the experimental value; therefore, when D is very small, even minor absolute differences between the predicted and experimental values can result in larger relative errors.


	Fig. 5 Variation of AARD (%) of the ANN model in estimating self-diffusion coefficient intervals for both training and test datasets.

In addition, low-diffusivity conditions typically correspond to high-density or highly structured fluid states, where molecular mobility is significantly restricted. Under such conditions, experimental measurements of self-diffusion coefficients are inherently more challenging and may be associated with higher uncertainty due to limitations in measurement techniques and sensitivity to temperature and pressure control.

From a modeling perspective, the calculation of residual entropy using PCP-SAFT also becomes more sensitive under these conditions.

Despite these challenges, the model maintains consistently low absolute accuracy across the full range of diffusion coefficients, and the observed increase in relative deviation at very low values is primarily a consequence of normalization effects and data sensitivity rather than a systematic limitation of the proposed framework. A similar trend is observed for the testing dataset, although with higher deviations as expected for unseen data. Overall, Fig. 5 confirms that the hybrid PCP-SAFT + ANN model provides reliable predictions of the self-diffusion coefficient across the six orders of magnitude range.

Fig. 6 presents the cumulative coverage curves for the training, testing, and overall datasets as a function of the AARD threshold. This representation provides a comprehensive assessment of the predictive reliability of the hybrid PCP-SAFT + ANN model across the full range of self-diffusion coefficients. For the training dataset, the curve increases sharply at low AARD thresholds, indicating that a large fraction of the data is predicted with high accuracy. 90% of the training data fall within 20% AARD, and complete coverage is achieved below about 52% AARD. This steep rise confirms the strong fitting capability of the model. For the testing dataset, the cumulative curve exhibits a more gradual increase, reflecting a greater variability as expected for unseen data. Nevertheless, the model still demonstrates solid predictive performance, with 70% of the data within 20% AARD and over 90% within 40% AARD. This behavior highlights the ability of the model to generalize effectively across different compounds and thermodynamic conditions. The curve corresponding to the full dataset lies between the training and testing curves, as expected, and reflects the overall predictive performance of the model. All three curves approach 100% coverage at AARD values below 62%, indicating that even the largest deviations remain within acceptable bounds. Overall, this cumulative analysis confirms that the developed hybrid model provides reliable and consistent predictions of the self-diffusion coefficient, with a high proportion of results falling within practically acceptable error limits for both known and unseen systems.


	Fig. 6 Cumulative fraction of data within a given AARD threshold for the training, testing, and overall datasets.

Table 6 presents the predictive performance of the hybrid PCP-SAFT + ANN model for different classes of compounds, categorized based on their intermolecular interaction type. This classification enables a more detailed assessment of the model's capability across fluids with fundamentally different interaction mechanisms, including dispersion-dominated (nonpolar), dipolar (polar non-associating), and hydrogen-bonding (associating) systems.

Table 6 Statistical evaluation of the hybrid PCP-SAFT + ANN model for different intermolecular interaction classes, including nonpolar, polar and associating compounds

Set	No component	No data	R²	MAE (m² s⁻¹)	AARD%	Max ARD%
Nonpolar	34	1108	0.9931	1.15 × 10⁻⁹	8.81	54.38
Polar non-associating	19	550	0.9343	4.57 × 10⁻¹⁰	16.57	58.72
Associating	14	605	0.9959	3.81 × 10⁻¹⁰	9.87	61.14

For nonpolar compounds, the model achieves a high level of accuracy, with an R² value of 0.9931 and an AARD of 8.81%. These systems are primarily governed by dispersion interactions, which are well described by PCP-SAFT. As a result, the residual entropy and density provide a consistent representation of the thermodynamic state, enabling the ANN to accurately capture the diffusion behavior across a wide range of conditions.

Similarly, associating compounds exhibit excellent predictive performance, with the highest R² value of 0.9959 and an AARD of 9.87%. This indicates that the hybrid framework successfully captures the effect of hydrogen-bonding interactions. The inclusion of association terms in PCP-SAFT, together with hydrogen-bond-related descriptors derived from COSMO-SAC, provides sufficient information to the ANN to capture the additional complexities introduced by specific intermolecular interactions.

In contrast, the performance for polar non-associating compounds is comparatively lower, with an R² value of 0.9343 and an AARD of 16.57%. This can be attributed to the more complex behavior of dipolar interactions, which are generally weaker and more sensitive to molecular orientation compared to hydrogen bonding. In such systems, the relationship between the thermodynamic properties and diffusion behavior is less direct, leading to increased variability in the data and a slightly reduced predictive accuracy. In addition, polar compounds often exhibit a wider range of molecular structures and dipole moments, which may not be fully captured by the selected descriptors.

Despite these differences, the model maintains reasonable accuracy across all categories, demonstrating its robustness and general applicability. The maximum absolute relative deviation (max ARD) remains within a similar range for all groups, indicating that extreme deviations are not systematically associated with any specific class of compounds.

3.2 Model capability to capture the trends in the self-diffusion coefficient with temperature and pressure

Fig. 7 illustrates the pressure and temperature dependency of the self-diffusion coefficient for four representative compounds (benzene, toluene, cyclohexane, and propane) selected from the training dataset. For each system, the self-diffusion coefficient is evaluated at multiple temperature levels and over a wide range of pressures to assess the capability of the hybrid PCP-SAFT + ANN model in capturing coupled thermodynamic effects. As observed in Fig. 7, the predicted values closely follow the experimental data across all investigated conditions. The model successfully reproduces the expected physical trend, namely the decrease of the self-diffusion coefficient with increasing pressure at a given temperature. In addition, the model accurately captures the influence of temperature, where higher temperatures lead to increased diffusion coefficients due to enhanced molecular motion. The predicted curves remain smooth and physically consistent over the entire pressure range, indicating that the model provides stable and continuous predictions. The excellent agreement between predicted and experimental data, combined with the smooth curves, demonstrates the strong interpolation capability of the ANN within the training domain. Overall, the results confirm that the hybrid model effectively captures the combined effects of pressure and temperature on self-diffusion behavior.


	Fig. 7 Comparison of estimated and experimental self-diffusion coefficients for four representative compounds from the training dataset across a wide range of temperatures and pressures. Lines represent predictions, while symbols denote the literature data: benzene,^{34,44,48–50,61–64} toluene,^34,61 cyclohexane,^{34,44–46,48,49} and propane.^34,39

Fig. 8 shows the predictive performances of the developed hybrid PCP-SAFT + ANN model for four representative compounds (ammonia, methanol, isopentane, and chloroform) that were not included in the training dataset. The comparison between predicted values and reported experimental data validate excellent model's generalization capability across different chemical systems and thermodynamic conditions. As can be seen in Fig. 8, the ANN predictions closely follow the reported data over the full range of temperatures and pressures. The model accurately reproduces both the magnitude and the variation of the self-diffusion coefficient, with predicted curves closely following the experimental data points. The agreement is consistent across different pressure levels, and the model successfully captures the separation between isobars at each temperature. The deviations between the predicted and experimental values are generally small and show no systematic trend. Note that all systems presented in Fig. 8 were not included in the training dataset, making this validation particularly rigorous. Overall, Fig. 8 demonstrates that the hybrid PCP-SAFT + ANN model provides accurate and reliable predictions for unseen compounds over a wide range of thermodynamic conditions, highlighting its strong generalization capability and suitability for practical applications.


	Fig. 8 Comparison of estimated and experimental self-diffusion coefficients for four representative compounds from the test dataset across a wide range of temperatures and pressures: ammonia,^34,70,71 methanol,^50–52,72 isopentane,⁷⁴ and chloroform.⁷⁹

3.3 Relative importance analysis of input variables

To quantify the relative importance of the input variables in predicting the self-diffusion coefficient, a correlation-based sensitivity analysis was performed. This approach is particularly useful for ANN models, where the contribution of individual inputs cannot be directly inferred from the internal network structure. In this study, the Pearson correlation coefficient between each input variable and the predicted self-diffusion coefficient was calculated for the entire dataset. The absolute values of diffusion coefficients were then normalized by taking the sum of all absolute correlations to obtain the relative sensitivity for each input. This normalization enables a direct comparison between the contributions from each variable to the model output. The results are summarized in Fig. 9, where the bar heights indicate their relative influence on the predicted self-diffusion coefficient. As shown in Fig. 9, the most influential variable is dimensionless residual entropy

, with a relative contribution of approximately 35%, indicating that intermolecular interactions play the most dominant role in determining diffusion behavior. This is followed by A_tot (17%); the first sigma-profile moment, M₁ (14%); and the second sigma-profile moment, M₂ (12%); all of which make notable contributions to the model prediction. The hydrogen-bond donor descriptor (M^HBD₁) also shows a meaningful effect, with a relative importance of approximately 9%, while density (ρ) contributes around 8%, and the hydrogen-bond acceptor descriptor (M^HBA₁) exhibits the smallest contribution, at approximately 5%. These results highlight that thermodynamic properties and molecular surface characteristics play key roles in governing the molecular diffusion. Moderate contributions are observed for M₁ (14%) and M₂ (12%), suggesting that molecular surface descriptors still play relevant, though secondary roles in influencing diffusion. Overall, the sensitivity analysis indicates that the ANN model relies primarily on residual entropy, while molecular descriptors also contribute significantly.


	Fig. 9 Contribution of input variables to the ANN predictions of self-diffusion coefficients, evaluated via permutation-based sensitivity analysis.

Table 7 presents the Pearson correlation matrix for the seven input variables used in the ANN model. The results show that most descriptors exhibit weak to moderate correlations, while some COSMO-SAC-derived descriptors show stronger correlations with each other. For example, M₂ is strongly correlated with M^HBD₁ and M^HBA₁, with correlation coefficients of 0.80 and 0.68, respectively. This is expected because these descriptors are all derived from the sigma-profile and represent related aspects of molecular surface charge distribution and hydrogen-bonding tendency.

Table 7 Pearson correlation coefficients between the input variables

	A_tot	M₁	M₂	M^HBD₁	M^HBA₁	s^res/R	ρ
A_tot	1.00	−0.68	−0.33	−0.35	−0.40	−0.59	−0.53
M₁	−0.68	1.00	0.12	0.06	0.01	0.53	0.12
M₂	−0.33	0.12	1.00	0.80	0.68	−0.23	0.55
M^HBD₁	−0.35	0.06	0.80	1.00	0.64	−0.20	0.54
M^HBA₁	−0.40	0.01	0.68	0.64	1.00	−0.09	0.77
s^res/R	−0.59	0.53	−0.23	−0.20	−0.09	1.00	0.12
ρ	−0.53	0.12	0.55	0.54	0.77	0.12	1.00

A strong correlation is also observed between M^HBA₁ and density, with a correlation coefficient of 0.77. This suggests that compounds with stronger hydrogen-bond acceptor characteristics in the present dataset tend to be associated with higher-density conditions or molecular classes. However, having correlations among the inputs does not necessarily imply redundancy, because, for example, density is a thermodynamic-state variable, while M^HBA₁ is a molecular descriptor.

Notably, the dimensionless residual entropy, s^res/R, shows weak to moderate correlations with most COSMO-SAC descriptors. Its strongest correlations are with A_tot and M₁, with coefficients of −0.59 and 0.53, respectively, while its correlations with M₂, M^HBD₁, M^HBA₁, and density are relatively weak.

To further evaluate the contribution of residual entropy to the predictive capability of the model, an ablation study was performed in which the dimensionless residual entropy (s^res/R) was removed from the input set and the ANN was retrained using the same training/testing protocol.

As seen in Table 8, the resulting model exhibited a noticeable deterioration in predictive performance, particularly for the independent testing dataset, with increased AARD values and reduced R². This result confirms that residual entropy provides essential thermodynamic information that cannot be fully replaced by the remaining molecular descriptors and density alone.

Table 8 Evaluation of the contribution of residual entropy to the predictive capability of the model

Model	Train		Test
Model	R²	AARD%	R²	AARD%
With s^res/R	0.9937	8.89	0.9763	15.89
Without s^res/R	0.9491	29.60	0.8390	51.69

4 Conclusions

In this work, a hybrid modeling framework integrating the PCP-SAFT model with an ANN was developed for the prediction of self-diffusion coefficients across a wide range of compounds and thermodynamic conditions. The model was trained and evaluated using a comprehensive dataset comprising 2263 experimental self-diffusion coefficient data points for 67 compounds from multiple chemical families, covering temperatures ranging from 93.0 to 973.2 K and pressures up to 3036 bar. The considered dataset spans from 1.89 × 10⁻¹² to 3.61 × 10⁻⁷ m² s⁻¹, corresponding to approximately five orders of magnitude, ensuring a broad representation of molecular transport behavior.

Molecular descriptors derived from COSMO-SAC sigma-profiles, including the total surface area, polarity-related parameters, and hydrogen-bonding contributions, were employed alongside thermodynamic properties (dimensionless form of residual entropy and density) calculated using PCP-SAFT. These inputs enabled the ANN model to capture the underlying relationships governing diffusion behavior.

To ensure the model's robustness and generalization capability, the dataset was partitioned on a compound basis, with 45 compounds (1586 data points) used for training and 24 compounds (677 data points) reserved exclusively for independent testing. A systematic exploration of ANN architectures, activation functions, and training algorithms was conducted to identify the optimal model configuration. The selected architecture, consisting of two hidden layers with 14 and 7 neurons, demonstrated excellent predictive performance, achieving R² values of 0.9937 and 0.9763 for the training and testing datasets, respectively, along with AARD values of 8.89% and 15.89%. The largest relative deviations were observed at very low diffusion coefficients, where small absolute differences lead to amplified percentage errors.

Sensitivity analysis based on Pearson correlation coefficients indicates that the dimensionless residual entropy is the most influential variable in predicting self-diffusion coefficients, followed by total surface area and polarity-related descriptors. In contrast, hydrogen-bonding descriptors and density exhibit comparatively lower contributions. This highlights the dominant role of thermodynamic state representation, particularly residual entropy, in governing diffusion behavior. Such a finding underscores the effectiveness of integrating PCP-SAFT-derived thermodynamic properties with molecular descriptors, providing a robust basis for predictive modeling.

The strong predictive capability of the model highlights the effectiveness of integrating physically grounded thermodynamic inputs with data-driven approaches. The proposed PCP-SAFT + ANN framework provides a reliable and generalizable tool for estimating self-diffusion coefficients across a wide range of compounds and conditions. This approach can support process design, transport modeling, and simulation tasks where accurate diffusion data are required. Future work may extend this framework to multicomponent systems and incorporate it into process simulation platforms.

Author contributions

Aliakbar Roosta: conceptualization, data collection, programming, analysis, and writing – review and editing. Nima Rezaei: conceptualization, supervision, and writing – review and editing. Hamid Godini: conceptualization, supervision, and writing – review and editing.

Conflicts of interest

There are no conflicts to declare.

Data availability

No new experimental data were generated for this study.

Supplementary information related to this study is accessible online to support the reproducibility and promote transparency of the proposed model. Supplementary information:dimensionless_self_diffusion_ANN_2026.mat—contains the trained ANN model, input standardization parameters, and model configuration metadata. Dimensionless_self_diffusion_Predictor.m—a MATLAB script that allows users to predict the self-diffusion coefficient of chemicals by entering COSMO-SAC-derived molecular descriptors, dimensionless form of residual entropy, and molar density, the script uses the trained ANN model. COSMO_SAC_derived_molecular_descriptors.xlsx—contains the COSMO-SAC-derived molecular descriptors for 69 chemicals. See DOI: https://doi.org/10.1039/d6cp01425a.

Acknowledgements

We acknowledge the funding received from the Research Council of Finland Academy Project Funding (3545438) that enabled this research.

Notes and references

B. A. Johnson, A. T. Castner, H. Agarwala and S. Ott, Beyond diffusion: ion and electron migration contribute to charge transport in redox-conducting metal–organic frameworks, Chem. Sci., 2025, 16, 5214–5222 RSC.
M. Mohammadi, M. Zirrahi and H. Hassanzadeh, An Analytical Model for Estimation of the Self-Diffusion Coefficient and Adsorption Kinetics of Surfactants Using Dynamic Interfacial Tension Measurements, J. Phys. Chem. B, 2020, 124, 3206–3213 CrossRef CAS PubMed.
D. S. Grebenkov and D. Krapf, Steady-state reaction rate of diffusion-controlled reactions in sheets, J. Chem. Phys., 2018, 149, 064117, DOI:10.1063/1.5041074.
A. Szczęsna-Chrzan, M. Vogler, P. Yan, G. Z. Żukowska, C. Wölke, A. Ostrowska, S. Szymańska, M. Marcinek, M. Winter, I. Cekic-Laskovic, W. Wieczorek and H. S. Stein, Ionic conductivity, viscosity, and self-diffusion coefficients of novel imidazole salts for lithium-ion battery electrolytes, J. Mater. Chem. A, 2023, 11, 13483–13492 RSC.
I. B. Obot, A. A. Bahraq and A. H. Alamri, Density functional theory and molecular dynamics simulation of the corrosive particle diffusion in pyrimidine and its derivatives films, Comput. Mater. Sci., 2022, 210, 111428 CrossRef CAS.
B. Zhang, X. Li, J. Zhang, J. Wang and H. Jin, Study on the self-diffusion coefficients of binary mixtures of supercritical water and H2, CO, CO2, CH4 confined in carbon nanotubes, Water Res., 2025, 283, 123856 CrossRef CAS PubMed.
J. Busch and D. Paschek, An OrthoBoXY-method for various alternative box geometries, Phys. Chem. Chem. Phys., 2024, 26, 2907–2914 RSC.
M. A. Hunter, B. Demir, C. F. Petersen and D. J. Searles, New Framework for Computing a General Local Self-Diffusion Coefficient Using Statistical Mechanics, J. Chem. Theory Comput., 2022, 18, 3357–3363 CrossRef CAS PubMed.
P. Ghesquière, T. Mineva, D. Talbi, P. Theulé, J. A. Noble and T. Chiavassa, Diffusion of molecules in the bulk of a low density amorphous ice from molecular dynamics simulations, Phys. Chem. Chem. Phys., 2015, 17, 11455–11468 RSC.
R. Kokubu, S. Inasawa and H. Ohashi, Validation of Shell-like Free Volume Model for Self-Diffusion Coefficients in Polymer–Solvent System with Practical Parameter Determination Method, Ind. Eng. Chem. Res., 2025, 64, 4596–4603 CrossRef CAS.
Z. Zuo, X. Lu and X. Ji, Modeling Self-Diffusion Coefficient and Viscosity of Chain-like Fluids Based on ePC-SAFT, J. Chem. Eng. Data, 2024, 69, 348–362 CrossRef CAS.
Y. Wei, Z. Dai, Y. Dong, A. Filippov, X. Ji, A. Laaksonen, F. U. Shah, R. An and H. Fuchs, Molecular interactions of ionic liquids with SiO 2 surfaces determined from colloid probe atomic force microscopy, Phys. Chem. Chem. Phys., 2022, 24, 12808–12815 RSC.
F. Zeng, R. Wan, Y. Xiao, F. Song, C. Peng and H. Liu, Predicting the Self-Diffusion Coefficient of Liquids Based on Backpropagation Artificial Neural Network: A Quantitative Structure–Property Relationship Study, Ind. Eng. Chem. Res., 2022, 61, 17697–17706 CrossRef CAS.
J. P. Allers, J. A. Harvey, F. H. Garzon and T. M. Alam, Machine learning prediction of self-diffusion in Lennard-Jones fluids, J. Chem. Phys., 2020, 153, 034102, DOI:10.1063/5.0011512.
C. J. Leverant, J. A. Greathouse, J. A. Harvey and T. M. Alam, Machine Learning Predictions of Simulated Self-Diffusion Coefficients for Bulk and Confined Pure Liquids, J. Chem. Theory Comput., 2023, 19, 3054–3062 CrossRef CAS PubMed.
J. P. Allers, F. H. Garzon and T. M. Alam, Artificial neural network prediction of self-diffusion in pure compounds over multiple phase regimes, Phys. Chem. Chem. Phys., 2021, 23, 4615–4623 RSC.
A. Klamt, COSMO-RS: from quantum chemistry to fluid phase thermodynamics and drug design, Elsevier, 2005 Search PubMed.
A. Klamt, The COSMO and COSMO-RS solvation models, Wiley Interdiscip. Rev.:Comput. Mol. Sci., 2011, 1, 699–709 CAS.
T. Nevolianis, R. A. Ahmed, A. Hellweg, M. Diedenhofen and K. Leonhard, Blind prediction of toluene/water partition coefficients using COSMO-RS: results from the SAMPL9 challenge, Phys. Chem. Chem. Phys., 2023, 25, 31683–31691 RSC.
G. Chen, Z. Song and Z. Qi, Transformer-convolutional neural network for surface charge density profile prediction: Enabling high-throughput solvent screening with COSMO-SAC, Chem. Eng. Sci., 2021, 246, 117002 CrossRef CAS.
N. Mac Fhionnlaoich, J. Zeglinski, M. Simon, B. Wood, S. Davin and B. Glennon, A hybrid approach to aqueous solubility prediction using COSMO-RS and machine learning, Chem. Eng. Res. Des., 2024, 209, 67–71 CrossRef CAS.
J. J. Suárez, I. Medina and J. L. Bueno, Diffusion coefficients in supercritical fluids: available data and graphical correlations, Fluid Phase Equilib., 1998, 153, 167–212 CrossRef.
P. N. Bartlett, D. A. Cook, M. W. George, A. L. Hector, J. Ke, W. Levason, G. Reid, D. C. Smith and W. Zhang, Electrodeposition from supercritical fluids, Phys. Chem. Chem. Phys., 2014, 16, 9202 RSC.
Y. Rosenfeld, Relation between the transport coefficients and the internal entropy of simple systems, Phys. Rev. A:At., Mol., Opt. Phys., 1977, 15, 2545–2549 CrossRef.
Y. Rosenfeld, A quasi-universal scaling law for atomic transport in simple fluids, J. Phys.: Condens. Matter, 1999, 11, 5415–5427 CrossRef CAS.
I. H. Bell, J. C. Dyre and T. S. Ingebrigtsen, Excess-entropy scaling in supercooled binary mixtures, Nat. Commun., 2020, 11, 4300 CrossRef CAS PubMed.
J. C. Dyre, Perspective: Excess-entropy scaling, J. Chem. Phys., 2018, 149, 210901, DOI:10.1063/1.5055064.
S. Chapman and T. G. Cowling, The Mathematical Theory Of Nonuniform Gases, Cambridge At The University Press, 3rd edn, 1970 Search PubMed.
J. O. Hirschfelder, C. F. Curtiss and R. B. Bird, The Molecular Theory of Gases and Liquids, John Wiley & Sons, Inc, New York, 1964 Search PubMed.
J.-L. Bretonnet, Self-diffusion coefficient of dense fluids from the pair correlation function, J. Chem. Phys., 2002, 117, 9370–9373 CrossRef CAS.
M. Hopp, J. Mele and J. Gross, Self-Diffusion Coefficients from Entropy Scaling Using the PCP-SAFT Equation of State, Ind. Eng. Chem. Res., 2018, 57, 12942–12950 CrossRef CAS.
A. Dehlouz, J.-N. Jaubert, G. Galliero, M. Bonnissel and R. Privat, Entropy Scaling-Based Correlation for Estimating the Self-Diffusion Coefficients of Pure Fluids, Ind. Eng. Chem. Res., 2022, 61, 14033–14050 CrossRef CAS.
F. J. Carmona Esteva, Y. Zhang, K. Duncheskie, E. J. Maginn and Y. J. Colón, Excess entropy scaling explains the enhanced dynamics of the ionic liquid 1-ethyl-3-methylimidazolium chloride in external electric fields, Phys. Chem. Chem. Phys., 2026, 28, 353–364 RSC.
O. Suárez-Iglesias, I. Medina, M. de los Á. Sanz, C. Pizarro and J. L. Bueno, Self-Diffusion in Molecular Fluids and Noble Gases: Available Data, J. Chem. Eng. Data, 2015, 60, 2757–2817 CrossRef.
E. B. Winn, The Temperature Dependence of the Self-Diffusion Coefficients of Argon, Neon, Nitrogen, Oxygen, Carbon Dioxide, and Methane, Phys. Rev., 1950, 80, 1024–1027 CrossRef CAS.
A. Boushehri, J. Bzowski, J. Kestin and E. A. Mason, Equilibrium and Transport Properties of Eleven Polyatomic Gases At Low Density, J. Phys. Chem. Ref. Data, 1987, 16, 445–466 Search PubMed.
C. R. Mueller and R. W. Cahill, Mass Spectrometric Measurement of Diffusion Coefficients, J. Chem. Phys., 1964, 40, 651–654 CrossRef CAS.
H. F. Vugts, A. J. H. Boerboom and J. Los, Diffusion coefficients of isotopic methane mixtures and of methane-rare-gas mixtures, Physica, 1971, 51, 311–318 CrossRef CAS.
A. Greiner-Schmid, S. Wappmann, M. Has and H.-D. Lüdemann, Self-diffusion in the compressed fluid lower alkanes: Methane, ethane, and propane, J. Chem. Phys., 1991, 94, 5643–5649 CrossRef CAS.
K. R. Harris, The density dependence of the self-diffusion coefficient of methane at −50°, 25° and 50 °C, Phys. A, 1978, 94, 448–464 CrossRef.
K. R. Harris and N. J. Trappeniers, The density dependence of the self-diffusion coefficient of liquid methane, Phys. A, 1980, 104, 262–280 CrossRef.
M. Iwahashi, Y. Yamaguchi, Y. Ogura and M. Suzuki, Dynamical Structures of Normal Alkanes, Alcohols, and Fatty Acids in the Liquid State as Determined by Viscosity, Self-Diffusion Coefficient, Infrared Spectra, and 13CNMR Spin-Lattice Relaxation Time Measurements, Bull. Chem. Soc. Jpn., 1990, 63, 2154–2158 CrossRef CAS.
D. C. Douglass and D. W. McCall, Diffusion in Paraffin Hydrocarbons, J. Phys. Chem., 1958, 62, 1102–1107 CrossRef CAS.
D. W. McCall, D. C. Douglass and E. W. Anderson, Diffusion in Liquids, J. Chem. Phys., 1959, 31, 1555–1557 CrossRef CAS.
P. S. Tofts, D. Lloyd, C. A. Clark, G. J. Barker, G. J. M. Parker, P. McConville, C. Baldock and J. M. Pope, Test liquids for quantitative MRI measurements of self-diffusion coefficient in vivo, Magn. Reson. Med., 2000, 43, 368–374 CrossRef CAS PubMed.
M. Holz, S. R. Heil and A. Sacco, Temperature-dependent self-diffusion coefficients of water and six selected molecular liquids for calibration in accurate 1H NMR PFG measurements, Phys. Chem. Chem. Phys., 2000, 2, 4740–4742 RSC.
D. W. McCall, D. C. Douglass and E. W. Anderson, Self-Diffusion in Liquids: Paraffin Hydrocarbons, Phys. Fluids, 1959, 2, 87–91 CrossRef CAS.
R. Freer and J. N. Sherwood, Diffusion in organic liquids. Part 1.—Appraisal of a gel sectioning technique and its application to self-diffusion in benzene and cyclohexane, J. Chem. Soc., Faraday Trans. 1, 1980, 76, 1021 RSC.
R. Freer and J. N. Sherwood, Diffusion in organic liquids. Part 2.—Isotope-mass effects in self-diffusion in benzene and cyclohexane, J. Chem. Soc., Faraday Trans. 1, 1980, 76, 1030 RSC.
P. A. Johnson and A. L. Babb, Self-diffusion in.Liquids. 1. Concentration Dependence in Ideal and Non-ideal Binary Solutions, J. Phys. Chem., 1956, 60, 14–19 CrossRef CAS.
G. Guevara-Carrion, C. Nieto-Draghi, J. Vrabec and H. Hasse, Prediction of Transport Properties by Molecular Simulation: Methanol and Ethanol and Their Mixture, J. Phys. Chem. B, 2008, 112, 16664–16674 Search PubMed.
N. Karger, T. Vardag and H.-D. Lüdemann, Temperature dependence of self-diffusion in compressed monohydric alcohols, J. Chem. Phys., 1990, 93, 3437–3444 Search PubMed.
S. Meckl and M. D. Zeidler, Self-diffusion measurements of ethanol and propanol, Mol. Phys., 1988, 63, 85–95 CrossRef CAS.
X. Chen, R. Hu, H. Feng, L. Chen and H.-D. Lüdemann, Intradiffusion, Density, and Viscosity Studies in Binary Liquid Systems of Acetylacetone + Alkanols at 303.15 K, J. Chem. Eng. Data, 2012, 57, 2401–2408 CrossRef CAS.
K. P. Das, A. Ceglie and B. Lindman, Microstructure of formamide microemulsions from NMR self-diffusion measurements, J. Phys. Chem., 1987, 91, 2938–2946 CrossRef CAS.
N. Karger, S. Wappmann, N. Shaker-Gaafar and H.-D. Lüdemann, The p, T – dependence of self diffusion in liquid 1-, 2- and 3-pentanol, J. Mol. Liq., 1995, 64, 211–219 CrossRef CAS.
M. I. Hrovat and C. G. Wade, NMR pulsed-gradient diffusion measurements. I. Spin-echo stability and gradient calibration, J. Magn. Reson., 1969, 1981(44), 62–75 Search PubMed.
E. O. Stejskal and J. E. Tanner, Spin Diffusion Measurements: Spin Echoes in the Presence of a Time-Dependent Field Gradient, J. Chem. Phys., 1965, 42, 288–292 CrossRef CAS.
D. J. Tomlinson, Temperature dependent self-diffusion coefficient measurements of glycerol by the pulsed N.M.R. technique, Mol. Phys., 1973, 25, 735–738 Search PubMed.
I. Chang and H. Sillescu, Heterogeneity at the Glass Transition: Translational and Rotational Self-Diffusion, J. Phys. Chem. B, 1997, 101, 8794–8801 CrossRef CAS.
M. A. Awan and J. H. Dymond, Transport Properties of Nonelectrolyte Liquid Mixtures. XI. Mutual Diffusion Coefficients for Toluene + n-Hexane and Toluene + Acetonitrile at Temperatures from 273 to 348 K and at Pressures up to 25 MPa, Int. J. Thermophys., 2001, 22, 679–700 CrossRef CAS.
H. J. Parkhurst and J. Jonas, Dense liquids. I. The effect of density and temperature on self-diffusion of tetramethylsilane and benzene-d₆, J. Chem. Phys., 1975, 63, 2698–2704 Search PubMed.
M. A. McCool, A. F. Collings and L. A. Woolf, Pressure and temperature dependence of the self-diffusion of benzene, J. Chem. Soc., Faraday Trans. 1, 1972, 68, 1489 RSC.
A. F. Collings and L. A. Woolf, Self-diffusion in benzene under pressure, J. Chem. Soc., Faraday Trans. 1, 1975, 71, 2296 Search PubMed.
M. K. Mapes, S. F. Swallen and M. D. Ediger, Self-Diffusion of Supercooled o -Terphenyl near the Glass Transition Temperature, J. Phys. Chem. B, 2006, 110, 507–511 CrossRef CAS PubMed.
H. Walderhaug and K. D. Knudsen, Aqueous Mixtures of a Trisiloxane Surfactant and Oil Studied by SANS and NMR Self-diffusion: Effect of Temperature and Oil Concentration, J. Solution Chem., 2012, 41, 367–379 CrossRef CAS.
J. H. Wang, Self-Diffusion Coefficients of Water, J. Phys. Chem., 1965, 69, 4412 CrossRef CAS.
R. Malhotra, W. E. Price, L. A. Woolf and A. J. Easteal, Thermodynamic and transport properties of 1, 2-dichloroethane, Int. J. Thermophys., 1990, 11, 835–861 CrossRef CAS.
R. L. Hurle and L. A. Woolf, Self-diffusion in liquid acetonitrile under pressure, J. Chem. Soc., Faraday Trans. 1, 1982, 78, 2233 Search PubMed.
D. W. McCall, D. C. Douglass and E. W. Anderson, Self-Diffusion in Liquid Ammonia, Phys. Fluids, 1961, 4, 1317–1318 CrossRef CAS.
D. E. O’Reilly, E. M. Peterson and C. E. Scheie, Self-diffusion in liquid ammonia and deuteroammonia, J. Chem. Phys., 1973, 58, 4072–4075 CrossRef.
C. D’Agostino, M. D. Mantle, L. F. Gladden and G. D. Moggridge, Prediction of binary diffusion coefficients in non-ideal mixtures from NMR data: Hexane–nitrobenzene near its consolute point, Chem. Eng. Sci., 2011, 66, 3898–3906 CrossRef.
K. R. Harris, Temperature and density dependence of the self-diffusion coefficient of n-hexane from 223 to 333 K and up to 400 MPa, J. Chem. Soc., Faraday Trans. 1, 1982, 78, 2265 RSC.
A. Enninghorst, Density dependence of self-diffusion in liquid pentanes and pentane mixtures, Mol. Phys., 1996, 88, 437–452 CrossRef CAS.
H. S. Sandhu, Coefficient of self-diffusion in liquids using pulsed NMR techniques, J. Magn. Reson., 1969, 1975(17), 34–40 Search PubMed.
L. Chen, T. Groß and H.-D. Lüdemann, T,p-Dependence of Self-Diffusion in the Lower N-methylsubstituted Amides, Z. Phys. Chem., 2000, 214, 239, DOI:10.1524/zpch.2000.214.2.239.
A. Heinrich-Schramm, W. E. Price and H.-D. Lüdemann, Self-Diffusion in Compressed Dimethylether: The Influence of Dipole-Dipole Interaction and Hydrogen Bonding Upon Translational Diffusivity in Simple Fluids, Z. Naturforsch., A:Phys. Sci., 1995, 50, 145–148 CrossRef CAS.
H. S. Sandhu, Self-diffusion Measurements in Pure Liquids using Spin Echoes, Can. J. Phys., 1971, 49, 1069–1072 CrossRef CAS.
F. X. Prielmeier and H.-D. Lüdemann, Self diffusion in compressed liquid chloromethane, dichloromethane and trichloromethane, Mol. Phys., 1986, 58, 593–604 Search PubMed.
R. E. Rathbun and A. L. Babb, Self-diffusion in liquids. iii. temperature dependence in pure liquids 1, J. Phys. Chem., 1961, 65, 1072–1074 CrossRef CAS.
M. Has and H.-D. Lüdemann, Self Diffusion in Compressed Fluid CF 3 Brand CF 3 Cl, Z. Naturforsch., A:Phys. Sci., 1989, 44, 1210–1214 Search PubMed.
H. Ertl and F. A. L. Dullien, Self-diffusion and viscosity of some liquids as a function of temperature, AIChE J., 1973, 19, 1215–1223 CrossRef CAS.
L. Chen, T. Groß and H.-D. Lüdemann, The density dependence of self-diffusion in some simple amines, Phys. Chem. Chem. Phys., 1999, 1, 3503–3508 RSC.
M. Holz, X. Mao, D. Seiferling and A. Sacco, Experimental study of dynamic isotope effects in molecular liquids: Detection of translation-rotation coupling, J. Chem. Phys., 1996, 104, 669–679 CrossRef CAS.
M. N. Rodnikova, Z. S. Idiyatullin and I. A. Solonina, Mobility of molecules of liquid diols in the temperature range of 303–318 K, Russ. J. Phys. Chem. A, 2014, 88, 1442–1444 CrossRef CAS.
M. Kempka, B. Peplińska and Z. Pajak, Anisotropy of Translational Diffusion in Liquid α,ω-Dibromoalkanes, Ber. Bunsen-Ges. Phys. Chem., 1988, 92, 686–689 CrossRef CAS.
R. Fingerhut, W.-L. Chen, A. Schedemann, W. Cordes, J. Rarey, C.-M. Hsieh, J. Vrabec and S.-T. Lin, Comprehensive Assessment of COSMO-SAC Models for Predictions of Fluid-Phase Equilibria, Ind. Eng. Chem. Res., 2017, 56, 9868–9884 CrossRef CAS.
J. Gross, An equation-of-state contribution for polar components: Quadrupolar molecules, AIChE J., 2005, 51, 2556–2568 CrossRef CAS.
J. Gross and J. Vrabec, An equation-of-state contribution for polar components: Dipolar molecules, AIChE J., 2006, 52, 1194–1204 CrossRef CAS.
J. Gross and G. Sadowski, Perturbed-Chain SAFT: An Equation of State Based on a Perturbation Theory for Chain Molecules, Ind. Eng. Chem. Res., 2001, 40, 1244–1260 Search PubMed.
J. Gross and G. Sadowski, Application of the Perturbed-Chain SAFT Equation of State to Associating Systems, Ind. Eng. Chem. Res., 2002, 41, 5510–5515 CrossRef CAS.

Click here to see how this site uses Cookies. View our privacy policy here.