Open Access Article
M. Saud Ul Hassan
,
Kashif Liaqat
* and
Laura Schaefer
Energy Systems Lab, Dept. of Mechanical Engineering, Rice University, Houston, TX 77005, USA. E-mail: Kashif.Liaqat@rice.edu; ms18ig@my.fsu.edu; Laura.Schaefer@rice.edu
First published on 11th September 2025
The alarming increase in global warming, primarily driven by the rising CO2 concentration in the atmosphere, has spurred the need for technological solutions to reduce CO2 concentrations. One widely successful approach is geological sequestration, which involves pressurizing and injecting CO2 into underground rock formations. Saline aquifers, containing saltwater, are often used for this purpose due to their large storage capacity and broad availability. However, to optimize CO2 storage and reduce the risk of gas leakage, it is essential to account for capillary forces and the interfacial tension (IFT) between CO2 and brine within the formation. Traditional methods for characterizing CO2-brine IFT in saline aquifers, both experimental and theoretical, are well-documented in the literature. Experimental methods, though accurate, are labor-intensive, time-consuming, and require expensive equipment, while theoretical approaches rely on idealized models and computationally demanding simulations. Recently, machine learning (ML) techniques have emerged as a promising alternative for IFT characterization. These techniques allow models of CO2-brine IFT to be automatically “learned” from data using optimization algorithms. The literature suggests that ML can achieve superior accuracy compared to traditional theoretical methods. However, in its current state, the literature lacks a comprehensive review of these emerging methods. This work addresses that gap by offering an in-depth survey of existing machine learning techniques for IFT characterization in saline aquifers, while also introducing novel, unexplored approaches to inspire future advancements. Our comparative analysis shows that simpler ML models, such as ensemble tree-based models and small multi-layer perceptrons, may be the most accurate and practical for estimating CO2-brine IFT in saline aquifers.
Environmental significanceAddressing climate change requires effective carbon capture and storage (CCS), with geological sequestration in saline aquifers offering high potential. A key factor in CCS success is understanding interfacial tension (IFT) between CO2 and brine, which affects storage efficiency and leakage risk. Traditional IFT methods are complex and resource-heavy. This study explores machine learning (ML) as a scalable, data-driven alternative for IFT prediction. Beyond performance, it examines why ML models work, their limitations, and future challenges. By enhancing IFT modeling, this work promotes safer, more efficient CO2 storage and advances global sustainability through better climate mitigation strategies, uniting data science with environmental action. |
To optimally utilize the CO2 storage capacity of saline aquifers as well as reduce the danger of CO2 leakage out of them, which can be damaging to the environment and harmful to animal and human life,11 one must consider the capillary forces at play.12,13 These forces are mainly governed by the interfacial tension (IFT) between CO2 and the host fluid, brine.13–15 Accurate characterization of CO2-brine interfacial tension in saline aquifers is thus of prime importance to the success of geologic sequestration projects.
The methods presented in the literature for characterizing CO2-brine IFT can broadly be classified into three categories: experimental, theoretical, and data-driven methods. The most widely used experimental methods to measure the CO2-brine IFT are the pendant drop method16,17 and the capillary rising method.18 The pendant drop method measures surface or interfacial tension by analyzing the shape of a drop hanging from a needle in a surrounding fluid. The droplet's profile, captured as a shadow image, is used in drop shape analysis to calculate tension based on the balance between gravity and interfacial forces. The capillary rise method determines IFT by observing the height to which a liquid rises or falls in a narrow tube when in contact with another fluid. The balance between adhesive and cohesive forces allows calculation of IFT. The experimental procedures, though inherently accurate, are prone to inaccuracies introduced through measurement noise and experimental errors. Additionally, they can be time-consuming to carry out, require expensive equipment, and demand extensive experience with the equipment.14,19 The theoretical approaches,20–23 on the other hand, are mainly based on molecular dynamics models,24 often demanding idealized conditions that are rarely satisfied closely in practice. Furthermore, they rely on computer simulations that introduce numerical inaccuracies.14,19
While experimental and theoretical methods have been invaluable to date for understanding CO2-brine IFT, their practical application in large-scale carbon storage projects remains challenging. As geologic carbon sequestration projects expand globally, there is a growing need for predictive tools that are both accurate and scalable across diverse geologic settings for carbon capture and storage.25–27 Machine learning (ML) offers a compelling alternative.28–30 By leveraging existing experimental and simulation datasets, ML models can capture complex, nonlinear relationships between fluid properties, environmental conditions, and IFT, without requiring explicit physical simplifications. ML approaches can be rapidly retrained with new data, adapted to different brine compositions, and deployed for real-time prediction, making them attractive for both research and operational decision-making. Over the past decade, interest in applying ML to CO2-brine IFT prediction has grown considerably, producing a scattered body of work across multiple disciplines. However, no comprehensive review currently exists to summarize these developments, compare methodologies, or identify open challenges. This article addresses that gap by providing the first systematic survey of ML-based approaches for IFT characterization in saline aquifers, alongside a comparative analysis to guide future research.
In Section 2, we introduce the problem of modeling CO2-brine interfacial tension in saline aquifers using data-based methods, aiming to formulate the problem in a way that accommodates various machine learning techniques. Building on this formulation, we then present an overview of various machine learning models for IFT characterization, as reviewed in Section 3 and summarized in Fig. 1. We also present novel approaches for modeling IFT as a time series using state-of-the-art sequential machine learning models, which present promising directions for future research. Throughout this section, we strive to establish a standardized mathematical notation for describing the different models, as the absence of a standardized approach in the existing literature has resulted in works that are difficult to compare. In Section 4, we provide a brief overview of how IFT relates to different physical parameters. Finally, in Section 5, we critique the current state of the literature and propose recommendations to incorporate into future research in the field.
as the feature space, where d is the number of features, also called the dimension of the feature space.
The CO2-brine interfacial tension in saline aquifers may generally be modeled as
.† Though this function is not known analytically, one may experimentally obtain data:
are picked independently such that
and yn = f(xn) + εn, to characterize the input–output behaviour of f under noise
, and subsequently employ machine learning algorithms to construct an approximation
to f from a hypothesis set
using
(Fig. 2).‡ In the following, we formally present the machine learning algorithms for CO2-brine IFT characterization reviewed in Section 3 using this problem formulation. Additionally, we introduce advanced methods for sequential data processing, which can be adapted for CO2-brine IFT modeling through a straightforward reformulation of the problem, as we later demonstrate.
![]() | ||
Fig. 2 A depiction of linear hypotheses in one-dimensional (d = 1) and two-dimensional (d = 2) feature spaces.33 Each black circle represents a data-point in , and the vertical lines depict the error ‖h(xn) − yn‖2. The linear regression algorithm designs the hyperplane to be such that the error is minimal on average. | ||
Formally, the hypothesis class of linear estimators,
, is the set of hyperplanes in
with a surface normal aT and offset b from the origin:
This set can be represented more succinctly by embedding
in
as
. Then:
passing through the origin, is equivalent to
, in that a hypothesis
can be converted to a equivalent hypothesis
, where a = (a1, …, ad), by choosing the surface normal as ã = (a1,…,ad, b). We will use both these representations of the linear hypotheses set in the exposition below – covering linear regression, ridge regression, and support vector machine regression – depending on whichever is mathematically convenient.
that minimizes the error:Here, ‖.‖2 is the
norm, and
represents the dataset
embedded in
, where
:
Using standard matrix calculus, the hypothesis
that minimizes Ein can be found to be given by
lr(
) = ãlrT
, where ãlr is given by the roots of ∇ãEin:38
ãlr = ( T )−1 Ty, |
T
is assumed to be invertible; that is, det
T
≠ 0. Linear regression is one of the few machine learning algorithms where the analytical formula describing the optimal hypothesis is known:
lr(
) = ãlrT
.33 It is important to realize that this hypothesis is optimal with respect to the in-sample error, Ein, while what is of interest is the out-of-sample error, Eout, which is a proxy for how well the model would generalize to real-world data. However, Eout cannot be computed since f is unknown, and thus one has no recourse but to work with Ein. This is a theme common to all machine learning methods, but some methods, particularly linear estimators, are special in that we can often find exact bounds on the out-of-sample error for the optimal hypothesis.
T
being invertible. If
T
is singular, the optimal hypothesis
lr is undefined. An ad-hoc solution to this problem is to define the optimal hypothesis as
rr(
) = ãrrT
, whereThis technique is called ridge regression, where λ, called the regularization rate, is a hyperparameter, i.e., a parameter whose value is either determined through trial and error or by using a meta-optimization scheme, such as Particle Swarm Optimization (PSO).39 Note that ridge regression is also more numerically stable than linear regression.§ However, unlike linear regression, ridge regression is a biased estimator.40 This increase in bias is counterbalanced by a reduction in variance – a phenomenon known as the bias-variance tradeoff38,41 – making ridge regression less prone to the problem of overfitting, wherein a model fits the dataset too closely, thus compromising how well it fits the desired function f.42
of all data points into two half-spaces. Here, margin is defined as the distance of a hyperplane to the data point(s) closest to it (called support vectors), and one can ascertain that it is inversely related to the flatness of the hyperplane.44 SVMs can be extended to regression problems,45 where the goal becomes to find a hyperplane
that is as flat as possible and does not deviate from the targets yn by more than ε.46 However, as is, the problem can be infeasible if no function
exists that approximates all points
to within ε. In order to address this issue, the constraints are made “soft” through the introduction of slack variables ξn+ and ξn−,46 thus yielding the following constrained optimization problem:
controls the ε-insensitivity, i.e., the degree to which one is willing to allow data points to fall outside the ε allowance (Fig. 3 – left). For details on how to solve this problem using quadratic programming, refer to ref. 46 and 47.
While previous research has explored nonlinear methods for CO2-brine IFT estimation in saline aquifers, the coverage has been restricted. In particular, advanced neural estimators like convolutional neural networks, recurrent neural networks, and transformers, which have demonstrated cutting-edge performance in time-series prediction tasks across diverse scientific and engineering domains, remain untapped in the context of CO2-brine IFT prediction in saline aquifers. In the following sections, we not only formally present the nonlinear estimators previously applied in the literature but also present some yet-to-be-applied modern time series prediction methods.
is by transforming the input space
through a non-linear transform
and constructing a hypotheses set over:
38,44
is commonly referred to as the feature space; and since it is related to the input space,
, through a non-linear mapping, employing linear estimators in the feature space gives non-linear hypotheses in the input space (Fig. 3).33Designing linear estimators in the feature space is computationally expensive if it is high-dimensional, as is usually the case.¶ Take, for example, ridge regression in a feature space
of dimension d′. To compute the estimator, one must solve the linear system arr = (ZTZ + λI)−1ZTy, which requires
operations. In such cases, the dual solution can often be cheaper to compute. For example, the dual solution to ridge regression is arr = (ZZT + λI)−1y, which requires
operations, where N is the number of data points. Thus, for N < d′, the dual solution is computationally cheaper. Additionally, the entries 〈Ψ(xi), Ψ(xj)〉 of the matrix ZZT, known as the Gram matrix, are inner products, which can often be computed as a direct function of the inputs, thus further reducing the computational cost.48
Kernel methods capitalize on this property by defining a function
, called a kernel, that enables direct computation of these inner products. For instance, the quadratic transform
leads to the quadratic kernel K(zi, zj) = 〈zi, zj〉2. However, it is not necessary to explicitly define the transformation Ψ; one can specify the kernel function K directly, provided that K meets Mercer's conditions.49 For example,
using a single rule. However, instead of using a complicated and difficult-to-interpret model to describe all of
, an alternative is to partition
into subsets, and then partition those subsets into further subsets, and keep going in this fashion until the subsets are small enough that they can be described by simple models. Decision trees are machine learning methods that follow this approach to data modeling.51Typically, decision trees are constructed in a top-down manner using a recursive partitioning strategy,52 where the input space is greedily divided into subsets.53 This process continues until the resulting subsets are sufficiently small to be represented by constant values.
While decision trees are expressive and powerful, they are also highly prone to overfitting. To control the degree of overfitting, one usually constrains the maximum depth of the tree (called top-down pruning) or removes leaves from the tree after it has been built (called bottom-up pruning). Another common strategy is to use an ensemble learning method,54 such as bagging or boosting.
Formally, bagging is an ensemble learning method where M decision trees are constructed on independent subsets of the dataset
, and at inference time, the final prediction on a given input
,
(x), is obtained by averaging the predictions
m(x) from the individual trees:
Note that even though bagged trees are grown on independent subsets of the dataset, the inputs to the trees can still be correlated, reducing the benefit that bagging brings. The Random forests approach aims to address this problem by employing bagging along with feature bagging, wherein each tree is constructed on a subset of the dataset using only a subset of the possible feature splits.38
| Γ(x;Θ) = ALΦ(AL−1(…Φ(A1x + b1)…) + bL−1) + bL |
,
, and
is a non-linear function, often referred to as the activation function.57 Common choices for the activation function include the rectified linear unit (ReLU), wavelet basis functions, and radial basis functions.In neural networks literature, x is called the input layer, and each application of
is called a hidden layer, where
and
for
and n0 = d, are called the parameters or weights of the
-th layer. A layer is thought of as made of neurons (Fig. 4), and the output
of a layer is called a neural activation. It is common to define the depth of a network as the number of layers, L, and its size as the total number of neurons in those layers:
.58
![]() | ||
Fig. 4 (A) Visualization of a neural network with three layers: one input layer, and two hidden layers. To keep the diagram simple, the connections are shown as undirected, and the input is depicted for d = 2. The white circles depict neurons, where each neuron in the hidden layers is an affine mapping followed by a nonlinearity Ψ. The grey circle is the output. Though a typical neural network can have several outputs, we only require a single output (nL = 1), that is, the interfacial tension. (B) An example hypothesis of a single-output ReLU neural network visualized as piece-wise continuous linear function over .55 | ||
The hypotheses set of neural networks is very expressive: under minor conditions on the activation function Φ, it can be shown that every continuous function
on a compact set
can be arbitrarily well-approximated by a fixed-size neural network.59 One can pose the task of approximating the IFT function, f, from the dataset
, using a neural network Γ as finding parameters Θ such that:
, known as the loss function, is composed of the in-sample loss
, measuring how well a given neural network parameterization
approximates f over
, and a regularization term
, modulated by a hyperparameter
, which constrains the possible parameterizations of Γ.58
It is important to point out here that even if
is a “simple” function of Γ, obtaining an analytical expression for
is generally not possible. Instead, one approximates Θ* as Θt using an iterative procedure, such as gradient descent:60
, identifies a step in the algorithm's execution, and
is a hyperparameter, called the step size.|| In practice, Θ0 is chosen to be random non-zero values or set using a weight initialization scheme,61 and
is computed using the backpropagation algorithm,62 which automatically computes gradients using an efficient graph-based implementation of the chain rule. However, note that
is evaluated over all data points
, which might be computationally expensive. Therefore, it is common to use a variation of gradient descent called stochastic gradient descent,63 where
is divided into a set of batches, and
is computed only over one batch in a given gradient descent step. It can thus take several steps to update Θ over all of
, and once that happens, an epoch of training is said to have been completed.58 It usually takes several epochs of training to get a good approximation to Θ*.
Neural networks have seen wide success in tasks such as computer vision,64,65 natural language understanding,66,67 and reasoning and control.68,69 This success has mainly been driven by an increase in compute power70 and the availability of large datasets,71–74 which have made training deep networks possible,75 thus starting the field that is now known as deep learning.76 Over the years, a number of improvements to the training of deep networks have been proposed and adopted, e.g., gradient descent with momentum,77 adaptive gradient descent,78 and regularization techniques.79 Also, different variations to the base neural network architecture, commonly known as multi-layer perceptron (MLP) or fully connected network (FCN), have been proposed and adopted to better tackle practical problems.80–82 While a complete review of these various neural net architectures is beyond the scope of this paper, a brief introduction to three particularly important neural architectures follows.
CNNs owe their design to the animal visual cortex,84 and have become a foundation of modern computer vision.85 Apart from that, CNNs, particularly 1-dimensional CNNs,86 where p = m, i.e.,
, have also found immense application in modeling time series data.87 Since the CO2-brine IFT function f is implicitly a function of time, one can create a sequence of points
such that
represents the sample obtained at (normalized) time t. The dataset
casted as a time series
can thus be used to model f as a 1D-CNN. However, to the best of our knowledge, no prior work exists investigating this approach.
| h(t) = ζ(x(t), h(t−1); Θ), |
is called the hidden state of the RNN (at time step t). Various choices are available for the function ζ; one being:| ζ(x(t), h(t−1); Θ) = tanh(Ahxx(t) + Ahhh(t−1) + bh), |
make up the parameters Θ of the network. Recurrent networks are trained using backpropagation through time, which works by unrolling the compute graph through time (Fig. 6) and computing gradients using backpropagation.89 However, backpropagation through the above choice of ζ is numerically unstable, leading to gradients exploding and/or vanishing as they flow to earlier time steps, which hinders the modeling of long-term dependencies.90 One way to address the problem is to define ζ as the long short-term memory (LSTM) cell:91| ζ(x(t), h(t−1); Θ) = tanhc(t) ⊙ o(t), |
,
, and
, define the parameters of the network, and
is the softmax function, defined such that:RNNs have enjoyed immense success in processing sequential data.92,93 One can train an RNN on the sequence
defining the IFT data as well, for example, by formulating the loss at each time step t to be
, where ŷ(t) = Aζ(x(t), h(t−1); Θ) + bς, for
and
, and
. However, to the best of our knowledge, this line of work remains unexplored in the literature.
, represented as rows of a matrix
, and computes a weighted representation of the inputs:67
,
and
are called queries, keys, and values, respectively.80 Intuitively, the attention module produces a representation of each input u(i) by weighing the value (vjT) of each input u(j) towards the input u(i) according to how much the j-th input's key (kjT) matches the i-th input's query (qiT). This may be viewed as a mechanism for allowing the network to selectively focus on the inputs.By doing away with recurrent processing in favor of attention modules, transformers allow for faster training through parallelization. They also support a much better gradient flow through their compute graph, thus leading to better learning of long-term dependencies.67 However, despite the revolution that transformers have brought about in neural computing, the authors have not come across any publication featuring transformers to model CO2-brine IFT.
| Reference | Year | Dataset size | Inputs | Method | Split | MAPE | RMSE | R2 |
|---|---|---|---|---|---|---|---|---|
| a They started with 2517 samples but reduced the sample space down to 2346 samples have removing inconsistent sample points. | ||||||||
| Li et al.102 | 2014 | 1716 samples | Pressure | Correlation | n.a. | 7.81% | 4.51 | 0.857 |
| Temperature | ||||||||
| N2 molarity | ||||||||
| CH4 molarity | ||||||||
| Na+, K+ molality | ||||||||
| Ca2+, Mg2+ molality | ||||||||
| Zhang et al.100 | 2016 | 1716 samples | Pressure | Multi-layer perceptron | Train | 2.58% | 1.47 | 0.985 |
| Temperature | ||||||||
| 70% train | N2 molarity | Val | 2.52% | 1.34 | 0.985 | |||
| 15% val | CH4 molarity | |||||||
| 15% test | Na+, K+ molality | Test | 3.39% | 2.04 | 0.970 | |||
| Ca2+, Mg2+ molality | ||||||||
| Niroomand-Toomaj et al.103 | 2017 | 378 samples | Pressure | Radial basis function network | Train | 2.10% | 1.139 | 0.994 |
| 80% train | Temperature | Val | n.a. | n.a. | n.a. | |||
| 20% test | Brine salinity | Test | 3.36% | 1.567 | 0.986 | |||
| Partovi et al.104 | 2017 | 1716 samples | Pressure | Radial basis function network w/particle swarm optimization (PSO) | Whole | 2.07% | 1.400 | 0.986 |
| Temperature | ||||||||
| N2 molarity | ||||||||
| 80% train | CH4 molarity | Adaptive neuro fuzzy inference system w/subtractive clustring | Whole | 1.96% | 1.240 | 0.989 | ||
| 20% test | Na+, K+ molality | |||||||
| Ca2+, Mg2+ molality | ||||||||
| Rashid et al.105 | 2017 | 1019 samples | Pressure | Least squares support vector machine w/coupled simulated annealing (CSA) | Train | 2.08% | 1.354 | 0.986 |
| Temperature | Val | 4.03% | 2.472 | 0.950 | ||||
| Salinity | Test | 3.54% | 2.425 | 0.941 | ||||
| 75% train | Pressure | Least squares support vector machine w/coupled simulated annealing (CSA) | Train | 2.20% | n.a. | 0.986 | ||
| 75% val | Temperature | Val | 4.03% | n.a. | 0.958 | |||
| 20% test | Na+, K+ concentration | Test | 3.21% | n.a. | 0.964 | |||
| Ca2+, Mg2+ concentration | ||||||||
| SO42− concentration | ||||||||
| HCO3− concentration | ||||||||
| Kamari et al.106 | 2017 | 1716 samples | Pressure | Decision trees | Whole | 3.15% | n.a. | 0.978 |
| Temperature | ||||||||
| 80% train | N2 molarity | Least squares support vector machine | Whole | 3.75% | n.a. | 0.972 | ||
| 20% test | CH4 molarity | |||||||
| Na+, K+ molality | Gene Expression Programming107 | Whole | 9.30% | n.a. | 0.640 | |||
| Ca2+, Mg2+ molality | ||||||||
| Dehaghani Saeedi et al.108 | 2019 | 378 samples | Pressure | Stochastic gradient boosting | Train | 0.30% | 0.179 | 1.000 |
| 80% train | Temperature | Test | 1.64% | 0.947 | 0.995 | |||
| 20% test | Salinity | |||||||
| Amooie et al.13 | 2019 | 2517 samples | Pressure | Least squares support vector machine w/CSA | Train | 5.13% | 2.51 | n.a. |
| Test | 5.32% | 2.71 | n.a. | |||||
| Temperature | Radial basis function network w/PSO | Train | 6.19% | 2.85 | n.a. | |||
| Test | 6.68% | 3.38 | n.a. | |||||
| 80% train | Na+, K+ molality | Multi-layer perceptron w/Levenberg–Marquardt | Train | 3.02% | 1.66 | n.a. | ||
| Test | 3.79% | 2.09 | n.a. | |||||
| Ca2+, Mg2+ molality | Multi-layer perceptron w/Bayesian regularization | Train | 3.16% | 1.72 | n.a. | |||
| Test | 3.87% | 2.09 | n.a. | |||||
| 20% test | Critical temperature of gaseous mixture | Multi-layer perceptron w/scaled conjugate gradient | Train | 3.56% | 1.83 | n.a. | ||
| Test | 4.66% | 2.41 | n.a. | |||||
| Multi-layer perceptron w/resilient backpropagation | Train | 5.00% | 2.30 | n.a. | ||||
| Test | 5.17% | 2.61 | n.a. | |||||
| Committee machine intelligent system | Train | 2.99% | 1.69 | n.a. | ||||
| Test | 3.35% | 1.75 | n.a. | |||||
| Group method of data handling | Train | 8.53% | 3.81 | n.a. | ||||
| Test | 8.32% | 3.80 | n.a. | |||||
| Zhang et al.14 | 2020 | 2346 samplesa | Pressure | Gradient boosted trees w/shrinkage strategy & column subsampling | Train | 0.32% | 0.04 | 1 |
| Temperature | ||||||||
| 80% train | N2 molarity | |||||||
| CH4 molarity | Test | 1.71% | 0.95 | 0.993 | ||||
| 20% test | Na+, K+ molality | |||||||
| Ca2+, Mg2+ molality | ||||||||
| Zhang et al.109 | 2020 | 1716 samples | Pressure | Gaussian process regression | Train | 0.21% | 0.14 | 1 |
| Test | 7.87% | 4.06 | 0.881 | |||||
| Temperature | Multi-layer perceptron | Train | 3.65% | 1.92 | 0.973 | |||
| Test | 4.48% | 2.22 | 0.967 | |||||
| 80% train | N2 molarity | Support vector machine | Train | 1.63% | 0.91 | 0.994 | ||
| Test | 4.16% | 2.24 | 0.967 | |||||
| CH4 molarity | Ridge regression w/radial basis function | Train | 4.63% | 2.38 | 0.959 | |||
| Test | 4.41% | 2.38 | 0.967 | |||||
| 20% test | Na+, K+ molality | Decision trees | Train | 0.05% | 0.1 | 1.000 | ||
| Test | 4.77% | 2.73 | 0.951 | |||||
| Ca2+, Mg2+ molality | Random forest | Train | 1.63% | 0.92 | 0.994 | |||
| Test | 4.17% | 2.26 | 0.967 | |||||
| Adaptive boosting (Adaboost) | Train | 2.19% | 1.06 | 0.992 | ||||
| Test | 4.43% | 2.28 | 0.966 | |||||
| Gradient boosted decision tree | Train | 1.29% | 0.72 | 0.996 | ||||
| Test | 2.61% | 1.4 | 0.987 | |||||
| Gradient boosting (XGBoost) | Train | 0.56% | 0.31 | 0.999 | ||||
| Test | 2.37% | 1.28 | 0.989 | |||||
| Hosseini et al.110 | 2020 | 1716 samples | Temperature | Multi-layer perceptron w/Levenberg–Marquardt | Train | 1.23% | 0.527 | 0.991 |
| Val | 2.53% | 1.057 | 0.971 | |||||
| 80% train | Pressure | Test | 1.55% | 0.786 | 0.986 | |||
| 20% test | Salinity | Multi-layer perceptron w/Bayesian regularization | Train | 1.15% | 0.483 | 0.994 | ||
| Val | 1.70% | 0.578 | 0.983 | |||||
| Test | 1.73% | 0.680 | 0.983 | |||||
| Brine density | Multi-layer perceptron w/scaled conjugate gradient | Train | 1.15% | 0.479 | 0.993 | |||
| Val | 1.91% | 1.832 | 0.971 | |||||
| Test | 2.03% | 0.859 | 0.983 | |||||
| CO2 density | Radial basis function w/differential evolution | Train | 1.11% | 0.500 | 0.991 | |||
| Val | 1.40% | 0.598 | 0.991 | |||||
| Test | 1.06% | 0.516 | 0.995 | |||||
| Radial basis function w/partical swarm optimization | Train | 0.94% | 0.478 | 0.993 | ||||
| Val | 1.73% | 0.610 | 0.989 | |||||
| Test | 1.35% | 0.511 | 0.988 | |||||
| Radial basis function w/farmland fertility algorithm (FFA) | Train | 1.12% | 0.543 | 0.991 | ||||
| Val | 0.96% | 0.455 | 0.991 | |||||
| Test | 1.10% | 0.544 | 0.993 | |||||
| Liu et al.19 | 2021 | 974 samples | Pressure | Multi-layer perceptron w/back propagation | Train | n.a. | 1.799 | 0.981 |
| Temperature | Test | n.a. | 1.954 | 0.926 | ||||
| 70% train | N2 molarity | Wavelet neural network (WNN) | Train | n.a. | 3.549 | 0.993 | ||
| 30% test | CH4 molarity | Test | n.a. | 3.837 | 0.976 | |||
| Na+, K+ molality | Radial basis function | Train | n.a. | 1.112 | 0.908 | |||
| Ca2+, Mg2+ | Test | n.a. | 76.04 | 35.2 | ||||
| Molality | Optimized wavelet neural network (I-WNN) | Train | n.a. | 2.689 | 0.958 | |||
| Test | n.a. | 2.782 | 0.952 | |||||
| Amar et al.111 | 2021 | 2346 samplesa | Pressure | Genetic programming for temperature ≤ 313.15 K | Train | n.a. | 4.806 | 0.943 |
| Temperature | Test | n.a. | 4.643 | 0.935 | ||||
| 80% train | N2 molarity | Genetic programming for temperature > 313.15 K | Train | n.a. | 2.193 | 0.960 | ||
| 20% test | CH4 molarity | Test | n.a. | 2.052 | 0.959 | |||
| Na+, K+ molality | Average of both models | Train | n.a. | 3.500 | 0.952 | |||
| Ca2+, Mg2+ molality | Test | n.a. | 3.348 | 0.947 | ||||
| Safaei-Farouji et al.112 | 2022 | 2184 samples | Pressure | Random forest | Train | 1.54% | 0.590 | 0.995 |
| Temperature | Test | 2.99% | 1.108 | 0.980 | ||||
| 80% train | N2 molarity | Gaussian process regression | Train | 1.56% | 0.557 | 0.994 | ||
| 20% test | CH4 molarity | Test | 3.64% | 1.282 | 0.970 | |||
| Na+, K+ molality | Radial basis function | Train | 5.05% | 1.828 | 0.951 | |||
| Ca2+, Mg2+ molality | Test | 5.01% | 1.894 | 0.946 | ||||
| Mouallem et al.113 | 2024 | 2896 samples | Pressure | Least-squares boosting | Train | 1.95% | 1.009 | 0.994 |
| Temperature | Test | 4.82% | 1.009 | 0.941 | ||||
| 80% train | Salinity of monovalentsalts (NaCl, KCl, Na2HCO3, Na2SO4) | Extreme gradient boosting | Train | 1.84% | 1.115 | 0.992 | ||
| 20% test | Test | 3.77% | 2.532 | 0.961 | ||||
| Salinty of bivalent salts (MgCl2, CaCl2, MgSO4) | Gradient boosting | Train | 0.93% | 0.719 | 0.997 | |||
| Test | 3.38% | 2.434 | 0.964 | |||||
| Impurities CH4 and N2 | Genetic programming | Train | 9.12% | 4.583 | 0.871 | |||
| Test | 9.02% | 4.284 | 0.886 | |||||
| Artificial neural network | Train | 5.86% | 3.124 | 0.939 | ||||
| Test | 8.99% | 3.124 | 0.921 | |||||
| Gaussian process regression | Train | 6.63% | 3.416 | 0.929 | ||||
| Test | 6.78% | 3.416 | 0.925 | |||||
| Vakili-Nezhaad et al.114 | 2024 | 549 samples | Pressure | Deep neural network w/Group method of data handling (GMDH) | Train | 1.3% | n.a. | 0.99 |
| Temperature | Test | 2.95% | n.a. | 0.97 | ||||
| 80% train | ||||||||
| 20% test | ||||||||
| Mutailipu et al.115 | 2024 | 1717 samples | Pressure | Particle swarm optimization random forest (PSO-RF) | Whole | 2.68% | 1.916 | 0.967 |
| Temperature | ||||||||
| 80% train | CH4 molarity | Improved gray wolf optimization random forest (IGWO-RF) | Whole | 2.70% | 1.900 | 0.969 | ||
| 10% val | N2 molarity | |||||||
| 10% test | Bulk carbon molecules | Sparrow search algorithm random forest (SSA-RF) | Whole | 2.67% | 1.900 | 0.969 | ||
| Microporous carbon | ||||||||
| Molecules | Bayesian optimization random forest (BO-RF) | Whole | 2.07% | 1.770 | 0.973 | |||
| Shen et al.116 | 2024 | 1716 samples | Pressure | Extreme gradient boosting (XGBoost) | Test | 2.31% | 1.24 | 0.989 |
| Temperature | ||||||||
| 80% train | CH4 molarity | Light gradient boosting machine (LightGBM) | Test | 2.45% | 1.32 | 0.987 | ||
| 20% test | N2 molarity | |||||||
| Monovalent molality | ||||||||
| Bivalent molality | Ensemble learning | Test | 2.01% | 1.11 | 0.991 | |||
| Density differences | ||||||||
| Nsiah Turkson et al.117 | 2024 | 1570 samples | Pressure | Gradient boosting (GradBoost) | Train | 0.20% | 0.126 | 1.000 |
| Temperature | Val | 2.51% | 1.400 | 0.986 | ||||
| 70% train | CH4 molarity | Test | 2.23% | 1.227 | 0.990 | |||
| 15% val | N2 molarity | Light gradient boosting machine (LightGBM) | Train | 0.91% | 0.504 | 0.998 | ||
| 15% test | Monovalent molality | Val | 3.00% | 1.979 | 0.971 | |||
| Bivalent molality | Test | 2.66% | 1.650 | 0.982 | ||||
| Li et al.118 | 2024 | 1823 samples | Pressure | Grey wolf optimizer-back propagation neural network | Whole | 3.38% | 1.682 | 0.971 |
| Temperature | ||||||||
| 80% train | CH4 molarity | Dung beetle optimizer-back propagation neural network | Whole | 3.35% | 1.678 | 0.972 | ||
| 20% test | N2 molarity | |||||||
| Monovalent molality | Particle swarm optimization – back propagation neural network (PSO-BPNN) | Whole | 3.61% | 1.899 | 0.962 | |||
| Bivalent molality | ||||||||
| Fan et al.119 | 2025 | 1716 samples | Pressure | Multibranch structure convolutional neural network (MBCNN) | Whole | 1.34% | 1.06 | 0.992 |
| Temperature | Train | 1.05% | 0.89 | 0.994 | ||||
| 80% train | Test | 2.49% | 1.54 | 0.982 | ||||
| 20% test | CH4 molarity | Random forest regressor (RFR) | Whole | 2.09% | 1.26 | 0.983 | ||
| Train | 1.63% | 0.92 | 0.994 | |||||
| N2 molarity | Test | 3.93% | 2.13 | 0.967 | ||||
| Monovalent molality | Gene expression programming (GEP) | Whole | 13.4% | 6.54 | 0.699 | |||
| Train | 13.5% | 6.62 | 0.695 | |||||
| Test | 13.0% | 6.22 | 0.715 | |||||
| Bivalent molality | Support vector regressor (SVR) | Whole | 13.6% | 7.56 | 0.597 | |||
| Train | 13.6% | 7.59 | 0.598 | |||||
| Test | 13.5% | 7.45 | 0.590 | |||||
| Liaqat et al.120 | 2025 | 1254 samples | Pressure | Linear regression (LR) | Train | 3.88% | 2.03 | 0.94 |
| Temperature | Test | 4.25% | 2.22 | 0.94 | ||||
| 80% train | NaCl molality | Support vector machine (SVM) | Train | 0.71% | 0.42 | 0.99 | ||
| 20% test | Test | 0.97% | 0.57 | 0.99 | ||||
| Decision tree regressor (DTR) | Train | 0.28% | 0.20 | 1.00 | ||||
| Test | 1.56% | 0.85 | 0.99 | |||||
| Random forest regressor (RFR) | Train | 0.57% | 0.30 | 1.00 | ||||
| Test | 1.16% | 0.62 | 0.99 | |||||
| Multilayer perceptron (MLP) | Train | 0.73% | 0.40 | 1.00 | ||||
| Test | 0.99% | 0.52 | 0.99 | |||||
In 2017,103 proposed a radial basis function network (RBFN) – essentially an MLP with RBF activations – for estimating IFT between CO2 and brine in saline aquifers. Their model, which includes three input neurons for pressure, temperature, and brine salinity, and three hidden layers with 80 neurons each, outperformed the tanh-based MLP proposed by.100 Despite being trained on a smaller dataset of 302 data points compared to the 1202 points used by,100 the RBFN demonstrated superior performance. However, it is important to note that103 utilized a different dataset than,100 which complicates direct comparisons since model performance is highly dataset-dependent. We speculate that the performance gain 103 achieved over100 is primarily from their use of a wider and deeper model, and less so from their use of the RBF activation over tanh.
Another RBFN model was proposed by104 in the same year, and though the performance of their model appears competitive, they only report the performance metrics aggregated over the whole dataset, and not individually for the train and test datasets, making it difficult to draw a proper comparison to other works. Along with a RBFN model,104 also proposes an adaptive neuro-fuzzy inference system (ANFIS),121 which is a hybrid of MLPs and fuzzy inference, and is useful to model complex systems. They employed Subtractive Clustering122 – a clustering algorithm to select representative data points (cluster centers) from the training data – to determine the membership functions in the fuzzy rule base of ANFIS. Based on the overall statistics provided in the paper, the ANFIS model performs better than the RBFN, as it builds on the strengths of both neural networks and fuzzy inference systems.
In 2017,105 used classical machine learning, particularly, Least-Squares Support Vector Machines (LSSVM)123 – a variant of regular Support Vector Machines (SVMs) that formulates the optimization problem as a least-squares problem, which is computationally cheaper to solve than a quadratic program – to model the interfacial tension of CO2-brine systems. To this end, they developed and analyzed two LSSVM models – one with three inputs and the other with eight, as described in Table 1 – and optimized their hyperparameters using an algorithm called Coupled Simulated Annealing (see ref. 124 and 125).†† Though the predictive performance of these LSSVM models does not measure up to MLPs, it should be noted that LSSVMs are generally faster to train and less data-intensive than neural networks. Additionally, they are more interpretable and faster in terms of inference time.106 also analyzed LSSVMs, along with other machine learning algorithms (namely decision trees and gene expression programming, see ref. 107), and their conclusion too remains that MLPs are more accurate than the classical techniques. However, their results show that decision trees also perform admirably, and as shown in the work of,108 decision trees in an ensemble can even outpace MLPs. In particular,108 used an ensemble of 2707 decision trees constructed using Stochastic Gradient Boosting,126 where they used 302 data points in the training process, in contrast to the 1372 training data points used by ref. 106.
Ref. 13, in 2019, conducted an extensive study on seven different machine learning models. They developed eight machine learning models in total: LSSVMs optimized with CSA; RBFNs optimized with PSO; MLPs with two hidden layers and sigmoid/tanh activations, optimized using Levenberg–Marquardt (LM),127 Bayesian Regularization (BR),128,129 Scaled Conjugate Gradient (SCG),130 and Resilient Backpropagation (RB)131 algorithms; and models based on Group Method of Data Handling (GMDH).132,133 GMDH is a self-organizing neural network that optimizes both structural and parametric aspects of the model. Each of these models were designed to take in the same set of five inputs – namely, pressure, temperature, molalities of Na+ and K+, molalities of Ca2+ and Mg2+, and the critical temperature of the mixture – and they were all trained on the same dataset of 2013 data samples. Based on the statistical performance reported in the paper, the authors ranked the models as follows: MLP-LM > MLP-BR > MLP-SCG > MLP-RB > LSSVM-CSA > RBF-PSO > GMDH. These results show that MLPs have the best predictive performance of all the models tested. Moreover,13 proposes a committee machine intelligence system (CMIS) – an ensemble that weights the top three performing models, namely, the MLPs optimized with LM, BR, and SCG algorithms. The ensemble aggregates the performance of the three MLPs and generally outdoes the individual MLPs. Later,14 took the same dataset as,13 and cleaned out any inconsistent data entries. They then trained an XGBoost model,134 an ensemble of gradient-boosted decision trees optimized for scalability and efficiency on large datasets, on the cleaned dataset, with hyperparameters optimized using 5-fold cross-validation (see ref. 135) with exhaustive grid search. In line with the findings of108 in regards to gradient-boosted decision tree ensembles,14 achieved remarkably low statistical error with their predictions, outperforming MLPs (Fig. 7).
![]() | ||
| Fig. 7 A comparison of methods reported in the literature. For consistency, only studies based on the 1716-sample dataset are included. Reported metrics correspond to test sets where available, or to the full dataset otherwise. Differences in feature sets, data processing, and optimization schemes contribute to some variability. Full details are provided in Table 1 and the accompanying text. | ||
In 2020, the works of ref. 110 and 109 sought to draw a detailed statistical picture of how various machine learning algorithms perform on the task of modeling the CO2-brine IFT. The former work compares MLPs and RBFNs optimized using various methods, and the latter compares a whole range of techniques, including MLPs, SVMs, ridge regression with RBF kernels (RR-RBF), decision trees, random forests, adaptive boosting, Gaussian process regression (GPR),136,137 and gradient-boosted trees. Results from ref. 110 were obtained against a dataset of 91 points only; however, they show that RBFNs optimized with either PSO, Differential Evolution (DE),138 or the Farmland Fertility Algorithm (FFA)139 perform better than MLPs optimized with LM, BR, and SCG. And results from ref. 109 show that MLPs come second only to gradient-boosted trees. Later in 2021,19 again conducted a comparative study of neural network models, including MLPs, RBFNs, and Wavelet Neural Networks (WNNs). They concluded that MLPs with sigmoid activations perform the best, and MLPs with wavelet activations (WNNs) perform the worst. The same year,111 proposed another classical learning technique for the problem – namely, genetic programming (GP).140 To that end, they divided the dataset into two subsets: one with data points where the temperature was less than or equal to 313.15 K, and the other with data points where the temperature was greater than 313.15 K. After creating the two subsets, they trained a separate model on each subset using genetic programming. However, as with most other non-ensemble classical techniques, the performance of their GP models does not stack up to the performance that neural networks with sigmoid activations have been shown to achieve. Another work comparing different machine learning methods for IFT was published in 2022 by ref. 112. They analyzed random forests (RF), GPR, and RBFNs, and reached the conclusion that random forests perform the best.
The application of AI to IFT prediction gained significant traction in 2024, reflected in the publication of six research papers: ref. 113, 114, 115, 116, 117 and 118. Ref. 113 uses multiple machine learning algorithms for IFT estimation, including Gradient Boosting, Extreme Gradient Boosting, Least Squares Boosting, Artificial Neural Networks, and Genetic Programming. Like most previous studies,113 employed six input features: pressure, temperature, the salinity of both monovalent (NaCl, KCl, Na2HCO3, Na2SO4) and bivalent salts (MgCl2, CaCl2, MgSO4), and the presence of impurities such as CH4 and N2. Among their models, the Gradient Boosting approach demonstrated the lowest MAPE (3.38%) for testing data, outperforming other models, whereas the Genetic Programming model exhibited the poorest performance. The ANN model achieved a relatively high MAPE of 8.99%, which is significantly higher compared to similar studies available in the literature. The discrepancy could be due to differences in the underlying dataset or a poor choice of hyperparameters. As a practical application of their models, the paper uses predicted IFT to determine the optimal storage depth for a real carbonate saline aquifer located onshore in the UAE.
Ref. 114 proposed a novel deep learning-based approach to estimate the IFT, specifically focusing on solutions containing divalent salts (MgCl2 and CaCl2), where GMDH was used to model IFT. The proposed GMDH-based model yielded a MAPE of 2.95% for test data, demonstrating high accuracy. A key advantage of the GMDH approach is its ability to optimize the network structure automatically, thus requiring less hyperparameter tuning.
Ref. 115 used an RF model coupled with a Bayesian Optimization algorithm (BO-RF) to predict IFT. The BO-RF model was compared against three other RF models, which were optimized using Sparrow Search Algorithm (SSA-RF), Particle Swarm Optimization (PSO-RF), and Improved Grey Wolf Optimization (IGWO-RF), respectively. Among these, the BO-RF model demonstrated the best performance, achieving a MAPE of 2.07% when evaluated on the entire dataset. The predicted IFT values were then utilized to determine the CO2 sequestration capacity of saline aquifers in the Tarim Basin of Xinjiang, China.
Ref. 116 introduced heterogeneous ensemble learning to predict IFT by combining XGBoost and Light Gradient Boosting Machine (LightGBM). The performance of the ensemble learning model was compared to the individual performances of XGBoost and LightGBM. The results showed that the ensemble learning model achieved a lower MAPE of 2.01%, compared to 2.31% for XGBoost and 2.45% for LightGBM.117 also investigated the use of Gradient Boosting and LightGBM with a slightly smaller dataset compared to ref. 116. The gradient boosting model achieved the best performance, reporting an error of 2.23% on the test data. While the gradient boosting model in this study outperformed the individual models of Shen et al., it still underperformed compared to the ensemble learning model proposed by ref. 116.
Ref. 118 introduced a dung beetle optimization-based backpropagation neural network (DBO-BPNN) for IFT modeling. The model's performance was compared to particle swarm optimization-based BPNN (PSO-BPNN) and grey wolf optimizer-based BPNN (GWO-BPNN). DBO-BPNN achieved the best accuracy, with an error of 3.35% on the whole dataset, outperforming PSO-BPNN, which had the next best performance with an error of 3.61%. However, despite its improved accuracy, DBO-BPNN has higher computational complexity and requires a larger dataset to perform optimally, making it less suitable for IFT applications. Moreover, previous studies have demonstrated that less complex models can achieve even better results.
So far in 2025, at the time of writing, we have identified two research papers published this year on AI-driven IFT modeling.119 introduced a multibranch convolutional neural network (MBCNN) for predicting CO2-brine IFT across varying temperature and pressure conditions. Unlike conventional single-branch machine learning models such as Random Forests and Support Vector Regression, their proposed MBCNN architecture integrates multiple convolutional layers and fully connected layers to capture inter-attribute relationships. While the MBCNN achieved a MAPE of 2.49%, outperforming the RF, GEP (Gene Expression Programming), and SVR models used for comparison in this study, previous research has demonstrated that simpler models can achieve similar performance under comparable conditions. For example,109 reported a MAPE of 2.37% using XGBoost, while116 achieved 2.31% with XGBoost and 2.01% with an ensemble learning approach combining XGBoost and LightGBM.
Deviating from the oft-used set of six input features, the work by120 in 2025 utilized three input features – temperature, pressure, and NaCl salinity – to predict IFT. The study explored a range of models, from simple linear regression to more complex architectures such as Multilayer Perceptron (MLP), striking a balance between accuracy and interpretability. Among the five models evaluated, Support Vector Machine (SVM) and MLP performed the best, achieving MAPE values of 0.97% and 0.99%, respectively, on the test data. These findings demonstrated that even relatively simple ML models with good data processing and hyperparameter tuning could accurately predict IFT, outperforming several complex models examined in previous studies.
From Table 1, several trends can be noticed. Among all ML algorithms surveyed, gradient-boosting consistently achieves the highest performance metric, such as high R2 scores, often outperforming more complex architectures such as deep neural networks in this domain. Support vector machines also show competitive performance, where they sometimes match or exceed the accuracy of gradient boosting. Among the models evaluated, gradient boosting variants (e.g., XGBoost, LightGBM) consistently show minimal signs of overfitting, with train and test metrics remaining very close in terms of R2, MAPE, and RMSE. In contrast, models such as Gaussian Process Regression and Decision Trees tend to exhibit larger discrepancies between training and testing performance, reflecting susceptibility to overfitting. Neural network–based models also show signs of overfitting in some cases.
Most studies, as seen in Table 1, focus on common input variables such as pressure, temperature, and bulk salinity. However, several important conditions remain underexplored. High-salinity brines rich in divalent ions (Ca2+, Mg2+, SO42−), which are typical of deep saline aquifers, are only sparsely represented. Similarly, datasets covering extreme pressures and temperatures relevant to supercritical CO2 storage are limited, reducing model generalizability. While some studies have included impurities such as CH4 and N2, other common impurities (e.g., H2S, O2) are rarely considered. Furthermore, most models rely solely on fluid-phase properties, leaving out potentially important features related to rock–fluid interactions, such as mineral composition and wettability.
Feature analysis not only provides interpretability by offering insights into both the model and the underlying physical process, but it also improves predictive performance by guiding feature selection. Well-chosen features eliminate redundancy and reduce the influence of irrelevant or highly correlated variables, which can otherwise introduce noise and degrade a model's generalization performance. Moreover, high-dimensional input spaces exacerbate variance and increase the risk of overfitting, particularly in data-constrained settings.
In practice, the sensitivity of a model to feature selection depends on its underlying learning mechanism. For example, Linear Regression and Least Squares models are especially vulnerable to irrelevant or collinear features, making manual feature screening and dimensionality reduction techniques such as Principal Component Analysis critical for stable and accurate predictions. SVMs are also sensitive to feature quality, as irrelevant or noisy features dilute the kernel similarity measure and reduce the model's ability to identify meaningful decision boundaries. RBFNs are particularly susceptible to the “curse of dimensionality,” since their performance depends on distance-based similarity; irrelevant or redundant features can therefore severely impair accuracy unless carefully pruned. Models based on non-linear architectures such as tree ensembles (e.g., Random Forests and Gradient Boosting) and deep neural networks are generally more robust to feature redundancy, as they can implicitly down-weight or ignore uninformative inputs. Nevertheless, even for these models, careful feature selection can improve accuracy, accelerate training, and mitigate overfitting—particularly when the dataset size is limited.
We use the most commonly used input features for IFT prediction to analyze and understand their influence. First, we plot the trend analysis of IFT with respect to each of the input features: pressure, temperature, monovalent cation molality, and bivalent cation molality. The dataset used for this analysis is obtained from the study by Li et al.118 Fig. 8 shows the trend analysis for each input feature. The IFT appears to decrease with increasing pressure. Regarding temperature, IFT increases until approximately 100 °C, after which it begins to decline. For the cations, while the overall trend suggests a direct relationship with increasing IFT, there is significant variation in IFT values for some cation concentrations. Bivalent cations show a more pronounced nonlinear effect at higher concentrations. The increase in IFT for monovalent cations tends to plateau, while for bivalent cations, it accelerates beyond 2 mol kg−1.
![]() | ||
| Fig. 8 IFT trend analysis with respect to different input features. A confidence interval (95%) around the trend line is also shown to indicate the spread of data. | ||
Fig. 9 presents the Pearson correlation coefficients for the same set of input features. This analysis indicates that pressure is the most influential feature affecting the ML model's predictions, followed by cation molality, with bivalent cations having a more dominant impact. Temperature has the least effect on IFT. The feature importance rankings observed here align with the findings reported in the literature using methods discussed earlier.13,80,118,120 While commonly used trend analysis and feature importance methods provide some insight into the underlying physics, the interpretability of ML-based IFT modeling remains a challenge. This will be explored in greater depth in the next section.
There is also a lack of a standardized, publicly available, high-quality, and expansive data set for interfacial tension that ML models can reliably be trained on. It's crucial to recognize that machine learning methods, particularly neural networks, are highly dependent on the quality and quantity of data available. We believe that efforts to collect a comprehensive, high-quality IFT dataset – at least for saline aquifers, but ideally covering multiple underground rock formations to leverage common patterns‡‡ – would significantly advance the adoption of modern machine learning techniques for modeling interfacial tension. As shown in Table 1, the available datasets are limited in size. For studies aimed at developing and testing novel ML models for IFT prediction, we recommend adopting the most widely used dataset, as employed in studies,100,102,104,106 as a baseline for standardized model performance comparison, and subsequently extending the analysis using larger datasets when available.
While simpler models have demonstrated strong generalization in data-limited scenarios, the potential of modern deep learning architectures for IFT prediction remains unexplored. If large, high-quality datasets with sufficient variability become available, more complex models—such as CNNs, RNNs, and transformers—could offer state-of-the-art performance, particularly when framing CO2-brine IFT prediction as a time-series modeling task. Given the increasing complexity and scale of CO2 storage projects, there is an urgent need to explore advanced architectures capable of capturing long-range dependencies and intricate parameter interactions in IFT prediction. Transformer-based models, originally developed for natural language processing, have demonstrated promising performance in diverse sequence modeling domains. Leveraging transformers with data availability and computational resources could accelerate predictive capabilities and reduce uncertainty in large-scale sequestration planning. Furthermore, modern techniques to aid the training of deep neural networks also remain unexplored. For example, ReLU activations have shown great promise in improving the performance of deep networks in a variety of tasks, yet, to the best of our knowledge, they remain unappreciated for the task of CO2-brine IFT modeling. Similarly, transfer learning,141–143 also known as domain adaptation, has proven revolutionary towards several applications concerning deep learning,66,144–147 and one can employ transfer learning for the problem at hand, too, where one would pre-train a network to model the general IFT function, and then adapt that network to model the IFT of CO2-brine systems. However, to date, there has been no work on this approach.
Due to variations in training and evaluation datasets, as well as a lack of information on computational efficiency—such as time and memory footprint—we refrain from making definitive claims about the best model in the reviewed literature. However, based on predictive performance metrics, it seems reasonable to suggest that simpler ML models, such as gradient-boosted decision trees and support vector machines, may be the most accurate and practical for estimating CO2-brine IFT in saline aquifers. While previous studies have explored advanced and complex neural network architectures, the currently available datasets appear to be a limiting factor, preventing these models from achieving more robust performance, thus giving an advantage to simpler ML approaches.
Nonetheless, we emphasize that multilayer perceptrons (MLPs) warrant further evaluation to fully assess their potential, as they have demonstrated state-of-the-art performance in similar tasks, often surpassing classical methods like decision trees. We believe that the MLPs reviewed in this study may have been constrained by their size, and that deeper MLP architectures trained on larger datasets could potentially yield even better results.
Future work could extend this review by conducting a meta-analysis on a substantially larger, standardized CO2-brine IFT dataset collected across diverse saline aquifer conditions. This dataset will enable the benchmarking of more advanced architectures such as transformers and physics-informed neural networks, which may capture complex, nonlinear relationships beyond the capabilities of current models. Lastly, we propose that efforts should be made by the research community to make source codes and datasets openly accessible. This would facilitate the practical adoption of the proposed methods and provide a foundation upon which future research can be more easily built.
| This journal is © The Royal Society of Chemistry 2025 |