Open Access Article
Isak Bengtsson
*a and
Patrik Johansson
abc
aDepartment of Physics, Chalmers University of Technology, SE-41296, Göteborg, Sweden. E-mail: isak.bengtsson@chalmers.se
bDepartment of Chemistry – Ångström, Uppsala University, SE-75237 Uppsala, Sweden
cALISTORE-ERI, FR CNRS 3104, Hub de l'Energie, 80039 Amiens, France
First published on 19th January 2026
Organic solvents and fluorinated Li-salts is the basis of lithium-ion battery electrolytes, and it has remained unchanged for decades despite significant drawbacks such as thermal instability and high vapour pressure. One alternative is ionic liquid (IL) based electrolytes. However, the mechanism(s) that govern ion transport in IL based electrolytes, a property crucial for battery performance, is not yet fully understood. We here suggest a novel approach to model the ionic conductivity of ILs themselves; using symbolic regression (SR) to find analytical expressions derived from free volume theory (FVT). Using molecular descriptors as model inputs, we find several FVT-based models that show high correlations: R2 = 0.97 and R2 = 0.94 for the training and validation set, respectively, for an experimental dataset of 22 ILs measured in-house. Moving towards a significantly larger dataset, using data on 338 ILs from 125 publications, we find that our best model has a significantly higher spread in prediction accuracy but still shows appreciable performance for many ILs (R2 = 0.76 and R2 = 0.73 for the training and validation set, respectively). Overall, the FVT derived models perform best for “good” ILs, i.e. with well-dissociated ions, and worse for those ILs with strong ion–ion interactions. Using data from many publications impacts model performance, likely due to significant variations in e.g. impurities and dryness, as well as experimental set-ups and conditions.
One alternative is to increase the Li-salt concentration to ca. 3–5 M, i.e. highly concentrated electrolytes (HCEs), or to resort to ionic liquid (IL) based electrolytes.7–9 These types of electrolytes have shown to offer several advantages, such as improved high voltage performance, suppressed dendrite growth when using lithium metal anodes, and overall a higher level of safety.10,11
Ionic interactions will have strong influence on the electrolyte properties and the ion transport in these types of electrolytes. In conventional, ca. 1 M Li-salt in organic solvent based electrolytes, ion transport is mainly governed by a vehicular mechanism,12 but in HCEs and IL-based electrolytes the solvation shells are less stable and not always even definable, why other conduction mechanisms come into play.13–16 Looking at IL-based electrolytes in particular, the mechanism(s) that govern ion transport is not yet fully understood. One hypothesis is that ions migrate by jumping between voids.17 This is supported by free volume theory (FVT), which has been used to describe transport phenomena in various glass formers, including ILs.18–20
According to FVT, voids appear in liquids due to thermal redistribution of a free volume vf. Given a critical volume
and a factor γ, accounting for overlapping voids, molecules diffuse when a nearby void reaches a critical value
. The diffusion coefficient of the molecules is related to the free volume with a temperature dependence according to
![]() | (1) |
Combining this with the Nernst–Einstein equation for conductivity:
![]() | (2) |
![]() | (3) |
, but the free volume vf will also be a function of temperature.
Experimentally and at a macroscopic level, the electrolyte is indeed commonly characterised mainly by its ionic conductivity, in many cases for the simplicity in contrast to any attempt to fully account of its ion transport properties. For ILs there are some rules of thumb that relate (ion) transport dynamics to the molecular structure of the ingoing ions, but with claims that over 1018 ILs exist it has proven difficult to find general relations.21 Considering how crucial adequate ion transport is for battery performance it is of considerable interest to understand structure–property relations in ILs, as well as all electrolytes, better. Here modelling approaches become indispensable.
Upon this base, we here aim to build models that relates the ionic conductivity to the molecular structures of the IL ions, as a stepping stone towards understanding the ion transport better and eventually constructing similar models for IL-based LIB electrolytes, i.e. Li-salt doped ILs.22–24 Previous modelling of ion transport of ILs can roughly be divided into two approaches; those that use molecular dynamics (MD) simulations and extract transport properties from the dynamics,25–28 and those that construct predictive, often analytical, models directly from a set of descriptors.29–32 Recently however, there has been a significant shift towards machine learning (ML) methods, with models based on support vector machines (SVMs), random forests (RFs) and many versions of neural networks (NNs).33–36 While NN approaches often generalise well and offer very accurate property predictions, they are generally not constrained by any physical laws and require large amounts of data. When training data is scarce, NNs are prone to overfitting, they may learn the training examples extremely well, yet fail to generalize to unseen data. Modern NNs are typically over-parameterized, giving them sufficient capacity to capture not only the meaningful structure of the problem but also incidental fluctuations, noise, or rare coincidences present in the training set. This primarily degrades their ability to interpolate reliably within the domain spanned by the available data. In contrast, difficulties with extrapolation arise even for well-regularized models, simply because they are asked to predict outside the region represented in the training set. Overfitted models can therefore appear highly accurate during training while failing to capture the genuine physical trends required for robust predictive performance. Physics-Informed NNs (PINNs) address these problems by incorporating known physics in the models, which has shown to improve predictive performance for out-of-distribution data.37,38 However, the mapping between input features and predictions remain complex. A viable modelling alternative, and the one we apply here, is to use symbolic regression (SR), a supervised ML method which tries to find mathematical expressions that best explain data i.e. analytical models that are directly interpretable and furthermore can be constrained to fulfil dimensional requirements and physical laws.
For the first dataset, all ILs were of the highest available purity (99.9% or 99.5%) and used as received. The dataset included cations from imidazolium, ammonium, pyrrolidinium and piperidinium families, combined with the anions [BF4]−, [PF6]−, [Tf]−, [FSI]− and [TFSI]−. For each IL, the ionic conductivity was measured in steps of 10 K from 298.15 to 368.15 K using dielectric spectroscopy, yielding 10−1–101 mS cm−1. For experimental details, see Nilsson-Hallen et al.39
The second dataset is more diverse, with data originating from 125 publications and ILs based on no less than 169 cations and 66 anions. Although a majority of the ILs were based on imidazolium and pyridinium cations, or other ring structures with nitrogen atom(s), the dataset also included ILs based on cations with different forms, such as phosphonium, cyclopropenium and sulfonium cations. For several ILs, there were more than one publication source; only the one with the lowest reported ionic conductivities was included, on the assumption that those correspond to the purest ILs. The ionic conductivities covered a much wider range, 10−5–102 mS cm−1 at temperatures between 210 and 571 K.
| Descriptor | Symbol | Unit |
|---|---|---|
| Temperature | T | K |
| van der Waals (vdW) volume | V | Å3 |
| Radius of gyration | R | Å |
| Inertial shape factor | Z | µ−1 Å−2 |
| Asphericity | A | N/A |
| Eccentricity | E | N/A |
| Spherocity index | ζ | N/A |
| Norm of dipole moment | μ | CÅ |
| Gini coefficient (of charge distribution) | G | N/A |
![]() | ||
| Fig. 1 A schematic illustration of an expression tree that computes the sum of two input variables and subtracts the exponent of a third one. | ||
The ultimate aim was to find generic functions that model the ionic conductivity as a function of temperature (T), a set of molecular descriptors (xi), and numerical constants (pi). To find expressions that are consistent with FVT and eqn (3), we constructed a template function:
![]() | (4) |
are allowed in the expressions, and we define a complexity C as the sum of the number of operators, variables and constant scalars used. To not render overly complicated expressions we set Cmax = 30.
Since the ionic conductivity has a strong temperature dependency, often across several orders of magnitude, it is beneficial to use a loss function that handles relative errors when computing the fitness of an expression. Therefore, we transformed our targets (y) and predictions (ŷ) to logspace before computing the root mean squared error (RMSE):
![]() | (5) |
To further promote useful and physically motivated models, we made use of dimensional constraints. For ionic conductivities measured in mS cm−1, eqn (4) must have units mS cm−1 K0.5 whilst functions f and g must divide to a dimensionless quantity. To promote dimensionally consistent expressions, we penalize those expressions that do not meet the requirements on f, g and h by adding an additional term β = 104 to the loss. To not be too restrictive about the search space, however, we do allow scalar constants to have any dimensionality.
We ran PySR in a distributed setting using 32 cores for 72 hours. To ensure that the SR search does not get stuck in a local optimum of the search space, we ran 8 separate instances of the SR search in parallel. This will ideally generate a large pool of candidate expressions with various complexities. To find the model that best balances accuracy and complexity, we ran a model selection algorithm that assigns scores to the discovered expressions. By computing a cost ε for each model, the score function first selects the subset of models that have the lowest cost for a given complexity (C). Then, a second subset of n models that have a cost equal or within a maximum 5% of the lowest cost achieved are considered. For these n models, a score (S) was computed by taking the derivative of the negative logarithmic cost with respect to complexity:
![]() | (6) |
The highest scoring model was then considered the best. To align with our desire to promote models that are performant when it comes to capturing the temperature dependency of individual ILs, the cost was defined as the average of RMSEs computed across every IL in the dataset.
![]() | (7) |
![]() | ||
| Fig. 2 Measured ionic conductivities vs. the model predictions computed using eqn (7), found by SR applied to the inhouse dataset. Orange circles indicate training data, whilst blue triangles represent validation data. | ||
![]() | ||
| Fig. 3 Measured ionic conductivities (markers) and model predictions (solid lines) found using eqn (7) for a selection of four ILs plotted against temperature. | ||
As for the expression that constitute the found model (eqn (7)), we highlight that the molecular volume of the cation is larger than the norm of the dipole moment of the anion for all ILs in the dataset. This means that the exponential captures the expected positive correlation between ionic conductivity and temperature, and also suggests that ILs with larger cations have lower conductivities, which is consistent with structural diffusion. Furthermore, the model indicates that the molecular structure is important in determining the magnitude of the ionic conductivity, as molecular asymmetry is captured by the asphericity, eccentricity, and spherocity index in the pre-factor. As a higher asphericity indicates a more asymmetric ion, a higher eccentricity a more elongated ion, and a higher spherocity index a more spherical ion, the model suggests that ILs with bulkier and more asymmetric ions have lower ionic conductivities. It is a rather complex interplay, though, as ILs with more elongated anions and a high charge delocalization on the cation (by the Gini coefficient), appears to give higher conductivities. While the size of the dataset limits the extrapolation possible, the excellent prediction accuracy indicates that an ion-hopping mechanism is responsible for the ion transport in these ILs.
![]() | (8) |
![]() | ||
| Fig. 4 Measured ionic conductivities vs. the model predictions for the training set, computed using eqn (8) found by applying SR to the large dataset. | ||
![]() | ||
| Fig. 5 Measured ionic conductivities vs. the model predictions for the validation set, computed using eqn (8) found by applying SR to the large dataset. | ||
![]() | ||
| Fig. 6 Measured ionic conductivities (markers) and model predictions (solid lines) found using eqn (8) for a selection of four ILs plotted against temperature. | ||
The expression that constitutes the model (eqn (8)) looks quite different as compared to the one found for the in-house dataset (eqn (7)). The model indicates that cations with more elongated shapes tend to yield ILs with higher ionic conductivities, which is somewhat counter-intuitive, as one might expect asymmetric and/or bulky ions to diffuse more slowly. However, elongation is a different structural attribute from overall asymmetry, and ions that are extended along a single axis may, in fact, diffuse more readily despite not being particularly symmetric. Why only the eccentricity appears in the pre-factor is hard to say, but it is possible that our upper bound on the model complexity, Cmax = 30, do not leave much room for the SR search to explore complex expressions in both the pre-factor and the exponential. For the in-house dataset the best model found yields accurate predictions despite featuring a simple expression in the exponential, but the considered ILs were limited both in number and diversity. When the SR search for the large dataset instead finds an expression with a more complex exponential term, this suggests that the conductivity response to temperature can vary significantly between different ILs and that the radius of gyration of both the cations and the anions influence this, but with opposite signs, which suggests that whilst bulkier cations lower the conductivity, large anions increase it. This has previously been attributed to the fact that larger anions render weaker interionic interactions.50 The denominator in the exponential features a measure of anion asymmetry, the asphericity, combined in the same term as a description of charge delocalization in the anion, the Gini coefficient. Higher asphericity indicates a more asymmetric anion and a higher Gini coefficient corresponds to a more imbalanced charge distribution. As this term increases, the denominator becomes smaller, which corresponds to a lower ionic conductivity. Since a more imbalanced charge distribution should make it easier for the anion to form stronger ionic interactions, thus lowering the ion mobility, it is reasonable that anions with higher Gini coefficients would give ILs with lower ionic conductivities. Overall, this suggests that both the shape and charge distribution of the anion play important roles in determining the ionic conductivity of ILs. We also emphasize that both the numerator and the denominator in the exponential remain positive for all ILs in the dataset, ensuring the expected positive correlation between temperature and ionic conductivity.
To better understand where the model fails, we categorize ILs based on both their cation and anion, averaging the RMSE (computed per IL across a temperature range, in logspace) for each category (Fig. 7). The combination of a triazolium cation and a sulfonate anion shows the highest error, owing to a single IL: [4MPrTr][Tos]. In the original study providing the experimental data, the authors do note that the tosylate anion gave an IL with a significantly higher relative viscosity and lower ionic conductivity as compared to analogous ILs based on other anions, such as [NO3]− and [BF4]−. They attribute this to strong π–π interactions between the aromatic rings of the tosylate anion and the triazolium cation.51 Considering how our model predicts the ionic conductivities of other ILs in the study to a much higher accuracy, it appears that it fails to capture these strong interactions. This could also explain why the model performs worse for ILs based on morpholinium, which can form strong ionic interactions as the ether groups act as dipole sites. In line with the observed error, this effect is particularly strong for polarizable anions such as dicyanamide, a pseudohalide. Similarly, both the phosphonium-carbanion and the phosphonium-halide/pseudohalide ILs report comparatively low ionic conductivities, attributed to strong association, and these are two other example of where our model fails to make accurate predictions.52,53 Strong interactions do render poor ILs, often classified by the amount of deviation from ideal behaviour in a Walden plot, commonly attributed to the formation of neutral ion-pairs.54 Cations with long alkyl chains can have similar effects, as dispersive forces may create a mesoscopic structure with small, localized charge regions separated by neutral domains. This could be another reason for why our model fails for the morpholinium group, which include cations with alkyl chains of up to nine carbons.55
As our FVT-inspired model builds on the Nernst–Einstein relation (eqn (2)) that assumes non-interacting ions, our model should in principle, by design, be less accurate for all ILs with strong interactions. Although the SR approach allows for considerable corrections to eqn (3) using a more general Ansatz (eqn (4)), it is likely quite hard to have a single model accurately capture the ionic conductivity of ILs that appear in very different regions of a Walden plot, it will be more accurate for ILs closer to the ideal Walden line. The ILs with the small lithium and potassium cations are also very far from ideal ILs; most ILs consist of bulky organic cations.
Although it might seem discouraging that the model has not managed to accurately describe the ionic conductivity of all ILs in the dataset, it is important to reflect on what this means. Clearly, searching for expressions inspired by FVT using SR do not yield a model than generalise across the entire dataset. However, the model still works very well for many ILs, with less accurate predictions primarily observed for non-ideal ones. This suggests that the SR approach can capture fundamental correlations that apply to the more idealised systems. To strengthen this belief, we explore how all eight found models of complexity C = 25 compare. Our SR implementation is inherently stochastic and there is no guarantee that different runs will yield similar models, especially if there would be no clear underlying relationships in the data. Yet, looking at how the models differ in their predictions, we find that they agree to a high extent, both for very accurate predictions such as the ones for [HPy][TFSI], and for considerably less accurate examples like [4MPrTr][Tos] (Fig. 9). Instead of a case where each SR search overfit to different chemical motifs, the discovered models converge to make similar predictions. Importantly, this shows that our models capture fundamental correlations and is a testament to the robustness of the SR approach.
To further test if the model from the large dataset has captured the relevant physics we cross-test it on the smaller in-house dataset and compare vs. the model represented by eqn (7). Unfortunately, it does not achieve the same prediction accuracy and tends to underestimate the ionic conductivities (Fig. 8). A possible/plausible explanation is that although the model performs best for more ideal ILs, the exposure to less ideal ILs during training has given it suboptimal performance across the board. Indeed, the loss function guiding the SR search punishes models that completely ignore a subset of ILs, and instead rewards those that display decent performance for many ILs. Especially, as discussed above, the large dataset contained several ILs with strong interionic interactions and low ionic conductivities, why it is reasonable to expect that the SR search tried to find models that adapted at least somewhat to these data, which could manifest itself as underestimations of the ionic conductivities.
![]() | ||
| Fig. 8 Measured ionic conductivities vs. the model predictions for the in-house dataset using eqn (7) (orange circles, found by SR applied to the in-house dataset), and eqn (8) (blue triangles, found by SR applied on the large dataset). | ||
On a more general note, the difficulties of dealing with experimental data sourced from 125 different publications should not be underestimated. In many cases, the database used to construct the dataset, ILThermo, has reported ionic conductivities that are multiple times or even magnitudes higher than other data for the same IL. These differences can likely be explained by variations in sample purity or dryness, which is known to have a considerable effect on the physical properties of ILs.56,57 When several publications had data on the ionic conductivity of a specific IL, our rule of always choosing the data series with the lowest reported ionic conductivities could help filter out less pure ILs, but for many there is only one publication and set of data.
Moving to a considerably larger dataset collected from the database ILThermo, SR found symbolic models that could capture the temperature dependency of many ILs, but with a relatively high spread in prediction accuracy. Grouping the ILs based on their constituent ions, we observe that the model performs better for more ideal ILs. When ionic interactions become stronger, or long alkyl chains create mesoscopic ordering with large neutral domains, the model predictions become less accurate. It appears difficult to capture the conductivity behaviour of ILs with very different amounts of ion pairing in a single model, at least when enforcing an expression consistent with FVT. It is likely that the diversity of the two datasets had an impact on the model performance even for more ideal ILs, this as the best model for the larger dataset tends to underestimate the ionic conductivities of the smaller dataset and does not achieve the same performance as the first model. Allowing for a model with multiple terms, each one consistent with the expression in eqn (3), could perhaps allow the model to generalise better. This would, however, come at the expense of model complexity.
The lack of control we had over the large dataset is also believed to be a contributing factor to the spread in prediction accuracy. As even trace amounts of water or other contaminants are known to affect the ionic conductivity, the model performance should in practice be limited by the variation in sample purity. This further points towards controlled, in-house measurements, as the ideal setting for SR. However, there is a strong argument for reusing data when possible, and in future studies it would be interesting to address the issues that arise from dealing with data from multiple sources.
Finally, the present models and the SR approach we have designed are most useful stepping-stones towards also modelling the ionic conductivities of IL-based electrolytes for LIBs, but as we then must expect strong ionic interactions by the introduced lithium cations there is at the same time no simple blue-print solution.
Supplementary information (SI): the SI discusses how chemical names were parsed to SMILES strings, provides definitions for the molecular descriptors used, and contains tables with the numerical parameters found for the analytical models. See DOI: https://doi.org/10.1039/d5cp04143k.
| This journal is © the Owner Societies 2026 |