Open Access Article
Gaopeng
Ren
a,
Austin M.
Mroz
ab,
Frederik
Philippi
a,
Tom
Welton
a and
Kim E.
Jelfs
*a
aDepartment of Chemistry, Imperial College London, White City Campus, London, W12 0BZ, UK. E-mail: k.jelfs@imperial.ac.uk
bI-X Centre for AI in Science, Imperial College London, White City Campus, London, W12 0BZ, UK
First published on 12th January 2026
Ionic liquids (ILs) are salts set apart by their low melting points and can act as highly tuneable solvents with broad application potential, for example as catalysts, in batteries, and for drug delivery. The potential chemical space of ILs is vast, with only a very small region having been explored to date. Machine learning offers a promising approach to advance into this vast space of unexplored ILs; however, existing IL databases contain limited ion diversity, constraining the performance of generative models. To address this, we introduce conditional variational autoencoders (CVAEs) and a novel ion scoring method as a conditioning factor. The ion score prioritises ions with a higher likelihood of forming low-melting-point ILs. Our CVAEs effectively generate novel and diverse cations and anions. Furthermore, we constructed a melting point prediction model to identify cation–anion pairs that are likely to yield ILs with low melting points. Visualisation of the generated ILs alongside existing ones reveals that our approach effectively expands the chemical space of ILs with novel structures. Molecular dynamics simulations further validate that 13/15 of the generated ILs possess desirable low melting points (<373 K). The associated code is available at https://github.com/fate1997/ILGen-ion.
Despite the immense potential chemical space of ILs,12 the existing chemical space of experimentally validated ILs remains remarkably small, on the order of thousands.13 This significant disparity highlights a vast unexplored territory of ILs. Given the impossibility of experimentally validating such a large number of compounds, computer-aided molecular discovery offers an alternative method for expanding the existing IL chemical space.14 Traditional approaches, including density functional theory (DFT) and molecular dynamics (MD), provide valuable insights into the atomistic behaviour of ILs and their structure–property relationships. However, these methods are computationally expensive, limiting their scalability for high-throughput screening.15,16 Thermodynamic models such as PC-SAFT and COSMO-RS17–20 have been widely used for IL property prediction, yet their generalisation to novel and exotic structures remains limited.21
In recent years, machine learning (ML) has emerged as a promising avenue for virtual screening. ML has been extensively applied to predict IL properties, including melting point,22 viscosity,23 and CO2 solubility.24 ML models offer rapid and accurate predictions, making them highly suitable for virtual screening. In the virtual screening process, it is important to construct a large initial database. One straightforward approach to generating new ion structures involves manually defining a fragment library and then combinatorially combining these fragments. Ion structures can be divided into two parts: the charged components, e.g., imidazolium for cations and carboxylate for anions, and the substituents used to functionalise the charged component, e.g., methyl and halogens. While this method can generate a substantial number of ion structures from pre-defined building groups,2,10,25,26 the resulting structures are often highly similar to the original systems, leading to limited expansion of the diversity of IL chemical space. Moreover, many existing melting point prediction models are trained exclusively on IL datasets, which are heavily biased toward low melting points, thereby limiting their effectiveness in screening applications.
Deep generative models are an alternative method to enlarge and diversify molecular chemical space, and they have been widely used in drug discovery27 and materials design.28 These models generate new samples based on training data. Thus, their performance heavily depends on the quality and quantity of the data in the available database. However, IL databases often suffer from data scarcity, impacting both unlabelled and labelled datasets. This restricts the chemical space of generated ILs and complicates the development of property-guided generative models. To alleviate the data scarcity problem, Liu et al.24,29 proposed optimisation-based methods to guide generated examples towards higher validity and desired properties. However, due to the limited number of unique cation and anion structures, their generated examples highly resemble the existing ones. Transfer learning provides another strategy to mitigate the data scarcity problem by leveraging knowledge from different but related domains. This typically involves a two-step process: training a large model on a broad database, then fine-tuning it on a smaller, related database. Beckner et al.30 applied transfer learning to expand the IL chemical space by pre-training variational autoencoders (VAEs) on the GDB-17 database (general organic compounds) and then fine-tuning them on an IL database. They demonstrated that transfer learning is an effective approach to creating a generative neural network model of scarce datasets. However, they also found that the majority of the generated examples are neutral due to the large number of neutral compounds in the pre-training database (GDB-17). More recently, Chen et al.31 compiled a large, ion database from PubChem32 and proposed a pre-trained model for IL property prediction. They further pre-trained a VAE on this database and then fine-tuned it on a labelled IL database.33 Their results show that transfer learning can effectively alleviate the data scarcity problem in IL databases. However, this work does not take the melting point into consideration, and so the generated examples are not guaranteed to be low-melting-point ILs. In our previous work,34 we applied a link prediction algorithm to address the data scarcity problem and considered melting points explicitly. However, this workflow did not incorporate more ion structures other than the existing IL ions, which limited its ability to generate structurally diverse and novel ions.
Here, we aim to expand the chemical space of ILs with a specific focus on low-melting-point ILs. This requires designing ions with structures dissimilar to those of existing ILs and identifying low-melting-point ILs using a general melting point prediction model. Such an expansion is important not only for computational discovery but also for experimentalists, as it increases the likelihood of identifying novel ILs with unconventional structures and properties, thereby enabling new structure–property analyses and theoretical insights. To achieve this, we first collected large ion databases from PubChem, yielding approximately 0.9 million cations and 0.4 million anions. These PubChem ions cover the existing chemical space of IL ions; however, the vast disparity in quantity between PubChem ions (millions) and IL-specific ions (thousands) significantly reduces sample efficiency in identifying high-quality ions that readily form low-melting-point ILs. To address this and leverage prior knowledge from existing IL databases, we introduce ion scorers. These aim to softly classify whether an ion is likely derived from general ions or those represented in existing ILs. We then trained conditional VAEs (CVAEs) on the general ions using these predicted ion scores as a conditioning factor. After training, we used the ion score as a condition to generate ions that were likely to form low-melting-point ILs. Subsequently, we trained a melting point prediction model on a general melting point database and applied it to identify low-melting-point cation–anion pairs with less bias on underestimation of melting points. Finally, the chemical space visualisation confirmed the effectiveness of our workflow, demonstrating that the generated ILs are clearly distinct from existing ILs. Moreover, MD simulations validate that 13 out of 15 sampled ILs exhibit low melting points (<373 K).
• PubChem ion databases: contains cations and anions extracted from PubChem.
• Collected IL database: a compilation of ILs gathered from several sources.
• General melting point database (general MPT): a comprehensive dataset of melting points, covering a wide range of cation–anion pairs from low-melting-point ILs to high-melting non-IL systems.
• IL melting point database (IL MPT): a melting point database for ILs, collected by Venkatraman et al.35
These datasets were used to train various ML models aimed at generating novel, diverse, and valid ILs. All collected data, along with the full implementation of the methods described in this paper, are available at https://github.com/fate1997/ILGen-ion.
We collected IL ions and PubChem ions to train the conditional generation models. To visualise the chemical space of ions from both PubChem and the IL dataset, we applied the Uniform Manifold Approximation and Projection (UMAP) algorithm36 using extended-connectivity fingerprints (ECFPs)37 as input features. Owing to the size of the PubChem ion datasets, a random sample of 50
000 cations and 50
000 anions was selected for plotting. As shown in Fig. 1a, the sampled PubChem ions span a broader chemical space than the ions in the IL dataset, demonstrating their potential to enrich the diversity of generated ILs. The specific composition and dataset generation methods and criteria for the IL dataset and the PubChem ion dataset are described in the following sections, Section 2.1.1 and 2.1.2, respectively.
P values greater than 6 or less than −4, more than 6 hydrogen-bond donors, more than 11 hydrogen-bond acceptors, or more than 15 rotatable bonds, in order to exclude overly complex ions. Additional filters eliminated molecules that (1) could not be parsed by RDKit, (2) contained uncommon elements, (3) consisted of multiple components, or (4) contained unpaired electrons. These criteria were designed to ensure that the resulting ions were chemically reasonable. After filtering, the final dataset consisted of 903
585 cations and 401
474 anions.
To handle duplicate entries with inconsistent melting points, we applied a filtering rule: if the melting point values differed by more than 10 K, all duplicates were excluded; otherwise, the mean value was used. The melting point distributions for the general MPT and IL MPT datasets are shown in Fig. 1b. The IL MPT dataset comprises a greater number of compounds featuring lower melting points relative to the general MPT dataset, indicating, as expected, that ILs typically possess lower melting points than general cation–anion pairs. In this study, we aim to use the melting point prediction model to identify low-melting-point ILs from general cation–anion pairs. Therefore, relying solely on the IL-specific dataset would risk training a model that underestimates melting points. The general MPT dataset, being larger and more diverse (Fig. S3), provides a more suitable foundation for training a robust predictive model.
| yls = (1 − α)y + α/K, | (1) |
:
20 ratio. We implemented logistic regression models with L1 regularisation using the scikit-learn Python library.57 Finally, the trained ion scorers assign a score (0–1) to ions. We set a threshold of 0.5; ions with scores above 0.5 are classified as IL ions, while those below are classified as non-IL ions (which have a low probability of forming ILs).
![]() | (2) |
We constructed a vocabulary based on the unique characters in the PubChem dataset. SMILES strings were tokenised into sequences of integers, which were then passed through an embedding layer, producing 292-dimensional vectors. Both the encoder and decoder consisted of three GRU layers with hidden dimensions of 292. The latent space dimensionality was set to 128 (excluding the dimension for the ion score). The models were trained for 100 epochs using the Adam optimiser with a learning rate of 0.0001 and a batch size of 128.
:
20 ratio. The trained model was subsequently used to screen the generated cation–anion pairs, retaining only those predicted to have low melting points, and thus higher likelihoods of forming ILs.
| Model | Database size | RMSE (K)↓ | MAE (K)↓ | R 2↑ |
|---|---|---|---|---|
| a ANN, artificial neural network; RF, random forest; KRR, kernel ridge regression; CNN, convolutional neural networks; XGBoost,65 a gradient boosting algorithm for decision trees; and GC, graph convolutional. ↑ indicates “higher is better”, and ↓ indicates “lower is better”. b The metric is not reported. | ||||
| ANN62 | 799 | 33.3 | —b | 0.54 |
| RF35 | 2212 | 45.0 | 33.0 | 0.66 |
| KRR41 | 2212 | 38.5 | 29.8 | 0.76 |
| Transformer CNN63 | 3073 | 45.0 | 33.7 | 0.66 |
| GC64 | 3080 | 37.1 | 28.8 | 0.76 |
| XGBoost [this work] | 5848 | 42.5 | 30.9 | 0.71 |
| RF [this work] | 5848 | 42.4 | 31.3 | 0.72 |
| TabPFN [this work] | 5848 | 39.4 | 29.0 | 0.75 |
Since our dataset includes melting points collected from general MPT databases, it avoids the inherent bias present in IL-specific datasets, which tend to be skewed toward lower melting points (Fig. 1b). ML models trained solely on IL databases are likely to underestimate the melting points of cation–anion pairs, making them less suitable for use as filters to eliminate high-melting-point candidates. In contrast, the general MPT database compiled in this work provides a more balanced and comprehensive view of cation–anion combinations. As a result, models trained on this broader dataset should exhibit reduced bias and be better suited for accurately identifying high-melting-point compounds. This characteristic is particularly valuable in large-scale virtual screening tasks, where the model must generalise well across a diverse chemical space.
After feature selection, the logistic regression models were retrained using the top 25 descriptors. The performance of the classification model was measured by accuracy, recall and the area under the receiver operating characteristic curve (ROC-AUC). Accuracy is the proportion of total correct predictions made by the model. Recall is the proportion of actual positives that were correctly identified by the model. ROC-AUC indicates how well the model distinguishes between positive and negative examples, with higher values meaning better performance. Performance metrics for both cation and anion scorers are summarised in Table 2. Despite the model's simplicity, both scorers achieved strong performance, indicating that the classification task is relatively tractable. High recall scores demonstrate the models' effectiveness in identifying IL-relevant ions. Notably, the cation scorer outperformed the anion scorer, likely due to the greater number of cation samples available during training.
| Metric | Cation | Anion |
|---|---|---|
| Accuracy↑ | 0.9147 | 0.8578 |
| ROC-AUC↑ | 0.9142 | 0.8617 |
| Recall↑ | 0.9511 | 0.9271 |
After training the ion scorers, we randomly sampled 10
000 PubChem cations and anions and computed the ion scores of PubChem ions and IL ions. The resulting score distributions are presented in Fig. 4. As expected, the scorers successfully assigned higher scores to IL ions and lower scores to PubChem ions, demonstrating effective discrimination between the two classes. Notably, due to the application of label smoothing during training, the score distributions are less sharply polarised. This allows a subset of PubChem ions to receive relatively high scores, reflecting the model's capacity to recognise potentially IL-like structures beyond those present in the training data.
![]() | ||
| Fig. 4 Ion score distributions. (a) Distribution of ion scores for IL cations and PubChem cations. (b) Distribution of ion scores for IL anions and PubChem anions. | ||
000 SMILES and assessed them using four metrics: validity, uniqueness, novelty, and reconstruction accuracy. Validity refers to the proportion of generated SMILES that can be parsed by RDKit. Uniqueness measures the fraction of unique SMILES among the valid ones. Novelty quantifies the percentage of generated SMILES not present in the training set. Reconstruction accuracy is defined as the proportion of test set SMILES that are correctly reconstructed by the model. The performance metrics are summarised in Table 3. The cation and anion generation models demonstrate high uniqueness and novelty, indicating their ability to generate diverse and previously unseen structures. However, the validity is not very high, likely due to the complexity of learning syntactic rules from highly diverse ion structures. To further assess the conditional generation capability, we sampled 5000 ions conditioned on label 0 (non-IL ions) and another 5000 on label 1 (IL ions) and calculated their ion scores. The label 1 condition is intended to bias the generation toward ions with a higher likelihood of forming an IL. The average ion scores for label 1 samples were 0.45 for both cations and anions, whereas the averages for label 0 were 0.17 and 0.20, respectively. These results confirm that the conditional generation model effectively produces ions with higher predicted relevance to ILs when guided by label 1. It is worth noting that the generated examples for label 1 do not consistently achieve very high ion scores. This may be due to class imbalance in the training data. For cations, there are 678
076 label 1 examples compared to 176
279 label 0 examples. A similar trend is observed for anions, with 290
998 label 1 examples and 102
235 label 0 examples. This imbalance makes it difficult for the CVAEs to achieve high average scores for label 1. Despite this, the use of ion scores still helps the CVAEs generate more positive examples (0.45 vs. 0.20).
| Ion type | Validity | Uniqueness | Novelty | Recon. |
|---|---|---|---|---|
| Cation | 77% | 100% | 99% | 71% |
| Anion | 72% | 100% | 98% | 71% |
Upon visual inspection of the generated ions presented in Fig. S6, we observed that although the ion generation model successfully produces diverse and novel structures, some generated ions exhibit chemically unstable features. For instance, certain ions are excessively complex, contain implausible substructures (e.g., carbanions), or possess radical electrons. There are generally two strategies to address such issues: (1) pre-filtering the training data to exclude undesired structures before model training, or (2) post-filtering the generated molecules to remove invalid or implausible candidates. To better explore potential IL ions across a broad and diverse chemical space, we opted for the post-filtering approach. As described in Section 2.4, we applied several structural and chemical filters to eliminate unreasonable ions from the generated set. Representative examples of filtered ions are shown in Fig. 5. These examples demonstrate that the ion-level generation model produces diverse ion structures. For instance, the positively charged groups include rings of various sizes, ranging from 3-membered to 9-membered rings; while the negatively charged groups include carboxylates, thiocarboxylates, phosphonates, and amides. At the same time, key characteristics commonly found in ILs, such as long alkyl chains and fluorine atoms, are also present in many generated structures. This suggests that the generative model, combined with ion scoring and post-filtering, is capable of exploring a large chemical space while still capturing important structural motifs observed in real ILs.
To further assess the impact of ion scores on the CVAEs, we randomly sampled 100 cations and 100 anions from the decoder, each with ion scores of 0 and 1. Cations and anions with the same ion score were then paired to construct IL candidates. Using the trained melting point prediction model, we estimated the melting points of these ILs, and the results are shown in Fig. 3b. The predicted values indicate that ILs composed of ions with an ion score of 1 generally exhibit lower melting points compared to those formed from ions with a score of 0. This suggests that the ion scorers effectively guide the generation process toward chemical space regions more likely to correspond to low-melting-point ILs. By using ion scores as conditions, the CVAE model is able to generate ions with properties similar to those found in known ILs, increasing the likelihood of forming ILs with desirable melting behaviour. Compared to transfer learning approaches, this conditional generation method offers a softer constraint, and it does not force the generated chemical space to closely mimic the existing IL dataset.33 Instead, it allows for meaningful expansion of the IL chemical space while still preserving key characteristics of known IL ions. Overall, the cation and anion CVAEs effectively alleviate the limited diversity of the existing chemical space of ILs.
000 unique ion pairs, which were subsequently evaluated using the melting point prediction model and ranked based on their predicted melting points. The top 5000 ILs with the lowest predicted melting points were selected as the generated IL candidates. As shown in Fig. 6, our workflow effectively expands the existing IL chemical space by discovering numerous novel ILs that are chemically diverse and distinct from those currently known, while also generating ILs similar to existing ones. The divergence in chemical space observed in the generated ILs may be attributed to the fact that the collected ILs database was not directly incorporated into our workflow. Instead, the use of ion scorers as a soft constraint guided the generation process toward a distinct yet chemically plausible region, that is likely to form low-melting-point ILs. Compared to previous IL generation workflows,24,29 which primarily focus on ion generation using only known IL ions as inputs, our approach leverages a large pool of PubChem ions to explore a much broader chemical space, enabling the generation of novel ions beyond the scope of existing IL datasets.
Additionally, our framework explicitly incorporates melting point consideration during IL generation, enabling the identification of low-melting-point candidates. We computed the average melting point of the generated ion combinations via the trained melting point prediction model, finding it to be approximately 380 K. This is lower than the average melting point of the general MPT database (395 K), indicating that the CVAE models are capable of generating ions that can form lower-melting-point ion mixtures even without melting point filtering. To further validate the effectiveness of our approach, we applied the MD-based workflow from our previous work34 to compute the melting points of top 15 generated ILs with the lowest predicted values (details provided in the SI Section S6). The results show that 13 out of 15 ILs exhibit melting points below 373 K, with an average melting point of 353 K. This provides additional confirmation that our workflow can reliably identify low-melting-point ILs.
Looking ahead, promising directions include the direct generation of ion pairs guided by melting-point prediction, the integration of active learning with automated melting-point calculations or experiments, and optimisation-based strategies (e.g., Bayesian optimisation or reinforcement learning) to identify low-melting-point ILs within large chemical spaces more efficiently. We also observed structural imbalances in the PubChem ion dataset; for instance, most anions contain carboxylate functional groups, while others, such as borate-based ions, are underrepresented. This imbalance can limit the diversity of generated ions. Although ion scoring helped mitigate this issue, future work could explore additional strategies, such as data augmentation, to address dataset biases. Meanwhile, we found that several generated ions contained unstable structures. Although we attempted to remove these structures using post-filtering, this approach was not very efficient. We also tested pKa prediction models to identify unstable ions; however, this method depends heavily on the accuracy of the prediction model, and the existing models cannot give good pKa predictions on IL ions. Overall, our framework, which integrates structural scoring, conditional generation, and predictive filtering, shows promising performance in IL discovery. Furthermore, it holds promise for generalisation to other material systems, such as deep eutectic solvents and transition metal complexes.
Supplementary information (SI): including IL dataset visualisations, melting point database visualisations, feature importance analyses, ablation studies for ion scorers, chemically unstable ions, and molecular dynamics validation results. See DOI: https://doi.org/10.1039/d5sc08673f.
| This journal is © The Royal Society of Chemistry 2026 |