Sadhana
Barman
and
Utpal
Sarkar
*
Department of Physics, Assam University, Silchar-788011, Assam, India. E-mail: utpalchemiitkgp@yahoo.com
First published on 18th September 2025
Material property prediction through machine learning has emerged as a revolutionary approach for diminishing hardships in the design of optimal materials for practical applications. Herein, we used a machine learning approach to refine over 11
664 solid electrolyte interphase materials and identified potential candidates in terms of chemical stability at the molecular level, solvation energy and ease of synthesis, thereby obtaining insights for discoveries of new effective optimal interphase materials for lithium-ion batteries. The predicted accuracy of chemical reactivity parameters and solvation energy was in the range of 86.7–91.3% by uncovering atomistic input features. Dipole moments, number of heteroatoms, NHOH count, heavy atom count, number of hydrogen acceptors and donors, several surface area descriptors (PEOE_VSA1, PEOE_VSA4, SMR_VSA6, SMR_VSA10, EState_VSA10, VSA_EState1, VSA_EState2), kappa index (kappa1), and functional groups (fr_ketone, fr_alkyl_halide, fr_nitro), etc. have been identified as key factors influencing solvation energy and chemical reactivity, offering critical guidance for screening the materials. These insights enable the strategic selection of SEI materials with chemical stabilities that effectively impact dendrite formation, thereby having the potential to enhance the performance and longevity of electrochemical systems. For the ideal identified candidates that have solid electrolyte interphase-affecting characteristics, the predicted property values perfectly align with the actual values. The predicted solvation energy, chemical hardness, and electrophilicity index are in the ranges of 1.433–5.677 kcal mol−1, 10.796–17.530, and 0.270–0.390, respectively, along with a low synthetic accessible score of 1.219–2.260. Non-ideal materials with the predicted solvation energy, chemical hardness, electrophilicity index and synthetic accessibility score are in the ranges of 85.354–300.982 kcal mol−1, 1.820–4.005, 2.030–4.823, and 4.002–7.422, respectively, demonstrating the model's robustness for reliable prediction, along with poor solid electrolyte interphase-suppressing characteristics. The most intriguing feature of our work is the molecules containing the elements fluorine, nitrogen and carbon, which define stable SEI candidates, while sulphur, oxygen, nitrogen, and carbon-containing molecules reduce the stable SEI formation capability. This result highlights a robust workflow that can guide the future discovery of materials through property optimization, particularly for dendrite suppression.
The practical performance of SEI layers in lithium-ion batteries depends on their solvation structure and also on a wide range of electrochemical and physical properties, such as interfacial energy with electrodes, mechanical robustness, lithium-ion diffusivity, and low electronic conductivity. The role of SEI in LIB battery performance is still an interesting topic for the scientific community. However, density functional theory (DFT)21,22 has appeared as an alternative to analyse these reactive molecules that compensate for the void and provide a better understanding of the reaction chemistry of the SEI. The reactivity parameters like electronegativity, chemical hardness, electrophilicity, etc.,23–28 are used for the chemical characterization of molecules within the DFT framework.
Solvation energy is a critical parameter for battery applications that determines the interaction between solute and solvent molecules. It is a physical property that measures the amount of minimum necessary work required for solvation29–34 and describes molecular interactions that cover the interactive nature of the solvent in contact with a dissolved molecule, in which the solute and solvent organize. Solvation energy signifies the deformation of free energy that is correlated with the transportation of molecules between the solvent, as well as in an ideal gas at a certain pressure and temperature.35–39 Forecasting this thermodynamic property has always been a bit challenging, but it has been investigated through in silico computational methods for complicated hydration mechanisms.40–53 This property has extended its applicational domain into various chemical processes of drug delivery systems,54,55 sustainable synthesis methods,56 as well as the electrochemical performance of energy storage devices.57,58 Moreover, its accuracy has been hindered due to a lack of sufficient experimental data despite several breakthroughs. To date, various reliable techniques, e.g., molecular dynamics, quantum chemical simulation, etc., have been used to predict this solvation energy.59–67 However, some hydration complexity has been explained using the Boltzmann equation, which describes the behaviour of solvent in an isotropic medium.68–70
While describing the solvation model, only small molecules are considered at the quantum level during calculations, and complicated examples are underrepresented. Molecular dynamics facilitates a significant understanding of solvation energy, typically near 4 kcal mol−1.71 Forecasting physical properties provides the possibility for screening for optimal material design that also gives feasibility for synthetic route design, etc.72
Data-driven approaches, which are considered the fourth paradigm of science, have been proven to be an efficient method for predicting physical properties and are also cost-effective and time-saving.73–82 These approaches facilitate finding the most relevant features influential for solvation energy. Solvation energy is predicted using a graph neural network, and further transfer learning is implemented on experimentally calculated solution energy datasets.83,84 However, the range of the target property value found experimentally is usually small and is a limitation of the work since the variation in the property value is also small; consequently, it is relatively easy to predict the target property compared to the case where the variation of the target property is large.
Herein, we have implemented automated machine learning (ML) models to predict the solvation free energy of SEI products through supervised machine learning algorithms. Instead of using calculated features as input, we have developed machine-generated atomistic input features using the Rdkit module,85 and ML models provide the correlations between these features and solvation energy. We have also narrowed down our entries to find the optimal structures that can be synthesized. We have selected chemical reactivity parameters, namely electronegativity, chemical hardness, and electrophilicity, apart from solvation energy and the synthetic accessibility score, as screening criteria for SEI-forming materials.
664 datapoints from the LIBE dataset to predict solvation energy. Each SEI formation material present in the dataset has been represented with SMILES strings whose actual solvation energy values span from 1.381 to 323.507 kcal mol−1.
664 entries in the initial database. Significant atomistic information, in the form of molecular descriptors, representing the physicochemical characteristics of each material, has been decoded, and more than 200 input features that are significantly correlated with the chemical reactivities and solvation energy were considered. We selected (by sorting) highly correlated features following refinement of the dataset for better accuracy by hyperparameter tuning, which resulted in the enhanced performance of the models. Through this workflow of predicting these properties, machine learning models provided us with some highly impactful input features that have prominent contributions to the targeted properties. We separated the materials having solvation energy values less than 50 kcal mol−1 (6134 entries) and greater than 50 kcal mol−1 (5530 entries), classifying them as ‘desired for SEI’ and ‘unwanted for SEI’, respectively. Finally, we generated the SA score of these materials and implemented ‘material screening’ by optimizing the chemical hardness, electrophilicity index, solvation energy, and SA score through the Pareto filter method, which gives us the best candidate material.
The correlation coefficients, governed by the associated correlation matrix of solvation energy with the top descriptors and reactivity parameters, are distinguished by the different colours in the heat map. The more vibrant colours of the boxes, towards red, signify a more intense correlation, whereas the colour changes towards blue indicate a low correlation between the features. The diagonal boxes featuring the brightest red colour symbolize its autocorrelation. Essential molecular descriptors (Table S1) for solvation energy come into focus because of the correlation coefficients obtained through this (correlation heatmap) effective tool. Necessary correlation coefficients for our study are observed to fall within the range of −0.34 to 0.77, related to the relevant descriptors: “electronegativity”, “chemical hardness”, “electrophilicity index”, “topological polar surface area (TPSA)”, “dipole moment”, “partial equalization of orbital electronegativity and surface area contribution of atoms in molecule (PEOE_VSA1)”, “MOE logP VSA descriptor 2 (SlogP_VSA2)”, “heavy atom count”, “number of NHs and OHs (NHOH count)”, “number of nitrogen and oxygens (NO count)”, “number of hydrogen bond acceptors (Num H acceptors)”, “number of hydrogen bond donors (Num H donors)”, “number of hetero atoms (Num hetero atoms)”. Fig. 2(b) portrays the highest ten input features along with chemical reactivity parameters chosen from the Pearson correlation heatmap, Fig. 2(a). The correlation analysis shows that chemical reactivity parameters are nonlinearly correlated with solvation energy, signified by their correlation coefficient values, 0.14, 0.24, and −0.34, which ensures a reliable context for screening materials by the multiobjective optimization of these properties. The negative value of the correlation coefficient for chemical hardness ensures the inverse correlation with solvation energy. Three descriptors, “TPSA”, “NO count”, and “Num H acceptors” show the correlation values of 0.77, 0.75, and 0.71, respectively. In comparison, “dipole moment” and “Num hetero atoms” exhibit lower correlation than the three descriptors above, with correlation values of 0.69 and 0.65. “NHOH count”, “Num H donors”, “PEOE_VSA1”, “SlogP_VSA2”, and “heavy atom count” show correlation with solvation energy in the range between 0.58 and 0.50. In Fig. 3, we visualise the correlation pattern of individual input features with solvation energy; the thirteen scatter plots confirm the nonlinear relationships between the individual features and the target feature (solvation energy). Plots a, b, c, d, g, l, and m of Fig. 3 show the variation of solvation energy with respect to these input features; the remaining plots, e, f, g, h, i, j, and k, demonstrate the discrete relationship pattern between solvation energy and the corresponding input features. These individual pairwise plots help to assess the predictive and optimization relevance between each feature.
Interestingly, an inverse relationship between the solvation energy and chemical hardness (Fig. 3c) indicates that as desolvation increases, the molecule's reactivity increases, and molecules become more polarised (Fig. 3d, g and l), as indicated by the direct correlation of the solvation energy with dipole moment and surface area descriptors. Such information guides us to find the appropriate parameters for prediction and optimization through its direct and inverse underlying relations. Based on these, we proceeded with the prediction, taking electronegativity, chemical hardness, electrophilicity, and solvation energy as targets and the rest of the features as inputs.
Regression algorithms have been implemented for the accurate prediction of all properties through machine learning models. In our investigation, we use 80% of the dataset for training, and the remaining 20% of the data for testing. A total of forty-two lazy predict88 ML models have been tested (Table S2), depending upon the R-squared value. Here, we present twenty-two ML models (Fig. 4) for predicting solvation energy, through which performance has been scrutinized.
Model performance has been examined based on its high coefficient of determination (R-squared) and low root mean squared error (RMSE) values. As depicted in Fig. 4, the performances of the top twenty-two models are demonstrated in Table 1.
| Serial number | Model | R-squared | RMSE |
|---|---|---|---|
| 1 | XGB regressor | 0.83 | 12.53 |
| 2 | Extra trees regressor | 0.83 | 12.68 |
| 3 | Random forest regressor | 0.81 | 13.20 |
| 4 | Hist gradient boosting regressor | 0.81 | 13.21 |
| 5 | LGBM regressor | 0.81 | 13.44 |
| 6 | Gradient boosting regressor | 0.80 | 13.46 |
| 7 | MLP regressor | 0.79 | 13.92 |
| 8 | K neighbors regressor | 0.79 | 13.92 |
| 9 | Bagging regressor | 0.79 | 13.97 |
| 10 | Lasso lars CV | 0.71 | 16.34 |
| 11 | Lasso CV | 0.71 | 16.34 |
| 12 | SGD regressor | 0.71 | 16.34 |
| 13 | Ridge CV | 0.71 | 16.35 |
| 14 | Bayesian ridge | 0.71 | 16.35 |
| 15 | Ridge | 0.71 | 16.35 |
| 16 | Lasso lars IC | 0.71 | 16.35 |
| 17 | Linear regression | 0.71 | 16.35 |
| 18 | Transformed target regressor | 0.71 | 16.35 |
| 19 | Orthogonal matching pursuit CV | 0.71 | 16.35 |
| 20 | Elastic net CV | 0.71 | 16.35 |
| 21 | Lasso lars | 0.71 | 16.37 |
| 22 | Lasso | 0.71 | 16.37 |
Fig. 4a and b indicate that the highest performing ML model is the XGB regressor model, with an R-squared value of 0.83 and the lowest RMSE value of 12.53. Extra tree regressor also showed superior performance: it achieved an R-squared value of 0.83 and an RMSE value of 12.68 (Table 1). The four models, i.e., random forest regressor, hist gradient boosting regressor, LGBM regressor, and gradient boosting regressor, appear to capture more diverse datapoints with consistently high R-squared values with only a slight difference from the highest performing extra trees regressor model and XGB model, which reflects strong predictive power. The R-squared values of the four models lie in the range of 0.80–0.81, with RMSE values ranging from 13.46–13.20. However, for models with serial numbers 7 to 22 (Table 1), a decline in the R-squared values was observed. This slight decline indicates the effectiveness of the first six models compared to the rest of the ML models (MLP regressor, K neighbors regressor, bagging regressor, Lasso lars CV, Lasso CV, SGD regressor, ridge CV, Bayesian ridge, ridge, Lasso lars IC, linear regression, transformed target regressor, orthogonal matching pursuit, elastic net CV, Lasso lars, Lasso) for which and R-squared and RMSE cover the ranges 0.71–0.79 and 16.37–13.92, respectively. It manifests that these models capture almost the same data pattern among the input features and the target properties; this trend also demonstrates the reduced predictive power.
Overall, the extra trees regressor and XGB regressor models performed as the top-performing models among twenty-two ML models for solvation energy. Taking this into account, we proceeded to predict the electronegativity, chemical hardness, electrophilicity index, along with solvation energy, using these two highest performing models. Here, we have shown the prediction plot for the XGB regressor and extra trees regressor models (Fig. 5). The accumulation of datapoints, as shown in Fig. 5 near the diagonal line, signifies favourable correspondence between the predicted and actual values for all predicted properties. Fig. 6 shows the variation of residuals plot and the distribution of these residuals in a histogram plot (for XGB regressor), which clarifies the presence of some datapoints that have very few residual errors seen a little way off from the best fitting line.
The predicted chemical reactivity and solvation energy with the best performing XGB regressor model and extra trees regressor model have achieved R-squared values in the range 0.867–0.913, mean absolute error (MAE) values in the range 0.152–4.849, and RMSE values in the 0.236–10.047 range, as shown in Fig. 5. The distribution of the residuals of these predicted properties is displayed in Fig. 6. The better alignment of the tested datapoints towards the best fitting line can be interpreted as its enhanced predictive power. This prediction enhanced our area of search for optimal SEI products with better predictability of all targeted properties.
The highly contributing input features in predicting all properties for the dataset were detected according to their correlation coefficients, as shown in Fig. 7. The values signify a better correlation between input and output and dictate the accuracy in prediction performance.
The best important features (using XGB regressor) were arranged according to their importance value, as shown in Fig. 7, where some specific surface area descriptors, functional groups, are seen to have the highest importance in predicting these properties. “Functional group nitro (fr_nitro)” was observed to possess the most important feature for the electronegativity and electrophilicity index, with importance values of 0.3242 and 0.6010, respectively. “Ring count” and “dipole moment” are the highest importance features for chemical hardness and solvation energy, with importance values of 0.7318 and 0.2589, respectively. Surface area descriptor “MaxEStateIndex”, SMR_VSA (SMR_VSA6, SMR_VSA10), and VSA_EState (VSA_EState1, VSA_EState2) series contribute to the prediction of both the electronegativity and electrophilicity indexes. “Functional group ketone (fr_ketone)” and the “kappa1” index participate as the important features for all reactivity parameters, with importance values of 0.0150 and 0.0129 for electronegativity, 0.0101 and 0.0181 for chemical hardness, and 0.0487 and 0.0504 for electrophilicity index. Two other important functional groups for electronegativity prediction chosen by XGB regressor are “functional group of NH2 (fr_NH2)” and “functional group of alkyl halide (fr_alkyl_halide)”.
The “TPSA” contribution was observed, with a value of 0.1853 for solvation energy. Feature importance values of “NO count”, “number H acceptor”, “number hetero atoms”, “number atom count”, “number H donors”, “NHOH count”, “SlogP_VSA2”, and “PEOE_VSA1” for solvation energy were 0.1710, 0.0705, 0.0702, 0.0342, and 0.0282, respectively.
Surface area descriptors are directly correlated with the interactions that influence the chemical reactivity parameters and solvation energy from our study. The surface chemistry of the electrolyte impacts the kinetic stability of the electrolyte; to incorporate it, we considered correlated surface area descriptors as input features, apart from the already mentioned reactivity parameters of the individual molecule. This guides the understanding of the electrical (electronic insulation capability) properties, as well as the reactivity of the SEI, which, to some extent, helps with comprehending the electrode–electrolyte structure property relationship. This comprehension may lead to the basis of interphase formation.
The “TPSA” quantifies the polarity, as well as the potential hydrogen bonding capacity of a molecule, depending on the distribution of polar atoms in the molecule. Dipole–dipole interactions between a polar solvent and solute greatly affect the solvation energy value, resulting in a more stable structure due to the higher solvation energy. Dipole moments also quantify how polar the molecule is. Polarization plays a critical role in regulating the formation and stability of the solid electrolyte interphase (SEI) in lithium-ion batteries. Under the influence of an internal electric field, materials with high dielectric constants—such as certain separators or electrolyte components—undergo strong electron displacement polarization. This polarization modifies the local electrostatic environment at the electrode–electrolyte interface, influencing the distribution and mobility of lithium ions and coordinating species. As a result, polarization can alter the solvation structure of Li+, favouring the inclusion of anions in the solvation sheath, which leads to the formation of anion-derived SEI components such as LiF. These inorganic-rich SEI layers are typically more uniform, mechanically stable, and ionically conductive. Moreover, the enhanced polarization can suppress side reactions by reducing local electron density near the electrode surface, thereby mitigating the formation of amorphous organic oligomers and promoting a compact, low-resistance interphase. Overall, polarization-induced tuning of the interfacial environment emerges as a powerful mechanism for optimizing SEI chemistry and improving battery performance.89
ML models, considered as black boxes used for predictions, have been interpreted by the easily understandable Shapley additive explanations (SHAP)90 method in our study. Since we are emphasizing the solvation effect, in this method, the contribution of the input feature for predicting solvation energy has been interpreted by assigning each highly important input feature a numerical value. These numerical values signify the marginal participation of each feature, which can be chosen by different combinations, as well as contributions of features that influence the result and are constructed on the basis of the cooperative game theory principle.90Fig. 8 shows the SHAP plot of input features with predicted solvation energy.
In Fig. 8, the vertical axis represents the arrangement of important features from top to bottom based on their contribution to predicting solvation energy. The horizontal axis represents the impact of the contribution from positive to negative values by the SHAP algorithm. Red datapoints signify how impactful these datapoints are in the prediction of the target, whereas blue points show datapoints that have a low contribution to the prediction. In our SHAP plot, the datapoints of each feature are distributed from negative to positive values. The red points show how many positive and negative points are participating in the prediction; this can help us understand the contribution of each feature. From our plot, we have seen that most of the positive values, as well as some negative values near 0, are red points that show their influence in prediction, i.e., ‘dipole moment’, ‘TPSA’, ‘NO count’, ‘heavy atom count’, ‘NHOH count’, ‘PEOE_VSA1’, ’SlogP_VSA2’. Conversely, a negative impact on the target property also results from some positive values, as shown from the SHAP plot.
Partial dependence plots (PDP)91 and individual conditional expectation (ICE) plots92 have been demonstrated in Fig. 9 and 10, respectively, for important features with respect to their predicted solvation energy. The average contribution of input data points has been taken into consideration for the change in the target feature with the help of PDP (Fig. 9). The dependency of input features on the target response has been picturized through these plots, showing us linear and nonlinear relationships between the input and predicted property, and is helpful to have better information about the values of input features that are more influential to the target property. From PDP, it is clear that increasing ‘TPSA’ and ‘dipole moment’ values give rise to the linear enhancement of solvation energy. Conversely, the ‘NO count’, ‘Num hetero atoms’, ‘Num H donor’, ‘NHOH count’, ‘Num H acceptors’ plots manifest that after a certain point, there is no increment of solvation energy with increasing values of these features. The ICE plot provides a better visualization and understanding of the dependency of the target feature on each value of the input features as well as how the target feature has been influenced by each value of the input features in each sample. We have displayed the top four feature plots with respect to solvation energy in Fig. 10, and the remaining features in Fig. S1. For the “TPSA” feature, we have seen (Fig. 9(b)) a linear relation with solvation energy (except for an initial sharp decline for a very short range). For the NO count, the solvation energy value increased rapidly until the NO count reached 0.5; after that, it became saturated with an increase in the NO count value. On the other hand, the third most influential feature, ‘Num H acceptors’, also exhibits nonlinear behaviour with solvation energy and becomes saturated when the value is greater than or equal to 6.
![]() | ||
| Fig. 10 ICE plots of solvation energy through the extra trees regressor for refined datasets, showing contributions for the top four features. | ||
The design of new SEIs that can reduce side reactions and enhance battery life and stability is crucial. Efficient SEI materials should be designed in such a way that they become chemically and mechanically stable, emphasize reaction prevention, dissolve in the electrolyte, do not break easily during volume expansion of the electrode, facilitate the interactions of electrons generated from the electrode and electrolyte, and prevent electrolyte reduction reactions that enable the smooth flow of Li-ions from the electrolyte to the electrode. The development of new SEI products can be accelerated through theoretical design of experimental efforts for potential candidates, which reduces the iterative refinement and time for evaluation from concept to practical implementation.93–96
Molecular-level information for an individual molecule is encoded in reactivity parameters like HOMO, LUMO, electronegativity, chemical hardness, and electrophilicity, which dictate the chemical stability and reactivity of the molecule. The electrode's performance depends on the electrolyte's solvation structure, which characterizes the alkali metal's desolvation ability.97 The solvation free energy of the electrolyte molecules, which is evaluated using the implicit solvent approach (polarizable continuum model, PCM, proposed by Tomasi and co-workers98), is included in the data set as an input feature. For electrochemical stability, the electrolyte should have the following: (a) good ionic conductivity and electronic insulating properties, which facilitate ion transport and minimal self-discharge; (b) a wide electrochemical window to prevent electrolyte degradation in the range of the working potential; and (c) chemical inertness with respect to the cell separator, electrode substrate, etc.99 A multi-component system encompasses salt solvent additives that result in a large number of interactions, which introduces additional complexities.
The SEI layer, determined by the solvate, plays a significant role in battery cyclability. The stability of the electrode is dependent on the solvation structure of the electrolytes. The cathode, anode, and electrolyte composition also regulate the formation of the SEI layer, which should be mechanically strong and flexible to cope with the volume change (expansion/contraction) during charging/discharging.100 To minimize the loss of capacity and ions, the SEI should be made of stable, insoluble, compact compounds to maintain high capacity, since the physical properties of the electrolyte depend on the decomposed SEI components.101 Higher ion diffusivity provides additional stability and conductivity, whereas lower ion diffusivity is associated with enhanced resistivity, i.e., further blocking of ion movements, which results in capacity fading and low longevity/short life time of the battery, thereby controlling the battery performance.102 The low electronic conductivity of the SEI layer reduces the battery performance because low electronic conductivity originates from high internal resistance, lowering of the charge–discharge rate that results in low coulombic efficiency,103 and SEI heterogeneities that lead to reaction heterogeneity, uneven ion distribution, and dendrite growth;104 however, high interfacial energy assists in stable ion deposition and suppresses dendrite formation.105
Understanding the solvation behaviour is hard for lithium-based materials as it is correlated with conductivity, stability, and reactivity. Therefore, to systematically analyse these effects on SEI, we proceeded to classify the data to enable more targeted findings by grouping properties in high and low solvation energies. In our investigation, the classification involved defining two distinct classes based on solvation energy: ideal SEI (having a solvation energy value lower than 50 kcal mol−1) and non-ideal SEI (having a solvation energy above 50 kcal mol−1), divided into two datasets containing 6134 (non-ideal SEI) and 5530 (ideal SEI) candidates, with potential implications for the SEI layer; the distribution is shown in Fig. 10. We focused on both practicability (in terms of accessibility score) and performance (in terms of solvation energy, chemical hardness, electronegativity, and electrophilicity index) to screen potential candidates. Specifically, to highlight the interpretability, we did not screen the molecules solely based on solvation energy but also by chemical hardness, electrophilicity, and synthetic accessibility score. This multiple selection criterion, which includes chemical reactivity parameters, reflects the relevance to electrochemical stability at the molecular level and the applicability of our investigation. It is worth noting that molecules having a high electrophilicity index and low chemical hardness refer to highly reactive species, whereas molecules with low electrophilicity index and high chemical hardness refer to relatively inert or chemically stable species, which are necessary for an ideal SEI. Along with the above parameters, solvation energies that possess lower values are prone to preventing excess SEI layer formation (ideal SEI), and those with higher solvation energy values have non-ideal characteristics to form the SEI. Lower solvation energy-possessing SEI products will be useful in Li-ion battery materials, which are widely recognised for their capability to facilitate the formation of a stable SEI. Conversely, a high solvation energy value may offer ample opportunities for in-depth mechanistic investigations towards excess dendrite formation that degrades battery performance. Therefore, to design the SEI, the above properties should be characterized through structure–property relationships. In Fig. 10, we have displayed the distribution of solvation energies, indicating the frequency of occurrence in our dataset. Most of the values were found between 20 to 50 kcal mol−1 (Fig. 10(a)), whereas most higher values were found between 20 to 25 kcal mol−1. However, for the second class of materials, i.e., the solvation energy greater than 50 kcal mol−1, material distribution is skewed towards higher solvation energies (50–100 kcal mol−1), as depicted in Fig. 11(b).
![]() | ||
| Fig. 11 Distribution of solvation energies of the refined dataset: (a) less than 50 kcal mol−1 and (b) greater than 50 kcal mol−1. | ||
Synthetic accessibility analysis106 provides significant insight into the facile synthesis of the theoretically computed structure, and analysis for its potential synthetic route. It provides an an idea for filtering the structures that can be efficiently synthesized in the laboratory, based on the synthetic accessibility value. The synthetic accessibility value indicates the complexity of synthesis. A lower SA score indicates great feasibility of the structures that can be synthesized experimentally (SA score = fragment score − complexity penalty). The calculation of the SA score is mainly based on the ‘synthesis data’ from one million molecules stored in PubChem.107Fig. 12 shows a 3D plot of the structure index, the SA score with respect to their solvation energy. Candidates with favourable solvation may tend to exhibit high structural complexity along with unfavourable chemical reactivity, which means that if the possibility to synthesize the molecule is low and it is non-reactive, or the possibility to synthesize the molecule is high but it is reactive, then even if the solvation energy is favourable, it will not be treated as a good material. In such cases, the significance comes from detecting the candidates not only with low solvation energy but also with low complexity and suitable chemical reactivity. The optimal solutions chosen are those molecules that are prone to desolvation and are easy to synthesize, less reactive, not very electrophilic, and result in very stable molecules with a high probability of not forming dendrites when they come in contact with electrodes. The non-ideal solutions have very low desolvation probability, along with low reactivity and stability. This methodology ensures a rigorous, computationally efficient approach that balances prediction accuracy with chemical relevance, ultimately guiding the selection of molecules with enhanced screening of ideal and non-ideal SEI molecules.
![]() | ||
| Fig. 12 3D plots of structure index: SA score, chemical hardness, and electrophilicity index with (a) high solvation energy and (b) low solvation energy. | ||
Our method uses features based on the molecular structures to predict the property values. This way, the predictions are directly connected to the structures. We used these predicted values to find the best molecules through Pareto optimization. This ensures a clear link between the molecular structures and the computed values. We used molecular features (as inputs) to accurately predict property values. These predictions reflect the molecular characteristics, allowing us to connect our results back to the molecular level. The Pareto optimization then helps to identify molecules with the best combined properties, bridging the numerical data and molecular-level conclusions.
The SA scores of our considered structures for low solvation energy (below 50 kcal mol−1) lie between 1.00 and 2.089, and for high solvation energies (above 50 kcal mol−1), the SA score lies in the range of 3.994–7.465. Moreover, we are searching for optimal structures that can be easily synthesized, bearing both low108,109 and high solvation energies, and based on the reactivity parameters; however, screening from 6134 and 5530 entries is challenging. Therefore, to obtain optimal structures that possess low solvation energies with low SA scores, are less reactive, not very electrophilic, and become very stable molecules, and reactive molecules with high solvation energies and high SA scores, we implemented multi-objective optimization to filter these structures through the Pareto filter method110 (Fig. 12). We optimized the solvation energy with reactivity parameters along with their respective SA scores and finally got nine optimal values (Table 2) (low solvation energy with low SA score, inert molecules) and ten optimal values (Table 3) (high solvation energy with high SA score, reactive molecules). Fig. 12 shows the 3D scatter plots of all the datapoints having all optimized properties; the multi-coloured dots in the plot refer to all the structures, and the red marks are the Pareto optimal solutions, which have been considered as the potential structures. Among these potential candidates, as shown in Fig. 13, we selected a total of nineteen (nine ideal SEI and ten non-ideal SEI) possible candidates. Tables 2 and 3 display the predicted and actual values of optimized properties of two important input features for solvation energy, namely the dipole moment and heavy atom account. As observed from both tables, the predicted values of chemical hardness, electrophilicity index, SA score, and solvation energy are closely aligned with the actual values, demonstrating our model's robustness, thus supporting our proceeding with the optimization of these predicted properties to find the best candidates.
| Molecules | Chemical hardness | Predicted chemical hardness | Electrophilicity index | Predicted electrophilicity index | Solvation energy | Predicted solvation energy | SAS | Predicted SAS | Dipole moment | Heavy atom count |
|---|---|---|---|---|---|---|---|---|---|---|
| a | 17.639 | 17.530 | 0.290 | 0.299 | 1.590 | 1.595 | 2.089 | 2.260 | 0.005 | 5 |
| b | 11.243 | 11.249 | 0.392 | 0.390 | 1.925 | 1.927 | 1.000 | 1.236 | 0.024 | 5 |
| c | 10.912 | 10.910 | 0.342 | 0.342 | 1.841 | 1.844 | 1.000 | 1.268 | 0.000 | 6 |
| d | 11.469 | 11.470 | 0.266 | 0.270 | 5.690 | 5.677 | 1.014 | 1.219 | 0.021 | 3 |
| e | 10.848 | 10.861 | 0.390 | 0.389 | 1.799 | 1.832 | 1.000 | 1.290 | 0.015 | 7 |
| f | 11.986 | 11.971 | 0.381 | 0.379 | 1.423 | 1.433 | 1.755 | 1.786 | 0.055 | 3 |
| g | 11.784 | 11.771 | 0.374 | 0.374 | 1.757 | 1.750 | 1.606 | 1.773 | 0.000 | 4 |
| h | 11.426 | 11.426 | 0.358 | 0.359 | 2.092 | 2.094 | 1.209 | 1.621 | 0.000 | 6 |
| i | 10.792 | 10.796 | 0.332 | 0.333 | 2.134 | 2.133 | 1.549 | 1.428 | 0.077 | 7 |
| Molecules | Chemical hardness | Predicted chemical hardness | Electrophilicity index | Predicted electrophilicity index | Solvation energy | Predicted solvation energy | SAS | Predicted SAS | Dipole moment | Heavy atom count |
|---|---|---|---|---|---|---|---|---|---|---|
| a | 1.782 | 1.820 | 4.856 | 4.823 | 170.414 | 169.177 | 5.483 | 5.483 | 17.396 | 37 |
| b | 3.993 | 3.995 | 2.481 | 2.482 | 284.010 | 281.501 | 6.561 | 6.562 | 30.609 | 48 |
| c | 4.016 | 4.005 | 2.020 | 2.030 | 311.290 | 300.982 | 5.408 | 5.379 | 23.513 | 37 |
| d | 2.625 | 2.579 | 3.101 | 3.113 | 132.507 | 206.703 | 5.409 | 4.6985 | 27.151 | 26 |
| e | 3.177 | 3.125 | 3.404 | 3.387 | 133.762 | 121.014 | 5.812 | 6.661 | 10.281 | 47 |
| f | 3.252 | 3.253 | 3.443 | 3.439 | 229.074 | 228.776 | 3.994 | 4.002 | 16.311 | 55 |
| g | 3.665 | 3.671 | 2.880 | 2.881 | 123.888 | 122.068 | 7.465 | 7.422 | 3.972 | 50 |
| h | 3.198 | 3.190 | 3.551 | 3.558 | 85.019 | 85.354 | 5.692 | 5.628 | 2.923 | 37 |
| i | 3.891 | 3.959 | 2.072 | 2.074 | 136.440 | 125.233 | 5.924 | 6.947 | 3.374 | 46 |
| j | 3.275 | 3.274 | 3.424 | 3.416 | 85.228 | 85.869 | 5.519 | 5.533 | 3.820 | 36 |
Let us consider molecule ‘a’ of Table 2 that exhibits the predicted solvation energy of 1.595 kcal mol−1, predicted chemical hardness of 17.530, predicted electrophilicity index of 0.299, and predicted SA score of 2.260, suggesting that the ML model captures the underlying relation, effectively having their actual values as follows: solvation energy – 1.590, electrophilicity index 0.290, chemical hardness 17.639, and SAS 2.089. However, other structures (Table 2) exhibit solvation energy values of 1.927, 1.844, 5.677, 1.832, 1.433, 1.750, 2.094, 2.133, 2.492 kcal mol−1, which are very close to their actual values. For one of the reactivity parameters, i.e., chemical hardness, the predicted values fall in the range of 10.796–11.971, which is very close to the actual value range of 10.792–11.986. Similarly, the predicted electrophilicity index from molecule ‘b’ to molecule ‘j’ has values of 0.390, 0.342, 0.270, 0.389, 0.379, 0.374, 0.359, 0.333, 0.389, whose actual values are 0.392, 0.342, 0.266, 0.390, 0.381, 0.374, 0.358, 0.332, 0.388, respectively. The predicted SAS lies in the range of 1.219–2.260, and actual values are in the range of 1.000–2.089. On the other hand, for non-ideal SEI, ten products (Table 3) exhibited higher solvation energy values of 170.414, 284.010, 311.290, 132.507, 133.762, 229.074, 123.888, 85.0189, 136.440, and 85.228 that correspond well with the predicted values of 169.177, 281.501, 300.982, 206.703, 121.014, 228.776, 122.068, 85.354, 125.233, and 85.869, respectively. A comprehensive comparison between the predicted and actual values for chemical hardness, electrophilicity index, and SAS for Table 3 depicts a high degree of agreement across the dataset. For all these properties of ten non-ideal candidates, the prediction closely tracks the actual value with minimal deviation. The significance of the input features associated (Tables 2 and 3) with these optimal solutions provides information about the polarity (high dipole moment) and non-polarity (low dipole moment) of structures of the molecules. Apart from the dipole moment, the contributions of ‘heavy atom count’ in solvation energy are also depicted. Hence, the above discussion provides insight for the experimental validation of these (ideal and non-ideal candidates) solvation energy structures, since reasonable results were obtained. The structure possessing the necessary solvation energy and reactivity criteria should be considered on a priority basis, followed by the difficulty in synthesis. The stable SEI-forming optimal candidates, Fig. 13(A), are seen to be enriched with carbon and fluorine, while the non-ideal set of products, Fig. 13(B), are seen to have nitrogen, sulphur, oxygen, and carbon atoms, which are highly recommendable for material selection.
The dataset is available at the github link ‘https://github.com/Sadhana-barman/Solid-electrolyte-interphase-material/tree/main’.
Estimating Aqueous Solubility Directly from Molecular Structure, J. Chem. Inf. Comput. Sci., 2004, 44, 1000–1005, DOI:10.1021/ci034243x | This journal is © the Owner Societies 2025 |