Nure Alam
Chowdhury
ab,
Leaford Nathan Adebayo
Henderson
ab,
Samin
Yaser
c,
Olusola Pelumi
Oyeku
ab,
Maydenee Maydur
Tresa
ab,
Chandra
Kundu
d and
Jayan
Thomas
*abe
aNanoscience and Technology Center, University of Central Florida, Orlando, FL 32826, USA. E-mail: Jayan.Thomas@ucf.edu
bDepartment of Materials Science and Engineering, University of Central Florida, Orlando, FL 32816, USA
cCollege of Engineering and Computer Science, University of Central Florida, Orlando, FL 32816, USA
dDepartment of Statistics and Data Science, University of Central Florida, Orlando, FL 32816, USA
eCREOL, The College of Optics and Photonics, University of Central Florida, Orlando, FL 32816, USA
First published on 23rd July 2025
Zinc-ion batteries (ZIBs) are considered as a cheaper, non-toxic and safer alternative to lithium-ion batteries (LIBs). Manganese dioxide (MnO2) is one of the most viable cathode materials for aqueous electrolyte based ZIBs. The addition of different dopants in the MnO2 cathode material can significantly change its physical properties and electrochemical performance in ZIBs. In this study, we collected about 603 papers from which we selected 57 ZIB published papers related to doped MnO2 as a cathode material. The dataset consists of a total of eleven features (ten input features and one target) in which six features are related to battery properties and five features are related to the elemental properties of the dopants. The Pearson correlation plot is considered to investigate the correlation between different features, and it is observed that the electronegativity and first-ionization energy of the dopant have a positive relation with discharge capacity (DC). Both classification and regression treatment are applied to our dataset using different machine learning models such as XGBoost, random forest (RF), and K-nearest Neighbors. The RF model can classify DC with an accuracy of 0.72 into three predefined grades. In the regression analysis, the XGBoost model can predict DC with an R2 value of 0.92. Finally, the findings of this study can be utilized to predict the performance of doped MnO2 before synthesizing it in the laboratory.
ZIBs encounter several problems associated with cathode materials. The most promising candidates for cathode materials for ZIBs are manganese-based oxides,15–17 vanadium-based oxides,18–20 Prussian blue analogues,21–23etc. One of the leading candidates among these is manganese dioxide (MnO2) due to its good voltage, high discharge capacity, and ease of synthesis with different crystal structures, viz., α-MnO2,24 β-MnO2,25 γ-MnO2,26 ε-MnO2,27 λ-MnO2,28 and layered structures.29–31 Different crystal structures exhibit different tunnel or layer spacings that can modify the ionic intercalation/de-intercalation behavior of the material and potentially lead to different specific capacities. However, the major problems associated with the MnO2 cathode include low electronic conductivity (10−5 S cm−1 to 10−6 S cm−1),32 and the other is the Jahn–Teller effect.33–35 The structure of MnO2 experiences a phase transition during successive charging/discharging cycles, and with time, this structural change can degrade the MnO2 cathode material.36 Dopants can assist MnO2 in stabilizing its crystal structure for a long time and can improve the performance profile of MnO2. The presence of Mo in γ-MnO2 shows superior rate capabilities and cycling stability, and enhances the diffusion kinetics of ions and electrons.36 When aluminum (Al) is doped into birnessite-type δ-MnO2, it can be seen that Al can prevent a structural collapse by minimizing the growth of microcracks during charging and discharging.37 Experimental observation has confirmed that the doping of magnesium (Mg) in tunnel-type α-MnO2 can minimize the reaction resistance and diffusion, and can improve the ion diffusion coefficient and also boost the stability of the crystal structure.38
Machine learning (ML), which is a modern mathematical framework, can be used to read, understand, and predict the complex internal connection of different data points.39–41 With the development of advanced algorithms, ML models (MLMs) can be used to explore the cryptic relationships of colossal data within a short period of time; however, the exact relationships cannot be discovered using traditional trial–error methods.39–41 MLMs have been considered in designing battery materials, analyzing the chemical compositions of the anode, cathode, and separator, optimizing the battery performance, and predicting the health of the battery before fabrication and testing.42–45 MLMs can suggest new super-ion conductors and can predict the performance of solid electrolytes 109 times faster compared to density functional theory calculations with a mean absolute error of 0.25eV.46 A manually curated dataset has been considered for applying MLMs to predict the initial discharge capacity (DC) and that at end of the 20th cycle.47 In that case, MLMs can predict the initial DC with a value of R2 = 0.53 and that at the end of 20th cycle with R2 = 0.54.47 The existence of different dopants can change the electrochemical properties of the lithium, nickel, cobalt, and manganese in various cathode materials for LIBs.48 MLMs have been applied to study the relation between the DC and the structural and elemental features of different dopants. The gradient boost model can predict the initial (50th cycle) DC with R2 = 0.76 (R2 = 0.64).48 The ML approach can differentiate the impact of different physical properties associated with dopants on metal-oxide based photoelectric materials.49 ML is also used in the field of supercapacitors.50 The multilayered perception and random forest (RF) models have been used to classify the specific capacitance of four pre-defined grades. Recently, four ML classifier models were considered to classify the DC of TiO2 anode materials in the presence of fourteen dopants for LIBs.51 For this purpose, 316 samples and eleven features associated with different published papers were considered. The gradient boosting model achieved an accuracy of 0.79 and a specificity of 0.90 for classification.51
Even though considerable effort has been invested in predicting the effect of dopants in many important LIB cathodes, no attempt has been made to predict the DC of the MnO2 cathode material for ZIBs with different dopants by applying the MLMs. The aim of the present work is to collect data associated with dopants which can improve the performance of the MnO2 cathode material for ZIBs from 57 published papers. We will apply both classification and regression treatment to our dataset by considering different MLMs, viz., RF,51K-nearest neighbors (KNN),52 and XGBoost53 to classify and predict the DC of doped MnO2.
![]() | ||
Fig. 1 The workflow of the current work. Initially, the goal of our article is presented, and then data collection, feature extraction, model training and finally feature importance are discussed. |
Covariates | |||
---|---|---|---|
Elemental features | Publication results | ||
Name | Abbreviation | Name | Abbreviation |
Electronegativity of doped elements | EN | Atom ratio of dopant and Mn | Ratio |
Ionic radius (in pm) of the dopant | IR | Molar mass of the molecule | MMM |
State of the dopant | State | Lowest voltage during charging and discharging (V) | LV |
Number of electrons of the dopant | NED | Highest voltage during charging and discharging (V) | HV |
First ionization energy (in kJ mol−1) | FIE | Current density (A g−1) | Current |
Response variable | |
---|---|
Name | Abbreviation |
Discharge capacity (mAh g−1) | DC |
We employed the Pearson correlation matrix (PCM) to examine relevant associations among features. The PCM quantifies pairwise linear relationships between variables and enables visual interpretation through a heatmap (see Fig. 2). Among the features, there is a moderate relationship that can be observed between the current and DC (i.e., r = −0.5), reporting that the DC decreases with the value of current. This negative correlation observed between DC and current likely reflects the effect of polarization within the battery, a well-known phenomenon noticed predominantly at high charging/discharging rates.54 On the other hand, a weak positive correlation (i.e., r = 0.17) can be observed between EN and DC, manifesting that higher EN is weakly associated with increased DC for ZIBs. Similarly, the weak positive correlation (i.e., r = 0.17) between FIE and DC suggests that higher FIE values may be modestly increasing the DC of the ZIBs. This could be explained via the mechanism of dopant stabilization of the crystal structure of MnO2 described earlier. Due to the increased local charge in the vicinity of the dopant ions, which is strongly correlated with the actual electronegativity of the ion itself, the electrical conductivity of the bulk material is increased, which would be expected to also improve DC.55 This effect is also expected to improve the stability of the crystal structure due to the suppressed Jahn–Teller distortion; so, although battery cycle life was not a feature captured in this study, it may be worth including in a future investigation. Of note, DC is expected to show a similar trend to both EN and FIE, since both are correlated with the attraction exerted on the outer electrons of the dopant ions by the nucleus; hence, dopants with higher electronegativities are harder to ionize and have larger associated FIE.
Strong inter-feature correlation can be observed for different variables, for example, FIE and the EN (i.e., r = 0.74). The ratio of dopant to Mn and MMM has a high positive correlation (i.e., r = 0.72), indicating that the existence of different ratios can determine the MMM. The presence of highly correlated variables in the dataset can provide us with a detailed picture of the interaction of the data points and can guide us to determine which variables are important to change the physical properties of the system. Features with low correlations may not provide essential information for the linear treatment of the dataset but could be important for the nonlinear predictive MLMs. Overall, the correlation matrix can provide a primary valuable insight regarding the feature interdependence and can guide us in incorporating different MLMs. The histogram for different features can be seen in Fig. S2–S4 (ESI†).
We have tested three different models, viz., KNN, XGBoost, and RF on the dataset. Furthermore, we have built regression models to predict the DC values from given features. For regression purposes, we have utilized KNN, RF, and XGBoost. The dataset was randomly shuffled and split: 15% for the holdout test (i.e., 90 samples) and 85% for training (i.e., 510 samples). The training was done following stratified 10-fold cross-validation, while Bayesian optimization56 was used for hyper-parameter tuning for individual models. The dataset and code for this article can be found at the following GitHub link: https://tinyurl.com/396jsa4t.
MLMs | Accuracy | Precision | Recall | Specificity |
---|---|---|---|---|
KNN | 0.69 | 0.68 | 0.68 | 0.68 |
XGBoost | 0.70 | 0.70 | 0.70 | 0.70 |
RF | 0.72 | 0.74 | 0.72 | 0.72 |
XGBoost demonstrates consistent and balanced performance across all metrics, achieving a score of 0.70 for accuracy, precision, recall, and specificity. This uniformity suggests that XGBoost performs equally well in identifying both positive and negative cases, while maintaining a good balance between correctly predicting true positives and avoiding false positives. On the other hand, RF outperforms XGBoost across all metrics, with an accuracy of 0.72, a precision of 0.74, a recall of 0.72, and a specificity of 0.72. These higher scores indicate that RF is more effective overall, correctly predicting a larger proportion of both positive and negative cases. Its slightly higher recall and specificity also suggest better reliability in handling grades. While both models are consistent and reliable, RF offers superior predictive performance, making it the more robust option in this comparison. It may be noted that the confusion matrix and the ROC curve for different MLMs can be seen in Fig. S5 and S6 (ESI†).
MLMs | Train | Test | ||||||
---|---|---|---|---|---|---|---|---|
R 2 | Adjusted R2 | RMSE | MAE | R 2 | Adjusted R2 | RMSE | MAE | |
KNN | 0.97 | 0.97 | 12.99 | 4.27 | 0.84 | 0.82 | 33.70 | 23.91 |
RF | 0.91 | 0.91 | 25.56 | 17.35 | 0.86 | 0.84 | 31.64 | 23.81 |
XGBoost | 0.97 | 0.96 | 15.47 | 9.38 | 0.92 | 0.91 | 24.39 | 16.23 |
The KNN model shows strong performance of the training data, with an R2 of 0.97 and very low error values (a RMSE of 12.99 mAh g−1 and a MAE of 4.27 mAh g−1), indicating an excellent fit. However, its performance drops noticeably on the test set, where R2 decreases to 0.84, and the error values increase significantly (a RMSE of 33.70 mAh g−1 and a MAE of 23.91 mAh g−1). This gap between training and testing results suggests that the KNN model captures noise in the training data and fails to generalize well to unseen data.
When comparing RF and XGBoost, both models show good generalization, but XGBoost clearly performs better overall. While RF achieves an R2 of 0.91 on the training set and 0.86 on the test set, XGBoost posts higher values with an R2 of 0.97 during training and 0.92 in testing. XGBoost also records lower error metrics across the board, especially on the test set, where its RMSE (24.39 mAh g−1) and MAE (16.23 mAh g−1) outperform RF's RMSE (31.64 mAh g−1) and MAE (23.81 mAh g−1). These results indicate that XGBoost not only fits the training data well but also generalizes more effectively, making it the more accurate and reliable model in comparison. From the perspective of ZIB research, these findings associated with different MLMs can be applied to determine the DC of the doped MnO2 according to the different related experimental and elemental features before synthesizing the materials in the laboratory. In this context, XGBoost would be the optimal choice for designing the doped MnO2. The comparison of actual vs. predicted DC (mAh g−1) using three different regression models can be seen in Fig. S7 (ESI†).
EN exhibits a bifurcated effect, where both high and low values influence DC in varying directions, indicating complex interactions. IR shows an intermediate contribution to the decision making of the DC of Zn-ion batteries. In contrast, features like “State” exhibit more clustered SHAP values close to zero, signifying a minimal effect on the output. From a practical standpoint, the XGBoost model effectively learns from the data, handles non-linear effects, and captures meaningful relationships among features, while mitigating over-fitting of the data. The partial-dependence plots in Fig. S8 (ESI†) illustrate its ability to model the non-linear patterns. We calculated the SHAP values using TreeSHAP,57,58 which inherently accounts for feature dependencies as captured by the model's structure and can be interpreted as interventional Shapley values.59 However, it is important to note that TreeSHAP primarily addresses dependencies learned by the model and may still misestimate contributions if strong correlations are not explicitly reflected in the tree splits.60 Overall, this SHAP analysis confirms that the XGBoost model's predictions are driven by features with clear physical and electrochemical significance, enhancing both the model's reliability and interpretability. The outcomes of Fig. 4 are essential to understanding the underlying factors to determine DC and can guide the selection of the features for designing new doped MnO2.
• The electrolyte is to be considered as the bridge between the anode and the cathode. So, in our data collection, we did not consider any information related to the electrolytes. Different electrolytes can make a significant change in the battery performance.
• Some anode materials were not similar. The performance of the anode material can also change the DC.
• The ratio of the conductive additive and binder in the preparation of the electrode can cause a significant change in the battery performance. Therefore, in our current study, accurately reporting these issues remains a challenge.
• We have also not categorized data relative to the phase of MnO2, and even though the cathode material is fixed, it is still challenging to fully report the particle size associated with the cathode material.
• The current collector or substrate is not similar for all the cells.
• At present, no external datasets match our specific feature definitions and experimental protocols. Although our internal validation approach is rigorous and supports confidence in the results, validation using independent datasets in future studies will be essential to confirm and further strengthen the generalizability of our method.
In the implications of MLMs to datasets, it is essential to consider the same environment to collect data. Most of the time, it is very difficult to find common features from different papers. So, to resolve this problem in collecting the data, multiple experiments can be conducted in the same environment and under the same conditions.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5cp01218j |
This journal is © the Owner Societies 2025 |