Open Access Article
Sung-Jin Kim
*a,
Song-Sae Kangb,
Kyong-Nam Paec,
Song-Il Pakb,
Hyon-Il Job and
Ryong-Jin Kima
aFaculty of Materials Science, Kim Il Sung University, Pyongyang 497335, Democratic People's Republic of Korea. E-mail: ksj1223@163.com
bFaculty of Information Engineering, Pyongyang HanTokSu University of Light Industry, Pyongyang 999093, Democratic People's Republic of Korea
cFaculty of Physical Engineering, Kim Chaek University of Technology, Pyongyang 950003, Democratic People's Republic of Korea
First published on 13th April 2026
Efficient recovery of copper from metallurgical waste is essential for sustainable resource utilization. This study develops an interpretable machine learning framework to predict copper leaching efficiency from copper slag under oxidative sulfuric acid conditions. A comprehensive multi-source dataset comprising 465 experimentally reported data points collected from multiple peer-reviewed studies was compiled from peer-reviewed literature. Four algorithms, Random Forest, Support Vector Regression, XGBoost, and LightGBM, were systematically optimized using 10-fold cross-validation. XGBoost demonstrated superior predictive performance with R2 = 0.9794, RMSE = 3.4757, and MAE = 2.3442 on the test set. SHAP-based interpretability analysis revealed that operational parameters, particularly leaching time, acid concentration, and temperature, exert dominant influence over copper extraction, while compositional variables such as Si, S, and Al show limited direct contribution within the investigated dataset range. The nonlinear trends identified are consistent with shrinking-core kinetics and diffusion-controlled mechanisms. External validation using independent literature datasets confirmed robust generalization capability. The proposed framework provides quantitative guidance for process optimization and offers a practical tool for enhancing sustainable metal recovery from metallurgical waste.
Traditional investigations of copper slag leaching primarily rely on controlled laboratory experiments to evaluate the influence of individual process parameters.7–9 While these studies provide valuable mechanistic insights, they often focus on limited experimental ranges and isolated variable effects.10 The inherent multivariable coupling in leaching systems, including reaction–diffusion interactions and compositional heterogeneity, is difficult to fully capture using conventional regression or empirical modeling approaches.11,12 Moreover, inconsistencies in experimental conditions across published studies hinder the development of generalized predictive relationships.
Machine learning (ML) techniques have recently emerged as powerful tools for modeling complex nonlinear systems in metallurgical and hydrometallurgical processes.13–17 Algorithms such as Random Forest, gradient boosting methods, and support vector regression have demonstrated strong predictive capability in various materials-processing applications.11,18,19
Recently, Weng et al. (2025) reported a machine-learning-assisted framework for copper slag acid leaching, identifying Random Forest as the optimal predictive model and demonstrating the feasibility of literature-derived data integration for process optimization.20 Their work represents an important step toward data-driven hydrometallurgy. Building on this foundation, the present study extends previous machine learning applications by employing a larger dataset and an optimized predictive model. A total of 465 data points were compiled from multiple literature sources, substantially larger than the 265 data points used in ref. 20. Based on this dataset, the optimized XGBoost model achieved R2 = 0.9794, RMSE = 3.4757, and MAE = 2.3442 on the test set, compared to R2 = 0.91, RMSE = 7.492, and MAE = 5.681 reported in the same study.
Nevertheless, several aspects remain to be further explored. The dataset size in previous studies remains relatively moderate, and interpretability analysis has largely relied on conventional feature-importance metrics and partial dependence plots.21 More importantly, systematic validation using independent literature sources not involved in model training has not been rigorously implemented. These factors may limit model generalization capability and reduce mechanistic transparency in practical applications.
To address these challenges, the present study develops an expanded and interpretable machine-learning framework for predicting copper leaching efficiency from copper slag under sulfuric acid oxidative conditions. A comprehensive multi-source dataset comprising 465 experimental data points was compiled from peer-reviewed literature, covering broad compositional and operational ranges. Four supervised learning algorithms, including Random Forest, Support Vector Regression, XGBoost, and LightGBM, were systematically optimized using 10-fold cross-validation and comparatively evaluated.
To enhance mechanistic interpretability, SHapley Additive exPlanations (SHAP) were employed to quantify both global and local feature contributions and to elucidate nonlinear interactions among variables. Furthermore, independent datasets obtained from separate literature sources not included in model training were utilized to rigorously assess model robustness and generalization capability.
By integrating large-scale literature data harmonization, advanced ensemble learning, SHAP-based interpretation, and external validation, this work aims to establish a predictive and mechanistically meaningful framework for copper slag hydrometallurgy. The proposed approach not only improves predictive accuracy but also provides actionable insights for process optimization and preliminary industrial design.
To ensure mechanistic consistency, studies involving chloride systems, bioleaching processes, reductive environments, or multi-acid mixed systems were excluded. Only sulfuric acid leaching under oxidative conditions was considered, thereby minimizing variability in reaction pathways and maintaining physicochemical comparability across the compiled data.
After systematic screening and filtering, a total of 465 valid experimental data points were obtained. The final dataset comprises six compositional variables of copper slag (Cu, Fe, Si, S, Zn, and Al contents, wt%) and six operational parameters: particle size (PS, µm), leaching time (LT, min), oxygen pressure (OP, kPa), pulp density (PD, g L−1), temperature (T, °C), and sulfuric acid concentration (AC, mol L−1). The target variable is copper leaching efficiency (LE, %).
Although stirring speed is recognized as an important hydrodynamic parameter affecting external mass transfer and diffusion behavior during hydrometallurgical leaching processes, it was not included as an input variable in the present modeling framework. In heterogeneous leaching systems, agitation influences the thickness of the liquid boundary layer surrounding solid particles and therefore may affect the external mass transfer coefficient. However, in the literature sources used to construct the present dataset, stirring speed was not consistently reported or standardized across studies. Incorporating this variable would therefore have significantly reduced the number of usable data points and weakened the statistical representativeness of the compiled dataset.
Furthermore, most of the experimental studies included in this dataset were conducted under sufficiently vigorous agitation conditions intended to minimize external diffusion limitations and ensure homogeneous suspension of slag particles. Under such conditions, leaching kinetics are generally dominated by intrinsic reaction processes or internal diffusion within product layers rather than by external film diffusion. Consequently, the omission of stirring speed is not expected to fundamentally alter the predictive relationships captured by the machine learning models within the investigated parameter space. Nevertheless, the potential influence of hydrodynamic conditions should be considered when extrapolating the model predictions to industrial-scale reactors or systems operating under different agitation regimes.
In most of the collected studies, particle size was reported as a size interval (e.g., 38–75 µm) rather than a single representative value. To ensure numerical consistency and enable quantitative modeling, the arithmetic mean of each reported size range was adopted as the representative particle size. This treatment assumes an approximately uniform particle size distribution within the specified interval, which is a common approximation in leaching studies. Although this simplification may introduce minor uncertainty associated with intra-range distribution effects, it allows preservation of a large and diverse dataset while maintaining modeling feasibility.
All variables were harmonized to ensure unit consistency. Acid concentrations originally reported in g L−1 were converted into mol L−1 based on molar mass; oxygen pressure values were standardized to kPa; and temperature values were unified in °C. Where necessary, weight percentages were converted into consistent wt% representation. This harmonization procedure minimizes systematic bias arising from inconsistent reporting formats across different literature sources.
Entries with incomplete information for any of the selected variables were excluded to avoid uncertainty introduced by data imputation. Outliers were evaluated using the interquartile range (IQR) method. However, extreme values were retained if they corresponded to experimentally valid conditions reported in the original studies, ensuring that physically meaningful high-temperature or high-pressure conditions were not artificially removed.
Descriptive statistics of the processed dataset are summarized in Table 1, and variable distributions are illustrated in Fig. 1. The dataset exhibits broad distributions in oxygen pressure (21–2000 kPa), temperature (24–200 °C), pulp density (6.93–400 g L−1), and leaching time (5–120 min), indicating substantial variability in experimental conditions. Such diversity enhances the robustness and generalization capability of the developed machine learning models.
| Cu (%) | Fe (%) | Si (%) | S (%) | Zn (%) | Al (%) | PS (µm) | LT (min) | OP (kPa) | PD (g L−1) | T (°C) | AC (mol L−1) | LE (%) | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Count | 465 | 465 | 465 | 465 | 465 | 465 | 465 | 465 | 465 | 465 | 465 | 465 | 465 |
| Mean | 1.18 | 41.47 | 12.42 | 1.88 | 4.06 | 1.91 | 69.01 | 45.41 | 364.81 | 59.63 | 127.93 | 0.99 | 64.40 |
| Std | 0.59 | 4.35 | 2.08 | 5.02 | 2.01 | 0.43 | 35.75 | 26.19 | 383.50 | 49.14 | 69.31 | 0.82 | 23.51 |
| Min | 0.64 | 20.70 | 10.12 | 0.59 | 1.20 | 1.43 | 19.00 | 5.00 | 21.00 | 6.93 | 24.00 | 0.00 | 8.00 |
| 25% | 0.64 | 41.36 | 10.12 | 0.98 | 2.02 | 1.43 | 41.50 | 25.00 | 21.00 | 10.04 | 60.00 | 0.40 | 48.51 |
| 50% | 0.64 | 41.36 | 14.23 | 0.98 | 5.60 | 2.24 | 75.00 | 45.00 | 500.00 | 100.00 | 155.00 | 0.40 | 68.35 |
| 75% | 1.84 | 43.76 | 14.23 | 1.21 | 5.60 | 2.24 | 75.00 | 60.00 | 600.00 | 100.00 | 200.00 | 2.00 | 84.03 |
| Max | 1.84 | 43.76 | 15.37 | 32.7 | 8.9 | 2.63 | 230.00 | 120.00 | 2000.00 | 400.00 | 200.00 | 2.50 | 99.00 |
For model development, the dataset was randomly divided into training (80%) and testing (20%) subsets. Hyperparameter optimization was conducted using 10-fold cross-validation to improve statistical reliability and mitigate overfitting. A fixed random seed was applied to ensure reproducibility of all modeling results. For tree-based models (RF, XGBoost, and LightGBM), raw feature values were directly used without scaling. For the SVR model, input features were standardized using Z-score normalization to enhance numerical stability and convergence performance.
Although the dataset size is moderate compared to large industrial datasets, it represents one of the most comprehensive harmonized collections of copper slag leaching data currently available in open literature.
The compositional variables include Cu, Fe, Si, S, Zn, and Al contents (wt%). These variables reflect the mineralogical and chemical characteristics of the slag matrix. The Cu content represents the primary recoverable metal phase and directly affects the theoretical leaching capacity. Fe content is associated with fayalite and magnetite phases, which influence slag structure and acid consumption. Si content is related to silicate network stability and may affect slag dissolution resistance. Sulfur content can indicate the presence of sulfide phases, potentially altering oxidative leaching behavior. Zn and Al are considered secondary metallic components that may influence solution chemistry and acid consumption during leaching.
The operational parameters include particle size, leaching time, oxygen pressure, pulp density, temperature, and sulfuric acid concentration. Particle size affects the available reactive surface area and internal diffusion distance. Leaching time reflects the progression of reaction kinetics. Oxygen pressure governs oxidative conditions and influences redox reactions, particularly for sulfide-containing phases. Pulp density controls solid–liquid ratio and mass transfer behavior. Temperature affects both reaction rate and diffusion coefficients. Acid concentration determines proton availability and dissolution driving force.
The target variable is copper leaching efficiency (%), defined as the percentage of copper dissolved relative to its initial content in the slag. This metric directly reflects process performance and recovery efficiency.
Although certain variables may exhibit statistical correlations (as illustrated in Fig. 2), no feature elimination was performed prior to model development. Tree-based models inherently handle multicollinearity, and retaining all physically meaningful variables allows preservation of mechanistic interpretability. The feature importance and interaction effects were subsequently analyzed using SHAP to further elucidate variable contributions.
The chosen input–output framework thus integrates intrinsic material characteristics and key thermodynamic-kinetic operating parameters, enabling development of predictive yet physically interpretable machine learning models for copper slag leaching.
In previous studies addressing copper slag leaching and related hydrometallurgical processes, RF has been reported as one of the most suitable models for capturing nonlinear relationships between compositional factors and process variables.20 Its robustness to multicollinearity and noise makes it particularly appropriate for literature-compiled datasets with heterogeneous experimental conditions. In this study, key hyperparameters including the number of trees (n_estimators) and maximum tree depth (max_depth) were optimized.
In recent years, XGBoost has been increasingly adopted in hydrometallurgical and leaching-related studies due to its strong nonlinear learning capability and superior predictive accuracy compared to traditional ensemble models.32–34 Its ability to capture complex interactions among compositional and operational variables makes it particularly suitable for multi-factor copper slag leaching systems. In this work, hyperparameters such as maximum tree depth (max_depth), learning rate (η), and number of estimators (n_estimators) were optimized using 10-fold cross-validation.
All models were implemented using Python-based machine learning libraries. Hyperparameter optimization was performed via grid search combined with 10-fold cross-validation on the training dataset. The final predictive performance was evaluated using the independent testing subset.
By incorporating bagging-based, kernel-based, and boosting-based algorithms, the modeling framework enables systematic performance comparison and ensures robustness of the predictive conclusions.
The coefficient of determination (R2) measures the proportion of variance in the observed data explained by the model and is defined as:
![]() | (1) |
RMSE quantifies the standard deviation of prediction errors and emphasizes large deviations:
![]() | (2) |
MAE represents the average absolute deviation between predicted and observed values:
![]() | (3) |
While R2 evaluates overall goodness-of-fit, RMSE penalizes large errors more heavily, and MAE provides a scale-consistent measure of average prediction deviation. The combined use of these metrics ensures balanced performance assessment.
To mitigate overfitting, hyperparameter optimization was conducted using 10-fold cross-validation on the training dataset. Model performance was subsequently evaluated using an independent testing subset not involved in model training or tuning. This separation ensures an unbiased estimation of predictive capability.
Beyond predictive accuracy, model interpretability was investigated using SHapley Additive exPlanations (SHAP).36 SHAP values are derived from cooperative game theory and quantify the marginal contribution of each feature to the model prediction. Global feature importance was evaluated through mean absolute SHAP values, allowing ranking of influential variables. Additionally, SHAP dependence plots were employed to explore nonlinear relationships and interaction effects between input variables and copper leaching efficiency.
Partial dependence behavior was further analyzed to examine how variations in individual features influence predicted leaching efficiency while averaging out other variables. This approach enables identification of threshold effects, nonlinear transitions, and potential synergistic interactions among operational parameters.
To assess model generalization, additional validation analyses were performed under distinct operating condition subsets. This case-based evaluation examines the robustness of the selected optimal model when applied to different compositional or process regimes, thereby strengthening confidence in its practical applicability.
Through the integration of quantitative performance metrics and interpretable machine learning techniques, the evaluation framework not only identifies the most accurate predictive model but also provides mechanistic insights into the governing factors of copper slag leaching.
The copper content in the slag varies from 0.64% to 1.84%, with a mean value of 1.18% and a relatively moderate standard deviation (0.59%). This indicates that the dataset encompasses both low-grade and relatively enriched copper slags. Iron content exhibits a narrower distribution (20.70–43.76%, mean 41.47%), reflecting the dominance of iron-bearing phases such as fayalite in copper slag matrices. Silicon content ranges from 10.12% to 15.37%, suggesting variability in silicate network structure that may influence dissolution resistance.
Sulfur content presents a comparatively wider spread (0.59–32.7%), although the interquartile range indicates that most samples contain relatively low sulfur levels. This variability may reflect differences in slag production routes and residual sulfide phases. Zinc (1.20–8.9%) and aluminum (1.43–2.63%) contents exhibit moderate dispersion, potentially influencing acid consumption and secondary dissolution reactions.
The violin plots in Fig. 1 reveal that several compositional variables exhibit skewed distributions rather than symmetric normal distributions. Such non-normality supports the use of nonlinear machine learning algorithms capable of handling complex feature-response relationships.
Operational parameters exhibit substantial variability across studies. Particle size ranges from 19 to 230 µm (mean 69.01 µm), reflecting differences in grinding conditions and liberation levels. Leaching time varies from 5 to 120 min (mean 45.41 min), covering both early-stage and near-equilibrium dissolution regimes.
Oxygen pressure displays a particularly broad distribution (21–2000 kPa), with a high standard deviation (383.50 kPa), indicating that both atmospheric and high-pressure oxidative leaching conditions are represented. Temperature ranges from 24 to 200 °C (mean 127.93 °C), spanning mild to hydrothermal regimes. Pulp density varies significantly (6.93–400 g L−1), suggesting diverse solid–liquid interaction intensities across the dataset. Acid concentration covers 0–2.50 mol L−1 (mean 0.99 mol L−1), indicating both dilute and moderately concentrated leaching environments.
The wide distribution ranges of these operational parameters, as visualized in Fig. 1, demonstrate substantial experimental diversity. Such variability enhances the robustness of the predictive modeling framework by exposing algorithms to a wide range of reaction regimes.
Copper leaching efficiency (LE) ranges from 8% to 99%, with a mean value of 64.40% and a standard deviation of 23.51%. The broad distribution of LE indicates that the dataset includes both low-efficiency and near-complete extraction scenarios. This diversity is advantageous for regression modeling, as it prevents bias toward a narrow operational window and improves the ability to learn nonlinear trends.
Spearman correlation coefficients between input variables and copper leaching efficiency are presented in Fig. 2. The nonparametric Spearman method was adopted due to the non-normal distribution characteristics observed in several variables.
Operational parameters such as temperature, acid concentration, and leaching time generally exhibit positive correlations with leaching efficiency, consistent with reaction kinetics and thermodynamic expectations. Oxygen pressure also shows a positive association, reflecting the role of oxidative conditions in promoting copper dissolution.
Conversely, particle size tends to exhibit a negative correlation with leaching efficiency, which is consistent with surface-area-controlled and diffusion-limited dissolution mechanisms. Among compositional variables, copper content shows a positive correlation with LE, whereas high silica content may display weaker or slightly negative correlations due to increased structural stability of silicate matrices.
It is noteworthy that certain input variables exhibit intercorrelations, particularly among operational parameters. However, no extreme multicollinearity was observed that would necessitate feature elimination. Moreover, tree-based models employed in this study are inherently robust to moderate multicollinearity, and retaining all physically meaningful variables preserves interpretability for subsequent SHAP-based analysis.
Overall, the statistical analysis indicates that the compiled dataset covers wide compositional and operational regimes, with substantial variability in both input and output variables. The presence of nonlinear distributions and moderate inter-feature correlations further justifies the application of nonlinear ensemble learning algorithms.
The diversity and scale of the dataset provide a solid statistical foundation for developing generalizable and interpretable predictive models for copper slag leaching efficiency.
![]() | ||
| Fig. 3 Flowchart for model training. Testing, and evaluation. R2-coefficient of determination; RMSE-root mean square error; MAE-mean absolute error. | ||
Among the four evaluated models, XGBoost exhibited the most robust and stable optimization behavior (Fig. 4). The response surface analysis revealed a broad performance plateau across varying combinations of max_depth and n_estimators, indicating limited sensitivity to moderate hyperparameter deviations. This stability reflects the effectiveness of the regularized objective function and second-order gradient optimization in controlling model complexity while preserving nonlinear learning capacity. The relatively smooth performance gradients further suggest enhanced robustness under practical tuning constraints.
In contrast, Random Forest showed diminishing performance gains beyond a threshold number of estimators, consistent with the variance-reduction mechanism inherent to bagging ensembles (Fig. S1). Although competitive in accuracy, its ability to capture complex nonlinear feature interactions remained inferior to boosting-based approaches.
SVR demonstrated pronounced sensitivity to hyperparameter selection (Fig. S2). Large values of C increased variance and overfitting risk, whereas small γ values limited nonlinear representation capability. Despite careful grid optimization, the narrow optimal region reduced robustness across parameter configurations.
LightGBM (Fig. S3) achieved rapid performance improvement with increasing tree complexity due to its leaf-wise growth strategy. However, excessive depth led to diminishing returns and elevated overfitting risk, particularly under limited data conditions, resulting in higher variance compared to XGBoost.
The hyperparameter search ranges and corresponding optimal values obtained through grid-search optimization are summarized in Table S1.
The training times of the four machine learning models under identical hardware and software conditions are summarized in Table S2, demonstrating the superior computational efficiency of LightGBM relative to XGBoost, Random Forest, and SVR.
Performance comparison across optimized models (Fig. 5) confirmed that XGBoost achieved the highest R2 and lowest RMSE and MAE on the test dataset, demonstrating superior predictive accuracy and generalization capability. Considering its consistent performance across evaluation metrics and stable convergence characteristics during optimization, XGBoost was selected as the baseline model for subsequent SHAP-based interpretability analysis. Its structural capacity to model high-order feature interactions further supports this selection for complex multi-factor leaching systems.
The predictive performance of the four optimized models is summarized in Fig. 5. For clarity, Fig. 5a–h compares R2, RMSE, and MAE values across RF, SVR, XGBoost, and LightGBM models.
Among the evaluated algorithms, XGBoost achieved the highest R2 and the lowest RMSE and MAE on the testing dataset, demonstrating superior predictive accuracy and generalization capability. SVR exhibited competitive accuracy but yielded slightly lower R2 and somewhat higher RMSE and MAE values compared to XGBoost. Random Forest, which has been reported in previous studies as a relatively high-performance model for copper slag leaching modeling, showed lower predictive performance than both XGBoost and SVR in this study. LightGBM exhibited comparatively lower predictive performance under the present dataset conditions.
These performance discrepancies can be attributed to the structural characteristics of each algorithm. XGBoost effectively captures complex nonlinear relationships within the data while rigorously controlling overfitting through its regularized objective function and second-order gradient optimization. The balanced bias-variance trade-off achieved by XGBoost contributes to its superior generalization performance. SVR is capable of learning nonlinear patterns via kernel tricks; however, its sensitivity to hyperparameter selection and data noise renders it somewhat less stable compared to XGBoost. Random Forest, a bagging-based ensemble method, excels at reducing variance but may fall short in modeling intricate patterns with the same precision as boosting-based algorithms. Lastly, although LightGBM offers rapid training through its leaf-wise growth strategy, it is prone to overfitting in relatively small or noisy datasets, which may have limited its ability to fully capture the underlying characteristics of the present data.
Given its consistently superior performance across evaluation metrics and stable convergence behavior during hyperparameter optimization, XGBoost was selected as the primary model for subsequent interpretability analysis using SHAP. The selection is further justified by its capacity to handle heterogeneous datasets and capture high-order feature interactions inherent in multi-factor leaching systems.
![]() | ||
| Fig. 6 The SHAP analysis of the XGBoost model for (a) bar chart and (b) density scattering plot of feature importance. | ||
The global importance ranking based on mean absolute SHAP values is shown in Fig. 6a, while the SHAP summary plot is illustrated in Fig. 6b. The results indicate that operational parameters exert a stronger influence on copper leaching efficiency than compositional variables.
Among all input features, leaching time, temperature, and acid concentration emerge as the most influential variables. Oxygen pressure, pulp density and particle size also exhibit substantial contributions. In contrast, compositional variables such as Zn display moderate but comparatively smaller effects.
The dominance of leaching time, temperature, and acid concentration is consistent with reaction kinetics principles, as copper dissolution in sulfuric acid systems is governed by proton availability, temperature-dependent reaction rates, and reaction time progression. The SHAP results quantitatively confirm that process conditions outweigh intrinsic compositional variability within the investigated dataset.
Interestingly, Si, S, and Al contents exhibit zero mean SHAP values in the optimized XGBoost model, indicating minimal direct contribution to the prediction of copper leaching efficiency within the investigated dataset. This observation does not necessarily imply that these elements are chemically irrelevant; rather, it suggests that their effects may be secondary compared to dominant operational parameters such as temperature, acid concentration, and leaching time.
From a metallurgical perspective, the relatively weak contributions of Zn and Al can be rationalized by their typical occurrence and behavior in copper slag systems. Zinc is often present in oxide or spinel-type phases that dissolve relatively readily under acidic oxidative conditions, meaning that its presence does not significantly control the dissolution pathway of copper-bearing phases. Aluminum, on the other hand, is commonly associated with aluminosilicate structures that remain relatively stable in sulfuric acid leaching environments, thereby exerting limited direct influence on copper extraction kinetics.
From a statistical perspective, the compositional ranges of Zn and Al within the compiled dataset are relatively narrow compared with the broader variation of key operational parameters such as temperature, acid concentration, and leaching time. Machine learning interpretability methods such as SHAP therefore tend to assign lower importance to variables exhibiting limited variability or weaker correlations with the target response.
Therefore, the negligible SHAP contribution of Si, S, and Al should be interpreted as an indication of limited predictive relevance within the current dataset scope rather than complete mechanistic insignificance. Future studies incorporating mineralogical phase descriptors or broader compositional variability may further clarify their roles.
The partial dependence of various input variables on copper leaching efficiency is further elucidated through SHAP analysis, as illustrated in Fig. 7. As shown in Fig. 7a, leaching time exerts the most dominant positive influence, with a sharp increase in predicted efficiency observed within the initial 60 minutes of reaction. Beyond approximately 60 minutes, however, the marginal contribution gradually diminishes, exhibiting a saturating trend characteristic of the shrinking-core model. This behavior suggests that as the reaction progresses, increasing diffusion resistance becomes rate-limiting, thereby moderating the kinetic benefits of extended leaching time.
Temperature, acid concentration, and oxygen pressure also demonstrate clear positive correlations with the predicted leaching efficiency (Fig. 7b–d). Temperature exhibits a consistently positive contribution across its entire range, reaffirming its role as a fundamental kinetic driver in the dissolution process. Acid concentration, in contrast, displays a threshold behavior: leaching efficiency increases sharply up to approximately 0.5 mol L−1, beyond which further increases yield diminishing returns. This inflection point indicates a possible shift in the rate-controlling mechanism from surface reaction dominance to diffusion-limited kinetics once sufficient proton availability is secured.
Conversely, pulp density and particle size exhibit negative influences on the predicted leaching efficiency (Fig. 7e–f). Larger particle sizes and higher solid-to-liquid ratios both lead to reduced copper extraction, which aligns with fundamental hydrometallurgical principles concerning reduced specific surface area and impaired mass transfer characteristics.37
It is worth noting that the relative influences of oxygen pressure, pulp density, and particle size are considerably smaller in magnitude compared to those of leaching time, acid concentration, and temperature. This hierarchy of feature importance underscores the predominant role of operational parameters in governing leaching performance.
Among the compositional variables, the contents of Zn, Fe, and Cu in the feed material exhibit comparatively negligible effects on the predicted copper leaching efficiency, as shown in Fig. 7g–i. Their limited contribution relative to the dominant operational parameters suggests that, within the compositional ranges represented in the current dataset, process conditions outweigh intrinsic material variability in determining leaching outcomes.
Based on the SHAP partial dependence analysis (Fig. 7), recommended operating ranges for the primary process parameters were identified, as summarized in SI Table S3.
Overall, the SHAP analysis reveals that copper slag leaching efficiency is governed by a combination of proton-driven surface reactions, temperature-enhanced kinetics, diffusion limitations, and compositional resistance effects. Potential second-order interactions among key operational parameters were implicitly captured by the boosting structure, further supporting the model's capability to represent complex multivariate leaching dynamics. The machine learning model not only achieves high predictive accuracy but also captures mechanistically meaningful nonlinear behaviors consistent with established hydrometallurgical principles.
The verification results are presented in Fig. 8. Predicted copper leaching efficiencies show strong agreement with experimentally reported values across both independent cases. The predicted-observed parity plots demonstrate close clustering around the 1
:
1 line, indicating minimal systematic deviation.
In both validation cases, the model accurately captures the trend of increasing leaching efficiency with rising temperature and acid concentration, as well as the influence of particle size and oxygen pressure. The consistency of prediction accuracy across distinct experimental regimes suggests that the model successfully learned generalized relationships rather than memorizing specific data patterns.
It is particularly noteworthy that the two external datasets encompass operating conditions partially different from the dominant regions of the compiled training dataset. Despite this variation, the model maintains stable predictive behavior, highlighting its ability to extrapolate within physically meaningful parameter ranges.
The strong performance under independent validation can be attributed to several factors. First, the compiled dataset spans wide compositional and operational domains, enhancing model exposure to diverse leaching scenarios. Second, the use of 10-fold cross-validation during hyperparameter optimization effectively mitigates overfitting. Third, the gradient boosting structure of XGBoost enables capture of nonlinear interactions and higher-order feature relationships that govern copper dissolution.
While the model demonstrates promising generalization performance, it should be noted that hydrodynamic conditions such as stirring speed were not included due to incomplete reporting in the literature. Therefore, predictions for systems with substantially different mixing regimes should be interpreted with caution.
Overall, the external validation results confirm that the developed XGBoost model is not merely a high-accuracy fitting tool but a robust and transferable predictive framework for copper slag leaching efficiency. The demonstrated generalization capability supports its potential application in process optimization and preliminary design evaluation in hydrometallurgical operations.
Supplementary information (SI): hyperparameter optimization results (Fig. S1–S3), optimal hyperparameters for XGBoost, Random Forest, LightGBM, and SVR models (Table S1), model training time comparisons (Table S2), and recommended operating ranges for copper slag leaching parameters derived from SHAP partial dependence analysis (Table S3). See DOI: https://doi.org/10.1039/d6ra01571a.
| This journal is © The Royal Society of Chemistry 2026 |