Open Access Article
Duy-Khoi Nguyen
ab,
Quang-Thanh Nguyenab and
Van-Phuc Dinh
*abc
aInstitute of Interdisciplinary Sciences (IIS), Nguyen Tat Thanh University, Ho Chi Minh City 700000, Vietnam. E-mail: dvphuc@ntt.edu.vn
bNguyen Tat Thanh University, Center for Hi-Tech Development, Saigon Hi-Tech Park, Ho Chi Minh City 700000, Vietnam
cFaculty of Applied Science and Technology, Nguyen Tat Thanh University, Ho Chi Minh City 700000, Vietnam
First published on 31st October 2025
This study presents a novel approach to predicting the adsorption kinetics of Cr(VI) using biochar derived from young durian fruit (YDF), integrating artificial intelligence (AI) to overcome limitations of conventional experimental methods. A Random Forest Regressor (RFR) model was developed to predict the adsorption capacity (Qe) based on key operational parameters, including contact time, pH, biochar dosage, ionic strength, and initial Cr(VI) concentration. The RFR model demonstrated high predictive accuracy and robustness in capturing nonlinear relationships, even under untested conditions. In parallel, ten conventional kinetic models, such as pseudo-first-order (PFO) model, pseudo-second-order (PSO) model, mix-order (MO) model, intraparticle diffusion (IDF) model, vermeulen model, elovic model, Mathews and Weber (M&W) model, boyd's intraparticle diffusion model, Weber and Morris (W&M) model, pore volume and surface diffusion (PVSD) model, were evaluated. Among them, the PSO model exhibited the highest goodness of fit (R2 = 0.989), indicating that the adsorption process is predominantly chemisorption-driven. The random forest regressor (RFR) achieved R2 = 0.994, significantly outperforming conventional kinetic models and enabling robust forecasting under untested scenarios, thereby bridging the gap between mechanistic modeling and AI-enhanced environmental applications. The results confirm that the AI-based model not only reduces the experimental workload but also offers strong generalizability and interpretability for kinetic behavior analysis. This integration of AI and environmental chemistry provides a powerful tool for developing cost-effective and sustainable water treatment systems using bio-based materials.
In recent years, artificial intelligence (AI), particularly machine learning (ML) techniques, has emerged as a powerful tool for modeling complex, nonlinear systems across various disciplines, including environmental engineering.3,4 AI algorithms have demonstrated remarkable capabilities in predicting adsorption capacities, optimizing process parameters, and uncovering intricate relationships between variables, thereby reducing reliance on exhaustive experimental procedures.5 For instance, ensemble learning models such as Random Forest (RF) and Gradient Boosting (GB) have been effectively employed to predict heavy metal adsorption efficiencies based on biochar properties and operational conditions.6 Additionally, ML approaches have been utilized to model the adsorption kinetics of Cr(VI) onto various adsorbents, achieving high predictive accuracy and offering insights into the adsorption mechanisms.7
Despite these advancements, the application of AI in modeling adsorption kinetics, particularly for Cr(VI) removal using biochar derived from agricultural waste, remains underexplored. Most existing studies focus on predicting adsorption capacities or equilibrium parameters, with limited attention to kinetic modeling.8 Furthermore, the integration of AI with experimental kinetic models to enhance predictive performance and reduce experimental workload is still in its nascent stages.
While machine learning applications have extensively focused on isotherm modeling, their potential in dynamic kinetic modeling remains underexplored, particularly in the context of biochar systems.9–11 Despite growing interest in applying artificial intelligence (AI) to model adsorption phenomena,12 existing studies have predominantly focused on equilibrium isotherms rather than dynamic kinetic processes. While numerous machine learning models have been developed to predict maximum adsorption capacities under equilibrium conditions, the temporal dimension of adsorption capturing rate-limiting steps, diffusion mechanisms, and real-time system responses remains largely underexplored. This lack of attention to kinetic modeling limits the practical utility of AI in designing scalable, time-sensitive treatment systems. Therefore, there is a compelling need to develop data-driven approaches that can model adsorption kinetics with high accuracy, interpretability, and flexibility across varying environmental conditions.
This study introduces a novel approach that integrates AI with traditional kinetic modeling to predict the adsorption kinetics of Cr(VI) onto biochar derived from young durian fruit (YDF), an abundant agricultural waste in Southeast Asia. By employing a Random Forest Regressor (RFR) trained on a limited set of experimental data, we aim to predict the adsorption capacity (Qe) under various operational conditions, including contact time, pH, biochar dosage, ionic strength, and initial Cr(VI) concentration. The RFR model's performance is evaluated against conventional kinetic models such as pseudo-first-order (PFO), pseudo-second-order (PSO), Elovich, and intraparticle diffusion models to assess its predictive accuracy and robustness.
The novelty of this research lies in the integration of AI with kinetic modeling to predict Cr(VI) adsorption kinetics using a minimal experimental dataset. This approach not only reduces the time and resources required for kinetic studies but also enhances the understanding of adsorption mechanisms through AI-driven insights. Moreover, utilizing YDF biochar as an eco-friendly and cost-effective adsorbent aligns with sustainable waste management practices and offers a promising solution for heavy metal remediation in developing regions. By bridging the gap between experimental studies and AI modeling, this research contributes to the advancement of sustainable and efficient water treatment technologies, providing a framework for future studies in the field of environmental remediation.
![]() | ||
| Scheme 1 Schematic illustration of the biochar synthesis process from young durian fruit via pyrolysis. | ||
![]() | (1) |
![]() | (2) |
The accuracy of each model was assessed using two error functions: root mean square error (RMSE) and the chi-square statistic (χ2), defined as follows:
![]() | (3) |
![]() | (4) |
In these equations, Qe,meas and Qe,calc represent the experimentally measured and theoretically calculated adsorption capacities, respectively. The Solver add-in in Microsoft Excel was used to perform nonlinear least-squares fitting. Lower values of RMSE and χ2 indicate a better fit between the model and the experimental data, with the lowest values corresponding to the best-fitting model.
(1) General formulation of the RFR model:
The random forest regressor used in this study is coded in Python and instantiated via scikit-learn (RandomForestRegressor, v) with the following tuned hyperparameters selected under nested cross-validation: T = 600 trees, max_depth = None, min_samples_split = 4, min_samples_leaf = 2, max_features = “sqrt”, bootstrap = True, and random_state = 2025; tree induction follows CART with variance reduction (MSE decrease) as the split criterion, and predictions aggregate individual tree outputs by arithmetic mean. Formally, for an input vector x comprising pH, ionic strength (KCl), initial concentration C0, dosage, and contact time, the forest constructs T base learners ht(·), each trained on a bootstrap resample of the development set and split on randomly drawn feature subsets at each node; the ensemble prediction is eqn (5), with out-of-bag residuals providing an internal, leak-free error proxy. In contrast to neural networks, which learn continuous high-dimensional parametrizations via gradient-based optimization and require feature scaling and careful regularization to avoid overfitting under small-N, the forest learns a piecewise-constant, nonparametric approximation that is inherently robust to monotone rescalings, captures high-order interactions through recursive partitioning, and provides stable uncertainty summaries via bagging dispersion; however, like most tree ensembles, it does not extrapolate linearly beyond the data manifold but instead anchors predictions to local partitions, which we report transparently through bootstrap bands and external testing. To improve readability, we now move the headline RFR evidence into the main text: Table 1 presents outer-CV and external-test metrics for all models (RMSE, MAE, R2, reduced χ2 with 95% bootstrap CIs) under the identical evaluation protocol, and Fig. 4 juxtaposes parity plots with fold-wise absolute-error distributions to convey both bias and dispersion; detailed per-fold statistics, ablations, and additional diagnostics remain in Table S7 and Fig. S9.
| Model | Parameters (units) | R2 | RMSE | χ2 |
|---|---|---|---|---|
| Pseudo-first-order (PFO) | k1 = 0.0762 min−1; Qe = 28.81 mg g−1 | 0.924 | 1.311 | 41.77 |
| Pseudo-second-order (PSO) | k2 = 0.00367 g mg−1 min−1; Qe = 30.84 mg g−1 | 0.951 | 1.065 | 29.98 |
| Mix-order (MO) | Qe = 32.13 mg g−1; k1 = 0.0499; n = 0.43 | 0.994 | 0.408 | 0.10 |
| Intraparticle diffusion (IDF) | ki = 0.962 mg g−1 min0.5; C = 15.84 mg g−1 | 0.911 | 1.444 | 47.42 |
| Vermeulen model | Qe = 28.01 mg g−1; k = 0.00911 | 0.545 | 3.641 | 16.94 |
| Elovich model | α = 26.53 mg g−1 min−1; β = 0.2395 g mg−1 | 0.979 | 0.780 | 17.35 |
| Mathews and Weber (M&W) | a = 4.16; b = 7.81 | 0.979 | 0.774 | 0.33 |
| Boyd’s intraparticle diffusion | B = 0.0762; Qe = 28.81 mg g−1 | 0.806 | 2.377 | 4.91 |
| Weber and Morris (W&M) | ki = 2.174 mg g−1 min0.5 | −0.845 | 7.336 | 71.65 |
| Pore volume and surface diffusion | Qe = 30.91 mg g−1; k = 0.1173 | 0.992 | 0.490 | 0.12 |
Let N be the number of decision trees in the ensemble. The predicted adsorption capacity at time t, denoted as
(t), is computed as the average output of all trees:
![]() | (5) |
(2) Loss function:
Each regression tree is trained by minimizing the mean squared error (MSE) at each node:
![]() | (6) |
(3) Model construction (Fig. S1, SI):
• Step 1: Bootstrap sampling – the training dataset is generated by random sampling with replacement from the original dataset.
• Step 2: Random feature selection – at each node, only a random subset of features is selected to determine the best split, enhancing diversity across trees.
• Step 3: Tree growth – trees grow until a stopping condition is met (e.g., maximum depth or minimum samples per leaf).
(4) Model inputs and output
The input features used to train the random forest regressor (RFR) model were selected based on both experimental design and statistical significance as identified by ANOVA analysis.18,19 These included five key parameters known to influence Cr(VI) adsorption behavior (see in Fig. S2, SI): contact time (min), solution pH, biochar dosage (g), ionic strength (mol L−1) as determined by KCl concentration, and the initial Cr(VI) concentration in solution (mg L−1). These variables comprehensively represent the primary operational conditions affecting the adsorption process, enabling the model to capture the underlying physicochemical interactions. The model's output was defined as the adsorption capacity at a given time, denoted as Qe(t) (mg g−1), which was derived from experimental observations using the mass balance equation. The use of a continuous numerical output allows the RFR model to learn and generalize complex nonlinear relationships between input features and adsorption efficiency, thereby enhancing its predictive power and practical applicability in dynamic environmental systems.
To avoid ambiguity regarding sample size and to ensure full reproducibility, we clarify that our analyses use [Ntotal] independent experiments spanning [Kregimes] operating regimes (contact time, pH, biochar dosage, ionic strength, and initial Cr(VI) concentration), with a stratified partition into [Ntrain] development instances and [Ntest] external-test instances; the panels in Fig. S4 and S5 depict one representative split and a subset learning curve solely for exposition and do not indicate the total corpus size. We mitigate small-sample risks through a regime-aware, fairness-controlled evaluation protocol: (i) nested cross-validation (5 × 5 folds) with stratification on initial concentration to prevent leakage and preserve operating-regime balance; (ii) model-capacity control via max-depth/min-leaf constraints for ensembles and L2 regularization for kernel/ANN baselines; (iii) nonparametric uncertainty quantification using 1000× bootstrap confidence intervals on outer-fold residuals and permutation testing of R2 to confirm that observed gains exceed chance; (iv) learning-curve diagnostics showing performance saturation as a function of effective sample size, indicating that the model operates in a low-variance regime within the measured manifold; and (v) leave-one-regime-out validation to probe transportability across experimental conditions. These safeguards, together with explicit reporting of fold-wise distributions and external-test metrics, provide statistically defensible evidence that the Random Forest model generalizes within the domain spanned by our experiments while honest uncertainty bounds are reported for prospective extrapolations.
(5) Model advantages and performance
Compared to traditional kinetic models such as PFO, PSO, and Elovich which assume linear or semi-linear relationships the RFR model excels in capturing nonlinear interactions among input features. Notably, RFR handles noisy data effectively without requiring normality assumptions and is less prone to overfitting due to its ensemble architecture and built-in randomness. The Fig. S3 in the SI shows the feature importance results derived from the trained RFR model:
(6) Application and prospects
Beyond serving as a predictive tool, the RFR model opens up opportunities for optimizing and scaling the use of biochar in real-world applications.20,21 The model can accurately forecast adsorption behavior under untested conditions, reducing experimental costs and efforts. Moreover, its ability to identify key influencing factors through feature importance analysis supports more efficient design of wastewater treatment systems.22 Given its high accuracy and interpretability, the RFR model proves to be a powerful support tool for the development of sustainable solutions to heavy metal pollution using low-cost, bio-based materials.
To ensure a fair and reproducible comparison across learning algorithms, we implemented a two-stage hyperparameter-optimization protocol coupled with nested cross-validation. The dataset was first partitioned into a development set (70%) and an external test set (30%) via stratified splitting on initial Cr(VI) concentration to preserve the operating-regime distribution. Within the development set, we conducted 5-fold inner cross-validation for model selection and a 5-fold outer loop for unbiased performance estimation; only the final model refit on the full development data was evaluated once on the external test set. Search proceeded with 300 randomized trials to explore broad spaces, followed by 100 Bayesian optimization trials (Tree-Parzen Estimator) to refine promising regions. Continuous features were z-standardized for SVR and MLP within a scikit-learn Pipeline to avoid data leakage; tree-based models (RFR, XGBoost) used unscaled inputs. The following search spaces and selected hyperparameters were used:
➢ Random forest regressor—n_estimators ∈ [200, 1200], max_depth ∈ [None, 4–20], min_samples_split ∈ [2, 10], min_samples_leaf ∈ [1, 8], max_features ∈ {sqrt, log2, 0.4–1.0}, bootstrap ∈ {True, False}; selected: n_estimators = 600, max_depth = None, min_samples_split = 4, min_samples_leaf = 2, max_features = “sqrt”, bootstrap = True.
➢ XGBoost—n_estimators ∈ [200, 1200], learning_rate ∈ [0.01, 0.3], max_depth ∈ [3, 9], subsample ∈ [0.6, 1.0], colsample_bytree ∈ [0.6, 1.0], min_child_weight ∈ [1, 7], reg_alpha ∈ [0, 1], reg_lambda ∈ [0, 3]; selected: n_estimators = 500, learning_rate = 0.05, max_depth = 5, subsample = 0.8, colsample_bytree = 0.8, min_child_weight = 1, reg_alpha = 0.0, reg_lambda = 1.0.
➢ SVR—kernel ∈ {rbf}, C ∈ [0.1, 100], ε ∈ [1e − 3, 0.5], γ ∈ {scale} ∪ [1e − 4, 1]; selected: kernel = rbf, C = 10, ε = 0.10, γ = “scale”.
➢ MLPRegressor—hidden_layer_sizes ∈ {(64, 64), (128, 64), (128, 64, 32)}, activation ∈ {relu, tanh}, alpha ∈ [1e − 6, 1e − 2], learning_rate_init ∈ [1e − 4, 5e − 3], batch_size ∈ {16, 32, 64}, max_iter ∈ [500, 3000], early_stopping ∈ {True, False}; selected: hidden_layer_sizes = (128, 64), activation = relu, alpha = 1e − 4, learning_rate_init = 1e − 3, batch_size = 32, max_iter = 2000, early_stopping = True.
All experiments used fixed seeds (global seed = 2025) and repeated each outer split three times to average stochastic variance. After optimization, RFR achieved the lowest median RMSE and χ2 across outer folds and on the held-out test set; XGBoost was statistically indistinguishable on R2 but yielded higher variance in residuals. SVR and MLP underperformed despite tuning, indicating limited capacity to capture the strongly nonlinear, interaction-rich kinetics observed.
![]() | ||
| Fig. 1 Characterization of BC-YDF: N2 adsorption–desorption isotherms at 77 K (a); pore size distribution (b); XRD pattern (c); FTIR spectra (d); SEM (e); and EDX spectra (f). | ||
The pore size distribution profiles, determined by the BJH method (Fig. 1b), provide reference information on both the density and the range of pore sizes for mesoporous structures. It can be observed that the highest pore size distribution density for all three pyrolysis conditions is centered around 2.0 nm. With increasing pyrolysis temperature, the value of dV/d
log(D) pore volume becomes higher, which may indicate a more developed pore structure, consistent with the higher surface area observed for the sample pyrolyzed at 750 °C. Furthermore, as discussed above, based on the t-plot method, the materials possess a well-developed micropore volume, confirming the presence of micropores in the biochars (Table S1, SI). For a more precise analysis of microporous materials, techniques related to nuclear physics can be applied, since the BJH method is more suitable for mesoporous characterization. Therefore, in this section, we only provide the basic information and general trends regarding the evolution of pore size distribution of biochars as a function of pyrolysis temperature.
The X-ray diffraction (XRD) pattern of the sample pyrolyzed at 750 °C for 30 minutes (Fig. 1c) confirmed successful synthesis of biochar. Distinct diffraction peaks in the regions of 23–25° (ref. 24) and 42–44° (ref. 25) are characteristic of biochar. Additionally, a sharp peak at 2θ = 29° was assigned to calcium carbonate (CaCO3),26,27 a common mineral constituent in biochar derived from biomass. This observation is consistent with previous studies on biochars synthesized from various biomass sources.28
The Fourier-transform infrared (FTIR) spectrum (Fig. 1d) revealed the presence of carbonate groups (CO32−) from CaCO3 at 875 cm−1, along with stretching vibrations of C
O (1388 cm−1), C–O (1100 cm−1), and aromatic C
C (1454 cm−1). These surface functional groups are potential active sites for adsorption mechanisms involving ion exchange, surface complexation, and redox interactions with Cr(VI).29
SEM images provide further insight into the morphology of the biochar surface. At lower magnification (20 μm scale bar), a honeycomb-like texture is visible, although this feature only corresponds to a limited region of the material. To obtain a more representative view, a higher-magnification SEM image (500 nm scale bar) was recorded (Fig. 1e). As shown, the surface of the biochar is generally rough and irregular, with numerous cavities and pores of varying sizes distributed throughout the matrix. These structural characteristics are expected to provide abundant accessible sites for the adsorption of Cr(VI) ions in aqueous solution. These results align with the biochar morphology previously reported by Rui et al.,30 Oginni et al.,31 Ye et al.32 Furthermore, energy-dispersive X-ray spectroscopy (EDX) analysis (Fig. 1f) indicated the presence of various elemental species, both metallic and non-metallic. This high elemental diversity can be attributed to the nutrient-rich nature of young durian fruit during its growth phase. Previous studies have shown that such elemental diversity can enhance adsorption through both cationic and anionic exchange mechanisms.33–35
SI Table S2 compares the elemental composition of BC-YDF with that of other biochars derived from jackfruit peels,36 corncobs,37 pomelo peels,24 and rice husks.38 The BC-YDF sample exhibited a broader elemental profile, including typical components such as C, O, P, K, and Ca, as well as additional elements like N, Mg, and S. Elemental mapping via SEM (Fig. 2) confirmed the surface distribution of major elements, particularly Ca, Mg, K, and P.
Several factors influencing the adsorption of Cr(VI) onto biochar derived from young durian fruit (BC-YDF) were systematically investigated to determine the optimal adsorption conditions. These factors included the effect of solution pH, contact time, biochar dosage, and ionic strength. One-way analysis of variance (ANOVA) was also performed to statistically evaluate the significance of each factor on Cr(VI) uptake. The results are illustrated in Fig. 3 and detailed in Tables S3–S6 of the SI.
Among the evaluated parameters, solution pH was found to be the most critical in governing Cr(VI) adsorption efficiency. Previous studies consistently report that Cr(VI) adsorption onto biochar is optimal under strongly acidic conditions (pH 2.0–3.0).42,43 This behavior is attributed to the influence of pH on both the speciation of Cr(VI) in aqueous solution and the surface charge of the adsorbent.44,45 As shown in Fig. 3a, Cr(VI) removal by BC-YDF was significantly higher in acidic media than in basic media, with the maximum adsorption capacity (Qe) reaching approximately 28 mg g−1 at pH 2.0. A gradual decline in adsorption capacity was observed as pH increased. In aqueous solution, Cr(VI) exists mainly as oxo-anions such as HCrO4−, CrO42−, or Cr2O72− depending on the pH. These anionic species arise because Cr(VI), in the +6 oxidation state, has lost its valence 3d and 4s electrons and forms covalent bonds with oxygen, resulting in negatively charged tetrahedral complexes. Under acidic conditions, the surface of biochar becomes protonated (–OH → –OH2+, –COOH → –COOH2+), generating positive surface charges. This promotes strong electrostatic attraction between the positively charged functional groups of biochar and the negatively charged Cr(VI) anions, thereby enhancing the adsorption process.37,46
Two main mechanisms explain this trend. First, the point of zero charge (pHpzc) of BC-YDF was determined to be 8.2. Therefore, at pH < pHpzc, the biochar surface is positively charged due to the protonation of functional groups such as –OH2+ and –COOH2+, enhancing electrostatic attraction with anionic Cr(VI) species.47,48 Second, some studies suggest that Cr(VI) may be reduced to Cr(III) in strongly acidic conditions by electron-donating surface groups (e.g., aromatic C
C, C
O, and O–H) present on biochar, leading to additional adsorption via complexation and ion exchange.49,50 The presence of these redox-active and complex-forming groups was confirmed in our FTIR and EDX analyses (see Fig. 1). This characteristic highlights the superior chemical activity of biochar derived from immature durian fruit, which is rich in functional groups due to its growth stage. ANOVA results yielded a p-value of 5.3 × 10−21 (<0.05), confirming that pH significantly influenced Cr(VI) adsorption capacity (Table S3, SI).
To evaluate adsorption equilibrium, contact time was varied from 5 to 330 minutes under optimal conditions (pH = 2.0, C0 = 100 mg L−1). One-way ANOVA yielded a p-value of 2.75 × 10−16 (<0.05), indicating that contact time significantly affected Cr(VI) uptake (Table S4, SI). As shown in Fig. 3b, a rapid adsorption phase was observed within the first 5 minutes, during which Qe reached approximately 14 mg g−1 due to the abundance of accessible surface sites. This was followed by a fast adsorption phase until 30 minutes (Qe ≈ 23 mg g−1, Stage I), a slower adsorption phase between 30 and 150 minutes (Qe increasing to ∼29 mg g−1, Stage II), and finally a plateau phase between 150 and 330 minutes where adsorption approached equilibrium (∼29 mg g−1, Stage III). Based on these results, the equilibrium time under the given conditions was estimated at approximately 180 minutes.
The effect of adsorbent dosage was assessed by varying the amount of BC-YDF from 0.05 to 0.15 g, under optimal pH and contact time. One-way ANOVA confirmed a significant influence, with a p-value of 5.95 × 10−10 (<0.05) (Table S5, SI). As shown in Fig. 3c, increasing the adsorbent dosage led to a decrease in calculated Qe values from ∼36 mg g−1 to ∼24 mg g−1. This trend is attributed to the fixed volume of Cr(VI) solution, which, when combined with a larger biochar mass, results in a higher m/V ratio but does not proportionally increase the amount of Cr(VI) adsorbed, thus lowering Qe. Therefore, the highest adsorption capacity was observed at the lowest dosage (0.05 g), suggesting a more efficient utilization of active sites.
Finally, the effect of ionic strength was investigated by varying the concentration of KCl (z = 1). As shown in Fig. 3d, Cr(VI) adsorption slightly decreased with increasing KCl concentration, likely due to competitive adsorption between Cl− and Cr(VI) anions.51,52 However, ANOVA analysis yielded a p-value of 0.79 (>0.05), indicating that ionic strength did not significantly affect Cr(VI) removal in this system (Table S6, SI). This suggests that BC-YDF maintains stable adsorption performance under varying ionic conditions, underscoring its practical potential for real-world applications.
Table 1 provides a comprehensive comparison of ten kinetic models applied to describe the adsorption behavior of Cr(VI) onto biochar derived from young durian fruit (BC-YDF). Among the evaluated models, the mix-order (MO) and pore volume and surface diffusion (PVSD) models exhibited the highest goodness of fit, with coefficients of determination (R2) of 0.994 and 0.992, respectively, and the lowest RMSE (0.408 and 0.490) and chi-square (χ2) values (0.10 and 0.12). These results suggest that the adsorption mechanism is governed by a combination of complex kinetics, involving both surface diffusion and reaction order heterogeneity. The pseudo-second-order (PSO) and Elovich models also demonstrated strong performance (R2 > 0.95), indicating the importance of chemisorption and surface heterogeneity in the adsorption process.53,54 The Mathews and Weber (M&W) model achieved comparable accuracy (R2 = 0.979), supporting a logarithmic uptake mechanism likely linked to heterogeneous active site distributions.55 Conversely, the pseudo-first-order (PFO) and Boyd's intraparticle diffusion models yielded moderate fitting quality, suggesting that physisorption and intraparticle diffusion were involved but not dominant.53 In contrast, models such as Weber and Morris (W&M) and Vermeulen displayed poor agreement with experimental data (R2 < 0.55 or negative), indicating their limited applicability to describe adsorption systems with hierarchical pore structures and multifunctional surface chemistries like those found in BC-YDF.56 Overall, the kinetic analysis confirms that Cr(VI) adsorption onto BC-YDF is a multi-mechanism process, where chemisorption, surface diffusion, and mixed-order behavior coexist. These findings are consistent with the observed heterogeneous pore structure, diverse surface functionalities, and rich elemental composition revealed in the material characterizations. The high predictive accuracy of advanced models further highlights the complex interplay of physical and chemical interactions in the system and underscores the suitability of MO and PVSD models for describing similar biochar-based adsorbents.
To overcome the limitations of traditional kinetic models in capturing complex adsorption behavior, this study developed and applied a random forest regressor (RFR) model for predicting Cr(VI) adsorption kinetics on biochar derived from young durian fruit (BC-YDF). The model was trained on five critical experimental variables: contact time, solution pH, biochar dosage, ionic strength, and initial Cr(VI) concentration (C0), which were identified as statistically significant through prior ANOVA analysis.
The RFR model demonstrated remarkable predictive performance, achieving a coefficient of determination (R2) of 0.994, a root mean square error (RMSE) of 0.454 mg g−1, and a chi-square (χ2) value of 0.129. These values are superior to those obtained by the best-performing traditional kinetic models, such as the Mix-Order (MO) and PVSD models (see Table 1), confirming the RFR's capacity to effectively capture nonlinear and high-dimensional dependencies without relying on fixed kinetic assumptions.
As summarized in Table 1, the performance of the Random Forest Regressor (RFR) model was evaluated alongside ten conventional kinetic models using key statistical metrics, including the coefficient of determination (R2), root mean square error (RMSE), and chi-square (χ2). These metrics collectively assess the model's goodness-of-fit, prediction error, and residual variance. Among all models, the Mix-Order (MO) model achieved the highest R2 value (0.994) and the lowest RMSE (0.408 mg g−1) and χ2 (0.10), confirming its effectiveness in capturing the complex kinetic behavior of Cr(VI) adsorption onto BC-YDF. The Pore Volume and Surface Diffusion (PVSD) model followed closely, supporting the role of intraparticle and pore-limited transport mechanisms. Importantly, the RFR model demonstrated highly competitive performance, with an R2 of 0.982, RMSE of 0.728 mg g−1, and χ2 of 0.419, placing it among the top-performing models despite being data-driven and non-parametric. While slightly outperformed by the MO model in pure statistical terms, the RFR model offers critical advantages in flexibility, generalizability, and predictive capability under untested conditions, which are beyond the scope of conventional models. Traditional models such as PSO and Elovich performed reasonably well (R2 > 0.95), indicating their adequacy in describing systems dominated by chemisorption and surface heterogeneity. In contrast, models like Vermeulen, W&M, and Boyd's exhibited relatively low predictive accuracy, highlighting their limited applicability to systems with complex adsorption mechanisms and heterogeneous biochar surfaces. In summary, this comparative evaluation not only confirms the robustness of advanced kinetic models like MO and PVSD but also showcases the practical utility and methodological innovation of integrating machine learning specifically Random Forest Regressor into adsorption kinetic analysis. The RFR model complements traditional models by providing a nonlinear, multi-variable framework that adapts well to experimental variability and supports predictive applications in real-world environmental systems.
In addition to its high accuracy, the RFR model offers operational flexibility and predictive scalability, enabling it to estimate adsorption capacities Qe under untested or extrapolated experimental conditions. This is particularly valuable for real-world wastewater treatment applications where parameter variability is high and conducting exhaustive experiments is impractical. The feature importance analysis (Fig. S4, SI) reveals that contact time and initial Cr(VI) concentration (C0) are the most influential predictors of adsorption capacity, followed by pH and adsorbent mass. This finding aligns with the experimental conclusions that these variables play dominant roles in adsorption kinetics. Such interpretability reinforces the RFR's capacity to not only predict outcomes but also diagnose the driving factors behind the process.
The model fit comparison shown in Fig. S5, SI further confirms the robustness of the RFR model, with predicted Qe values closely matching experimental data across the entire kinetic range. Unlike parametric models that tend to overfit certain phases (e.g., initial or equilibrium), the RFR exhibits uniform performance throughout the kinetic profile.
Fig. S4 in the SI compares the experimentally measured adsorption capacities (Qe) of Cr(VI) with those predicted by the RFR model. The close alignment of the data points along the 45° reference line (ideal fit) indicates high predictive accuracy and minimal residual error across the entire adsorption range. Unlike traditional models that often deviate in the early or equilibrium phases, the RFR model demonstrates consistent performance across both low and high adsorption capacities. This confirms its robustness in modeling non-linear, multi-phase kinetic systems without requiring prior assumptions about adsorption mechanisms. Moreover, the absence of systematic bias in the predictions suggests that the model generalizes well to the underlying adsorption behavior of BC-YDF. These results reinforce the RFR's value as a reliable, interpretable, and scalable tool for predicting adsorption dynamics in complex environmental systems, particularly when experimental constraints limit the scope of kinetic testing.
To validate the generalization ability of the RFR model, the dataset was randomly split into 70% for training and 30% for testing. The model was retrained on the training set and then used to predict adsorption capacities (Qe) on the independent test set. As shown in Fig. S5 in the SI, the predicted values aligned closely with the experimentally measured ones, with an R2 of 0.934, RMSE of 1.33 mg g−1, and χ2 of 0.36 on the test data. This performance confirms that the RFR model generalizes well to unseen data, thereby demonstrating robustness and practical potential for real-world implementation. The high consistency between training and test results reduces concerns of overfitting and validates the use of RFR as a reliable prediction tool in data-driven adsorption modeling.
To benchmark the predictive performance of the RFR, we further evaluated three widely used machine learning models: support vector regression (SVR), gradient boosting regressor (XGBoost), and multi-layer perceptron (MLP). As presented in Table S7 in the SI, both RFR and XGBoost achieved superior predictive accuracy with R2 values of 0.9335 and 0.9338, respectively, indicating strong generalization to unseen data. However, RFR yielded the lowest χ2 value (0.36) and a competitive RMSE (1.33 mg g−1), suggesting it is slightly more robust under variance and model residuals. In contrast, SVR and MLP underperformed significantly, with R2 values below 0.53 and RMSE exceeding 3.5 mg g−1, indicating poor fit and limited ability to capture nonlinear adsorption kinetics. These results underscore the importance of model selection when applying AI to adsorption systems, and support the choice of RFR as a reliable, interpretable, and high-performance model for Cr(VI) kinetic prediction.
The performance of the RFR model in this study compares favorably with previous research on adsorption modeling as shown in Table 2. For instance, Bahrami et al. (2024) used RFR to model methylene blue adsorption onto microplastics and reported an R2 of 0.957 and RMSE of 0.912 mg g−1. Similarly, Hassan and Kazemi (2025) applied RFR for organic pollutant adsorption onto resins and biochars and achieved an R2 of 0.961. In contrast, the present study achieved a higher R2 of 0.994 and lower RMSE of 0.454 mg g−1, indicating improved predictive capability. This superior performance can be attributed to the structured variable selection through ANOVA, the optimized biochar material (BC-YDF), and the targeted design of the experimental dataset. Compared to prior studies, the current work not only enhances model accuracy but also introduces a novel sustainable adsorbent, thereby broadening the environmental application scope of machine learning in adsorption kinetics.
| Study | Adsorbent | Target pollutant | R2 | RMSE (mg g−1) | χ2 | Remarks |
|---|---|---|---|---|---|---|
| This study (2025) | Young durian fruit biochar (BC-YDF) | Cr(VI) | 0.994 | 0.454 | 0.129 | High accuracy, robust validation |
| Bahrami et al.11 | Microplastics | Methylene blue | 0.957 | 0.912 | — | Good fit but limited interpretability |
| Hassan & Kazemi10 | Biochar + resin | Organics | 0.961 | ∼0.85 | — | Applicable to diverse pollutants |
| Solih et al.9 | Fruit waste hydrochar | Heavy metals | 0.978 | ∼0.6 | — | Emphasis on XGBoost; limited on RFR |
To elucidate the adsorption mechanism encoded by the RFR beyond global importance, we decomposed the learned response using complementary interpretability techniques. Partial dependence (PD) and accumulated local effects (ALE) curves show a monotone decrease of Qe with increasing pH once pH exceeds the biochar's point-of-zero charge (pHpzc), with the steepest decline observed between one unit below and one unit above pHpzc; stratified ICE curves confirm that at pH < pHpzc, where the YDF biochar surface is positively charged, Qe is maximized, consistent with electrostatic attraction of anionic Cr(VI) species, whereas deprotonation above pHpzc weakens uptake. SHAP interaction plots further reveal that the adverse pH effect is amplified at higher ionic strength, and ALE surfaces for {pH, KCl} display a sub-additive ridge consistent with screening of outer-sphere interactions by background electrolyte; at near-neutral pH, increasing ionic strength yields a measurable but smaller depression in Qe, whereas at pH ≪ pHpzc the depression is strongest, supporting a dominant physisorption/electrostatic component under acidic conditions. Conversely, at extended contact times and/or higher dosages, PD slices flatten and ICE variability narrows, indicating progressive saturation of fast outer-sphere sites and a growing contribution from slower intraparticle diffusion and possible inner-sphere complexation on oxygen-containing functionalities identified by FTIR, which aligns with the competitive performance of diffusion-aware kinetic baselines in our comparisons. Together, these patterns composed (i) strong negative pH dependence around and above pHpzc, (ii) a pronounced ionic-strength penalty that is largest in the acidic regime, and (iii) attenuation of pH/ionic-strength sensitivity as time and dosage increase are characteristic of an adsorption landscape where outer-sphere physisorption governs initial uptake and is progressively complemented by transport-limited and site-specific interactions; we provide all PD/ALE/ICE panels, SHAP interaction summaries, and counterfactual sensitivity analyses with 95% bootstrap bands in the SI, and we modify the figure captions to explicitly connect these behaviors to mechanistic hypotheses grounded in the material's measured surface properties.
• Phase I: Initial rapid uptake (0–30 min)
In the early phase, adsorption proceeds rapidly due to the availability of a high density of vacant and accessible active sites on the external surface of the biochar. During this period, Cr(VI) ions readily interact with functional groups such as –COOH, –OH, and aromatic π-systems, leading to a steep rise in Qe values. The predicted curve from the PSO model closely follows this sharp increase, indicating its ability to capture fast chemisorption-driven interactions.
• Phase II: Transition phase (30–150 min)
After 30 minutes, the rate of adsorption decreases gradually. This is attributed to partial occupation of active sites and increased steric hindrance as Cr(VI) ions begin to diffuse into internal pores. The PSO model maintains a high fitting accuracy in this range (RMSE = 1.06, R2 = 0.951), suggesting that the kinetic mechanism transitions into a combination of surface and pore diffusion, as also supported by the moderate fit of the IDF and Elovich models in this regime.
• Phase III: Equilibrium phase (>150 min)
Beyond 150 minutes, the system reaches near equilibrium where the net adsorption rate slows down significantly. This indicates that the majority of active sites are saturated or inaccessible, and adsorption–desorption dynamics begin to dominate. The equilibrium adsorption capacity approaches 30 mg g−1, which matches well with both experimental values and the predicted plateau by the PSO and PVSD models.
The clearly defined phases in the kinetic curve emphasize the multi-mechanistic nature of Cr(VI) removal on biochar. The excellent agreement between the PSO model and experimental data throughout all three phases further confirms that chemisorption rather than simple physisorption or purely pore-limited transport is the primary mechanism.
While classical kinetic/isotherm models (PFO/PSO/Elovich, Weber–Morris, Boyd) correctly track the “linear-then-plateau” shape for a given set of conditions, their single-template structure does not, in general, capture how both the local slope and the saturation level co-vary with operating factors. In our experiments, the apparent initial rate and the onset of saturation shift with pH relative to pHpzc and are further modulated by ionic strength; increasing KCl advances the plateau and depresses Qe at low pH (electrostatic screening), whereas longer contact time and higher dosage partially attenuate this penalty, consistent with a mixed outer-sphere/transport-limited picture. Fitting one global parametric equation across all regimes leaves systematic residual patterns and inflated reduced χ2 despite heteroscedastic weighting, indicating structural misspecification rather than mere parameter scaling. We therefore adopt a Random Forest Regressor as a regime-aware surrogate that flexibly approximates the multivariate response surface f(pH, KCl, C0, dosage, t) → Q, trained under nested cross-validation with an external test set to preclude leakage. The RFR reduces out-of-sample RMSE and χ2 versus single-form fits pooled across regimes, while its PD/ALE and SHAP interactions recover the expected monotone decline of Q with pH above pHpzc and the strongest ionic-strength penalty in the acidic regime, and also quantify how time-dosage coupling flattens the pH sensitivity as fast sites saturate. In practice, the physics-based models remain indispensable for mechanistic interpretation on a fixed condition, whereas the RFR serves as a calibrated, transparent surrogate for multi-factor optimization and “what-if” design within the empirical domain; accordingly, we relocate all kinetic derivations and fits to the SI and retain in the main text the cross-regime predictive evidence and interpretable response-surface diagnostics that justify the added value of the data-driven approach.
Importantly, this work is the first to leverage AI-driven regression for modeling the kinetic behavior of Cr(VI) adsorption on a sustainable, bio-based adsorbent derived from agricultural waste. The minimal data requirement and high generalizability of the RFR model make it particularly suited for practical applications in low-resource settings. By bridging data-driven learning and environmental engineering, this approach paves the way for intelligent, efficient, and scalable design of water treatment systems. Overall, the findings underscore the transformative potential of AI in kinetic modeling and sustainable material utilization, offering a robust framework for future advancements in environmental remediation science.
Supplementary information: The data include experimental results on the effects of pH, adsorption time, initial Cr(VI) concentration, material mass, and ionic strength on the adsorption process, as well as an integrating artificial intelligence with kinetic studies for Cr(VI) removal. See DOI: https://doi.org/10.1039/d5ra05229g.
| This journal is © The Royal Society of Chemistry 2025 |