Open Access Article
Thi Duyen Vu
ab,
Thanh Dam Nguyen
c,
Thi Kim Trang Phamd,
Michael Berg
e and
Hung Viet Pham
*a
aKey Laboratory of Analytical Technology for Environmental Quality and Food Safety Control, VNU University of Science, Vietnam National University, Hanoi, 334 Nguyen Trai, Thanh Xuan Ward, Hanoi 100000, Vietnam. E-mail: vietph@vnu.edu.vn
bGraduate University of Science and Technology, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet, Nghia Do Ward, Hanoi 100000, Vietnam
cFaculty of Chemistry, VNU University of Science, Vietnam National University, Hanoi, 334 Nguyen Trai, Thanh Xuan Ward, Hanoi 100000, Vietnam
dResearch Center for Environmental Technology and Sustainable Development, VNU University of Science, Vietnam National University, Hanoi, 334 Nguyen Trai, Thanh Xuan Ward, Hanoi 100000, Vietnam
eEawag, Swiss Federal Institute of Aquatic Science and Technology, Department Water Resources and Drinking Water, 8600 Dübendorf, Switzerland
First published on 9th December 2025
Groundwater quality in rapidly urbanising megacities such as Hanoi is increasingly threatened by over-extraction and widespread contamination. Despite heavy reliance on groundwater as the primary water source, its quality has rarely been assessed comprehensively and objectively. This study proposes a machine learning – based approach for evaluating groundwater quality in Van Phuc, where aquifers are affected by intensive exploitation and arsenic pollution. The Extreme Gradient Boosting (XGBoost) algorithm was employed to rank and select parameters according to their importance to overall water quality. Among the eleven input indicators, eight parameters, including As, total hardness, Mn, Na, Cl−, NH4+, Fe, and F−, showed substantial contributions, with As identified as the most influential variable, whereas pH, SO42−, and total dissolved solids (TDS) contributed negligibly. Four aggregation functions were employed to compute the overall groundwater quality index (GWQI), and the National Sanitation Foundation (NSF) model yielded the most consistent and reliable classification for the study area. Application of the developed framework indicated that only one sample exhibited good water quality, while the remainder fell into fair (41.4%), marginal (20.7%), and poor (34.5%) categories. Spatial water quality patterns were closely aligned with hydrogeochemical zonation: poor-to-marginal conditions predominated within the Holocene aquifer, improving with depth across the redox transition zone, and generally achieving better quality in the Pleistocene aquifer. The proposed approach provides a transparent and transferable tool for groundwater quality assessment in stressed urban aquifers, reducing subjectivity while enhancing interpretability and supporting evidence-based water management.
Environmental significanceGroundwater in rapidly urbanising regions such as Hanoi is increasingly threatened by overextraction and arsenic contamination, posing long-term risks to public health and sustainable water management. Conventional groundwater quality indices often rely on subjective expert judgment and fail to capture nonlinear interactions among hydrochemical parameters. This study introduces a transparent and transferable machine learning – based framework using the XGBoost algorithm to objectively evaluate groundwater quality in arsenic-affected aquifers. The model identifies key parameters controlling contamination and delineates spatial patterns consistent with hydrogeochemical processes. The findings provide an interpretable, data-driven tool that enhances groundwater quality assessment, supports evidence-based management, and contributes to the broader goal of safeguarding urban water resources under environmental stress. |
Hanoi, a megacity of nearly 8.7 million inhabitants in 2024, relies heavily on deep aquifers. In 2017, exploited groundwater from these aquifers reached 1 million m3 day−1, supplying roughly 70% of the city's domestic demand.6,7 Over a century of overextraction has resulted in significant drawdown, created cones of depression, and altered regional flow directions.6,8 Concurrently, widespread contamination by arsenic, iron, manganese, and ammonium has also been reported.2,9–12 The combination of depletion and contamination highlights an urgent need for quantitative, interpretable, and scalable groundwater quality assessment tools to support sustainable urban water management.
Assessing water quality, especially in complex hydrogeochemical regions, is challenging due to monitoring data's volume, variability, and heterogeneity.13 The water quality index (WQI), initially developed in the 1960s, remains widely used to integrate multiple chemical indicators into a single score representing overall water quality status. Its simplicity and transparency make it effective for both scientific interpretation and public communication.1,13,14 However, traditional WQI models rely heavily on expert judgment and locally defined weighting schemes, which introduce subjectivity, overlook nonlinear interactions among parameters, affecting the objectivity and reliability of the assessment and limit generalizability.1,13–15
Recent advances in data science, particularly machine learning (ML), provide promising alternatives to overcome the inherent limitations of conventional water quality assessment methods. ML-based frameworks can learn nonlinear relationships directly from data, improving predictive accuracy, enhancing objectivity, and reducing uncertainty in water quality assessment.1,14,16–19 Numerous studies demonstrate that ensemble algorithms such as Random Forest, Gradient Boosting, Support Vector Machines, and Neural Networks outperform traditional regression and index-based methods for handling heterogeneous hydrochemical datasets.20 These models not only provide predictive accuracy but also yield interpretable feature-importance metrics that help identify key hydrochemical drivers influencing water quality. Advances in data-driven modelling have also shown that machine learning and multivariate statistical approaches can capture complex, non-linear relationships between water quality indicators in river and lake systems.21–23 Although ML models have been widely used for water quality assessment, they also have certain weaknesses. They are often trained on relatively small and site-specific datasets, which makes them vulnerable to overfitting and limits their robustness and interpretability.24,25 In addition, data-driven WQI frameworks may inherit substantial uncertainty from subjective or site-dependent choices such as indicator selection, sub-index functions, weighting schemes and aggregation rules, so quantifying and communicating model uncertainty is essential if such models are to be used in water-management decisions.26
Among these approaches, Extreme Gradient Boosting (XGBoost) has emerged as a robust and interpretable ensemble algorithms for environmental applications. Many studies have demonstrated that XGBoost is a boosting algorithm that provides high accuracy, the ability to model complex nonlinear relationships, avoid overfitting through regularisation, and robustness to incomplete datasets.14,19,27–29 These features make it effective for complex hydrochemical datasets that are nonlinear, sparse, and influenced by coupled physical-chemical processes. Moreover, XGBoost offers feature-importance rankings enabling transparent interpretation of how each water-quality parameter influences the model output, an essential advantage for risk communication and evidence-based groundwater management.1,19,28,29
In Vietnam, the application of ML for water quality assessment has expanded rapidly in recent years to assess and forecast water quality in both surface and groundwater systems. For example, Khoi et al.,30 evaluated twelve ML algorithms for predicting the WQI of the La Buong River (Southeast Vietnam), where XGBoost outperformed others. Le et al.,31 demonstrated the utility of ML classifiers in capturing spatiotemporal variability in the Song Quao-Ca Giang water system, helping to establish a methodological foundation for subsequent WQI-focused studies. Lap et al.,32 combined feature selection and ML in the An Kim Hai irrigation network (Hai Phong) to maintain high predictive accuracy while reducing data requirements. Nguyen et al.,33 applied Bayesian model averaging with gradient-boosting regressors in Red River Delta irrigation systems, achieving R2 ≈ 0.96 with minimal input parameters. Additionally, Nguyen et al.,34 used ML to predict groundwater quality in Quang Tri's coastal aquifer, confirming the model's relevance for subsurface applications. These studies highlight the growing maturity of ML-based WQI modelling in Vietnam; however, few have focused on Hanoi's deep aquifers, where chronic overextraction and arsenic contamination converge, leaving a crucial research and management gap.
Despite growing global adoption, most ML-WQI studies still focus on surface waters or require extensive datasets that are rarely available for groundwater systems. Few have targeted groundwater in Southeast Asian megacities, where aquifers face compounded threats from urbanisation and geogenic contamination. Addressing this gap is crucial to advancing data-driven groundwater management and understanding the nonlinear interplay between hydrochemical parameters in such vulnerable aquifer systems.
In this study, we develop and apply an ML-based WQI framework to evaluate groundwater quality in Van Phuc, Thanh Tri District, Hanoi – a site representative of aquifers affected by massive abstraction and widely contaminated groundwater. The XGBoost algorithm is employed for its robustness, interpretability, and ability to model nonlinear hydrochemical interactions.14,16,19,27–29 Specifically, we aim to (1) identify key hydrochemical indicators influencing groundwater quality through feature-importance analysis; (2) compute ML-based WQI to quantify groundwater (GWQI) in Van Phuc; and (3) explore spatial variation in groundwater quality in Van Phuc.
We hypothesise that ML-based feature selection will provide a more objective, transferable framework than conventional WQI approaches. The proposed method integrates environmental chemistry insight with data-driven modelling, offering a scalable, interpretable, and evidence-based tool to support sustainable groundwater management in Hanoi and comparable urban environments globally.
Over the past century, intensive groundwater abstraction from deep aquifers in Hanoi has led to substantial drawdown, reversing flow direction in the Van Phuc village. Groundwater now flows from southeast to northwest toward Hanoi at a rate of 40 m year−1 and has been sustained over the last 50–60 years.8,10,37 As a result, the Red River acts as the primary recharge source for both aquifers within a 5 km corridor, raising concern that contaminated water from the Holocene aquifers may migrate downward into the underlying, previously uncontaminated Pleistocene aquifer.8,10 The geological and hydrological conditions and geographical location make this area ideal for studying groundwater quality and biogeochemical characteristics. Our research group has been investigating biogeochemical characteristics in the study area for an extended period. However, no studies have used machine learning to assess groundwater quality and derive the WQI.
At the scale of the entire Red River Delta, large-scale surveys and probabilistic mapping by Winkel et al.,2 show that a substantial fraction of domestic tubewells abstract groundwater with As concentrations above the World Health Organisation (WHO) guideline value of 10 µg L−1, with the highest probabilities occurring along the banks of the Red River and in parts of the southwestern delta. Van Phuc is situated within one of these arsenic-affected zones and is well known for its elevated arsenic concentration in groundwater.2,10
The chemical composition of groundwater was analysed at the VNU Key Laboratory of Analytical Technology for Environmental Quality and Food Safety Control (KLATEFOS), VNU University of Science. Major cations (Na, K, Ca and Mg), as well as total iron (Fe) and manganese (Mn), were quantified using atomic absorption spectroscopy (AA-6800, Shimadzu, Japan). Total arsenic (As) concentration was determined by hydride generation atomic absorption spectrometry (HG-AAS) on the same instrument. Internal reference materials (ARS, certified by Eawag, Switzerland) were analysed concurrently for quality control. Ammonium (NH4+) and phosphate (PO43−) ions were measured using a UV-Vis spectrophotometer (UV-1800, Shimadzu, Japan). Anions (F−, Cl−, SO42−, NO2− and NO3−) were determined by ion-exchange chromatography (HIC-20A super, Shimadzu, Japan), employing the Shimadzu PIA reference material to ensure analytical reliability. All measurements were performed under calibration conditions with R2 values above 0.995. The relative standard deviation (RSD) of triplicate analyses remained below 5%, and recovery rates for reference materials consistently ranged between 90% and 110%, confirming the precision and accuracy of the analysis.
![]() | (1) |
| TDS = 0.67 × EC | (2) |
Compared to previous publications, the most challenging aspect in developing the XGBoost model in this study is handling data imbalance. Among 29 Van Phuc groundwater samples, only one sample (VP25) was classified as PS = 0. This extreme imbalance between the PS = 0 and PS = 1 classes makes ML-based classification impractical. Additional PS = 0 samples from the study area could not be obtained, as the possibility of identifying new uncontaminated samples was low given the extensive contamination. Algorithmically generated techniques like SMOTE were also infeasible due to only a single minority class pattern. To mitigate this issue, seven PS = 0 groundwater samples from other Hanoi sites (same period sampling time) were added to the dataset to build the model. Consequently, the full dataset of groundwater quality used in this study includes 29 samples collected from monitoring wells in Van Phuc village in April 2019 and 7 additional tubewell samples from other sites in Hanoi that met all Vietnamese groundwater quality guidelines (Table S2). These additional samples were gathered during the same campaign, using identical sampling procedures, and were analysed with the same analytical methods as those collected from Van Phuc.
Consistent with the approaches of Uddin et al.,14 and Wang et al.,1 physicochemical parameters with vales below the method detection limits were assigned a value of 0 prior to model training. The complete dataset (n = 36, Table S2) was then normalised using min–max scaling to minimise the influence of differing measurement ranges (e.g., high concentrations of TDS compared with trace-levels of As). The dataset was subsequently divided into training and testing datasets (80/20 split), with stratified 5-fold cross-validation applied under the constraint that each training and test set included at least two samples with PS = 0. Additionally, the single PS = 0 sample from the original Van Phuc dataset was consistently allocated to the test set to evaluate model discriminability directly. Three widely used ML algorithms, including decision tree (DT), random forest (RF), and XGBoost, were compared to identify the most suitable approach for classifying groundwater quality in the study area. For initial screening purposes, each algorithm was executed with a simple hyperparameter optimisation (Table S3). Evaluation metrics included the area under the receiver-operating characteristic curve (AUC), logloss, accuracy, sensitivity, and specificity. As shown in Table S4, the DT model proved inadequate, with the lowest AUC (0.5) and the highest logloss (0.693). The RF algorithm performed robustly, achieving a lower prediction logloss (0.299) than XGBoost (0.351), indicating a strong probability calibration. However, for the specific purpose of identifying unpolluted samples without error, XGBoost was superior, achieving 100% accuracy, sensitivity, and specificity on the test set compared to slightly lower metrics for RF. Given the priority of avoiding false positives in an imbalanced groundwater quality dataset, combined with its high interpretability, XGBoost was finally selected as the optimal model. Therefore, XGBoost was selected as the most appropriate algorithm for computing the GWQI in the Van Phuc area, Hanoi.
Hyperparameter tuning for the XGBoost model was optimised via grid search (Table S5) across 24
200 combinations of learning rate (0.01–0.10), max_depth (1–10), subsample (0.50–0.95), colsample_bytree (0.50–0.95), and scale_pos_weight (0.25 and 1), min_child_weight (1), and gamma (0). The number of boosting rounds (nrounds) was fixed at 1000, with early stopping implemented after 50 rounds. Unlike Uddin et al.,14 model performance was evaluated using logloss and AUC, which are more suitable metrics for binary classification than root mean square error (RMSE).47 Optimal hyperparameters were chosen to maximise AUC and minimise logloss on the test set, while ensuring correct classification of the original PS = 0 sample.
After hyperparameter optimisation, the model was retrained on the original 29-sample Van Phuc dataset, and feature importances for the 11 indicators were extracted. These indicators were ranked by importance, and their weights were calculated using the rank order centroid (ROC) method, as described by Uddin et al.14 Sub-index values for indicators with non-zero weights were then derived using eqn (3)–(5) based on their condition as shown in Table 1.
![]() | (3) |
![]() | (4) |
![]() | (5) |
Finally, the indicator weights and sub-indices were aggregated to produce an overall GWQI score. Four functions, including the National Sanitation Foundation (NSF) index, weighted quadratic mean (WQM), Scottish Research Development Department (SRDD) index, and Wet Java (WJ) index, were applied for comparison, using eqn (6)–(9) where relevant, with NSF and WQM proposed by Uddin et al.,14 as performing best in their context.
![]() | (6) |
![]() | (7) |
![]() | (8) |
![]() | (9) |
In contrast, a significant proportion of the samples exhibited elevated levels of As, Fe, Mn, and NH4+, with exceedance rates of approximately 55%, 62%, 62%, and 79%, respectively, indicating widespread contamination (Fig. 2). These findings are consistent with earlier studies in Van Phuc10,35,37 and with reports of similar contamination patterns elsewhere in the Red River Delta.2,9,11,12
The widespread contamination observed in the study area, particularly the excessive As concentrations exceeding the World Health Organisation (WHO) and Vietnamese standard guideline value of 10 µg L−1 for drinking water, indicates that the groundwater is unsuitable for domestic use without prior treatment. Direct consumption of groundwater containing elevated levels of toxic metals such as As, Fe, and Mn poses serious health risks, including cancer and neurological disorders.41–45
The model was retrained on the original 29-sample dataset to extract feature importances, which determined indicator ranks and weights. Of the 11 parameters, eight exhibited non-zero importance (Table 2), encompassing all parameters that breach guideline values in at least one sample (As, Mn, Fe, NH4+, F−, and HN), whereas pH, SO42−, and TDS contributed negligibly. Among the eight retained indicators, As was associated with the most frequent exceedances and emerged as the most influential variable. Notably, NH4+ displayed relatively low importance despite its prominence as a pollutant in the study area; in contrast, a comparable model by Wang et al.,1 identified NH4+ as the primary predictor, highlighting potential dataset-specific variations. Similarly, Uddin et al.,14 reported dissolved oxygen (DOX, in summer) and molybdate reactive phosphorus (MRP, in winter) as key pollutants that were not influential in their seasonal models. The weights computed from these importance ranks are summarised in Table 2.
| Feature | Importance | Rank | ROC weight |
|---|---|---|---|
| As | 0.3994 | 1 | 0.3397 |
| HN | 0.1649 | 2 | 0.2147 |
| Mn2+ | 0.1153 | 3 | 0.1522 |
| Na+ | 0.1052 | 4 | 0.1106 |
| Cl− | 0.0840 | 5 | 0.0793 |
| NH4+ | 0.0571 | 6 | 0.0543 |
| Fe | 0.0452 | 7 | 0.0335 |
| F− | 0.0290 | 8 | 0.0156 |
| pH | 0 | — | 0 |
| SO42– | 0 | — | 0 |
| TDS | 0 | — | 0 |
Sub-index values (si) were calculated for the eight indicators, where si = 0 represents the poorest quality and si = 100 represents the highest quality. The four most frequently non-compliant parameters (As, Fe, Mn, NH4+) showed si ranges of 0–100, with mean values of 39.7 for As and 11.2 for NH4+. For F− and HN, si ranged from 0–93.0 and 0–77.4, respectively, while the remaining two indicators had si values between 74.4 and 98.7.
Composite indices were computed using the NSF, WQM, SRDD, and WJ indices. While NSF, SRDD, and WJ are popular established approaches, WQM was recently introduced by Uddin et al.14 The WJ index was unsuitable for the Van Phuc groundwater, resulting in zero scores for all samples except sample VP25. For the other indices, GWQI score ranges were NSF 16–83, WQM 36–86, and SRDD 3–69, with respective means of 44, 59, and 23 (Fig. 3). The maximum scores across all three indices were achieved for the single PS = 0 sample (83 for NSF, 86 for WQM, and 69 for SRDD). In contrast, the minima were associated with sample VP29, which exhibited the highest number of exceedances (5 out of 11): 16 for NSF, 36 for WQM, and 3 for SRDD.
Spearman's rank correlations revealed strong agreement among the three methods (r = 0.972–0.998, p < 0.001). However, index scores varied significantly according to Friedman's two-way analysis (Q2,29 = 58.0, p < 0.001): WQM generally produced higher scores than NSF (ZNSF–WQM = −3.808, p < 0.001), while SRDD yielded the lowest scores (ZSRDD–NSF = 3.808, p < 0.001; according to Wilcoxon signed-rank post-hoc with Bonferroni correction).
The most appropriate index among NSF, WQM, and SRDD was selected by evaluating the frequency of over- and under-estimates relative to compliance-based quality classes. Water-quality status was classified as: good (no exceedances; WQI 80–100), fair (1–2 exceedances; WQI 50–79), marginal (3 exceedances; WQI 30–49), and poor (≥4 exceedances; WQI 0–29).14 WQM often overestimated (14 of 15), whereas SRDD frequently underestimated (18 of 19). NSF generated the fewest misclassifications (n = 5; 2 overestimates, 3 underestimates) and was thus considered the most suitable for the study area. This selection contrasts with Uddin et al.,14 where WQM was favoured, likely due to differences between coastal waters and groundwater systems. As elaborated earlier, several parameters exceeding guideline values were deemed non-influential in Uddin et al.14
Of the 29 samples in Van Phuc, only one was classified as good (GWQI = 83), twelve (41.4%) as fair, and the remaining 55.2% as marginal and poor. These results underscore substantial groundwater contamination in Van Phuc, emphasising the necessity for enhanced monitoring and management strategies to safeguard public health.
From a hydrogeochemical perspective, the feature-importance ranking obtained from XGBoost is consistent with the current understanding of arsenic mobilisation in the study area. The dominating role of As as the top-ranked feature is consistent with its high exceeding frequency and toxicity. Arsenic was released from sediment into groundwater through the reductive dissolution of Fe oxyhydroxides in the aquifer systems, supported by reducing bacteria, when the natural organic matter (NOM) is sufficiently available.10,12,37 The second-ranked indicator, hardness, acts as an integrated proxy for Ca–Mg carbonate equilibria and mixing between river-derived recharge and deeper Pleistocene groundwater, both of which control contaminant transport pathways.39 High importance of Mn2+ reflects the progression of redox reactions towards more reducing conditions that may lead to the further mobilisation of As.37 Na+ and Cl− likely capture mixing between young river-derived recharge and more mineralised, partly saline or anthropogenically impacted groundwater.38,48 Although NH4+ is a widespread pollutant in the study area, its moderate importance suggests that it indirectly reflects reducing conditions, which is coherent with the release of As in the Holocene aquifer as mentioned above. The minor contributions of Fe and F− further support the role of redox-controlled mobilisation of iron oxyhydroxides and water–sediment interactions in shaping groundwater quality in Van Phuc. The negligible contributions of pH, SO42− and TDS in the model reflect their relatively limited variability within the dataset and their compliance with the guideline ranges.
Although a novel and reliable approach for assessing groundwater quality at the Van Phuc site in Hanoi has been developed, several limitations remain that warrant attention and may be addressable in future work to broaden applicability to other regions in Vietnam and globally. First, the XGBoost model was built on a relatively small and imbalanced dataset. While supplementing the dataset with additional samples from hydrogeochemical similar sites elsewhere in Hanoi to increase the minority class (PS = 0) partly mitigates this limitation, this practical approach is inconvenient and may prove challenging to implement when applied to datasets from other regions. It should be emphasised that Vietnam's national technical standards for groundwater are stringent, and obtaining uncontaminated samples, i.e., samples meeting all specified technical criteria, is particularly challenging in heavily stressed aquifer systems such as Hanoi, where both geochemical processes and intensive anthropogenic exploitation impact groundwater. Among over 100 samples collected from other locations in Hanoi during the same sampling campaign at Van Phuc, only seven samples could be classified as PS = 0 for incorporation into the dataset. Although such compliant samples may be more readily available in less impacted regions beyond Hanoi, incorporating samples from geographically distant sites may introduce regional bias. Alternatively, data augmentation algorithms can be employed to generate minority-class samples artificially; however, this strategy requires sufficiently large original datasets to ensure the generation of realistic synthetic data. A more promising approach would involve shifting from binary classification (PS = 0 versus PS = 1) to multiclass classification, wherein PS = 0 represents samples meeting all national standards, PS = 1 represents samples with one or two parameter exceedances, PS = 2 represents samples with three exceedances, and PS = 3 represents samples with four or more exceedances. Multiclass classification would reduce the severity of dataset imbalance and would eliminate the need for augmented PS = 0 samples.
A second limitation is that this assessment is based on a single sampling campaign conducted in April 2019 and does not account for seasonal or inter-annual variability in groundwater quality. Temporal fluctuations may alter redox conditions and contaminant concentrations, potentially affecting the classification of indicators. Future studies could therefore incorporate time-series monitoring to quantify seasonal variability in the GWQI and to better characterise the robustness of the framework across different hydrogeochemical conditions.
![]() | ||
| Fig. 4 Spatial distribution of groundwater quality along the Van Phuc transect based on NSF model. Each symbol represents a monitoring well positioned according to its location along the transect, colours indicate groundwater quality classes. The violet arrow represents the dominant groundwater flow direction, along which most monitoring wells are located. The letters B–E show the major hydro(geo)chemical zones as proposed by Stopelli et al.,37 (2020): B = Holocene aquifer near the river; C = Holocene aquifers; D = redox transition zone (RTZ); and E = Pleistocene aquifers. | ||
Zone B showed consistently poor groundwater quality due to the reductive dissolution of Fe-bearing minerals, which simultaneously mobilise arsenic.37 Furthermore, recharge from surface water and the Red River water may introduce additional organic matter, enhancing the aquifer's reducing conditions and promoting further As release. Consequently, groundwater in Zone B remains of poor quality and effectively acts as a pollution corridor, facilitating the migration of arsenic from the river into deeper sections of the Holocene aquifer.
Zone C was the most severely impacted area37 with strong Fe(III) reduction and methanogenic conditions, causing maximum of As, Fe, and NH4+ concentrations. Despite this, Mn remained low (0–0.27 mg L−1) in most wells, suggesting secondary precipitation, which moderated water quality to marginal in most wells rather than poor as zone B.
Within Zone D, the Redox transition zone (RTZ), groundwater quality improves markedly with depth. Shallow wells (<30 m) exhibited poor WQI values, while deeper wells (>30 m) achieved fair quality. Along the flow path, As- and Fe-rich water from Zone C attenuated through coprecipitation with Fe oxyhydroxides, whereas Mn minerals were reduced, increasing dissolved Mn2+ concentrations.37 The pronounced reduction in As and Fe in deeper RTZ wells significantly improved GWQI values. This finding underscores the RTZ's dual role as a hydro(geo)chemical boundary and a reactive “filter” attenuating contaminant transport and improving groundwater quality.
Groundwater quality in Zone E (Pleistocene aquifer) was generally better than in the Holocene system. Most wells displayed fair GWQI values; the only well achieving “good” status (at 38 m depth) was also located here. Stable Fe(III) oxyhydroxides in this zone35,37 serve as an efficient arsenic sink, maintaining low As concentrations and stable water quality. Nonetheless, moderate dissolved Mn and NH4+ levels indicate ongoing redox processes, which may influence long-term As dynamics, particularly under intensive groundwater abstraction.
Overall, the spatial distribution of GWQI at Van Phuc reflects the coupled processes of arsenic mobilisation and attenuation within the Holocene–Pleistocene aquifer system. Zones B and C function as As source areas with poor to marginal GWQI, Zone D acts as a transition zone where contaminant attenuation enhances water quality, and Zone E represents a relatively stable aquifer with generally better GWQI. These results highlight the necessity of integrating groundwater quality assessment with stratigraphic architecture and hydro(geo)chemical processes. Furthermore, they emphasise the importance of groundwater treatment prior to use and the urgent need for sustainable management policies, as excessive pumping of deeper aquifers could exacerbate downward contaminant migration from the Holocene into the Pleistocene aquifers.
Applying the developed approach to 29 wells in Van Phuc, the computed overall GWQI indicated that only one sample was classified as “good”. In contrast, the remaining samples were categorized as “fair” (41.4%), “marginal” (20.7%) and “poor” (34.5%). The spatial distribution of groundwater quality strongly correlated with hydrogeochemical zonation: poor to marginal water quality predominantly occurred within the Holocene aquifer, improving with depth across the redox transition zone, and generally exhibiting better quality within the Pleistocene aquifer.
These findings highlight that the proposed approach effectively reduces the subjectivity inherent in traditional methods while providing an interpretable and management-oriented framework for evaluating groundwater quality in stressed aquifer systems. Nevertheless, the present study is constrained by a limited and imbalanced dataset, as it relies on a single sampling campaign and includes a small number of good-quality samples from outside the study area to support model development. Future work should expand the spatial and temporal coverage, integrate additional redox-sensitive indicators, conduct formal uncertainty analyses and evaluate model transferability to other areas in Hanoi and comparable megacities to enhance robustness and generalizability.
Although this study focused on arsenic-contaminated aquifers, the proposed framework is generic and could be readily adapted to other contaminants of concern (e.g., fluoride, nitrate, salinity or emerging pollutants) by redefining the pollution status and indicator set. With appropriate local guidelines, the same ML-based GWQI approach could also be extended to arsenic-affected deltas elsewhere in South and Southeast Asia, thereby enhancing comparability across regions because of the similarity of the hydrogeochemical properties of the huge area of the south Himalaya mountain geology and paleo-geology (e.g. the West Bengal, India and Ganges River Delta in Bangladesh and especially the inter-boundary regional Mekong Delta).
Therefore, appropriate treatment methods are essential to reduce pollutant concentrations before groundwater in the study area is used for any purpose. Among modern mitigation approaches, sand filtration systems represent a practical, low-cost, and accessible technology capable of effectively removing contaminants, particularly heavy metals such as As, Fe, and Mn from groundwater.53–57
From the management perspective, the ML-based GWQI framework developed here could be used as a screening tool to prioritise wells for detailed monitoring and treatment, especially in the Holocene and shallow RTZ zones where poor and marginal water quality dominate. In parallel, controlling abstraction from the Pleistocene aquifer and promoting low-cost household and community-scale treatment (e.g., sand filtration) would reduce exposure risks while safeguarding the long-term integrity of the Red River Delta groundwater resources.
| This journal is © The Royal Society of Chemistry 2026 |