Evaluating groundwater quality in an arsenic-contaminated aquifer in the Red River Delta using machine learning: a case study in Van Phuc, Hanoi, Vietnam
Abstract
Groundwater quality in rapidly urbanising megacities such as Hanoi is increasingly threatened by over-extraction and widespread contamination. Despite heavy reliance on groundwater as the primary water source, its quality has rarely been assessed comprehensively and objectively. This study proposes a machine learning – based approach for evaluating groundwater quality in Van Phuc, where aquifers are affected by intensive exploitation and arsenic pollution. The Extreme Gradient Boosting (XGBoost) algorithm was employed to rank and select parameters according to their importance to overall water quality. Among the eleven input indicators, eight parameters, including As, total hardness, Mn, Na, Cl−, NH4+, Fe, and F−, showed substantial contributions, with As identified as the most influential variable, whereas pH, SO42−, and total dissolved solids (TDS) contributed negligibly. Four aggregation functions were employed to compute the overall groundwater quality index (GWQI), and the National Sanitation Foundation (NSF) model yielded the most consistent and reliable classification for the study area. Application of the developed framework indicated that only one sample exhibited good water quality, while the remainder fell into fair (41.4%), marginal (20.7%), and poor (34.5%) categories. Spatial water quality patterns were closely aligned with hydrogeochemical zonation: poor-to-marginal conditions predominated within the Holocene aquifer, improving with depth across the redox transition zone, and generally achieving better quality in the Pleistocene aquifer. The proposed approach provides a transparent and transferable tool for groundwater quality assessment in stressed urban aquifers, reducing subjectivity while enhancing interpretability and supporting evidence-based water management.

Please wait while we load your content...