Integrated machine learning and positive matrix factorization for the source-specific contamination and predictive risk assessment of potentially toxic elements in multi-land-use soils around an active coal mine
Abstract
Investigating the distribution, sources, and risks of potentially toxic elements (PTEs) in mining-impacted soils is critical for effective environmental monitoring and human health protection. However, traditional assessments often fail to integrate spatial, source-oriented, and predictive approaches limiting a comprehensive understanding. In this study, 120 soil samples were collected from five land-use types surrounding an active opencast coal mine in the Godavari Valley coalfields, India. Pollution indices revealed severe multi-metal contamination, with Co and Cd emerging as the most consistently enriched elements across land uses, while Zn showed pronounced but spatially restricted enrichment, particularly in coal mine soils. An integrated framework combining positive matrix factorization (PMF), machine learning, and geospatial analysis was developed to identify source-specific contamination patterns. A robust four-factor PMF solution identified mixed industrial-mining activities as the dominant source (∼49%) of contamination. A random forest (RF) model integrating soil properties, spatial variables, and PMF-derived source contributions demonstrated strong to moderate predictive performance (average R2 = 0.82) with an average root mean square error (RMSE) of 19.6 mg kg−1. Geostatistical mapping highlighted coal mines and adjacent agricultural areas as persistent contamination hotspots. Ecological risk assessment indicated Cd and Hg as the principal contributors to high ecological risks, particularly in agricultural and roadside soils. Probabilistic health risk assessment revealed unacceptable risks for the local population, with children being the most vulnerable. Cr was identified as the primary driver of carcinogenic risk, contributing ∼81% in children, while Co-dominated non-carcinogenic risks resulting in hazard indices for children approaching unacceptable thresholds across all land-uses. Our findings provide a precise and scientific framework for source-specific risk assessment to target soil remediation and environmental management in mining-impacted landscapes worldwide.

Please wait while we load your content...