Open Access Article
Angela Pedregal-Montes
*ab,
Eleanor Jenningsc,
Rafael Marcéd and
Maria José Farré
a
aCatalan Institute for Water Research (ICRA), Carrer Emili Grahit 101, Parc Científic i Tecnològic de la Universitat de Girona, 17003 Girona, Spain. E-mail: apedregal@icra.cat
bUniversity of Girona, Plaça de Sant Domènec 3, 17004 Girona, Spain
cCentre for Freshwater and Environmental Studies, Dundalk Institute of Technology, A91 K584 Dundalk, Ireland
dCentre for Advanced Studies of Blanes (CEAB), Spanish National Research Council (CSIC), 17300 Blanes, Spain
First published on 8th April 2026
Using long-term monitoring and machine learning, this study links upstream hydrometeorology, reservoir processes, and operations to source water conditions relevant to trihalomethane (THM) formation risk at the Mediterranean Ter drinking water treatment plant (DWTP) in Spain, supplied by a three-reservoir cascade (Sau–Susqueda–Pasteral). Based on exploratory analyses, three target variables were selected as indicators of THM formation risk: dissolved organic carbon (DOC) and water temperature (WT) at the DWTP inlet, and fluorescent dissolved organic matter (fDOM) at the Susqueda withdrawal depth. Permutation importance results using Random Forest and LSTM models indicated that withdrawal-layer conditions at Susqueda dominate downstream variability: DOC was most strongly associated with extracted fDOM and other withdrawal water quality variables, whereas inlet WT was primarily controlled by Susqueda withdrawal temperature. For fDOM at Susqueda, reservoir storage volume emerged as a major driver, highlighting the influence of water availability, retention time, and stratification on DOM dynamics. Optimized LSTM models predicted the three target variables with strong validation skill (R2 and KGE > 0.8). Scenario simulations identified seasonal windows of opportunity for THM risk reduction, with selective withdrawal targeting low-fDOM or cooler layers reducing indicator-based THM formation risk at the DWTP inlet, particularly during warm stratified periods and post-summer rainfall transitions. The effectiveness of this strategy was event-dependent and constrained by reservoir levels and gate accessibility. These results highlight opportunities to reduce DBP formation risk through upstream management, supporting a shift from end-of-pipe control to multi-barrier strategies, particularly in regions facing increasing hydroclimatic stress.
Water impactThis study shows that the ability to provide safe drinking water is strongly influenced by upstream climate variability, reservoir dynamics, and operational decisions that shape disinfection by-product formation risk. The findings highlight the need to manage source waters and reservoirs as active control points, supporting more resilient and integrated strategies to safeguard water supplies under increasing hydroclimatic uncertainty. |
Operational control therefore often focuses on monitoring source water precursor indicators at the DWTP inlet using dissolved organic matter (DOM) surrogates such as dissolved organic carbon (DOC), ultraviolet absorbance at 254 nm (UV254), or fluorescent DOM (fDOM), together with variables that influence formation kinetics, such as water temperature. These indicators are commonly incorporated into site-specific DBP risk tools to support operational decisions.11 However, rapid shifts in source water conditions such as temperature spikes, hydrologic events, or precursor pulses can reduce the response time available to operators and compromise water safety.12–14 A key limitation of many DBP predictive tools is that they are primarily DWTP centered and may not explicitly account for upstream drivers of DOM and water temperature, including catchment forcing, reservoir processes, and water source management.15 This gap is particularly relevant in Mediterranean regions, where droughts and intense rainfall are projected to intensify, with strong implications for DOM dynamics in rivers and reservoirs.16–18 Understanding the link between upstream controls and inlet indicators is therefore essential for more anticipatory and climate-adaptive management from source to tap.
Linking upstream forcing to source water conditions at a DWTP inlet requires methods that can represent both catchment driven inputs and managed reservoir transformations at relevant time scales.19 This is challenging because DOM dynamics reflect interacting hydrological, physical, and biogeochemical processes,20,21 while catchment DOM monitoring is typically low frequency (often monthly). Process-based hydrological and biogeochemical models can help bridge this gap by providing temporally continuous estimates of catchment DOM and discharge,22 but downstream water quality reaching the DWTP additionally reflects nonlinear interactions among meteorology, reservoir stratification and internal processing, and selective withdrawal operations that are difficult to parameterize explicitly in complex, highly managed reservoir systems.23 In this context, machine learning (ML) approaches offer a practical alternative for forecasting operationally relevant source water indicators, as they can learn empirical, potentially lagged and nonlinear relationships directly from multi-source monitoring and operational datasets without requiring explicit representation of all underlying processes.24
Previous work at the Ter DWTP showed that DOC and raw water temperature measured at the plant inlet can be used as practical indicators to classify THM formation risk using an empirical risk matrix, as raw waters are predominantly influenced by DOM and typically show low contributions from inorganic precursors (e.g., bromide).25 This framework reflects pre-treatment THM formation risk, defined as the potential for DBP formation during subsequent disinfection based on source water conditions. Building on this approach, this study examines the upstream controls, predictability, and operational leverage points governing these indicators in the Mediterranean Ter river-reservoir-DWTP continuum supplying the Barcelona metropolitan area. We compiled and analyzed an extensive spatio-temporal dataset spanning catchment forcing, reservoir water quality and stratification, reservoir storage and gate operations, and DWTP inlet monitoring. The dataset was complemented by daily upstream DOC and discharge simulations from a previously validated catchment model to represent inflow variability at appropriate temporal resolution. We first evaluated relationships among DOM proxies and THM observations to support the use of DOC as a consistent indicator, then assessed longitudinal DOC patterns to identify the most influential upstream control points along the continuum. Subsequently, we applied ML-based driver attribution and prediction to quantify the dominant hydrometeorological, water quality, and operational drivers of DOC and water temperature at the DWTP inlet, and finally tested alternative selective-withdrawal strategies to assess how operational choices could shift indicator-based THM risk classes. By focusing on source water indicators and their upstream drivers, the study provides a basis for more anticipatory, climate-adaptive source water management that can support THM risk mitigation without attempting to directly model DBP formation within the treatment process.
Reservoir water levels and releases are managed by regional water authorities to accommodate multiple uses, including hydroelectric generation, ecological flow maintenance, and recreational activities. In contrast, water quality management is primarily conducted by ATL through selective withdrawal operations at the Sau and Susqueda reservoirs, which are equipped with three and four intake levels, respectively (Fig. 1, S2 and S3). These intake structures allow operators to select withdrawal depth based on water quality conditions; however, the set of available intake levels varies with reservoir water level, which determines the accessibility of individual gates. Water is subsequently withdrawn from the bottom outlet of the Pasteral reservoir, where discharge rates can be regulated, and transported to the DWTP via a pipeline, with an approximate travel time of 12 hours to the DWTP inlet. Additional characteristics of the reservoirs are provided in Table S1. Although the three-reservoir configuration provides substantial buffering of hydrological variability and raw water quality, the system remains sensitive to extreme meteorological conditions that can alter reservoir stratification and organic matter dynamics, thereby influencing raw water quality at the DWTP inlet.26
The Ter DWTP employs a conventional treatment train that includes both pre-chlorination (combined dosing of chlorine dioxide (ClO2) and sodium hypochlorite (NaClO) to increase oxidation and disinfection capacity) and post-chlorination (NaClO). Pre-chlorination is the stage most susceptible to DBP formation, as disinfectants are applied when organic precursor concentrations in raw water are highest. Consequently, operational control focuses on limiting DBP formation by adjusting disinfectant dosage and treatment conditions in response to raw water characteristics. Previous studies have shown that, at the Ter DWTP, raw water DOM concentration and water temperature are particularly relevant factors influencing THM formation risk, as they control precursor availability and reaction kinetics under local treatment conditions.25 Due to the spatial extent of the supply network and variability in water demand, hydraulic retention times (HRT) within the distribution system range from several hours to multiple days. Therefore, to ensure compliance with drinking water regulations, which establish a maximum allowable concentration of 100 μg L−1 for total THMs at the consumer tap, ATL applies a more conservative internal operational limit of 50 μg L−1 for total THMs at the DWTP outlet.
First, exploratory analyses were conducted using long-term monitoring data to characterize the temporal variability of organic matter and water temperature at the DWTP inlet, their relationships with THM concentrations at the outlet, and their connection to upstream hydrometeorological conditions along the river-reservoir-DWTP continuum. These analyses were used to support the selection of candidate water quality indicators and to inform predictor preselection for subsequent modeling. The exploratory analysis leveraged the full monitoring record (2015–2023) to capture long-term variability and hydroclimatic extremes, whereas subsequent ML modeling was constrained to a shorter period defined by the availability of high-frequency reservoir data required to represent key upstream drivers.
Second, ML models were implemented to quantify the relative importance of upstream drivers influencing the selected indicators. Random Forest (RF) and Long Short-Term Memory (LSTM) models were trained using preselected hydrometeorological, operational, and water quality predictors, and permutation importance (PI) was applied to attribute the contribution of individual drivers. Third, LSTM models were refined for each indicator by reducing predictor sets based on driver attribution results and predictive performance. The optimized LSTM models were then used to evaluate predictability and temporal dynamics under observed conditions.
Finally, the calibrated LSTM models were applied to simulate alternative reservoir operation scenarios designed to reduce THM formation risk at the DWTP inlet. Model outputs under baseline and scenario conditions were subsequently translated into THM formation risk classes using published empirical relationships.
| Location | Variable | Temporal resolution | Source | Use in study |
|---|---|---|---|---|
| Note: exploratory analyses used the full monitoring period (1 January 2015–31 December 2023). Machine learning (ML) models were trained and evaluated over the period constrained by reservoir profiler availability (4 February 2017–30 November 2020). Abbreviations: DWTP, drinking water treatment plant; DOC, dissolved organic carbon; fDOM, fluorescent dissolved organic matter; WT, water temperature; UV254, ultraviolet absorbance at 254 nm; SUVA, specific ultraviolet absorbance; DO, dissolved oxygen; Chl-a, chlorophyll-a; THM, trihalomethane. | ||||
| Meteorology | Air temperature, total precipitation, solar radiation | Daily | ERA5 reanalysis | Exploratory and ML |
| Ter river | DOC | Monthly | Monitoring records | Exploratory |
| Discharge (simulated) | Daily | Process-based catchment model | Exploratory and ML | |
| DOC (simulated) | Daily | Process-based catchment model | ML | |
| Reservoirs | fDOM, WT, DO, turbidity, Chl-a | Daily (aggregated from 2-min profiler data) | In situ profilers | ML |
| DOC | Monthly | Monitoring records | Exploratory | |
| Gate operation | Event-based (gate changes) | Reservoir operational records | Exploratory and ML | |
| Stored volumes | Daily | Reservoir operational records | ML | |
| DWTP inlet | DOC, UV254, SUVA, WT | Daily | DWTP records | Exploratory and ML |
| DWTP outlet | Total THM concentration | Weekly | DWTP records | Exploratory |
Meteorological forcing was characterized using daily air temperature, total precipitation, and solar radiation obtained from the ERA5 reanalysis product of the European Centre for Medium-Range Weather Forecasts.27 ERA5 provides global atmospheric data at a spatial resolution of 0.25°. Data were extracted for the grid cell encompassing the Ter reservoir system and used to represent atmospheric drivers influencing catchment processes, reservoir stratification, and water temperature dynamics.
Hydrological inputs from the Ter River were represented using daily discharge and DOC concentrations simulated by the Precipitation, Evapotranspiration, and Runoff Simulator for Solute Transport (PERSiST)28 coupled with the Integrated Catchments Model for Carbon (INCA-C),29 previously validated for the Ter basin.30 These simulations were used to represent upstream catchment inputs and to provide daily values for ML analysis. In addition, observed DOC concentrations were available at monthly resolution and were used exclusively for exploratory, continuum-scale analyses.
Reservoir dynamics were characterized using a combination of monitoring records, high-frequency profiling data, and operational information. Monthly DOC concentrations at the surface and at extracted depths were available for all reservoirs and were used for exploratory analyses. For Sau and Susqueda reservoirs, high-frequency water quality data (2-minute resolution) from profiling buoys installed near the dams were aggregated to daily resolution at the surface (0–5 m) and at extraction depths (2.5 m above and below the active gate). At each site, a YSI EXO2 multiprobe recorded turbidity (NTU), chlorophyll-a (Chl-a, mg L−1), dissolved oxygen (DO, mg L−1) water temperature (°C) and fDOM (QSU) throughout the water column. EXO fDOM can be used as a surrogate for colored DOM (CDOM) (excitation 365 ± 5 nm; emission 480 ± 40 nm), and raw fDOM data were water temperature corrected following.31 Additionally, reservoir operational data, including daily stored volume and gate operation, were used to represent storage dynamics and selective withdrawal. No profiler data were available for the Pasteral reservoir due to its small volume and short HRT (∼1 day) relative to Sau and Susqueda. Profiler data availability differed between reservoirs. Susqueda profiler data were available from 4 February 2017 to 30 November 2020, whereas Sau profiler data were available from 4 February 2017 to 1 March 2020, after which the instrument failed due to damage sustained during Storm Gloria (19–24 January 2020).32 Historical time series of selected reservoir variables are available in the SI (Fig. S2 and S3).
At the DWTP inlet, historical datasets of raw water quality variables were available daily, including DOC, UV254, specific ultraviolet absorbance (SUVA), and water temperature. At the DWTP outlet, total THM concentrations were measured at weekly resolution and used exclusively for exploratory analyses. Further methodological details for these variables are provided in the SI (Text S1).
Exploratory analyses were conducted using the full available monitoring period (1 January 2015–31 December 2023). ML models were trained and evaluated over 4 February 2017 to 30 November 2020, with some gaps, defined by Susqueda profiler availability to retain the Storm Gloria period in the dataset; missing Sau profiler periods within this window were treated as extended gaps in the corresponding predictors. This period was selected to ensure the inclusion of high-frequency reservoir variables required to represent withdrawal-depth water quality and operational dynamics. While it captured substantial hydroclimatic variability, including extreme events, it did not fully encompass the longer-term variability observed in the full monitoring record (2015–2023). Therefore, model results should be interpreted within the range of conditions represented during the model training and evaluation period. All datasets were subjected to quality control and harmonized prior to analysis. Variables used in ML were aligned on a common daily time step; interpolation was applied only to profiler-derived variables to support daily alignment, while all other predictors were already available at daily resolution. Detailed preprocessing procedures are provided in Text S1.
To evaluate spatial patterns in organic matter variability along the continuum, DOC was used as the sole DOM proxy, as it was the only organic matter variable consistently available across all locations. DOC concentrations were compared along the river-reservoir-DWTP continuum using seasonal summaries and correlation analysis; for the Sau and Susqueda reservoirs, DOC values at both surface and extracted depths were considered.
Two ML approaches were implemented in this study: RF and LSTM neural networks. These models were selected to provide complementary perspectives on driver attribution rather than to perform exhaustive model benchmarking. RF was used as a robust and interpretable baseline method, widely applied in environmental prediction tasks due to its ability to capture nonlinear relationships and handle correlated predictors. In contrast, LSTM networks were selected to represent state-of-the-art sequence modeling approaches, capable of learning temporal dependencies and lagged relationships in time series data.
RF is a non-parametric ensemble method that constructs multiple decision trees using bootstrap samples and aggregates their predictions, and has demonstrated strong predictive performance across a range of environmental forecasting applications.34–37 LSTM networks were employed to represent temporal dependencies and lagged relationships in time-series data, and have shown strong performance in recent hydrological and water quality forecasting studies.37–41
Driver attribution analyses were performed using both RF and LSTM models, while LSTM models were used for forecasting and scenario simulations. Predictor sets were tailored to each modeled variable to reduce redundancy and model complexity. All predictors were aligned to a common daily time step and harmonized prior to modeling (see Text S1). LSTM inputs were constructed using a fixed 14-day lookback window. Different window lengths were tested, and a 14-day window yielded the best predictive performance across all target variables. This window length was consistent with the short- to medium-term dynamics of the system. Models were trained and evaluated using a chronological split, with the first 80% of the time series used for training and the remaining 20% reserved as a held-out test period. Model performance was assessed using the coefficient of determination (R2), root mean squared error (RMSE), mean absolute error (MAE), and Kling–Gupta efficiency (KGE). Reproducibility was ensured by fixing random seeds and enforcing deterministic settings. Additional details on model configuration, hyperparameter selection, and implementation settings are provided in Text S2.
Predictor relevance was quantified using PI, which evaluates the contribution of each predictor by randomly permuting its values and quantifying the resulting change in model performance.34 This approach avoids biases associated with split-based importance measures in the presence of correlated predictors.42 Importance was expressed as the percentage change in the coefficient of determination relative to the unpermuted model (ΔR2, %).
PI was computed on the chronologically held-out validation period for both RF and LSTM models. For RF, each predictor was permuted repeatedly and ΔR2 values were averaged across repetitions. For LSTM, predictors were permuted repeatedly across samples in the test set while keeping the remaining predictors unchanged; permutations were applied at the sequence level to preserve within-sequence temporal structure of non-permuted inputs. Attribution results from RF and LSTM were compared to assess the robustness of driver rankings across static and sequential learning frameworks.
Positive ΔR2 values indicate a loss of predictive skill when a predictor is permuted and therefore denote an important contributor to model performance, whereas values near zero or negative indicate negligible or unstable contributions that may arise from sampling variability and collinearity.43 Predictors were interpreted based on relative ranking and consistency across repetitions, consistent with recommendations for correlated predictors in predictive models.43 As PI can be sensitive to multicollinearity among predictors, importance values may be distributed across correlated variables. Therefore, results were interpreted primarily in terms of relative ranking rather than absolute importance values.
Two operational scenarios were defined at the Susqueda reservoir, the last major operational control point upstream of the DWTP, to represent plausible selective-withdrawal strategies designed to influence THM risk indicators (DOC and water temperature) at the DWTP inlet: (i) extraction of water layers characterized by lower organic matter values, and (ii) extraction of the coolest available water layer. These scenarios were implemented by modifying the input time series of reservoir extraction depth and associated profiler-derived water quality variables while keeping all other predictors identical to baseline conditions. Scenario definition was constrained to physically accessible withdrawal layers based on observed profiler data and gate availability under prevailing reservoir water levels. Fig. S3 illustrates the Susqueda reservoir stratification, water quality, water levels, and gate operation data used to define the scenarios.
Scenario predictions were computed only for dates where the required scenario inputs were available for the preceding 14 days (LSTM lookback window); therefore, gaps in scenario trajectories reflect incomplete profiler input data rather than model instability. For each scenario, the optimized LSTM models were used to simulate DOC and water temperature at the DWTP inlet for all eligible dates. Baseline simulations corresponding to observed operational conditions were also produced for comparison. Simulated DOC and water temperature time series were subsequently translated into THM formation risk classes using empirical, expert-based relationships developed previously for the Ter DWTP.25 In this framework, DOC and water temperature were first classified into discrete levels based on predefined concentration and temperature ranges (Table S2), and combined THM formation risk classes were then assigned using a rule-based matrix linking these categories (Table S3). These risk classes represent pre-treatment THM formation risk at the DWTP inlet, defined as the potential for DBP formation during subsequent disinfection. Changes in THM formation risk under alternative scenarios were assessed by comparing simulated risk classes against baseline conditions over the simulated period.
The temporal evolution of raw water DOC and water temperature exhibited seasonal variability, while departures from typical cycles were apparent during periods of hydro-meteorological extremes (Fig. 2a). To place these patterns in context, atmospheric forcing (air temperature and precipitation) and hydrological and operational conditions are shown in Fig. 2b and c. Total THM concentrations measured at the DWTP outlet displayed marked temporal variability (Fig. 2a) and tended to be higher during periods of elevated DOC and/or higher raw water temperature, although the correspondence was not systematic. Pearson correlation analysis indicated positive but moderate associations between THMs and DOC (r = 0.39) and between THMs and water temperature (r = 0.25). While these correlations alone do not imply causality, they indicate that DOC and water temperature capture part of the variability associated with THM formation. In this study, these variables were used as operational indicators of pre-treatment THM formation risk at the Ter DWTP inlet, rather than direct predictors of THM concentrations.
Several periods highlighted the influence of extreme conditions on inlet water quality and potential THM formation. During Storm Gloria in early 2020, DOC increased abruptly (Fig. 2a), concurrent with intense precipitation and hydrological disturbance (Fig. 2b and c), likely reflecting enhanced mobilization and transport of catchment-derived organic matter.44 In contrast, the prolonged drought from 2021 onwards coincided with elevated water temperatures and altered DOC dynamics (Fig. 2a), with DOC increases often occurring after rainfall events after extended dry periods, a response commonly reported in Mediterranean catchments.45
Finally, higher THM concentrations observed during the later part of the record (2022–2023; Fig. 2a) suggested that THM formation risk during prolonged droughts was influenced not only by upstream hydrological conditions but also by source water management decisions at the system scale. During periods of reduced inflow and elevated temperatures, the Ter DWTP relies on operational adjustments within the reservoir cascade to manage raw water quality, while overall supply reliability is supported through blending of treated surface water with desalinated water prior to storage and distribution. While blending does not alter THM formation during treatment, it may increase bromide in distributed water, promoting the formation of more toxic brominated THMs.46 Although bromide concentrations and THM speciation were not evaluated in this study, this highlights an important operational consideration for managing DBP precursors in source waters.
Correlation analyses further revealed clear spatial connectivity patterns along the continuum (Fig. S6). DOC at Ter River showed weak or negative correlations with downstream locations (Sau-extracted: r = 0.38; Sqd-extracted: r = −0.39; PST: r = −0.41; DWTP-inlet: r = −0.30), indicating limited direct propagation of upstream riverine variability. Sau exhibited only modest correlations with downstream sites (Sqd-extracted: r = 0.15; PST: r = 0.14; DWTP-inlet: r = 0.17). In contrast, Susqueda displayed strong correlations with Pasteral and the DWTP inlet (r = 0.94 and r = 0.91, respectively), consistent with PST-DWTP inlet coupling (r = 0.95) and indicating that DOC variability reaching the treatment plant is primarily controlled by the lower reservoirs.
Comparison with surface DOC values at Sau and Susqueda reservoirs (Fig. S7) showed consistently higher concentrations and greater seasonal amplitudes in the reservoir epilimnion relative to extracted waters, particularly during stratified periods. This contrast highlights the role of selective withdrawal in modulating the quantity and character of organic matter delivered downstream.50 Similar patterns were observed for surface DOC correlations (Fig. S8), although relationships were generally weaker, reinforcing the relevance of withdrawal conditions for downstream water quality.
Together, these analyses indicate that DOC variability at the DWTP inlet is strongly linked to conditions at Susqueda. This longitudinal connectivity identifies Susqueda as the key upstream control point for organic matter dynamics affecting the treatment plant. Given the availability of high-frequency monitoring at this location, fDOM measured at the extraction depth was retained as a proxy to investigate short-term organic matter dynamics. In combination with DOC and water temperature measured at the DWTP inlet, these results define the three variables selected for subsequent analysis of upstream drivers, predictability and THM formation risk.
At the DWTP inlet, DOC variability was primarily associated with water quality conditions at the Susqueda withdrawal depth, with extracted fDOM at Susqueda emerging as the most influential predictor in both RF and LSTM models (Fig. 4a). Additional contributions from DO, turbidity, water temperature and gate operation at Susqueda suggested that DOC reaching the DWTP reflects organic matter characteristics shaped within the reservoir and transmitted downstream through selective withdrawal operations. Together, these variables represent optical properties, particulate inputs, and temperature-dependent processes, which may provide complementary information beyond DOC alone, as DOM responses can be linked to internal biological and physical controls that vary with stratification and residence time in reservoirs.51,52 The higher DOC concentrations observed in the reservoirs relative to the upstream Ter River (section 3.2) further indicate a substantial autochthonous component to organic matter dynamics, consistent with the elevated importance of in-reservoir water quality variables in the LSTM model.53 In contrast, Ter River DOC and discharge showed weak or negligible contributions in the LSTM but slightly higher relevance in RF, reflecting methodological differences whereby RF captures static cross-sectional associations while LSTM emphasizes predictors that improve temporal forecasts across lag structures.
For water temperature at the DWTP inlet (Fig. 4b), PI results clearly identified water temperature at the Susqueda extraction depth as the dominant driver, particularly in the LSTM model. Air temperature contributed to a lesser extent, while other upstream hydrometeorological variables had minimal explanatory power. This pattern indicated that thermal conditions at the DWTP were largely controlled by selective withdrawal at Susqueda, with meteorological forcing indirectly embedded in the reservoir thermal structure rather than acting as a direct driver at the inlet.
At Susqueda reservoir, fDOM was modeled as an independent target variable to better understand controls on organic matter quality at this key upstream location (Fig. 4c). PI analysis highlighted reservoir storage volume as a major driver of fDOM variability, alongside withdrawal-related variables. This suggests that fDOM dynamics at Susqueda are closely linked to hydrodynamic and stratification conditions that regulate internal organic matter production, residence time, and vertical distribution.54 The importance of storage volume was consistent with findings by Mercado-Bettín et al., 2025, who reported a similar role of volume in controlling fDOM dynamics at the Sau reservoir, where water volume acted as a surrogate for in-reservoir DOM production, with lower volumes associated with reduced fDOM and higher volumes corresponding to increased and more stable values. The agreement between studies indicates that, in large managed reservoirs, water availability and storage conditions may shift DOM control from catchment-derived inputs toward internal biogeochemical processing.
Overall, RF and LSTM models yielded coherent driver rankings. RF provided an interpretable baseline of predictor relevance in a highly correlated system, while LSTM emphasized predictors that consistently improve temporal forecasts, reducing the apparent role of weaker or collinear variables. Across all target variables, Susqueda reservoir operational and withdrawal-related variables emerged as the dominant controls, underscoring the central role of reservoir management in shaping organic matter and thermal conditions that propagate to the DWTP inlet and influence THM formation risk.
It should be noted that PI estimates may be affected by multicollinearity among predictors, which can lead to shared or redistributed importance across correlated variables. Therefore, results are interpreted in terms of consistent patterns across predictors and modeling approaches rather than as precise quantitative measures of individual variable influence. This is particularly relevant in environmental systems where many drivers are interdependent.34
For DOC at the DWTP inlet, the highest predictability was achieved using a multivariate configuration comprising ten predictors dominated by Susqueda withdrawal water quality variables (fDOM, water temperature, DO, turbidity and Chl-a), together with extracted fDOM at Sau, reservoir storage volumes, and upstream river inputs (DOC and discharge). Water temperature at the DWTP inlet predictions required a simpler configuration, with Susqueda withdrawal temperature as the dominant predictor and air temperature providing secondary information. The strong performance obtained with this minimal configuration highlights the deterministic nature of downstream thermal dynamics once withdrawal-layer temperature was accounted for. In contrast, fDOM predictions at the Susqueda withdrawal depth required a broader predictor set to achieve optimal performance. The best results were obtained using all predictors retained from the PI analysis, with the exception of precipitation, which did not improve predictive skill, likely due to collinearity with river discharge and storage dynamics.43 This result reflects the greater complexity of organic matter quality dynamics within the reservoir, which are influenced by interacting processes.
These results demonstrated that the LSTM models effectively translated the dominant drivers identified in section 3.3 into robust generalizable predictions, while revealing clear contrasts in the number and type of predictors required to optimally predict each target variable.
Across the simulation period, the two operational strategies did not produce uniform benefits and did not always concur. The “minimum fDOM” strategy (strategy A; Fig. 6a) tended to reduce predicted DOC relative to baseline during selected periods, whereas the “coolest layer” strategy (strategy B; Fig. 6b) more consistently reduced predicted water temperature during stratified seasons. Because the THM formation risk class depends on both DOC and water temperature, a strategy only reduced risk when it lowered the variable that mattered most at that time. As a result, some periods showed a clear benefit from one strategy but little change from the other, while in other periods both strategies produced similar outcomes, especially when baseline conditions were already close to a risk-class threshold (Fig. 6c).
The results indicate that both strategies can reduce THM formation risk relative to baseline conditions, but their effectiveness varies seasonally and depends on prevailing reservoir stratification. Two windows illustrate the potential for operational mitigation in this system. During summer–autumn 2017 and summer–autumn 2020 the simulations indicate that THM formation risk could have been reduced relative to baseline (Fig. 6c). These windows align with conditions where late summer stratification and subsequent hydro-meteorological transitions can elevate risk; DOC can increase sharply after first post-summer rainfall events, while water temperature may remain relatively high, together shifting the system into higher risk categories. This pattern was consistent with seasonal behavior previously reported for DOC dynamics in the Ter catchment and with the sensitivity of Mediterranean systems to “first flush” events following dry periods.45,55 In these periods, strategy A was more effective when DOC reductions were sufficient to shift the DOC class downward, while strategy B was more effective when reducing temperature shifted the temperature class or prevented transitions to the highest-risk combination.
A key insight from the scenario analysis was that optimizing DOC and temperature simultaneously was not always possible because the importance of individual drivers can diverge seasonally. Although DOC and water temperature at the DWTP inlet were positively associated with THM concentrations at the plant and are therefore useful practical indicators of formation risk,25 their dominant upstream control at Susqueda may oppose each other during stratification. For example, cooler withdrawal layers can at times be associated with different organic matter quality signals than surface waters, and periods that minimize fDOM at the withdrawal depth may not coincide with the coolest available layer.56 This helped explain why the two strategies diverged in some seasons (Fig. 6a and b) and highlights the value of the predictive framework for decision support. Rather than relying on a single “rule-of-thumb”, managers can evaluate the trade-off between lowering DOC-related risk versus lowering temperature-related risk in real time, conditional on current stratification and gate availability.57
From an operational perspective, the results suggest a pragmatic approach. When the objective is short-term reduction of THM formation risk categories, managers could prioritize the strategy that targets the limiting component of risk at that moment. For example, strategy A during periods when DOC is near a class threshold and likely to increase (e.g., post-summer rainfall transitions), and strategy B during periods when temperature dominates risk (e.g., warm stratified conditions when DOC is relatively stable). However, the baseline record (Fig. S3), emphasizes that these decisions are sometimes constrained by reservoir storage dynamics, which determine the set of withdrawal options available at any given time and are influenced by system-wide release requirements.57 Therefore, the most actionable implication may be that source water quality management should be coordinated with water quantity governance.1 Maintaining storage conditions that preserve selective withdrawal flexibility (when feasible) may increase the capacity to mitigate DBP precursor export during high-risk seasons.56,58 Under increasing drought pressure and more variable inflows, integrating reservoir operations for both supply reliability and water quality may become essential to sustain risk reduction opportunities.59,60
The simulated withdrawal strategies represent operationally feasible alternatives within the Ter reservoir system, as they were based on observed profiler data and reflect physically accessible withdrawal conditions under given reservoir water levels. Their feasibility is therefore primarily determined by reservoir storage conditions, which control the availability of intake gates, while broader water management objectives influence these strategies indirectly through their effect on reservoir storage.
In addition, the use of DOC and water temperature as indicators represents a simplified description of potential DBP formation processes and does not explicitly account for other influencing factors such as halides, pH, or disinfectant conditions which are typically included in DBP formation models applied within treatment plants and distribution systems.61,62 The relevance of these indicators is therefore system-specific and should be assessed for each case study, particularly in systems with higher halide concentrations or other relevant DBP precursors.
A more comprehensive assessment of uncertainty, including contributions from input data, initial conditions, model parameters, and model structure, was beyond the scope of this study but represents an important direction for future work.
Despite these limitations, the consistency between long-term exploratory analyses and model-derived drivers supports the robustness of the identified control points, highlighting the value of the framework for operational decision support under observed system conditions. This is particularly relevant given that only a limited number of recent studies have applied ML approaches to model DOM dynamics (e.g., fDOM) in reservoirs,36,63 which typically focus on surface conditions and meteorological forcing due to data limitations and the complexity of representing internal reservoir processes.
In the Ter system, this control was exerted by the Susqueda reservoir, where longitudinal connectivity and operational choices primarily governed DOM dynamics at the drinking water treatment plant inlet. Strong covariation of DOM proxies supported the use of DOC as an operationally relevant indicator, while withdrawal-depth fDOM and temperature emerged as dominant drivers, emphasizing the importance of vertical reservoir structure and selective abstraction. The high predictive skill achieved for DOC, water temperature, and fDOM using LSTM models further indicates that data-driven approaches can effectively capture the combined influence of climate variability, storage dynamics, and operations in complex, managed catchment systems.
Scenario simulations revealed that upstream operational strategies, such as selective withdrawal, can reduce indicator-based THM formation risk at the DWTP inlet, but only within specific seasonal and hydroclimatic windows. The effectiveness of these interventions was constrained by reservoir levels, stratification state, and infrastructure limitations, emphasizing that water quality objectives must be coordinated with water quantity governance under increasing hydroclimatic stress.
Beyond the Ter system, this work highlights a broader and transferable opportunity for drinking water utilities. While the empirical risk relationships and operational constraints are site-specific, the overall framework (combining multi-source monitoring data, indicator-based risk metrics, machine learning, and scenario analysis) is broadly applicable to other reservoir systems. Many treatment plants already collect long-term and increasingly high-frequency data on hydrology, reservoir conditions, and raw water quality, and the integration of these datasets with globally available climate data products enables site-specific, data-driven analyses to support anticipatory DBP formation risk management. When combined with local knowledge of infrastructure and operating constraints, such approaches offer a practical pathway to move from reactive end-of-pipe mitigation toward proactive, multi-barrier strategies that begin at the source. As climate change continues to intensify hydroclimatic variability, leveraging existing data and integrated modeling and forecasting frameworks will be essential for safeguarding drinking water quality across diverse regions.
| This journal is © The Royal Society of Chemistry 2026 |