Open Access Article
Christian Ortiz-Lopez
*a,
Christian Boucharda and
Manuel J. Rodriguez
b
aCentre de recherche en aménagement et développement (CRAD), Université Laval, Québec, Canada. E-mail: christian.ortiz-lopez.1@ulaval.ca
bÉcole supérieur d'aménagement du territoire et de développement régional (ESAD), Université Laval, Québec, Canada
First published on 15th June 2026
Raw water quality used for municipal water treatment is impacted during and after rainfall events and the resultant changes in river flow. Recently, raw water quality parameters such as turbidity have been modeled and predicted using machine learning algorithms, based on environmental, hydrological, and meteorological information as input variables. Our research aims to integrate upstream water quality with river flow and watershed rainfall data into interpretable machine learning algorithms to enhance raw water turbidity predictions. Such predictions would allow water utility operators to anticipate the required adjustments during water treatment processes. First, we estimated lag-times between the upstream input variables of rainfall in watershed, river flow and raw water turbidity, and the output targeted variable of downstream raw water turbidity. Then, we used a XGBoost technique to predict raw water turbidity using upstream water quality along with river flow and watershed rainfall data. Finally, the overall importance of every input variable was estimated using a SHAP (SHapley Additive exPlanations) strategy. Results showed that the upstream raw water turbidity is the most important input variable, followed by river flow. Best performance metrics and time series visual inspection of modeled variables showed that integrating upstream raw water quality data leads to enhanced raw water predictions. These results could open possibilities for developing and implementing regional raw water quality modeling that can feed weather-event-water-treatment-early-warning-systems (WEWT-EWS). Future research could improve raw water quality prediction horizons and include interannual data.
Water impactThis research shows that integrating upstream water-quality data, river flow, and rainfall, greatly improves predictions of source-water turbidity. More accurate, timely predictions can help drinking-water facilities anticipate treatment challenges, supporting real-time early-warning systems during and after rainfall events and safeguarding water quality. These models outline a clear pathway toward proactive, climate-resilient water management. |
Spatiotemporal variability of source water quality parameters, such as Tu, can affect drinking water treatment and production, and lag-times can be an issue in determining a correct and timely treatment response.7 The first lag-time is between the moment when raw water Tu starts to increase and the moment the required coagulant dosage adjustment is determined at the Drinking Water Treatment Plant (DWTP). Then, there is a second lag-time before verification that this dosage adjustment has been effective in treated or produced water. During these lag-times, potential non-optimal operating conditions and poor treatment performance due to insufficient consideration of changes in raw water quality may lead to higher risks to public health.8
Early warning systems (EWS) have been developed to give decision-makers timely alerts about contaminant events in surface waters.9,10 Characterizing the water contamination event is a crucial function of an EWS. Nowadays, characterization can be carried out by means of modeling and forecasting, with machine learning emerging as a cost-effective and accurate tool to model and forecast raw water quality parameters such as Tu.7,11,12 Machine learning techniques take advantage of large amounts of data produced at DWTPs and/or hydrological and meteorological meter stations to find empirical relationships.
Previous studies have reported the relatively high accuracy of machine learning in predicting crucial raw water quality parameters, such as Tu. For instance, Alizadeh et al. (2018) Alizadeh, Kavianpour13 implemented artificial neural networks (ANN), extreme learning machines (ELM), and support vector regressions (SVR) to model Tu and other physical parameters in an estuary using upstream river flow as an input variable with a lag-time of up to three hours. Results found that using those lag-times lead to R2 between 0.89 and 0.89 and a root mean square error (RMSE) between 1.04 and 1.51 nephelometric turbidity units (NTU) during the test period. In another study by Delpla et al. (2019) Delpla, Florea,9 ANN were employed to predict daily Tu in source water using rainfall and antecedent soil moisture conditions (antecedent dry days), obtaining a R2 of 0.81 and a mean square error (MSE) of 1.08 NTU. Ahmed et al. (2021)14 applied several machine learning techniques such as Decision Trees (DT), k-nearest neighbour (KNN), logistic regression (Log
R), ANN, and Naive Bayes (NB) to classify Tu and other physical and microbiological parameters used to calculate a water quality index in a dam reservoir. They found that DT outperformed other models and demonstrated the need to aggregate other hydrological parameters such as rainfall and river flow. Zhang et al. (2021)15 used random forest (RF) to model raw water Tu in a lake based on meteorological data such as wind field, air temperature, and rainfall, finding R2 coefficients between 0.73 and 0.90. However, they did not consider lag-times in the input variables. Adedeji et al. (2022)16 implemented SVR, ANN, RF, and XGBoost to model physical and microbiological variables such as total phosphorus, total nitrogen, suspended solids, dissolved oxygen, and fecal coliform bacteria. They used preset modeling scenarios that include environmental, hydrological and water quality input variables. One scenario included precedent Tu in the same location as the output target as an input variable. Ortiz-Lopez et al. (2023, 2024)6,17 developed and tested a methodology to include lagged rainfall and river flow as input variables to model raw water Tu and NOM (represented by UV absorbance) using ANN, SVR, RF, and XGBoost. Performance metrics show that XGBoost outperformed other machine learning techniques. However, modeling multiple Tu peaks remains to be improved. More recently, J. Chen et al. (2025)18 predicted post-wildfire stream temperature and Tu using SVR and RF. For input variables, they used a number of antecedent moving averages of environmental, meteorological, and hydrological parameters. The best models showed performance metrics of R2 between 0.87 and 0.89 and RMSE between 1.77 and 2.2 NTU using RF. Kemper et al. (2025)19 forecasted raw water Tu in three different locations using the gradient boosting (GB) technique, with ten streamflow forecasts (precedent conditions and precedent peaks). Although upstream Tu was initially considered as a possible input variable, the final models only used river flow data. Models showed relatively high performance (R2 of 0.7 and NSE between 0.48 and 0.49) when using GB. Recently, Zhang et al. (2025)20 coupled a light GB model and a long–short term memory (LSTM) model to predict chemical oxygen demand (COD) using several input variables including upstream COD. The best model showed performance metrics of R2 of 0.826 for the stacked model and 0.802 for the LSTM. Yang et al. (2025)21 developed a sub-daily machine learning approach for predicting eutrophication trends using also upstream features as input values. GB and RF models presented the best performance metrics predicting Tu, total nitrogen and pH.
On the other hand, the use of data-driven and black box models for decision making can be problematic, since results cannot usually be interpreted. Several techniques have recently emerged to explain how predictions are performed by machine learning, such as linear models, neural networks and decision-tree-based techniques. For instance, the SHAP strategy can estimate the global importance of every input variable in the prediction of the output variable.22 This technique was recently used in several water quality modeling studies to explain and understand modeling results.23–26 According to the reported literature above, several machine learning techniques have been used to model raw water quality parameters such as Tu using hydrological and environmental variables. These studies favored the use of variables such as river flow, rainfall, number of dry or precipitation days, and water level, among others. However, including the targeted raw water variable measured upstream as an input variable has remained little explored until now, especially in the field of drinking water production.
Our research aims to demonstrate that modeling of raw water quality parameters such as Tu (especially peaks of turbidity) can be improved by integrating watershed upstream water quality as input information. Downstream DWTP turbidity models and predictive analyses can utilize upstream Tu measurements that are continuously monitored online. As a case study, we considered a watershed where the raw water source for at least three municipal water treatment intakes are supplied by the same river. We used the XGBoost machine learning technique27 as a tool to estimate the empirical relationships between all input variables and the raw water Tu at the targeted intake. To verify whether adding upstream Tu to input variables improved raw water Tu modeling, we coupled this technique with a SHAP strategy22 that estimates the overall feature importance and local decision interpretation of the machine learning model. XGBoost is a robust and interpretable technique which aligns with SHAP strategy. The novelty of this study, beyond the improvement in raw water quality predictions, is the use of information produced at upstream DWTP as decision-support aids. Timely and accurate predictions could provide improved input for developing tools for drinking water treatment managers and operators to enhance response strategies to variations in raw water quality that affect treatment efficiency. To the best of our knowledge, this is the first study to develop raw water Tu modeling for drinking water production purposes that includes upstream raw water quality, river flow, and rainfall as input variables within the same framework.
![]() | ||
| Fig. 1 Location of case study watershed, showing drinking water treatment plants (DWTP) and intakes, flow meter stations and rain gauge stations. | ||
In this case study, three DWTPs and intakes were considered: DWTP1 – Charny (CH), DWTP2 – Sainte-Marie (SM) and DWTP3 – Saint-Georges (SG), all of which take their raw water from Chaudière River. The raw water quality parameter targeted is Tu, which was measured at the three intakes by online analyzers. Data were collected from three river flow meters along the Chaudière River, Q1 – Saint-Lambert-de-Lauzon, Q2 – Saint-Georges and Q3 – Saint-Martin, and from four rain gauge stations, P1 – Scott, P2 – Saint-Severin, P3 – Saint-Georges and P4 – Saint-Ludger.29,30 The data set included raw water quality and hydrological and meteorological information, with more than 4000 hourly observations from April to October 2017. Due to technical issues, an average of 10.5% of observations were lost from the Tu series. On average, 60 gaps in observations lasting no longer than 24 hours and fewer than 5 gaps lasting no longer than 10 days were found in the Tu series. Flowmeter data and rainfall accumulation data were preprocessed by the Ministry of the Environment of the province of Quebec. We implemented several techniques to find and eliminate outliers (as described in ref. 3) and to fill in missing data, including Kalman smoothing with an ARIMA model31 to impute missing data. The data collection period captured the behavior of raw water quality during different dry and rainy periods (between April and October). In a Nordic country, the period considered is when turbidity varies the most. During winter, early spring, and late autumn, turbidity variations are much weaker. This period (April to October) excludes winter and early spring and interannual variability. Hydrological and meteorological data from raw water were collected either hourly (precipitation) or sub-hourly (Tu and river flow) and transformed to hourly (average) frequency.
Table 1 presents a summary of primary statistics for all the variables used in this study. Tu values observed throughout the Chaudière River were relatively low, although severe peaks did occur. Previous studies have shown that those Tu peaks (as well as peaks of other relevant raw water quality variables) are observed during and after rainfall events in the watershed.3,6 We observed that maximum river flow values occurred in the month of April, a period when snowmelt and ice jams result in spring flooding. During the summer period, peak flow rates were around 200 to 400 m3 s−1. We also noted that river flow gradually increased from upstream to downstream (Q3 to Q1), because of various tributaries flowing into the mainstem. Time series plots of river flow, precipitation and raw water Tu are shown in SI (Fig. S1–S3).
| Data | Location | Variable | Symbol | Unit | Mean | Min. | Max. | SD |
|---|---|---|---|---|---|---|---|---|
| Raw water quality | Charny DWTP | Turbidity | TuCH | NTU | 9.7 | 1.3 | 152.1 | 15.0 |
| Sainte-Marie DWTP | Turbidity | TuSM | NTU | 7.4 | 0.6 | 100.0 | 13.5 | |
| Saint-Georges DWTP | Turbidity | TuSG | NTU | 15.1 | 4.6 | 225.1 | 20.0 | |
| Hydrological | Saint-Lambert municipality | River flow | Q1 | m3 s−1 | 121.3 | 8.5 | 1580.3 | 206.2 |
| Saint-Georges municipality | River flow | Q2 | m3 s−1 | 58.2 | 3.7 | 959.9 | 99.4 | |
| Saint-Martin municipality | River flow | Q3 | m3 s−1 | 39.3 | 3.2 | 608.9 | 65.4 | |
| Meteorological | Scott municipality | Precipitation | P1 | mm h−1 | 0.14 | 0.00 | 24.20 | 0.93 |
| Saint-Severin municipality | Precipitation | P2 | mm h−1 | 0.15 | 0.00 | 19.20 | 0.80 | |
| Saint-Georges municipality | Precipitation | P3 | mm h−1 | 0.15 | 0.00 | 18.00 | 0.81 | |
| Saint-Ludger municipality | Precipitation | P4 | mm h−1 | 0.14 | 0.00 | 24.8 | 0.79 |
Data was randomly split using a training with a cross-validation (80%) and testing (20%) (CVT) approach.34 The model optimization involved using a 10-fold cross-validation approach for the training phase and grid search to find the optimized hyperparameters. After the training phase, the optimized model was tested using unseen data. The models were then evaluated using performance metrics to determine the quality of the predictions. Furthermore, predictions for Tu were compared to the observed data using scatter plots and time series graphs. Model predictive uncertainty was quantified using bootstrap prediction intervals.35,36 We resampled the training dataset 1000 times with replacement, refitting the XGBoost model for each bootstrap sample, and generating a distribution of predictions for each observation. The 5th and 95th percentiles of these bootstrap predictions defined the 90% prediction band, while the 50th percentile represented the median prediction. To obtain smooth and explainable uncertainty curves for visualization, the lower, median, and upper percentile estimates were further smoothed using a LOESS (locally weighted regression) technique, which reduced local variability and highlighted the overall trend of the uncertainty structure.
![]() | (1) |
![]() | (2) |
![]() | (3) |
![]() | (4) |
R2 values, which range between 0 and 1, give the percentage of the total variance of the observed data that is explained by the model. Values of the NSE coefficient range between minus infinity and 1, with 1 representing a perfect fit and an efficiency of less than zero indicating that the mean value of the observed data is a better predictor than the model. The RMSE provides the standard deviation of the model prediction error. We chose RMSE because it is reliable and gives a relatively high weight to errors. Finally, MAE was used to show the absolute difference between the actual and predicted values. However, this metric only provides information about the extent of the error and not the model validity.
The SHAP value of predictor ϕi represents its influence on the prediction (upstream raw water Tu), which is calculated as a weighted summation across all possible predictor combinations. It is calculated following eqn (5) (ref. 22 and adapted in ref. 40).
![]() | (5) |
![]() | (6) |
| Model | Output variable | Input variable | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Q3 | Q2 | Q1 | P4 | P3 | P2 | P1 | TuSG | TuSM | ||
| 1 | TuSG | 4 | NA | NA | 29 | NA | NA | NA | NA | NA |
| 2 | TuSM | 14 | 15 | NA | 49 | 37 | NA | NA | 22 | NA |
| 3 | TuCH | 20 | 19 | 9 | 95 | 82 | 38 | 40 | 61 | 31 |
Metrics show that all models performed reasonably well (Table 3). XGBoost models outperformed the baseline MLR models. R2 and NSE metrics for the baseline MLR models are reasonably good. However, RMSE and MAE metrics (representing errors between observed and predicted Tu values) of MLR models indicate poorer predictive performance. In contrast, for XGBoost models, R2 values are very high for all the three models (R2 = 0.98–0.99) both in training and testing stages, indicating that observed data variability is well captured by every model. NSE are also very high both in training and testing stages suggesting that models have very good predictive skills. RMSE and MAE are higher in the testing stage than in the training stage, indicating a slight decrease in performance. We also observed that models using more input variables have lower errors in training stage (RMSEmodel 1 > RMSEmodel 2 > RMSEmodel 3). A similar trend can also be observed in the testing stage, for which errors in model 2 are higher than errors in model 3. However, errors in model 1 are also lower than model 2. The 50th percentile in the uncertainty analysis (red line in the three models on Fig. 4) indicates a very small or null bias for all models, because they are almost perfectly aligned with 1
:
1 line (gray line). The width of the red band indicates that the models consistently predict similar Tu values across bootstraps, showing high confidence. On the other hand, the confidence bands get slightly wider at the highest Tu. This likely reflects the greater variability and modeling difficulty associated with peak turbidity during or following rainfall events.
| Model | Output | Inputs | R2 | NSE | RMSE | MAE | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Training | Testing | Training | Testing | Training (NTU) | Testing (NTU) | Training (NTU) | Testing (NTU) | |||
| Model 1 (XGBoost) | TuSG | P4, Q3 | 0.99 | 0.99 | 0.99 | 0.99 | 0.71 | 2.00 | 0.44 | 0.73 |
| Model 2 (XGBoost) | TuSM | P4, P3, P2, Q3, Q2, TuSG | 0.99 | 0.98 | 0.99 | 0.95 | 0.68 | 2.53 | 0.43 | 1.21 |
| Model 3 (XGBoost) | TuCH | P4, P3, P2, P1, Q3, Q2, Q1, TuSG, TuSM | 0.99 | 0.99 | 0.99 | 0.99 | 0.22 | 1.33 | 0.15 | 0.58 |
| Model 1 (MLR) | TuSG | P4, Q3 | 0.98 | 0.98 | 0.96 | 0.97 | 3.92 | 3.82 | 1.32 | 1.32 |
| Model 2 (MLR) | TuSM | P4, P3, P2, P1, Q3, Q2, Q1, TuSG, TuSM | 0.91 | 0.94 | 0.83 | 0.88 | 5.43 | 4.06 | 2.73 | 2.73 |
| Model 3 (MLR) | TuCH | P4, P3, P2, P1, Q3, Q2, Q1, TuSG, TuSM | 0.84 | 0.81 | 0.7 | 0.66 | 8.25 | 8.49 | 3.97 | 3.97 |
Fig. 5 shows time series plots of predicted and observed Tu at the three DWTPs. According to the performance metrics shown in Table 3, we found that all three models predicted data (both training and testing stages) that are generally near to observed data. They show that the three models are able to predict both low and high values throughout the year (from May to October). The best fit is observed for model 3, which has the most input variables, as discussed above. Model 2 has several predicted data points further from the observed data points, which agrees with RMSE and MAE performance metrics. In particular, model 2 overestimated several Tu peaks, especially between June/July and mid-September (Fig. 5b), possibly because there are two dams regulating flow upstream of the Sainte-Marie water intake (see Fig. 1), potentially influencing particles settling. This remains an untested hypothesis, and we currently lack the tools to evaluate it.
![]() | ||
| Fig. 5 Time series plots of predicted and observed Tu at the three DWTPs for (a) model 1 – TuSG; (b) model 2 – TuSM), and (c) model 3 – TuCH. | ||
![]() | ||
| Fig. 6 Interpretation using SHAP strategy of the best model 1 based on (a) input variable importance and (b) summary plot. | ||
![]() | ||
| Fig. 7 Interpretation using SHAP strategy of the best model 2 based on (a) input variable importance and (b) summary plot. | ||
![]() | ||
| Fig. 8 Interpretation using SHAP strategy of the best model 3 based on (a) input variable importance and (b) summary plot. | ||
For model 2 – TuSM (Fig. 7a), we observe that river flow – Q3 is also the most important input variable TuSG is the second most important, and precipitation – P4 is the least important (and also the furthest from the Ste-Marie DWTP). Therefore, higher river flow – Q3 and Q2, and higher TuSG values are the most important for predicting raw water TuSM. Average impact on TuSM is 2.7 NTU for river flow – Q3 and 2.6 NTU for TuSG, whereas precipitation – P4 has an impact of 0.939 NTU. Fig. 7b shows that high TuSG values and river flow – Q3 and Q2 (red points), have higher SHAP values leading to high TuSM values. In contrast, both high and low precipitation – P4 values (blue and red points) have low SHAP values leading to low TuSM values. We also observe that many SHAP values from river flow – Q3 and Q2, and TuSG are stacked near 0. Those observations correspond to lower values of Q3, Q2 and TuSG, meaning that lower river flow and lower or zero precipitation give insufficient information for predicting TuSM high values, since they tend to pull the raw water TuSM predictions below the average.
In addition, a second group of points for river flow – Q3 are observed to stack above the zero SHAP value (Fig. 7b), thus pulling the raw water TuSM predictions above the average. This may account for the raw water TuSM peak values that are observed to be overestimated (pulled up) in Fig. 5, as some river flow – Q3 observations are causing an overestimation of TuSM peak values, which decreases the model 2 performance metrics. In contrast to TuSG and river flow – Q3 and Q2, neither precipitation – P4 nor P3 contribute significantly to TuSM.
For model 3 – TuCH (Fig. 8a), we observe that TuSM (the turbidity at the closest DWTP) is the most important input variable, followed by river flow – Q1 (the closest flow meter station), TuSG, and river flow – Q2 and Q3. Therefore, higher the river flow – Q3, Q2, Q1, and higher TuSG and TuSM values, are the most important for predicting raw water TuCH. The least important variables are the four precipitation variables. The average impact on TuCH values are 2.9 NTU for TuSM, 2.0 NTU for river flow – Q1, and 1.6 NTU for TuSG. Fig. 8b shows that high TuSG and TuSM values (red points) have higher SHAP values leading to high TuCH values. This is also true for Q1 and Q2, although their highest SHAP values (i.e., impact) are lower. None of the precipitation variables contribute significantly to TuCH.
Similarly to models 1 and 2, we observe that many SHAP values from river flow – Q3, Q2, Q1 and TuSG and TuSM are stacked near 0. These observations correspond to lower variable values of Q3, Q2, Q1, TuSG, and TuSM, meaning that lower river flow, lower upstream raw water Tu and lower or zero precipitation provide no information for predicting TuCH high values, since they pull the raw water TuCH predictions below the average.
After interpreting the global influence of input variables on raw water Tu predictions, we selected the most extreme Tu events to analyze how input variables affected the predictions of TuCH (model 3). Fig. 9 presents the interpretation using the SHAP approach for four extreme turbidity events. In this figure, the Shapley value explanation is shown on the left, and the Tu time series on the right. As shown in Fig. 9, both upstream turbidity and river flow generally had a positive impact on TuCH (except for the first event, where Q2 had a negative effect). TuSM is the input variable that has the greatest influence on TuCH predictions, except in the third event, where it is the second most important variable. The second most influential variable is either Q1 or TuSG. In contrast, precipitation shows a negative, very low (between 0 and 1.5 NTU), or negligible impact on TuCH predictions. The limited influence of precipitation on Tu predictions may be explained by the nature of processes occurring in the watershed. River flow can act as an integrating variable of rainfall–runoff processes, as it reflects the cumulative effect of precipitation over the watershed. Furthermore, variations in flow may indicate the dynamics of particle transport into the river, whereas precipitation alone only represents the amount of rainfall. This remains an untested hypothesis, thus further research is needed to clarify the low impact of precipitation on Tu predictions.
![]() | ||
| Fig. 9 Interpretation using SHAP strategy for four extreme turbidity events. Shapley value prediction explanation (left). Turbidity time series (right). | ||
| Removed variable(s) | R2 | NSE | RMSE | MAE | ||||
|---|---|---|---|---|---|---|---|---|
| Training | Testing | Training | Testing | Training (NTU) | Testing (NTU) | Training (NTU) | Testing (NTU) | |
| Q1 | 0.99 | 0.99 | 0.98 | 0.99 | 1.89 | 0.87 | 0.91 | 0.52 |
| Q2 | 0.99 | 0.99 | 0.99 | 0.99 | 0.72 | 0.66 | 0.76 | 0.43 |
| Q3 | 0.99 | 0.99 | 0.98 | 0.99 | 1.75 | 0.67 | 0.78 | 0.43 |
| P1 | 0.99 | 0.99 | 0.99 | 0.99 | 1.69 | 0.66 | 0.76 | 0.43 |
| P2 | 0.99 | 0.99 | 0.99 | 0.99 | 1.70 | 0.66 | 0.75 | 0.43 |
| P3 | 0.99 | 0.99 | 0.99 | 0.99 | 1.74 | 0.68 | 0.77 | 0.45 |
| P4 | 0.99 | 0.99 | 0.99 | 0.99 | 1.73 | 0.66 | 0.76 | 0.42 |
| TuSM | 0.99 | 0.99 | 0.97 | 0.99 | 2.35 | 1.00 | 1.06 | 1.11 |
| TuSG | 0.99 | 0.99 | 0.98 | 0.98 | 2.05 | 0.83 | 0.96 | 0.58 |
| Q1, Q2, Q3 | 0.87 | 0.89 | 0.75 | 0.98 | 7.32 | 2.01 | 2.32 | 1.11 |
| P1, P2, P3, P4 | 0.99 | 0.99 | 0.99 | 0.99 | 1.72 | 0.69 | 0.78 | 0.45 |
| TuSM, TuSG | 0.68 | 0.75 | 0.43 | 0.50 | 10.98 | 7.50 | 3.19 | 2.60 |
Based on the modeling results shown in Fig. 4, the XGBoost technique demonstrated a very high ability to predict the raw water Tu peaks in our case study. The very high R2 performance metric shows that all three models (model 1 – TuSG, model 2 – TuSM and model 3 – TuCH) can explain 99% of the variability of observed Tu. High values of the NSE metric also indicate a very good prediction performance, showing a very low deviation between measured and modeled Tu values. Relatively low RMSE and MAE values also show low deviation between modeled and observed Tu values and are similar to or smaller than minimal values of raw water Tu at the three respective water intakes (Table 1). However, model 2 performed slightly less well than models 1 and 3. The main implication of the results found in this study is that including upstream raw water Tu in addition to river flow as input variables enhances prediction of raw water Tu peaks. Considering the uncertainty levels and confidence bands (Fig. 4), models 1 and 3 could be used in monitoring and decision-making systems, for example, for operational adjustments, anticipating raw water changes, and staff preparation. Since uncertainty moderately increases for higher and peak values, models 1 and 3 should be used with caution in critical and automated operational decisions. Therefore, although the models are useful as a supporting tool, their use as the sole criterion for critical operational decisions is not recommended. Better anticipation of raw water Tu peaks by DWTP managers should contribute to an appropriate and timely preparation of DWTP operations.
Up until now, the importance of input variables on raw water Tu modeling has remained obscured by the black-box nature of machine learning models. Moreover, the impact of the range (high or low) of input variable values on raw water Tu predictions was unknown. However, the SHAP strategy reveals the importance of each input variable in the implemented XGBoost technique. As expected, in model 1 – TuSG (only two input variables), river flow is more correlated with raw water Tu than precipitation. Moreover, high river flow – Q3 values contributed the most (over 150 NTU, dark red points in Fig. 6b) to predicting TuSG. In model 2 – TuSM (five input variables), the most important input variable was river flow – Q3, followed by TuSG and river flow – Q2 (both these contributing a small difference). In this case, although high TuSG values contributed the most (over 60 NTU, dark red points in Fig. 7b) to TuSM prediction, the stacked dark blue points to the right of 0 NTU for river flow – Q3 also contributed positively to TuSM prediction by pulling the raw water TuSM predictions above the average. In model 3 – TuCH (nine input variables) the most important input variable was TuSM and then river flow – Q1. In this case, high values for both TuSG and TuSM (over 60 NTU, dark red points, in Fig. 8b) improved TuCH prediction. Those findings indicate the most useful values for predicting raw water Tu peaks, especially during and after rainfalls.
Studies conducted by Ortiz-Lopez et al., (2023, 2024)6,17 to predict raw water Tu at the Charny DWTP did not consider upstream information on raw water Tu and had lower performance metrics than this current study: R2 = 0.87 and NSE = 0.75 for a Tu model in the testing stage using the SVR technique 6 and R2 = 0.81 and NSE = 0.65 for a Tu model in the testing stage using XGBoost technique.17 These findings demonstrate the relevance of including upstream information on water quality at the watershed scale, in addition to river flow and rainfall, as input variables for targeted Tu modeling, as was done in our model 3. We observed the high importance given to upstream raw water Tu by the machine learning technique, as highlighted by the SHAP strategy. An interpretability strategy, such as SHAP is needed to reveal the impact on the raw water Tu predictions, not only of every input variable, but also of their higher or lower value ranges. Model 3 clearly outperformed other models in this case study, based on its performance metrics. We also suggest that rainfall could be omitted as an input variable in Tu modeling, given its low importance estimated by the SHAP strategy in the three models.
Another important aspect to consider is the feasibility of having all input variables available in real-world applications for EWS. Decision-support approaches based on predictive models that incorporate upstream Tu information must be continuously supplied by DWTP. In the case of upstream Tu, as demonstrated in the sensitivity analysis, the model is able to provide predictions when one of the two Tu inputs is missing; however, its performance decreases when both turbidity inputs are unavailable. Continuous monitoring of raw water Tu is always present in a DWTP. These data are send continuously (online) to the SCADA (supervisory control and data acquisition) system. The system should therefore include a real-time data transmission framework to enable immediate data acquisition. In addition, continuous maintenance of sensors is required to ensure higher reliability. The same considerations apply to the flow variable measured upstream of the DWTP where Tu is to be predicted. Data transmission must be almost instantaneous, and mechanisms to correct flow measurements should be implemented. As shown in the sensitivity analysis, the omission of one or several flow series does not result in a significant loss of predictive performance in the models.
Some limitations of this study may be related to the prediction time horizon. This approach allows us to predict the raw water Tu at a specific moment, using time-lagged input variables. Thus, the greatest time step at which the raw water Tu can be predicted corresponds to the smallest lag-time among all input variables (4 hours for model 3, 14 hours for model 2, and 8 hours for model 1). These prediction time horizons might be too tight for application in early warning systems. Other modeling approaches might produce longer prediction time horizons, for instance using time series forecasting or more powerful time-series based machine learning techniques such as long-short term memory (LSTM) networks. In addition, moisture and dry precedent conditions were not considered as predictors because the related data were not available. Since previous studies have used moisture and dry precedent conditions, we recommend including these variables in future research. Since the Chaudière River has some dams upstream of two of the DWTP of this study, we also recommend including an analysis of dam operations in future studies.
Future research could improve the robustness of raw water Tu predictions by extending prediction horizons and using more complex machine learning models. More data should be collected at water intakes to improve the learning space for training stages of models. This could also open the possibility to test a different data-splitting approach for training machine learning models. For example, Zhu et al. (2023)34 propose using a block splitting approach that combines chronological and random split. Other crucial parameters in drinking water treatment and production should be measured, collected and modeled, such as natural organic matter (NOM). Surrogate parameters for NOM, such as, UV absorbance, have been already modeled and predicted in a water intake using watershed information. Prediction of NOM at water intakes might be improved by including upstream NOM measurements as input variables in machine learning models.
Our findings could help decision-making for operators in DWTP. Accurate and timely predictions of raw water Tu peaks during and after rainfall events are critical information for anticipating sudden changes in raw water quality and for timely adaptation of DWTP operations, such as coagulation management. Results show predictions horizon between at least 4 hours (in the predictive model with fewer input variables) and 8 hours (in the predictive model with more input variables). This is particularly relevant for watersheds with several water intakes supplied by the same river and that are affected by frequent intensive rainfall events.
This research highlights the importance of estimating lag-times when modeling a raw water parameter, such as Tu, and using time series of either hydrological, meteorological and other upstream raw water parameters as input variables. Identification of such lag-times between input and output variables is a critical methodological component when selecting input variables. As previously demonstrated, time-lagged input variables are better at representing the delays between changes in rainfall, river flow, upstream water quality and the targeted water quality parameter.
Results obtained in this study open possibilities for developing regional early warning systems where raw water quality information from upstream water intakes could be used to predict conditions at downstream water intakes. However, this study is limited to the summer period (when turbidity varies the most) and observations from only one year and could require more observations to be applied in an EWS. Forthcoming work should include an interannual analysis of raw water quality and hydrometeorological information. Future research on raw water quality modeling should be focused on increasing prediction horizons to provide timely and accurate input to an EWS. New developments using regional models could focus on the development of decision support tools to help DWTP operators adjust water treatment processes according to the predicted raw water quality parameters.
Supplementary information (SI) is available. See DOI: https://doi.org/10.1039/d6ew00235h.
| This journal is © The Royal Society of Chemistry 2026 |