Dissolved oxygen forecasting in the Mississippi River: advanced ensemble machine learning models
Abstract
Dissolved oxygen (DO) is an important variable for rivers, which controls many biogeochemical processes within rivers and the survival of aquatic species. Therefore, accurate forecasting of DO is of great importance. This study proposes two models, including AR-RBF by leveraging the additive regression (AR) of radial basis function (RBF) neural networks and MLP-RF by stacking multilayer perceptron (MLP) and random forest (RF), for the prediction of daily DO with multiple forecast horizons (1 day ahead to 15 days ahead) in the Mississippi River using a long-term observed dataset from the Baton Rouge station. Two input scenarios were considered: scenario A includes mean water temperature and a certain number of preceding DO values and scenario B comprises solely the aforementioned number of preceding DO values while entirely disregarding exogenous variables. The AR-RBF and stacked MLP-RF models excel in short-term forecasting and offer sufficiently accurate predictions for medium-term horizons of up to 15 days. For instance, in 3 day ahead predictions, the root mean square error (RMSE) amounts to 0.28 mg Lā1, with the mean absolute percentage error (MAPE) hovering around 2.5% in the worst-case scenario. Similarly, for 15 day ahead forecasts, RMSE remains below 0.93 mg Lā1, with MAPE not exceeding 8.2%, even under the worst-case scenario. Both models effectively capture the extreme values and the fluctuations of DO. However, as the forecasting horizon is extended, both models experience a decrease in accuracy, which is particularly evident for scenario B when the average water temperature is not included in the input variables. When examining longer forecasting horizons in the study, AR-RBF demonstrates a more restrained bias as compared to the stacked MLP-RF model. The consistently robust performance of the models, in comparison to prior research on DO levels in US rivers, underscores their potential as more effective tools for predicting such an essential water quality parameter.