Predictive modeling of BOD throughout wastewater treatment: a generalizable machine learning approach for improved effluent quality
Abstract
Biochemical oxygen demand (BOD) is one of the most sensitive and essential indicators of wastewater quality. However, today, BOD detection methods require considerable effort and time, resulting in management and operational errors during the wastewater-treatment process which leads to the production of poor-quality effluent that poses a threat to public health and safety. Using advanced machine learning (ML) methods, we developed generalizable BOD prediction model based on a unique, centrally integrated database from 30 wastewater-treatment plants (WWTP) across Israel. The model is based on easily retrieved water parameters measured by on-site sensors or conventional analytical devices. In this work, three different ML algorithms were examined and compared, random forest (RF), support vector machine, and gradient tree boosting. The optimized RF model reached the best results, R2 of 0.91 and RMSE of 8.58 in predicting the total BOD at different stages of the treatment process. The three key features for modeling were chemical oxygen demand, total suspended solids, and total Kjeldahl nitrogen. We then present an approach to predict BOD in effluent, focusing on binary classification predictions for regulatory compliance. For a prediction threshold of BOD > 9 mg L−1, a recall of 0.89 was achieved. These results demonstrate the potential of the model to be a generalized solution for BOD predictions in WWTP across Israel, and possibly worldwide. This method can be used as a part of a sensor for BOD monitoring and management in wastewater, effectively minimizing the time gaps between routine lab testing. The fundamental challenge addressed herein has important global relevance, especially in an era in which the demand for high-quality wastewater reuse is expected to increase dramatically.