Machine learning for hours-ahead forecasts of urban air concentrations of oxides of nitrogen from univariate data exploiting trend attributes
Abstract
The extraction of multiple attributes from past hours in univariate trends of hourly oxides of nitrogen (NOx) recorded at ground-level sites substantially improves NOx hourly forecasts for at least four hours ahead without assistance from exogenous-variable inputs. The method proposed is evaluated with public datasets of hourly NOx data, compiled from 2017 to 2021, for local sites from multiple cities in central England. The datasets for each urban or roadside site considered include more than 40 000 NOx hourly recordings. The period covered straddles the COVID-19-related lockdowns of 2020, associated with lower vehicle emissions that impacted NOx trends at all the studied sites extending into 2021. Fifteen trend attributes are extracted from the recorded NOx trends relating to the previous twelve hours of recorded data. The attributes considered are easily calculated and include seasonal components, recent-past-hour NOx values, averages of several past hours, and differences and rates of change between selected past hours. A multi-linear regression (MLR) and three machine-learning (ML) models are trained and cross-validated for various yearly intervals within the 2017 to 2021 period. The trained models are then applied to predict up to four hours ahead for 2020 and 2021 as separate testing subsets. The models substantially outperform autoregressive and moving average (MA) methods in their hours-ahead forecasts. Feature importance analysis extracted from the MLR and ML models reveals the flexibility with which the models can give more weight to certain trend attributes depending upon the t + x hour being predicted.
- This article is part of the themed collections: Topic Collection: Air Pollution & Air Quality and Artificial Intelligence and Machine Learning in Environmental Science