Emerging investigator series: a machine learning approach to quantify the impact of meteorology on tropospheric ozone in the inland southern California†
The role of meteorology in facilitating the formation and accumulation of ground-level ozone is of great theoretical and practical interest, especially due to changing global climate. In this study, with appropriate machine learning algorithms, we analyzed large meteorology and air quality datasets to train machine learning models to (1) enhance the prediction of ozone levels in the South Coast Air Basin of California, (2) investigate the impact of recent meteorological shifts on ozone formation, and (3) determine the most critical factors influencing ozone exceedance hours. Random forest regression was used to predict historical and future trends of ozone levels, and k-nearest neighbor was used as a binary classifier for ozone exceedance prediction. The models were trained on meteorology data from Ontario and Los Angeles International Airport stations and air quality data from the Fontana, California air monitoring station, and data were collected for the 1994 to 2018 time period. Upon model evaluation, the correlation of the RFR model was 0.92, and the probability of detection for ozone exceedances using k-nearest neighbors was 0.81 for the most recent years of the analysis (2014–2018). We also ran a 4 km Community Multiscale Air Quality model simulation to generate air pollution estimates over Southern California. As expected, ozone in Fontana was positively correlated with temperature. The ozone exceedance hours usually occurred when the temperature was above 25 °C, and the wind direction was from 270° (westerly). Ozone sensitivity as a function of temperature and NOx was also examined. Observed troughs in hourly NOx concentrations during midday under high temperatures suggests that most of the ambient NOx reacted, also as expected. The results indicate that machine learning can support state implementation planning by complementing traditional air quality modeling, reducing simulation time, and exploiting large datasets for historical simulations and future air quality predictions.
- This article is part of the themed collection: Emerging Investigator Series