Reproducibility and imputation of air toxics data

Hien Q. Le; Stuart A. Batterman; Robert L. Wahl

doi:10.1039/B709816B

Reproducibility and imputation of air toxics data†

Hien Q. Le,^a Stuart A. Batterman*^a and Robert L. Wahl^b

Author affiliations

* Corresponding authors

^a University of Michigan, Michigan, USA

^b Michigan Department of Community Health, Michigan, USA

Abstract

Ambient air quality datasets include missing data, values below method detection limits and outliers, and the precision and accuracy of the measurements themselves are often unknown. At the same time, many analyses require continuous data sequences and assume that measurements are error-free. While a variety of data imputation and cleaning techniques are available, the evaluation of such techniques remains limited. This study evaluates the performance of these techniques for ambient air toxics measurements, a particularly challenging application, and includes the analysis of intra- and inter-laboratory precision. The analysis uses an unusually complete-dataset, consisting of daily measurements of over 70 species of carbonyls and volatile organic compounds (VOCs) collected over a one year period in Dearborn, Michigan, including 122 pairs of replicates. Analysis was restricted to compounds found above detection limits in ≥20% of the samples. Outliers were detected using the Gumbell extreme value distribution. Error models for inter- and intra-laboratory reproducibility were derived from replicate samples. Imputation variables were selected using a generalized additive model, and the performance of two techniques, multiple imputation and optimal linear estimation, was evaluated for three missingness patterns. Many species were rarely detected or had very poor reproducibility. Error models developed for seven carbonyls showed median intra- and inter-laboratory errors of 22% and 25%, respectively. Better reproducibility was seen for the 16 VOCs meeting detection and reproducibility criteria. Imputation performance depended on the compound and missingness pattern. Data missing at random could be adequately imputed, but imputations for row-wise deletions, the most common type of missingness pattern encountered, were not informative. The analysis shows that air toxics data require significant efforts to identify and mitigate errors, outliers and missing observations, and that these steps are essential and should be performed prior to using these data in receptor, exposure, health and other applications.

Supplementary files

Article information

DOI: https://doi.org/10.1039/B709816B
Article type: Paper
Submitted: 28 Jun 2007
Accepted: 20 Sep 2007
First published: 12 Oct 2007

Download Citation

J. Environ. Monit., 2007,9, 1358-1372

Permissions

Request permissions

Reproducibility and imputation of air toxics data

H. Q. Le, S. A. Batterman and R. L. Wahl, J. Environ. Monit., 2007, 9, 1358 DOI: 10.1039/B709816B

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Journal of Environmental Monitoring

Reproducibility and imputation of air toxics data†

Abstract

Supplementary files

Article information

Download Citation

Permissions

Reproducibility and imputation of air toxics data

Search articles by author

Spotlight

Advertisements