From the journal Environmental Science: Atmospheres Peer review history

An analysis of degradation in low-cost particulate matter sensors

Round 1

Manuscript submitted on 27 okt. 2022
 

11-Dec-2022

Dear Dr deSouza:

Manuscript ID: EA-ART-10-2022-000142
TITLE: An analysis of degradation in low-cost particulate matter sensors

Thank you for your submission to Environmental Science: Atmospheres, published by the Royal Society of Chemistry. I sent your manuscript to reviewers and I have now received their reports which are copied below.

I have carefully evaluated your manuscript and the reviewers’ reports, and the reports indicate that major revisions are necessary.

Please submit a revised manuscript which addresses all of the reviewers’ comments. Further peer review of your revised manuscript may be needed. When you submit your revised manuscript please include a point by point response to the reviewers’ comments and highlight the changes you have made. Full details of the files you need to submit are listed at the end of this email.

Please submit your revised manuscript as soon as possible using this link:

*** PLEASE NOTE: This is a two-step process. After clicking on the link, you will be directed to a webpage to confirm. ***

https://mc.manuscriptcentral.com/esatmos?link_removed

(This link goes straight to your account, without the need to log on to the system. For your account security you should not share this link with others.)

Alternatively, you can login to your account (https://mc.manuscriptcentral.com/esatmos) where you will need your case-sensitive USER ID and password.

You should submit your revised manuscript as soon as possible; please note you will receive a series of automatic reminders. If your revisions will take a significant length of time, please contact me. If I do not hear from you, I may withdraw your manuscript from consideration and you will have to resubmit. Any resubmission will receive a new submission date.

The Royal Society of Chemistry requires all submitting authors to provide their ORCID iD when they submit a revised manuscript. This is quick and easy to do as part of the revised manuscript submission process. We will publish this information with the article, and you may choose to have your ORCID record updated automatically with details of the publication.

Please also encourage your co-authors to sign up for their own ORCID account and associate it with their account on our manuscript submission system. For further information see: https://www.rsc.org/journals-books-databases/journal-authors-reviewers/processes-policies/#attribution-id

Environmental Science: Atmospheres strongly encourages authors of research articles to include an ‘Author contributions’ section in their manuscript, for publication in the final article. This should appear immediately above the ‘Conflict of interest’ and ‘Acknowledgement’ sections. I strongly recommend you use CRediT (the Contributor Roles Taxonomy, https://credit.niso.org/) for standardised contribution descriptions. All authors should have agreed to their individual contributions ahead of submission and these should accurately reflect contributions to the work. Please refer to our general author guidelines https://www.rsc.org/journals-books-databases/author-and-reviewer-hub/authors-information/responsibilities/ for more information.

I look forward to receiving your revised manuscript.

Yours sincerely,
Dr Nønne Prisle
Associate Editor, Environmental Sciences: Atmospheres

************


 
Reviewer 1

This paper covers an important topic: How to determine whether a low-cost PurpleAir monitor has degraded and is no longer reporting reliable data. The authors have done a great job describing their methods in detail. My two biggest concerns are: (a) I think the authors are mistakenly conflating PM2.5 measurement errors associated with hygroscopic growth of particles at high RH with degradation of sensor electronics over time under high RH conditions. (b) Given the known limitations associated with the Plantower PMS5003 (e.g., inability to “see” many dust particles > 1 µm) and the original U.S.-wide correction equation (specifically, possible underestimation of very high wildfire smoke concentrations), how can the authors be sure that their “correction error” approach isn’t just identifying these other, non-degradation-related, performance limitations? I elaborate on these concerns in my comments below. I suspect that the authors will be able to revise the manuscript to address these concerns.

1. Page 3, first paragraph: “In addition, data from LCS are impacted by environmental variables such as relative humidity (RH), temperature (T), and dewpoint (D).” “Such models typically adjust for 1) systematic error, and 2) the dependencies of low-cost PM sensors measurements on RH, T and D.” It’s true that RH, T, and D are often useful predictors in correction models but, to be useful predictors, these variables don’t have to be the factors that are actually affecting sensor performance; these variables just have to be correlated with the variables that we know affect low-cost PM sensor performance (particle number size distribution, refractive index, and density). For example, the fact that temperature is a useful predictor in some models doesn’t necessarily mean that sensor operation is affected by temperature. I think it’s more likely that temperature is a useful predictor because it’s correlated with diurnal and/or seasonal changes in particle composition. I think a more accurate series of statements would be something like: “Several assumptions are typically made to convert light scattering into mass concentrations that can introduce error into the results. In addition, unlike reference monitors, LCS do not dry particles before measuring them, so PM concentrations reported by LCS can be biased high due to hygroscopic growth of particles at high ambient relative humidity (RH). Many research groups have developed different techniques to correct the raw LCS measurements from PM sensors. These models often include environmental variables, such as RH, temperature (T), and dewpoint (D), as predictors of the ‘true’ PM concentration.” I promise that I’m not just trying to pick on the authors’ introduction. These points relate to some of my concerns about the interpretation of the results.

2. Page 21, first paragraph: “Dust can degrade fan performance which would lead to potentially more disagreement between channels A and B of the PurpleAir sensors.” I agree with this statement, but I think dust can degrade sensor performance in several ways. Dust can also accumulate in the air flow path and optical components inside the sensor.

3. Page 21, second paragraph: “…except for sensors in hot and dry environments where the error was positively biased and increases over time by 0.08 (95% CI: 0.06, 0.09) µg/m3 per year of operation. PurpleAir sensors cannot ‘see’ dust or larger particles, which dominate in hot-dry environments, potentially explaining the disagreement between the corrected PurpleAir and reference measurements in these environments.” The authors are correct that PurpleAir sensors cannot see dust or larger particles, but I don’t understand why this would lead to a corrected PM2.5 concentration that is higher than the reference PM2.5 concentration. Jaffe et al. [DOI: 10.5194/amt-2022-265] found that raw PurpleAir PM2.5 data were underestimates of reference PM2.5 concentrations in dusty environments and that applying the U.S.-wide PM2.5 concentration made the underestimation worse. The results presented by Jaffe et al. are more consistent with the PMS5003’s inability to ‘see’ many particles larger than 1 µm. Did the authors also consider that wildfire smoke is also more likely to affect hot and dry environments? Could that be the reason the direction of the bias is different in hot and dry environments? My understanding is that raw PurpleAir PM2.5 data overestimate wildfire smoke concentrations and that the EPA correction approach should greatly reduce or eliminate that bias at moderate-to-high smoke concentrations, but could overcorrect (also leading to a negative bias like is observed with dust) at very high smoke concentrations (I’m thinking of the results reported by the EPA coauthors here: https://cfpub.epa.gov/si/si_public_record_report.cfm?dirEntryId=353088&Lab=CEMM).

4. Page 21, second paragraph: “RH has a major impact on PurpleAir performance, so it is not altogether surprising that degradation appears to be highest in hot and humid environments.” I agree that RH has a major impact on PurpleAir PM2.5 measurements and I agree that lots of time in a hot and humid environment could cause the electronic components inside a PurpleAir monitor to degrade more quickly, but I don’t think these two things are related as the authors suggest here. See my comment #1. RH affects PurpleAir measurements because particles take up water and grow when ambient RH is high. Particles aren’t dried before they enter the PMS5003 sensors, so PurpleAir monitors measure the ‘wet’ particles, which are bigger and scatter more light, and thus the PurpleAir can report much higher values than the ‘dry’ PM2.5 concentrations measured by a collocated reference monitor. In other words, PM2.5 concentrations reported by PurpleAir monitors are biased high at high RH because hygroscopic PM grows at high RH, not because of some effect that RH has on the sensor electronics.

5. Page 23, first paragraph: “The correction error appeared to become more and more negatively biased after 30,000 operational hours (3.5 years). However, due to the small number of sensors operating for more than 3 years, the wide confidence interval bands past 3 years casts uncertainty on the latter finding.” The authors dataset spanned 1 January 2017 to 20 July 2021. 3.5 years after 1 January 2017 would be the summer of 2020, when many locations in the western U.S. experienced very high concentrations of wildfire smoke. The EPA authors and Jaffe et al. found that equations with the form shown in Equation 1 can underestimate very high wildfire smoke PM2.5 concentrations (see my comment #3). Do the authors think that this phenomenon could have led to the negative bias observed in some sensors after 3.5 years of operation? Or is there enough variability in the PurpleAir monitor “start dates” that the dataset associated with 3+ years of operation isn’t dominated by sensors that came online in early 2017? Can the authors identify smoke-impacted data? If smoke impacted data are withheld from the dataset, does this result still appear?

6. Page 25, first paragraph: “Exposure to a cumulative number of high PM2.5 measurements significantly affected the association between the normalized correction error over time.” Do the authors think that they observed this result because high levels of PM2.5 pollution are causing degradation of the sensor over time or because the sensors do a poor job of predicting very high concentrations accurately, even after the data have been corrected using Equation 1, and especially if those very high concentrations are due to windblown dust or wildfire smoke? This is a concern that I have with “Method 2” overall: How can we be sure that this method is identifying sensor degradation and not just picking out times and locations when we know that Equation 1 will do a poor job predicting the true PM2.5 concentration accurately?

7. Page 25, second paragraph: “It is not altogether surprising that the correction error increases most rapidly in hot and humid climate zones, as past evidence suggests that the performance of PurpleAir are greatly impacted by RH.” See my comment #4.

8. Page 25, second paragraph: “It is likely that this outcome increases most rapidly over time in hot and dry environments instead, because such environments also tend to be dusty.” I was confused by this sentence the first two times that I read it. I suggest that the authors clarify what they are saying here by rephrasing as something like: “It is likely that the percentage of flagged measurements increases most rapidly over time in hot and dry environments because such environments tend to be dusty.”

9. Figure 5: Please specify the units of the mean error shown on the y axis. Is this error in µg/m3 or %?

10. Page 24, second paragraph: “…environmental conditions of indoor environments (T and RH) are more regulated that outdoor environments…” ‘that’ should be ‘than’.

11. Page 5, last paragraph: “Overall, the dataset included 114,259,940 valid measurements…” 115,259,940 valid hourly measurements?

12. Page 4, third paragraph: “The cf_atm data, displayed on the PurpleAir map, are the lower measurement of PM2.5 and will be referred to as the “raw” data in this paper when making comparisons between initial and corrected datasets.” Page 9, last paragraph: “Henceforth, when describing PurpleAir measurements, we consider only the mean PM2.5 cf_1 concentrations.” These two statements about the methods seem contradictory. Can the authors please clarify when PM2.5 cf_atm concentrations were used and when PM2.5 cf_1 concentrations were used?

13. Page 11: “…i.e., 40% of subsequent measurements were degraded for at least 100 hours of operation”…After reading this paragraph, I’m not sure I understand how a PurpleAir was determined to be permanently degraded. Was a PurpleAir permanently degraded if there was any continuous 100-h period in the dataset in which 40% or more of the measurements were flagged? Was the 40% calculated on the basis of 120-s averages (I assume not because the authors said they downloaded 15-minute averages), 15-minute averages, or 1-hour averages?

14. Page 11, last paragraph: “We evaluated and plotted the correction error which is defined as the difference between the corrected measurement and corresponding reference PM2.5 measurement.” Was the correction error evaluated as an absolute error (with units of µg/m3) or as a percent error?

15. Figure 1: The “subarctic” climate zone is missing from the legend.

16. Page 4, second paragraph: “The Plantower sensor components measure 90° light scattering…” Depending on where in the path of the laser beam a particle is, scattered light can be detected over a range of angles. For example, Ouimette at all note that, for a particle that is “centered” over the photodiode, the photodiode could see light scattered at angles ranging from 50° to 130° [DOI: 10.5194/amt-15-655-2022, see Section 2.2.4].

Reviewer 2

General Comments

1. This work represents a substantial effort to understand sensor degradation. The sample size is very large, which is good, but that presents its own issues. The authors should discuss the potential issues inherent in analyzing such large datasets.

2. It would seem that their estimate of long-term degradation would have to be an under-estimate if you consider that sensors which are functioning poorly are more likely to be removed from use. They are citing a value of 4% degradation over 3 years, but it is unclear if this considers the sensors that were removed from service (>11%)? Can the authors comment on this? How does a bias towards well-functioning sensors affect the interpretation of these data?

3. The authors should speak more to the practical implications of these findings. For example, they mention that users should delete the first 20 hrs of data. But, what do these data say about how long sensors should be used in the field? 2 yrs? 3 yrs? 4 yrs? If they can’t directly speak to that, what type of further analysis would be necessary to make such an estimate?

4. The authors should state somewhere that this analysis may not be generalizable to other make/models of PM sensor.

Specific Comments

5. Pg. 4, last paragraph – What is the lower limit of detection of the sensor? Other published literature shows low cost sensors with LOD in the range of ~5 ug/m3. Was this considered? If not, why?

6. Pg 9, 2nd paragraph, “We prioritized retaining data from reference monitors that did not rely on light scattering techniques as these instruments tend to have additional error when estimating aerosol mass.” The PurpleAir also uses light scattering, so can the authors comment more on those additional errors that might be relevant for interpreting the PurpleAir measurements.

7. Pg 11, Top of the page – what statistical test was used to test whether the distribution of PM2.5, RH, and T conditions was different for flagged vs. unflagged measurements?

8. Pg 11, 1st full paragraph – where did the value of 40% come from?

9. Pg 11, Section 2.4.2 – In the previous Section 2.4.1, the authors outline a metric for determining whether a sensor is permanently degraded compared to its co-located sensor and then suggest there is a 2nd approach using co-located reference measurements. But, this section (2.4.2) describes a way to correct for discrepancies between PurpleAir and reference methods, not for identifying degradation using some pre-determined metric. Can the authors clarify what the second method was and how it was used to identify degradation (not just correct for it)?

10. Pg 15, last sentence – the authors indicate that users should delete the first 20 hours of data. How was that specific number determined? Is it semi-quantitative (i.e. looking at the figures and drawing a line) or was there some statistical test performed? Can the authors comment on what might cause such erroneous data during those first hours?

11. Figure 5 – The authors use both “mean error” (y-axis) and “mean difference” (figure caption) to describe this data. I suggest choosing one descriptor and using it throughout.

12. Pg 21, 1st full paragraph – the data presented are “-0.92 (95% CI: -0.11, -0.75)” but the mean cannot be outside the CI range so one or more of these numbers is incorrect. These incorrect data are also shown in Table 4 and on pg 25 (top paragraph). Please update throughout with the correct numbers and adjust the discussion if necessary.


 

REVIEWER REPORT(S):
Referee: 1

Comments to the Author
This paper covers an important topic: How to determine whether a low-cost PurpleAir monitor has degraded and is no longer reporting reliable data. The authors have done a great job describing their methods in detail.

Authors: Thank you

My two biggest concerns are: (a) I think the authors are mistakenly conflating PM2.5 measurement errors associated with hygroscopic growth of particles at high RH with degradation of sensor electronics over time under high RH conditions. (b) Given the known limitations associated with the Plantower PMS5003 (e.g., inability to “see” many dust particles > 1 µm) and the original U.S.-wide correction equation (specifically, possible underestimation of very high wildfire smoke concentrations), how can the authors be sure that their “correction error” approach isn’t just identifying these other, non-degradation-related, performance limitations? I elaborate on these concerns in my comments below. I suspect that the authors will be able to revise the manuscript to address these concerns.

Authors: Thank you. You raise a good point that RH has a major influence on the Plantower sensors. We tried to address the potential RH-dependence of the calibration error in the following ways: 1) Observing the plot of correction error versus RH, we did not visually determine any relationship. 2) In supplementary analyses we have used various calibration approaches ranging from simple linear models, to more complex machine learning models to correct the data from the PurpleAir sensors (Tables S3, S4). The time dependence of the error term was robust to model specification (some models corrected for RH non-linearly) making us believe that a time-dependent force: degradation was in play.

We note the following in the text:

“We evaluated and plotted the correction error, which is defined as the difference between the corrected measurement and corresponding reference PM2.5 measurement in µg/m3. In supplementary analyses, we repeat this process using nine additional correction functions ranging from simple linear regressions to more complex machine learning algorithms, some of which additionally correct for T and D, in addition to RH (Table S3), to evaluate the sensitivity of our results to the correction model used. A key concern is that some part of the correction error observed might not be due to degradation but to inadequate correction of RH or other environmental parameters. We plot correction error versus RH to visually assess if such a dependence exists. Some of the supplementary correction models used rely on non-linear corrections for RH. Research has shown that a non-linear correction equation might be more suitable to correct for PurpleAir measurements above ~ 500 μg/m3 of PM2.5 levels37. The machine learning models that we used in the supplement can identify such patterns using statistical learning. A full description of these additional models can be found in deSouza et al., (2022)25.”

We also note the following later:

“The correction derived using a regression analysis yielded the following function to derive corrected PM2.5 concentrations from the raw PurpleAir data: PM2.5,corrected = 5.92 + 0.57PM2.5,raw -0.091RH. After correction, the Pearson correlation coefficient (R) improved slightly, from 0.88 to 0.89, and the RMSE improved significantly, from 12.5 to 6.6 μg/m3. The mean, median and maximum error observed were 3.3, 2.2, and 792.3 μg/m3, respectively (Table S3). Figure 5 displays the mean correction error across all sensors for every hour in operation. The mean error post 35,000 hours (3 years) becomes larger and is -0.45 μg/m3, compared to -0.13 μg/m3 before. A plot of correction error versus RH did not reveal any associations between the two variables (Figure S6). We note that similar results were observed when using a wide array of correction models, including models that contain both RH and T as variables, as well as more complex machine learning models that yielded the best correction results (Random Forest: R=0.99, RMSE = 2.4 μg/m3) (Table S3).”

1. Page 3, first paragraph: “In addition, data from LCS are impacted by environmental variables such as relative humidity (RH), temperature (T), and dewpoint (D).” “Such models typically adjust for 1) systematic error, and 2) the dependencies of low-cost PM sensors measurements on RH, T and D.” It’s true that RH, T, and D are often useful predictors in correction models but, to be useful predictors, these variables don’t have to be the factors that are actually affecting sensor performance; these variables just have to be correlated with the variables that we know affect low-cost PM sensor performance (particle number size distribution, refractive index, and density). For example, the fact that temperature is a useful predictor in some models doesn’t necessarily mean that sensor operation is affected by temperature. I think it’s more likely that temperature is a useful predictor because it’s correlated with diurnal and/or seasonal changes in particle composition. I think a more accurate series of statements would be something like: “Several assumptions are typically made to convert light scattering into mass concentrations that can introduce error into the results. In addition, unlike reference monitors, LCS do not dry particles before measuring them, so PM concentrations reported by LCS can be biased high due to hygroscopic growth of particles at high ambient relative humidity (RH). Many research groups have developed different techniques to correct the raw LCS measurements from PM sensors. These models often include environmental variables, such as RH, temperature (T), and dewpoint (D), as predictors of the ‘true’ PM concentration.” I promise that I’m not just trying to pick on the authors’ introduction. These points relate to some of my concerns about the interpretation of the results.

Authors: We thank you very much. We have changed the text to the one suggested in the paper

“Several assumptions are typically made to convert light scattering into mass concentrations that can introduce error into the results. In addition, unlike reference monitors, LCS do not dry particles before measuring them, so PM concentrations reported by LCS can be biased high due to hygroscopic growth of particles at high ambient relative humidity (RH). Many research groups have developed different techniques to correct the raw LCS measurements from PM sensors. These models often include environmental variables, such as RH, temperature (T), and dewpoint (D), as predictors of the ‘true’ PM concentration.”

2. Page 21, first paragraph: “Dust can degrade fan performance which would lead to potentially more disagreement between channels A and B of the PurpleAir sensors.” I agree with this statement, but I think dust can degrade sensor performance in several ways. Dust can also accumulate in the air flow path and optical components inside the sensor.

Authors: Thank you. We have modified the sentence as follows:
“Dust can degrade fan performance and accumulate in the air flow path and optical components which would lead to potentially more disagreement between channels A and B of the PurpleAir sensors.”

3. Page 21, second paragraph: “…except for sensors in hot and dry environments where the error was positively biased and increases over time by 0.08 (95% CI: 0.06, 0.09) µg/m3 per year of operation. PurpleAir sensors cannot ‘see’ dust or larger particles, which dominate in hot-dry environments, potentially explaining the disagreement between the corrected PurpleAir and reference measurements in these environments.” The authors are correct that PurpleAir sensors cannot see dust or larger particles, but I don’t understand why this would lead to a corrected PM2.5 concentration that is higher than the reference PM2.5 concentration. Jaffe et al. [DOI: 10.5194/amt-2022-265] found that raw PurpleAir PM2.5 data were underestimates of reference PM2.5 concentrations in dusty environments and that applying the U.S.-wide PM2.5 concentration made the underestimation worse. The results presented by Jaffe et al. are more consistent with the PMS5003’s inability to ‘see’ many particles larger than 1 µm. Did the authors also consider that wildfire smoke is also more likely to affect hot and dry environments? Could that be the reason the direction of the bias is different in hot and dry environments? My understanding is that raw PurpleAir PM2.5 data overestimate wildfire smoke concentrations and that the EPA correction approach should greatly reduce or eliminate that bias at moderate-to-high smoke concentrations, but could overcorrect (also leading to a negative bias like is observed with dust) at very high smoke concentrations (I’m thinking of the results reported by the EPA coauthors here: https://cfpub.epa.gov/si/si_public_record_report.cfm?dirEntryId=353088&Lab=CEMM).

Authors: Thank you. Given the scope of this analysis, we were unable to re-do this analysis in the context of wildfire smoke. However, we have modified the results in the following manner:

“The correction error (PM2.5, corrected - PM2.5, reference) appeared to become negatively biased over time: -0.12 (95% CI: -0.13, -0.10) μg/m3 per year of operation, except for sensors in hot and dry environments where the error was positively biased and increases over time by 0.08 (95% CI: 0.06, 0.09) μg/m3 per year of operation. Wildfires often occur in hot-dry environments. Research has shown that the correction approach could overcorrect the PurpleAir measurements at very high smoke concentrations, potentially explaining the disagreement between the corrected PurpleAir and reference measurements in these environments39. We note that mean PM2.5 concentrations were highest in hot-dry environments (Table S2). In addition, the number of PM2.5 concentrations > 100 μg/m3 recorded was the highest in hot-dry environments.”

4. Page 21, second paragraph: “RH has a major impact on PurpleAir performance, so it is not altogether surprising that degradation appears to be highest in hot and humid environments.” I agree that RH has a major impact on PurpleAir PM2.5 measurements and I agree that lots of time in a hot and humid environment could cause the electronic components inside a PurpleAir monitor to degrade more quickly, but I don’t think these two things are related as the authors suggest here.

Authors: Thank you. We have modified the sentence to read as follows:

“The magnitude of the correction error bias over time appears to be highest in hot and humid environments corresponding to -0.92 (95% CI: -1.10, -0.75) μg/m3 per year. RH has an impact on PurpleAir performance and can also cause the electronic components inside the sensors to degrade quickly, so it is not altogether surprising that degradation appears to be highest in hot and humid environments.”

See my comment #1. RH affects PurpleAir measurements because particles take up water and grow when ambient RH is high. Particles aren’t dried before they enter the PMS5003 sensors, so PurpleAir monitors measure the ‘wet’ particles, which are bigger and scatter more light, and thus the PurpleAir can report much higher values than the ‘dry’ PM2.5 concentrations measured by a collocated reference monitor. In other words, PM2.5 concentrations reported by PurpleAir monitors are biased high at high RH because hygroscopic PM grows at high RH, not because of some effect that RH has on the sensor electronics.

Authors: Thank you. We agree with the way RH can impact sensor performance. The correction model form that we have chosen in the main text that linearly corrects for RH may not adequately correct for non-linearities at high RH values. However, we observe similar time-dependence of the calibration error when we use other model forms: Tables S3 and S4 in the SI where we account for non-linearities in the RH.

Please refer to our response to the first main comment.

5. Page 23, first paragraph: “The correction error appeared to become more and more negatively biased after 30,000 operational hours (3.5 years). However, due to the small number of sensors operating for more than 3 years, the wide confidence interval bands past 3 years casts uncertainty on the latter finding.” The authors dataset spanned 1 January 2017 to 20 July 2021. 3.5 years after 1 January 2017 would be the summer of 2020, when many locations in the western U.S. experienced very high concentrations of wildfire smoke. The EPA authors and Jaffe et al. found that equations with the form shown in Equation 1 can underestimate very high wildfire smoke PM2.5 concentrations (see my comment #3). Do the authors think that this phenomenon could have led to the negative bias observed in some sensors after 3.5 years of operation? Or is there enough variability in the PurpleAir monitor “start dates” that the dataset associated with 3+ years of operation isn’t dominated by sensors that came online in early 2017? Can the authors identify smoke-impacted data? If smoke impacted data are withheld from the dataset, does this result still appear?

Authors: We thank the reviewer for this important comment. Evaluating the performance of the PurpleAir sensors in the context of wildfire smoke is beyond the scope of the analysis in this paper. However, we do make the following note in our text:
“GCV criteria revealed that the dependence of the percentage of flagged PurpleAir measurements over time was non-linear, likely due to the non-linear relationship observed at operational times greater than 30,000 hours (3.5 years; Figure 6). However, due to the small number of measurements after this time interval, the shape of the curve after this time was uncertain, as evidenced by the wide confidence bands in this time period. The correction error appeared to become more and more negatively biased after 30,000 operational hours (3.5) years. However, due to the small number of sensors operating for more than 3 years, the wide confidence interval bands past 3 years casts uncertainty on the latter finding. A possible reason we see an increase in correction error is because of wildfire smoke in the summer of 2020 that potentially affected sensors deployed in January 2017. However, the wide range of start month-years of sensors > 3.5 years in our dataset suggest that this is unlikely.”

6. Page 25, first paragraph: “Exposure to a cumulative number of high PM2.5 measurements significantly affected the association between the normalized correction error over time.” Do the authors think that they observed this result because high levels of PM2.5 pollution are causing degradation of the sensor over time or because the sensors do a poor job of predicting very high concentrations accurately, even after the data have been corrected using Equation 1, and especially if those very high concentrations are due to windblown dust or wildfire smoke? This is a concern that I have with “Method 2” overall: How can we be sure that this method is identifying sensor degradation and not just picking out times and locations when we know that Equation 1 will do a poor job predicting the true PM2.5 concentration accurately?

Authors: Past research done by ‘Tryner, J., Mehaffy, J., Miller-Lionberg, D. and Volckens, J., 2020. Effects of aerosol type and simulated aging on performance of low-cost PM sensors. Journal of Aerosol Science, 150, p.105654.’ has found that exposure to high levels of PM2.5 over extended stretches of time do cause sensor degradation which is why we made this evaluation.

We also found:
“The cumulative number of PM2.5 measurements recorded over 50, 100 and 500 μg/m3 significantly negatively modifies the association between operational time and the correction error (Table S5) meaning that sensors that experience more high concentration episodes are more likely to underestimate PM2.5. The increase in the negative bias of the corrected sensor data could be because the absolute magnitude of the correction error will be higher in high PM2.5 environments. When we evaluated the impact of the cumulative number of high PM2.5 measurements on the association between the normalized correction error and operation hour (hours since deployment), we found that the cumulative number of high PM2.5 measurements was not a significant effect modifier of this association (Table S6). In other words, we did not observe sensors in higher PM2.5 environments degrading faster.”

7. Page 25, second paragraph: “It is not altogether surprising that the correction error increases most rapidly in hot and humid climate zones, as past evidence suggests that the performance of PurpleAir are greatly impacted by RH.” See my comment #4.

Authors: Thank you. We have noted the many ways that RH can impact the PurpleAir sensors previously in the article as suggested.

8. Page 25, second paragraph: “It is likely that this outcome increases most rapidly over time in hot and dry environments instead, because such environments also tend to be dusty.” I was confused by this sentence the first two times that I read it. I suggest that the authors clarify what they are saying here by rephrasing as something like: “It is likely that the percentage of flagged measurements increases most rapidly over time in hot and dry environments because such environments tend to be dusty.”

Authors: Thank you. We have made the change

“It is not altogether surprising that the correction error increases most rapidly in hot and humid climate zones, as past evidence suggests that the performance of PurpleAir are greatly impacted by RH. It is surprising that this is not the case for the other degradation outcomes considered in this study: % of flagged measurements. It is likely that the percentage of flagged measurements increases most rapidly over time in hot and dry environments because such environments tend to be dusty and dust can degrade fan performance and accumulate in the air flow path and optical components of the PurpleAir sensors which can lead to disagreement between the two Plantower sensors. We also note that under conditions of wildfire smoke, also prevalent in hot and dry climates, the calibration error could also be magnified due to under-correction of the PurpleAir data. Future work is needed to evaluate the impact of wildfire-smoke on the performance of PurpleAir sensors.”

9. Figure 5: Please specify the units of the mean error shown on the y axis. Is this error in µg/m3 or %?

Authors: It is µg/m3. We have made the change to the legend.

10. Page 24, second paragraph: “…environmental conditions of indoor environments (T and RH) are more regulated that outdoor environments…” ‘that’ should be ‘than’.

Authors: Thank you. We have made this change.

11. Page 5, last paragraph: “Overall, the dataset included 114,259,940 valid measurements…” 115,259,940 valid hourly measurements?

Authors: Thank you. We have added a descriptor to indicate that these measurements are indeed hourly.

12. Page 4, third paragraph: “The cf_atm data, displayed on the PurpleAir map, are the lower measurement of PM2.5 and will be referred to as the “raw” data in this paper when making comparisons between initial and corrected datasets.” Page 9, last paragraph: “Henceforth, when describing PurpleAir measurements, we consider only the mean PM2.5 cf_1 concentrations.” These two statements about the methods seem contradictory. Can the authors please clarify when PM2.5 cf_atm concentrations were used and when PM2.5 cf_1 concentrations were used?

Authors: PM2.5 cf_1 measurements were used throughout the calculations when we realized that cf_1 agreed better with reference measurements than cf_atm.
We note in the paper:

“From the resulting dataset, we found that the Pearson correlation coefficient (R) between mean PM2.5 cf_1 and reference PM2.5 concentrations was 0.86, whereas the correlation between PM2.5 cf_atm and reference PM2.5 concentrations was 0.83. Henceforth, when describing PurpleAir measurements, we consider only the mean PM2.5 cf_1 concentrations.”

13. Page 11: “…i.e., 40% of subsequent measurements were degraded for at least 100 hours of operation”…After reading this paragraph, I’m not sure I understand how a PurpleAir was determined to be permanently degraded. Was a PurpleAir permanently degraded if there was any continuous 100-h period in the dataset in which 40% or more of the measurements were flagged? Was the 40% calculated on the basis of 120-s averages (I assume not because the authors said they downloaded 15-minute averages), 15-minute averages, or 1-hour averages?

Authors: All calculations were done using the 1-hour averages. We looked to see if 40% of measurements after a given time period were degraded. We edited the text to make this more clear:
“For each PurpleAir sensor, at each operational hour, we evaluated the percentage of flagged hourly averages at the given hour and for all subsequent hours. We designated a PurpleAir sensor as permanently degraded if more than 40% of the current and subsequent hourly averages were flagged and the sensor operated for at least 100 hours after the current hour. (Figure 4; Figure S4). In sensitivity analyses, we evaluated the number of PurpleAir sensors that would be considered ‘degraded’ for different thresholds (Figure S5). We also examined where such sensors were deployed.”
14. Page 11, last paragraph: “We evaluated and plotted the correction error which is defined as the difference between the corrected measurement and corresponding reference PM2.5 measurement.” Was the correction error evaluated as an absolute error (with units of µg/m3) or as a percent error?

Authors: The correction error was evaluated as merely the difference between the corrected and reference PM2.5 concentrations. Units μg/m3.

The text in the Methods section has been updated as follows:

“We evaluated and plotted the correction error, which is defined as the difference between the corrected measurement and corresponding reference PM2.5 measurement in µg/m3.”

15. Figure 1: The “subarctic” climate zone is missing from the legend.

Authors: Thanks for catching this. We have updated the legend

16. Page 4, second paragraph: “The Plantower sensor components measure 90° light scattering…” Depending on where in the path of the laser beam a particle is, scattered light can be detected over a range of angles. For example, Ouimette at all note that, for a particle that is “centered” over the photodiode, the photodiode could see light scattered at angles ranging from 50° to 130° [DOI: 10.5194/amt-15-655-2022, see Section 2.2.4].

Authors: Thank you. We have modified this to read:

“The Plantower sensor components measures light scattering with a laser at 680 ± 10 nm wavelength 26,27 and are factory calibrated using ambient aerosol across several cities in China 20.”


Referee: 2

Comments to the Author
General Comments

1. This work represents a substantial effort to understand sensor degradation. The sample size is very large, which is good, but that presents its own issues. The authors should discuss the potential issues inherent in analyzing such large datasets.

Authors: Thank you for this comment. One of the limitations of using such a dataset is that we cannot observe user behavior. It is likely that users remove degraded PurpleAir monitors early and therefore, our sample could be biased. We have listed this limitation in the following manner:

“Using the empirically derived definition of flagged measurements, the percentage of flagged measurements, as well as the percentage of cumulative flagged measurements across the 11,932 PurpleAir sensors for every hour of operation are plotted in Figure 3. The total number of measurements made at every hour of operation is also displayed using the right axis. The percentage of flagged measurements increases over time. After 4 years (~ 35,000 hours) of operation the percentage of flagged measurements every hour is ~ 4%. After 4 years of operation, we observe a dramatic increase in the average percentage of flagged measurements likely due to the small number of PurpleAir sensors operational for such long periods of time in our dataset. Note that as we rely on a crowd-sourced dataset of PurpleAir measurements, we do not have information on why users removed sensors from operation. If users removed PurpleAir sensors that displayed indications of degradation. The removal of such sensors could bias our results and could lead to us reporting lower degradation rates than observed. We also observe a high percentage of flagged measurements during the first 20 hours of the operation of all sensors.”

2. It would seem that their estimate of long-term degradation would have to be an under-estimate if you consider that sensors which are functioning poorly are more likely to be removed from use. They are citing a value of 4% degradation over 3 years, but it is unclear if this considers the sensors that were removed from service (>11%)? Can the authors comment on this? How does a bias towards well-functioning sensors affect the interpretation of these data?

Authors: Thank you. Please refer to our response to 1. We have listed this as a limitation. Unfortunately, we do not know why sensors were removed from operation. Some reasons could be faulty power/Wifi issues. It is not necessary that sensors were removed because they malfunctioned.

3. The authors should speak more to the practical implications of these findings. For example, they mention that users should delete the first 20 hrs of data. But, what do these data say about how long sensors should be used in the field? 2 yrs? 3 yrs? 4 yrs? If they can’t directly speak to that, what type of further analysis would be necessary to make such an estimate?

Authors: Thank you. We have made the following explicit recommendations:

“We evaluated two proposed degradation outcomes for the PurpleAir sensors over time. We observed there were a large number of measurements from channels A and B of each sensor during the first 20 hours of operation that were flagged (Figure 1). Some of these data might come from lab testing of the PurpleAir sensors. Our results suggest that it is important to delete the first 20 hours of data when analyzing PurpleAir measurements. We observed that the percentage of flagged measurements (where channels A and B diverged) increased linearly over time and was on average ~4% after 3 years of operation. It appeared that measurements from PurpleAir sensors are fairly robust, at least during this period. Degradation appeared to increase steeply after 4 years from 5% to 10% in just 6 months. It thus appears that PurpleAir sensors are serviced or the Plantower sensors are replaced after ~ 4 years of operation. However, given the small number of Plantower devices operational after 4 years (< 100), further work is needed to evaluate the performance of devices aged 4 years of more. We also note that although many low-cost sensors use Plantower sensors, just like the PurpleAir sensors. our analysis may not be generalizable to these devices if they have outer shells that can offer potentially more protection than the PurpleAir, or if there are other design differences that might affect instrument performance.”

And

“For outdoor sensors, we found that the climate zone in which the sensor was deployed is an important modifier of the association between the percent of flagged measurements and time. Outdoor sensors in hot-dry climates degrade the fastest, with the percentage of flagged measurements increasing by 2.09% (95% CI: 2.07%, 2.12%) every year, an order of magnitude faster than any other climate zone (Table 3). This suggests that outdoor sensors in hot-dry climates likely need to be serviced after ~ 3 years, faster than PurpleAir sensors deployed elsewhere.”

And

“It is not altogether surprising that the correction error increases most rapidly in hot and humid climate zones, as past evidence suggests that the performance of PurpleAir are greatly impacted by RH. It is surprising that this is not the case for the other degradation outcomes considered in this study: % of flagged measurements. It is likely that the percentage of flagged measurements increases most rapidly over time in hot and dry environments because such environments tend to be dusty and dust can degrade fan performance and accumulate in the air flow path and optical components of the PurpleAir sensors which can lead to disagreement between the two Plantower sensors. We note that under conditions of wildfire smoke, also prevalent in hot and dry climates, the calibration error could also be magnified due to under-correction of the PurpleAir data. Future work is needed to evaluate the impact of wildfire-smoke, specifically on the performance of PurpleAir sensors.”

4. The authors should state somewhere that this analysis may not be generalizable to other make/models of PM sensor.

Authors: Thank you. We have made the following note:

“We evaluated two proposed degradation outcomes for the PurpleAir sensors over time. We observed there were a large number of measurements from channels A and B of each sensor during the first 20 hours of operation that were flagged (Figure 1). Some of these data might come from lab testing of the PurpleAir sensors. Our results suggest that it is important to delete the first 20 hours of data when analyzing PurpleAir measurements. We observed that the percentage of flagged measurements (where channels A and B diverged) increased linearly over time and was on average ~4% after 3 years of operation. It appeares that measurements from PurpleAir sensors are fairly robust, at least during this period. Degradation appears to increase steeply after 4 years from 5% to 10% in just 6 months. It thus appears that PurpleAir sensors might need to be serviced or the Plantower sensors replaced after ~ 4 years of operation. However, given the small number of Plantower devices operational after 4 years (< 100), further work is needed to evaluate the performance of devices aged 4 years or more. We also note that although many low-cost sensors use Plantower sensors, just like the PurpleAir sensors. our analysis may not be generalizable to these devices if they have outer shells that can offer potentially more protection than the PurpleAir, or if there are other design differences that might affect instrument performance.”

Specific Comments

5. Pg. 4, last paragraph – What is the lower limit of detection of the sensor? Other published literature shows low cost sensors with LOD in the range of ~5 ug/m3. Was this considered? If not, why?

Authors: Thank you for this note. We did not evaluate the limit of detection of the PurpleAir sensors in this paper. We did not consider this number in our analysis.

6. Pg 9, 2nd paragraph, “We prioritized retaining data from reference monitors that did not rely on light scattering techniques as these instruments tend to have additional error when estimating aerosol mass.” The PurpleAir also uses light scattering, so can the authors comment more on those additional errors that might be relevant for interpreting the PurpleAir measurements.

Authors: Thank you. We agree that PurpleAir sensors use light scattering- which is why they exhibit errors. We wanted to compare corrected PurpleAir measurements with gold-standard PM2.5 measurements to evaluate the appropriate calibration error.

7. Pg 11, Top of the page – what statistical test was used to test whether the distribution of PM2.5, RH, and T conditions was different for flagged vs. unflagged measurements?

Authors: Thank you. We note in the Methods section that we used t-tests.

8. Pg 11, 1st full paragraph – where did the value of 40% come from?

Authors: We chose this value, because we deemed it reasonable. We tested the sensitivity of our results to using different thresholds.

9. Pg 11, Section 2.4.2 – In the previous Section 2.4.1, the authors outline a metric for determining whether a sensor is permanently degraded compared to its co-located sensor and then suggest there is a 2nd approach using co-located reference measurements. But, this section (2.4.2) describes a way to correct for discrepancies between PurpleAir and reference methods, not for identifying degradation using some pre-determined metric. Can the authors clarify what the second method was and how it was used to identify degradation (not just correct for it)?

Authors: The second approach was to evaluate an overall indicator of degradation not permanent degradation. We make this clear by including the following text before 2.4.2

“A limitation of using the percentage of flagged measurements as a degradation metric is that it does not account for the possibility that channels A and B might both degrade in a similar manner. Therefore, we rely on a second approach, using collocated reference monitoring measurements, to evaluate this aspect of possible degradation.”

10. Pg 15, last sentence – the authors indicate that users should delete the first 20 hours of data. How was that specific number determined? Is it semi-quantitative (i.e. looking at the figures and drawing a line) or was there some statistical test performed? Can the authors comment on what might cause such erroneous data during those first hours?

Authors: We obtained this number from Figure 1 as indicated in the text. We also indicate that some of this initial data might arise from lab testing of the PurpleAir devices before shipment which could explain this discrepancy.

11. Figure 5 – The authors use both “mean error” (y-axis) and “mean difference” (figure caption) to describe this data. I suggest choosing one descriptor and using it throughout.

Authors: Thank you. We have modified the caption to read as follows:
“Mean error (μg/m3) calculated as the difference between the corrected PM2.5 measurements from the PurpleAir sensors and the corresponding reference PM2.5 measurements across all sensors as a function of hour of operation.”

12. Pg 21, 1st full paragraph – the data presented are “-0.92 (95% CI: -0.11, -0.75)” but the mean cannot be outside the CI range so one or more of these numbers is incorrect. These incorrect data are also shown in Table 4 and on pg 25 (top paragraph). Please update throughout with the correct numbers and adjust the discussion if necessary.

Authors: Thank you. We made a typo and corrected this to -0.92 (-1.10, -0.75). We have made the change everywhere.




Round 2

Revised manuscript submitted on 22 jan. 2023
 

29-Jan-2023

Dear Dr deSouza:

Manuscript ID: EA-ART-10-2022-000142.R1
TITLE: An analysis of degradation in low-cost particulate matter sensors

Thank you for submitting your revised manuscript to Environmental Science: Atmospheres. I am pleased to accept your manuscript for publication in its current form. I have copied any final comments from the reviewer(s) below.

You will shortly receive a separate email from us requesting you to submit a licence to publish for your article, so that we can proceed with the preparation and publication of your manuscript.

You can highlight your article and the work of your group on the back cover of Environmental Science: Atmospheres. If you are interested in this opportunity please contact the editorial office for more information.

Promote your research, accelerate its impact – find out more about our article promotion services here: https://rsc.li/promoteyourresearch.

We will publicise your paper on our Twitter account @EnvSciRSC – to aid our publicity of your work please fill out this form: https://form.jotform.com/211263048265047

How was your experience with us? Let us know your feedback by completing our short 5 minute survey: https://www.smartsurvey.co.uk/s/RSC-author-satisfaction-energyenvironment/

By publishing your article in Environmental Science: Atmospheres, you are supporting the Royal Society of Chemistry to help the chemical science community make the world a better place.

With best wishes,

Dr Nønne Prisle
Associate Editor, Environmental Sciences: Atmospheres


 
Reviewer 1

I thank the authors for responding to my previous comments. I found the following typographical errors in the revised manuscript:

1. Page 2, 1st paragraph: “Federal Reference of Equivalent Monitors” should be “Federal Reference or Equivalent Monitors”

2. Page 5, 1st paragraph: “when RH” should be “when RH”




Transparent peer review

To support increased transparency, we offer authors the option to publish the peer review history alongside their article. Reviewers are anonymous unless they choose to sign their report.

We are currently unable to show comments or responses that were provided as attachments. If the peer review history indicates that attachments are available, or if you find there is review content missing, you can request the full review record from our Publishing customer services team at RSC1@rsc.org.

Find out more about our transparent peer review policy.

Content on this page is licensed under a Creative Commons Attribution 4.0 International license.
Creative Commons BY license