A novel statistical approach for analyzing environmental pollutant data with detection limits: atmospheric organochloride pesticide concentrations near Tibet's Namco Lake as a case study†‡
Abstract
In the analysis of pollutant data, concentrations below the analytical detection limit are commonly handled by substituting a constant value between zero and the limit of detection (LOD). However, this substitution can introduce significant bias under certain conditions. To address this issue, we have derived weight expressions that eliminate bias for lognormal and gamma data. These weights, applied to LOD/2 substitutions, can be calculated using available ranges of means, standard deviations and censoring proportions. We evaluated the performance of our weighted substitution (ωLOD/2) method using both simulated datasets with censoring proportions ranging from 5% to 50% and actual atmospheric α-HCH and HCB data from Tibet's Namco Lake. The ωLOD/2 method was compared against LOD/2 substitution, maximum likelihood estimation (MLE), and regression on order statistics (ROS). The results demonstrate that with small sample sizes (<160), although MLE and ROS did not show larger bias, ωLOD/2 outperforms both methods in estimating arithmetic and geometric means in most scenarios. It is also worth noting that ROS is currently limited to estimating summary statistics under the assumption of a lognormal distribution and cannot be applied to gamma-distributed data. In addition, ωLOD/2 provides standard deviation estimates comparable to those from MLE, with biases remaining within 5% in the majority of cases. Therefore, the proposed method is particularly suitable for situations involving small sample sizes. The application of our method to six censored atmospheric organochloride pesticide concentrations from Namco Lake further highlights its advantages in practical settings. To facilitate easy adoption by researchers, a free web app was developed that integrates our proposed weighting method with censored data distribution fitting.