David
Mantilla-Calderon
a,
Kaiyu (Kevin)
Huang
a,
Aojie
Li
a,
Kaseba
Chibwe
a,
Xiaoqian
Yu
b,
Yinyin
Ye
c,
Lei
Liu
d and
Fangqiong
Ling
*aefg
aDepartment of Energy, Environmental and Chemical Engineering, Washington University in St. Louis, St. Louis, MO, USA. E-mail: fangqiong@wustl.edu
bCentre for Microbiology and Environmental Systems Science, Department of Microbiology and Ecosystem Science, Division of Microbial Ecology, University of Vienna, Vienna, Austria
cDepartment of Civil, Structural and Environmental Engineering, University at Buffalo, Buffalo, NY, USA
dDivision of Biostatistics, Washington University in St. Louis, St. Louis, MO, USA
eDepartment of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO, USA
fDivision of Biological and Biomedical Sciences, Washington University in St. Louis, St. Louis, MO, USA
gDivision of Computational and Data Science, Washington University in St. Louis, St. Louis, MO, USA
First published on 4th May 2022
Background: recent applications of wastewater-based epidemiology (WBE) have demonstrated its ability to track the spread and dynamics of COVID-19 at the community level. Despite the growing body of research, quantitative synthesis of SARS-CoV-2 RNA levels in wastewater generated from studies across space and time using diverse methods has not been performed. Objective: the objective of this study is to examine the correlations between SARS-CoV-2 RNA levels in wastewater and epidemiological indicators across studies, stratified by key covariates in study methodologies. In addition, we examined the association of proportions of positive detections in wastewater samples and methodological covariates. Methods: we systematically searched the Web of Science for studies published by February 16th, 2021, performed a reproducible screening, and employed mixed-effects models to estimate the levels of SARS-CoV-2 viral RNA quantities in wastewater samples and their correlations to the case prevalence, the sampling mode (grab or composite sampling), and the wastewater fraction analyzed (i.e., solids, solid–supernatant mixtures, or supernatants/filtrates). Results: a hundred and one studies were found; twenty studies (671 biosamples and 1751 observations) were retained following a reproducible screening. The mean positivity across all studies was 0.68 (95%-CI, [0.52; 0.85]). The mean viral RNA abundance was 5244 marker copies per mL (95%-CI, [0; 16432]). The Pearson correlation coefficients between the viral RNA levels and case prevalence were 0.28 (95%-CI, [0.01; 0.51]) for daily new cases or 0.29 (95%-CI, [−0.15; 0.73]) for cumulative cases. The fraction analyzed accounted for 12.4% of the variability in the percentage of positive detections, followed by the case prevalence (9.3% by daily new cases and 5.9% by cumulative cases) and sampling mode (0.6%). Among observations with positive detections, the fraction analyzed accounted for 56.0% of the variability in viral RNA levels, followed by the sampling mode (6.9%) and case prevalence (0.9% by daily new cases and 0.8% by cumulative cases). While the sampling mode and fraction analyzed both significantly correlated with the SARS-CoV-2 viral RNA levels, the magnitude of the increase in positive detection associated with the fraction analyzed was larger. The mixed-effects model treating studies as random effects and case prevalence as fixed effects accounted for over 90% of the variability in SARS-CoV-2 positive detections and viral RNA levels. Interpretations: positive pooled means and confidence intervals in the Pearson correlation coefficients between the SARS-CoV-2 viral RNA levels and case prevalence indicators provide quantitative evidence that reinforces the value of wastewater-based monitoring of COVID-19. Large heterogeneities among studies in proportions of positive detections, viral RNA levels, and Pearson correlation coefficients suggest a strong demand for methods to generate data accounting for cross-study heterogeneities and more detailed metadata reporting. Large variance was explained by the fraction analyzed, suggesting sample pre-processing and fractionation as a direction that needs to be prioritized in method standardization. Mixed-effects models accounting for study level variations provide a new perspective to synthesize data from multiple studies.
Water impactRecent applications of wastewater-based epidemiology (WBE) have demonstrated its ability to track the spread and dynamics of COVID-19 at the community level. Despite the growing body of research, quantitative synthesis of SARS-CoV-2 viral RNA levels in wastewater generated from studies across space and time using diverse methods has not been performed. The meta-analysis methodology treats individual studies as members of a population of studies that all provide information on a given effect instead of drawing conclusions on exemplary studies that have shown strong positive effects. Leveraging a large sample size, meta-analysis can help move the narrative beyond statistical significance and draw attention to the magnitude, direction, and variance in effects. This study employed a meta-analysis methodology to quantitatively synthesize results among WBE studies in the first year of the COVID-19 pandemic. Positive pooled means and confidence intervals in the Pearson correlation coefficients between the SARS-CoV-2 viral RNA levels and case prevalence indicators provide quantitative evidence reinforcing the value of wastewater-based monitoring of COVID-19. Large heterogeneities among studies suggest a strong demand for experimental and computational methods to address cross-study heterogeneities. Mixed-effects models accounting for study level variations provide a new perspective to synthesize data from multiple WBE studies. |
As the number of WBE studies continues to grow, study-to-study variations are often encountered; thus, the growing body of data demands attention to generalizable relationships across studies. Although WBE studies focusing on the SARS-CoV-2 virus have been all conducted during the pandemic, not all tested wastewater samples provided a measurable detection when known cases were present in the associated area, thus presenting false negatives in the detection.8–20 In addition, while positive correlations between SARS-CoV-2 wastewater-based measurements and COVID-19 cases have been described,8,9,11,21–23 the strength of the correlations may vary among studies. To better describe the advantages and limitations of WBE and make evidence-based recommendations, research synthesis efforts are needed to quantify the detection rates of SARS-CoV-2 in wastewater, its RNA abundance, and their correlations to epidemiological indicators.
Meta-analysis provides an objective, quantitative, and powerful way to synthesize findings across studies.24 Instead of drawing conclusions on exemplary studies that have shown strong positive effects, meta-analyses treat individual studies as members of a population of studies that conjunctively provide information on a given effect.25 Leveraging a large sample size, a meta-analysis can help move the narrative beyond statistical significance and draw attention to the magnitude, direction, and variance in effects.24 Furthermore, the meta-analytic approach allows us to quantitatively examine the heterogeneity among study results, thus motivating the generation of new hypotheses.26 Meta-analyses systematically synthesize large quantities of data generated from multiple primary studies to reach broad generalizations. A well-conducted meta-analysis can provide a comprehensive picture of parameters of interest and their moderators that is not attainable from an individual primary study. Using statistical models to quantify the magnitude of an effect and its heterogeneity, a meta-analysis may also identify areas that require further research.
Here, we employed a meta-analytic methodology to synthesize wastewater-based SARS-CoV-2 viral RNA abundance data published by February 16th, 2021, approximately a year after the beginning of the COVID-19 pandemic. Following a PRISMA guideline,27 we synthesized and reported results from 1751 observations in 20 studies. We asked four fundamental questions: 1) what is the pooled proportion of positive detection of SARS-CoV-2 from wastewater samples; 2) what are the viral RNA levels of the SARS-CoV-2 virus in wastewater collectively and when subgrouped by key methodological variables; 3) what are the overall strengths of correlation between the positive detection or RNA levels of SARS-CoV-2 in wastewater and epidemiological indicators (daily and cumulative cases); and 4) how much of the variation in SARS-CoV-2 viral RNA abundance can be explained by COVID-19 cases alone? To account for study-level variations, mixed-effects models were employed to examine the correlation between SARS-CoV-2 viral RNA levels and positive detection.
Inclusion criteria | Rationales |
---|---|
C1: qPCR data were reported as quantification cycles, copy numbers per volume, genome equivalents per ml, or genome equivalents per weight | C1 provides comparable data among studies |
C2: sampling locations for raw sewage were identified as wastewater treatment plants (WWTPs), sewage collection networks, lift stations, manholes or septic tanks | C2 allows comparisons of SARS-CoV-2 viral titers and percent positivity in wastewater within and across studies |
C3: COVID-19 case records were reported for the associated locations during the sampling times | C3 allows comparisons of SARS-CoV-2 viral titers in wastewater within and across studies |
In the cases that SARS-CoV-2 measurements were performed using a multiplex qPCR/dPCR assay, the value for the variable “primer” for this specific observation was recorded by listing the primer sets employed, spaced by an underscore sign. To illustrate, if RNA levels were estimated using a duplex qPCR assay employing CDC_N1 and CDC_N2 primer sets, the value for the variable “primer” would be recorded as CDC_N1_N2. In some instances, a study may analyze multiple markers independently but report genome equivalents. A singular primer set would be recorded in the “primer” variable if it was specified in the study which primer set was used to calculate the reported genome equivalents; alternatively, the value recorded for the “primer” variable in the observation would include all the primer sets used in the study separated by a comma (e.g., CDC_N1, CDC_N2).
Epidemiological data reported as “cumulative cases” are denoted as “cumulative cases”. Cases reported as “daily new cases”, “new cases”, “positive daily test”, “new positive daily test”, or “seven-day average cases” were denoted as “daily cases”. “Hospital admissions” and “hospitalized patients” were denoted as “hospitalized cases”. All case counts were converted to prevalence, i.e., patients per 100000 inhabitants, to allow synthesis across studies.
The weight of each study in the forest plot was calculated as
Wi = 1/(Vi + T2) |
Table 2 describes the basic characteristics of the included resources. Eighteen studies reported quantitative measurements for SARS-CoV-2 as gene copies per unit mass/volume,8,9,11,14–21,23,30,37–41 while two studies reported Ct values.10,13 Among the 18 quantitative studies, seventeen reported marker copies or genome equivalents per mL,8,9,14–21,23,30,37–41 and one study reported marker copies per gram of biomass.11 Epidemiological indicators were reported as daily cases in nine studies, ranging from 0.6 per 100000 inhabitants to 117 per 100000 inhabitants,11,13,14,16,18,19,21,39,41 cumulative cases were reported in ten studies ranging from 1.6 per 100000 inhabitants to 808 per 100000 inhabitants,8,9,14,15,18,20,30,37,40,41 active cases were reported in four studies10,21,38,40 and hospitalized cases in two studies.13,17 Among these studies, two studies reported both daily and cumulative cases,18,41 one study reported both daily and active cases,21 and one reported both cumulative cases and hospitalized cases.13 Cumulative COVID-19 cases were the most frequently reported, followed by daily, active, and hospitalized cases. SARS-CoV-2 was detected in all studies, irrespective of case prevalence levels, albeit at varying proportions of positive detections.
Author (biosamples, observations) | Country/date of sampling | Sample collection point | Sample type | Population served | Sample fraction | Viral concentration method | Type of case (mean, min, max) |
---|---|---|---|---|---|---|---|
Ahmed, W. et al. (2020)12 (nbiosample = 8, nobs = 32) | Australia | Pumping station, WWTP influent | Grab and composite | 736172 | Supernatant and suspended solids | Electronegative membrane absorption-direct RNA extraction | Cumulative cases (50, 0, 70) |
Feb–April, 2020 | Supernatant | Ultrafiltration (Centricon) | |||||
Baldovin, T. et al. (2021)13a (nbiosample = 9, nobs = 18) | Italy | Municipal sewage network | Grab | 12770–36042 | Supernatant | Ultrafiltration | Cumulative cases (169, 141, 205) |
April 23 and May 05, 2020 | Hospitalized cases (34, 30, 39) | ||||||
D'Aoust, P. M. et al. (2021)21 (nbiosample = 22, nobs = 44) | Canada | Postgrid solids | Grab and composite | 1300000 | Solids | PEG precipitation | Daily cases (117, 19, 572) |
April–June, 2020 | Primary sludge | Alum precipitation–ultrafiltration | Active cases (19, 6, 58) | ||||
Gonçalves, J. et al. (2021)10a (nbiosample = 15, nobs = 30) | Slovenia | Hospital sewage | Composite | N/A | Supernatant | Ultrafiltration | Cumulative casesb (2, 0, 4) |
June, 2020 | Active casesb (2, 0, 4) | ||||||
Gonzalez, R. et al. (2020)9 (nbiosample = 198, nobs = 594) | USA | WWTP influent | Grab and composite | 1700000 | Supernatant | Hollow fiber concentrating pipet | Cumulative cases (229, 1, 2288) |
March–May, 2020 | Adsorption–elution electronegative membrane | ||||||
Graham, K. et al. (2020)11 (nbiosample = 89, nobs = 166) | USA | WWTP influent | Composite | 1700000 | Supernatant | PEG precipitation | Daily cases (2, 1, 12) |
March–April, 2020 | Primary settling tank | Composite | Primary solids | No concentration | |||
March–July, 2020 | |||||||
Haramoto, E. et al. (2020)41 (nbiosample = 5, nobs = 36) | Japan | WWTP influent | Grab | 817192a | Supernatant and suspended solids | Electronegative membrane vortex–ultrafiltration | Cumulative cases (5, 0, 7) |
March–May, 2020 | Electronegative membrane absorption-direct RNA extraction | Daily cases (1, 0, 1.0) | |||||
Hata, A. et al. (2021)14 (nbiosample = 45, nobs = 87) | Japan | WWTP influent | Grab | 697000 | Supernatant | PEG precipitation | Daily cases (8, 0, 19) |
March–April, 2020 | Cumulative cases (15, 0, 26) | ||||||
Kitamura, K. et al. (2021)15 (nbiosample = 32, nobs = 198) | Japan | WWTP influent, municipal sewage network | Grab | N/A | Supernatant | Adsorption–elution electronegative membrane | Cumulative casesb (122, 19, 209) |
June–August, 2020 | PEG precipitation ultrafiltration | ||||||
Solids | Solid precipitation–centrifugation | ||||||
Kumar, M. et al. (2020)37 (nbiosample = 2, nobs = 6) | India | WWTP influent | Grab | N/A | Supernatant | PEG precipitation | Cumulative casesb (7793, 4912, 10674) |
May, 2020 | |||||||
Medema, G. et al. (2020)8 (nbiosample = 25, nobs = 100) | Netherlands | WWTP influent | Composite | 2800000 | Supernatant | Ultrafiltration (Centricon) | Cumulative cases (16, 0, 87) |
Feb–March, 2020 | |||||||
Miyani, B. et al. (2020)39 (nbiosample = 33, nobs = 33) | USA | Municipal sewage network | Grab | 3200000 | Supernatant and suspended solids | Adsorption–elution electropositive column filters | Daily cases (6, 4, 8) |
April–May, 2020 | |||||||
Nemudryi, A. et al. (2020)16 (nbiosample = 17, nobs = 34) | USA | WWTP influent | Composite | 49831 | Supernatant | Ultrafiltration | Daily cases (6, 0, 14) |
March–June, 2020 |
Author (biosamples, observations) | Country/date of sampling | Sample collection point | Sample type | Population served | Sample fraction | Concentration method | Type of case |
---|---|---|---|---|---|---|---|
a Semiquantitative studies. b Cases not normalized by 100000 inhabitants. | |||||||
Peccia, J. et al. (2020)23 (nbiosample = 73, nobs = 226) | USA | Primary settling tank | Grab | 200000 | Solids | No concentration | Daily positive test (26, 3, 60) |
March–June, 2020 | |||||||
Randazzo, W. et al. (2020)38 (a) (nbiosample = 12, nobs = 24) | Spain | WWTP influent | Grab | 1200000 | Supernatant and suspended solids | Aluminium flocculation | Active cases (80, 1, 111) |
Feb–April, 2020 | |||||||
Randazzo, W. et al. (2020)20 (b) (nbiosample = 42, nobs = 42) | Spain | WWTP influent | Grab | 1357177 | Supernatant and suspended solids | Aluminum hydroxide adsorption–precipitation | Cumulative cases (36, 0, 140) |
March–April, 2020 | |||||||
Saguti, F. et al. (2021)17 (nbiosample= 21, nobs = 21) | Sweden | WWTP influent | Composite | 800000 | Supernatant | PS hollow fiber concentrating pipette | Newly hospitalized patients per day (9, 0, 20) |
February–July, 2020 | Adsorption–elution electropositive cartridges–ultrafiltration | ||||||
Sherchan, S. P. et al. (2020)18 (nbiosample = 7, nobs = 28) | USA | WWTP influent | Grab and composite | 290321 | Supernatant and suspended solids | Adsorption–elution electronegative membrane | Cumulative cases (808, 0, 2534) |
Jan–April, 2020 | Supernatant | Ultrafiltration | Daily cases (16, 0, 32) | ||||
Trottier, J. et al. (2020)19 (nbiosample = 7, nobs = 14) | France | WWTP influent | Composite | 470000 | Supernatant | Ultrafiltration | Daily cases (1, 0, 2) |
May–July, 2020 | |||||||
Westhaus, S. et al. (2021) (nbiosample = 9, nobs = 18) | Germany | WWTP influent | Composite | 4429500 | Supernatant | Ultrafiltration | Cumulative cases (123, 72, 220) |
April 08, 2020 | Active cases (72, 30, 174) |
Correlations between COVID-19 cases and wastewater SARS-CoV-2 viral RNA levels were reported in six studies. This is confirmed by our analysis. We performed linear regression on each dataset. Six out of eighteen studies detected significant linear correlations between SARS-CoV-2 viral RNA levels and the respective epidemiological indicators in the study (p-value < 0.05, Table 3, Fig. S2–S4†). These six studies were conducted at WWTPs, amongst which three analyzed the solid fraction, and three analyzed the supernatant/filtrate fraction.
Daily new COVID-19 cases per 100000 inhabitants | |||
---|---|---|---|
Author | Linear regression | ||
Slope | R-Squared | p-Value | |
D'Aoust, P. M. et al. | 0.52 | 0.51 | 1.03 × 10−7 |
Graham, K. et al. | 196.64 | 0.35 | 3.99 × 10−17 |
Scherchan, S. P. et al. | 0.03 | 0.17 | 1.44 × 10−1 |
Peccia, J. et al. | 1994.77 | 0.16 | 2.79 × 10−10 |
Hata, A. et al. | 0.16 | 0.05 | 3.85 × 10−2 |
Miyani, B. et al. | −0.20 | 0.01 | 5.10 × 10−1 |
Haramoto, E. et al. | −4.14 | 0.00 | 7.29 × 10−1 |
Trottier, J. et al. | 0.19 | 0.00 | 8.54 × 10−1 |
Nemudryi, A. et al. | −0.02 | 0.00 | 7.65 × 10−1 |
Cumulative COVID-19 cases per 100000 inhabitants | |||
---|---|---|---|
Author | Linear regression | ||
Slope | R-Squared | p-Value | |
Gonzalez, R. et al. | 0.04 | 0.61 | 4.81 × 10−124 |
Medema, G. et al. | 14.82 | 0.40 | 1.30 × 10−9 |
Sherchan, S. P. et al. | 0.00 | 0.09 | 2.87 × 10−1 |
Haramoto, E. et al. | −4.62 | 0.08 | 9.81 × 10−2 |
Hata, A. et al. | 0.08 | 0.03 | 1.19 × 10−1 |
Westhaus, S. et al. | −0.01 | 0.01 | 6.63 × 10−1 |
Randazzo, W. et al. (b) | 0.45 | 0.01 | 5.79 × 10−1 |
Active COVID-19 cases per 100000 inhabitants | |||
---|---|---|---|
Author | Linear regression | ||
Slope | R-Squared | p-Value | |
D'Aoust, P. M. et al. | 3.85 | 0.33 | 4.70 × 10−5 |
Randazzo, W. et al. (a) | 1.61 | 0.11 | 1.19 × 10−1 |
Westhaus, S. et al. | −0.01 | 0.01 | 7.28 × 10−1 |
Methodological variability was present in all steps of sample collection and analysis procedures (Fig. 2). In terms of sampling locations within a wastewater system, most studies analyzed samples collected at the WWTP (16 studies).8,9,11,12,14,16–21,23,37,38,40,41 A much smaller number of studies sampled at locations in the sewage collection network (two studies)13,39 or in-premise (one study).13 Kitamura, K. et al. examined the SARS-CoV-2 virus in wastewater at both municipal sewage network locations and WWTP influent samples.15 Saguti, F. et al. monitored WWTP influent samples and upstream locations.17 Because case counts for biosamples collected at the sewage network in Saguti, F. et al. were not provided in the publication,17 these biosamples from the upstream location were not included in the meta-analysis. Among the studies sampling at WWTPs, the service population ranged from 12770 to 3.2 million individuals, and covered regions in the Americas (nstudies = 7), Asia (nstudies = 4), Europe (nstudies = 8), and Oceania (nstudies = 1).
Fig. 2 Diagram depicting reported sample collection locations, pre-processing methodologies, and their respective annotations as sampling locations and fractions in this study. |
Upon sample collection, studies showed great variability under sample pre-processing conditions, resulting in the enrichment of different wastewater fractions (Fig. 2). Supernatant/filtrate fractions were recovered in 12 studies using centrifugation between 1840 and 10000g,8,9,11,14–19,30,37,40 while two studies retrieved these fractions by filtrating raw wastewater through 0.22 (ref. 13) and 0.7 μm membranes,10 respectively. Mixed supernatant and suspended solid fractions were identified in six studies where liquid wastewater samples were not subjected to any type of pre-processing. Solid fractions were retrieved in one study from influent wastewater by pellet collection after centrifugation at 1840g,15 while the remaining three studies utilizing solid fractions collected sludge samples directly from primary sedimentation tanks.11,21,23 It is important to highlight that a study may pre-process for more than one fraction (Table 2).
Once a fraction of choice was generated, a viral concentration step was usually performed prior to RNA extraction. The viral concentration protocols relied on the principles of the molecular weight cutoff achieved through ultrafiltration at 10000 Da,8,10,13,15,16,18,19,30,40 the affinity of enveloped viruses to electro-negative membranes, electro-positive membranes, or other adsorbents/flocculants such as PEG, skimmed milk, or aluminum,9,11,14,15,18,20,21,30,37–39,41 or a combination of both mechanisms sequentially.17,21,41 Some protocols did not include a concentration step and performed RNA extraction directly on the solid fraction.11,23 The methodological choices in the concentration step were highly variable, and the twelve different workflows were reported. Reviews on the viral concentration methodology and method evaluation employing surrogates can be found elsewhere.30,31,42–47
Notably, the various choices in separation methods result from an underlying assumption of differential enrichment/partitioning of the viral particles within the fractions in a biosample. Therefore, we considered the fractions as subgroups in achieving pooled estimates of SARS-CoV-2 RNA levels in wastewater (3.2).
Notably, viral RNA levels varied largely even among studies investigating SARS-CoV-2 RNA levels in the same sample fractions, as shown by heterogeneity across studies (I2) higher than 95% in all subgroups (Fig. 5, S5:† cubic root transformed data). We further aggregated the observations by grab/composite sampling and focused on studies that reported WWTP observations alone (Fig. S6 and S7:† cubic root transformed data). The cross-study heterogeneity remained high (I2 >93%) even after data were aggregated in more methodologically homogeneous groups. The observed heterogeneity suggested that pandemic severity, as well as other local variables, may drive the variations in SARS-CoV-2 RNA levels among studies.
First, we built logistic regression models to explain the relationships between positive SARS-CoV-2 detection from sewage and each covariate considered. The models with the sampling mode and fraction analyzed as sole predictors explained 0.6% and 12.4% (Tjur's R-squared) of the total variability in SARS-CoV-2 positive detections, respectively (Table 4). The proportion of variances explained by daily and cumulative cases was 9.3% and 5.9%, respectively (Table 5).
Univariate models | Binomial (nobs = 1508) | Gaussian (log transformation on RNA levels) (nobs = 936) | ||||
---|---|---|---|---|---|---|
Coefficient [95%-CI] | p-Values | Explained variance (R-squared) | Coefficient [95%-CI] | p-Values | Explained variance (R-squared) | |
Grab_composite | 0.01 | 0.07 | ||||
Intercept | 0.75 [0.55, 0.95] | 2.1 × 10−13*** | 2.50 [1.99, 2.98] | <2.00 × 10−16*** | ||
Grab_composite: grab | −0.36 [−0.59, −0.12] | 2.84 × 10−3** | 2.54 [1.94, 3.14] | 3.10 × 10−16*** | ||
Fraction | 0.12 | 0.56 | ||||
Intercept | 0.01 [−0.11, 0.13] | 0.87 | 1.51 [1.25, 1.77] | <2.00 × 10−16*** | ||
Fraction: solid | 2.17 [1.8, 2.56] | <2 × 10−16*** | 7.53 [7.10, 7.96] | <2.00 × 10−16*** | ||
Fraction: solid–supernatant mixture | 1.27 [0.89, 1.67] | 1.9 × 10−10*** | 2.14 [1.56, 2.72] | 9.70 × 10−13*** |
Model name | Daily case model (nobs = 500, nstudies = 8) | Cumulative case model (nobs = 912, nstudies= 8) | ||||||
---|---|---|---|---|---|---|---|---|
Binomial | Gaussian (log transformation on titers) | Binomial | Gaussian (log transformation on titers) | |||||
Fixed effects | Mixed effects | Fixed effects | Mixed effects | Fixed effects | Mixed effects | Fixed effects | Mixed effects | |
Fixed effects | b [95% CI] | b [95% CI] | b [95% CI] | b [95% CI] | b [95% CI] | b [95% CI] | b [95% CI] | b [95% CI] |
Intercept | 0.65 [0.33, 0.96] | 3.32 [−1.35, 8.00] | 6.23 [5.63, 6.84] | 1.78 [−1.09, 4.66] | −0.03 [−0.20, 0.13] | −0.06 [−3.95, 3.83] | 1.78 [1.53, 2.04] | 1.05 [−1.25, 3.36] |
Daily new cases | 0.06 [0.04, 0.09] | 0.11 [0.05, 0.16] | 0.01 [0.001, 0.02] | 0.01 [0.003, 0.009] | — | — | — | — |
Cumulative cases | — | — | — | — | 0.001 [0.001, 0.002] | 0.005 [0.004, 0.007] | 0.0007 0.0001, 0.0013] | 0.002 [0.001, 0.002] |
Random effects | ||||||||
---|---|---|---|---|---|---|---|---|
Study_ID (variance) | — | 23.15 | — | 17.12 | — | 29.75 | — | 10.93 |
Adjusted R2 | 0.09 | — | 0.01 | — | 0.06 | — | 0.01 | — |
Marginal R2 | — | 0.54 | — | 0.01 | — | 0.08 | — | 0.02 |
Conditional R2 | — | 0.94 | — | 0.93 | — | 0.91 | — | 0.90 |
AIC | 422 | 200 | 2579 | 1406 | 1214 | 1012 | 2381 | 1692 |
BIC | 431 | 213 | 2591 | 1422 | 1224 | 1027 | 2394 | 1709 |
Loglik | −209 | −97 | −1287 | −699 | −605 | −503 | −1188 | −842 |
Random effects (p-values) | — | <2.2 × 10−16*** | — | <2.2 × 10−16*** | — | <2.2 × 10−16*** | — | <2.2 × 10−16*** |
Next, we built linear models to examine the relationships between logarithmic transformed viral RNA levels and each covariate. The variance in RNA levels explained by the sampling mode and fraction analyzed was 6.9% and a notable 56.0%, respectively, whereas the variance explained by daily and cumulative cases was 0.9% and 0.8%, respectively. In all these models, the roles of methodological variables and epidemiological indicators were significant (p < 0.05, Tables 4 and 5). The daily or cumulative cases and sampling mode explained comparable proportions of variances. Notably, the fraction analyzed explained dramatically higher variance in viral RNA levels than any other variables.
For a mixed-effects model, we examined both the marginal R-squared, which is the proportion of variance explained by the fixed effects alone (daily or cumulative cases), and the conditional R-squared, which describes the proportion of variance explained by both the fixed and random factors (cases and the study identities respectively). Notably, mixed models exhibited conditional R-squared close to or over 0.9 for both positivity and viral RNA levels models reporting daily new cases or cumulative cases (Table 5). Thus, simultaneously considering variability across studies greatly improved our ability to explain the variation in wastewater SARS-CoV-2 measurements.
Supernatant/filtrate, solid–supernatant mixture, and solid fractions increased by average detection rates of 0.53 (95%-CI [0.32; 0.75]), 0.62 (95%-CI [0.12; 1]), and 0.82 (95%-CI [0.43; 1]), respectively (Fig. 4). The fraction analyzed explained 12.4% of the variance in the proportions of positive detection and 56% of the variance in RNA levels. Solid fractions exhibited SARS-CoV-2 viral RNA levels that were orders of magnitude higher than supernatants/filtrates and solid–supernatant mixtures. This observation is in agreement with the previous literature showing enrichment of the SARS-CoV-2 genetic material in wastewater solids (i.e., primary settled solids).50 Given the higher proportion of SARS-CoV-2 viral RNA in solid fractions, workflows utilizing wastewater solids may be useful to track SARS-CoV-2 when infections remain at low levels in the sewershed (i.e., periods between peaks of infection, early warning detection, etc.). The large variance in viral RNA levels explained by the fraction analyzed and the large magnitudes in regression coefficients suggest that standardizing the fraction analyzed needs to be prioritized when researchers would like to design monitoring efforts across multiple labs.47 The overall detection rate and those in subgroups of any sewage fraction were below one, suggesting a need for tools to maximize the chance of SARS-CoV-2 detection from sewage samples.
In our meta-analysis, large heterogeneity was detected in all effect sizes investigated (i.e., proportions of positive detections, viral RNA levels, and Pearson Rho between RNA levels and daily or cumulative cases, Fig. 3–5, Table S3†). We hypothesize that the unexplained variations in SARS-CoV-2 RNA levels detected in wastewater can be affected by study-level factors, such as COVID-19 prevalence, lags in epidemiological data reporting,51 methodological choices,46 and differences in the wastewater collection system design. Our meta-analysis found that metadata about the collection system, such as per capita water consumption, relative contributions of domestic vs. commercial/industrial water, or sewage travel times (i.e., residence times), are currently rare. These collection system-level variables can affect the dilution of fecal materials and the genetic decay of the viral signal.31,52–54 To illustrate, domestic water consumption can vary significantly in different areas, a person in the city of Berlin generates on average 135 L of wastewater per day,55 while a person in Qatar generates on average 500 L of wastewater per day.56 Thus, the dilution of the fecal matter may vary largely in different wastewater systems. Another aspect is combined sewage in comparison to sanitary sewage. Rainfall can affect the dilution of fecal matter in a combined sewage system through stormwater run-off,57,58 while not so in a sanitary sewage system. Even among sanitary sewage, the contribution of domestic waste can vary by system, ranging from as low as 30% of the total wastewater discharge to as high as near-complete dominance.59 These design differences could lead to variations in SARS-CoV-2 RNA levels even in systems where active viral shedders were identical.
Because water usage and system design characteristics can affect SARS-CoV-2 virus measurements at the wastewater treatment plant, more detailed metadata reporting regarding the wastewater collection system is needed to better explain variations across sites. McClary-Gutierrez, J. S. et al. compiled a list of minimum reporting data for WBE applications for COVID-19,60 which can support more consistent metadata reporting across studies and facilitate the synthesis of results. Lately, it was proposed that wastewater can be viewed as an independent indicator of true prevalence, as epidemiological indicators from current reporting can be affected by under-reporting.61 Therefore, methods and tools to investigate the wastewater metagenome and derive system-level data, or bridge wastewater-based measurements to prevalence, deserve more attention.62
To address the dilution of fecal matter by various wastewater streams, normalization of SARS-CoV-2 viral RNA levels by fecal strength indicators has been performed in some studies. These propose dividing the SARS-CoV-2 viral RNA levels by the copy numbers of pepper mild mottle virus (PMMoV),21,58,63 a diet-associated RNA virus commonly found in human feces.64 Among the qualified studies included in this meta-analysis, only one study utilized normalization by PMMoV,21 thus a meta-analysis on the effect of PMMoV normalization on correlations between wastewater SARS-CoV-2 measurements and epidemiological indicators was not included in this study. The effects of normalization techniques on the performance of regression models can be a topic of future interest in meta-analysis efforts when studies employing such techniques become more abundant.
It should be noted that heterogeneity in viral RNA levels and correlations observed here may not be fully explained by recovery efficiencies of viruses from wastewater samples during viral concentration workflows. To illustrate this complexity, we discuss two studies where recovery efficiencies were reported. In one study, an average viral RNA level of 881 ± 633 marker copies per mL was detected when COVID-19 prevalence in the associated area was between 10 and 80 cumulative cases per 100000 inhabitants;8 in another study, an average viral RNA level of 1.9 ± 6.0 marker copies per mL was reported within the same range of COVID-19 prevalence (10–80 cumulative cases per 100000 inhabitants).9 After adjusting the viral RNA levels by reported recovery efficiencies (73 and 7.7%, respectively), the adjusted copy numbers (1206 and 27 marker copies per mL, respectively) still vary by two orders of magnitude.
While the field's ability to quantify the effects of methodological variables and collection systems is an important ongoing research topic,46,47,65 mixed-effects models treating “studies” as a source of random effects can be considered useful for performing inference and prediction. Mixed-effects models handle a wide range of scenarios where observations have been sampled in a hierarchical structure rather than completely independently. In this study, treating studies as a source of random effects on intercepts profoundly improved the quality of the model, as seen in improved AIC and BIC compared to the respective fixed-effects models (Table 5). The final models reached conditional R-squared values above 0.9. The mixed-effects approach provides an alternative for researchers to leverage existing data from studies conducted elsewhere to build models useful for explaining variations in local observations.
Our study had several limitations. The most notable is the large amount of unexplained heterogeneity in positive detection, SARS-CoV-2 RNA levels, and Pearson correlations across studies. This is likely attributable to variability in methodological differences in SARS-CoV-2 virus measurements, wastewater-system characteristics, ways the epidemiological data were collected and reported as well as different COVID-19 incidence at the time the studies were conducted (e.g., COVID-19 waves and case fluctuations). Thus, we employed mixed-effects models to make inferences about the correlation between epidemiological indicators and viral detection/RNA levels, treating study-level variations as a source of random effects.
This systematic review and meta-analysis were performed using the Web of Science core collection focusing on the English-language literature. Other indexes such as PubMed, Medline, and Scopus are worth examining in future research. Sources such as Europe PMC which included preprint servers will increase data inclusion. As more data become available, future meta-analysis focusing on the collection of upstream sewage streams and comparisons of SARS-CoV-2 detection sensitivity between qPCR and dPCR may become possible. The present study may be used as a framework for future studies analyzing larger datasets.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2ew00084a |
This journal is © The Royal Society of Chemistry 2022 |