Meng
Ling
a,
Hanadi S.
Rifai
*a,
Charles J.
Newell
b,
Julia J.
Aziz
b and
James R.
Gonzales
c
aUniversity of Houston, 4800 Calhoun Rd., Room N107D, Houston, TX 77204-4003, USA. E-mail: mling@mail.uh.edu; rifai@mail.uh.edu
bGroundwater Services Inc., 2211 Norfolk, Suite 1000, Houston, TX 77098-4044, USA. E-mail: jjaziz@gsi-net.com; cjnewell@gsi-net.com
cAir Force Center for Environmental Excellence, Brooks AFB, TX 78235-5363, USA. E-mail: James.Gonzales@hqafcee.brooks.af.mil
First published on 8th January 2003
An innovative methodology for improving existing groundwater monitoring plans at small-scale sites is presented. The methodology consists of three stand-alone methods: a spatial redundancy reduction method, a well-siting method for adding new sampling locations, and a sampling frequency determination method. The spatial redundancy reduction method eliminates redundant wells through an optimization process that minimizes the errors in plume delineation and the average plume concentration estimation. The well-siting method locates possible new sampling points for an inadequately delineated plume via regression analysis of plume centerline concentrations and estimation of plume dispersivity values. The sampling frequency determination method recommends the future frequency of sampling for each sampling location based on the direction, magnitude, and uncertainty of the concentration trend derived from representative historical concentration data. Although the methodology is designed for small-scale sites, it can be easily adopted for large-scale site applications. The proposed methodology is applied to a small petroleum hydrocarbon-contaminated site with a network of 12 monitoring wells to demonstrate its effectiveness and validity.
In recent years, a handful of approaches have been developed to improve existing monitoring programs.1–17 Among them data-driven approaches2,4,8–11,13–15 and simulation-optimization approaches3,6,7,16,17 are most widely used. These approaches use two major strategies—spatial sampling analysis and temporal sampling analysis—to optimize an existing monitoring plan.
Data-driven approaches analyze historical monitoring data with statistical or geostatistical methods to assess the efficiency of an existing monitoring plan and suggest ways to optimize it. Spatial sampling optimization is usually achieved via variogram modeling and kriging methods.2,4,13–15,18,19 In these methods, kriging maps, kriging weights, or kriging variance can be evaluated to determine the redundancy or importance of a sampling location. Redundant locations can then be eliminated from the monitoring network if the elimination does not cause significant information loss in plume characterization. Cameron and Hunter,4 for example, used the global kriging weight assigned to each location to determine redundant locations and the interpretation of overall kriging variance to check for information loss. When a monitoring network needs to be expanded, new sampling locations are determined from potential locations that will bring maximum information gain. The information gain in Rouhani14 is the total variance reduction due to measurements at new locations. This criterion was expanded by Rouhani and Hall15 to include the magnitude of the variable of interest. As an alternative to geostatistical approaches, Hudak and Loaiciga20 presented an innovative approach using facility location theory to expand a preexisting monitoring network.
In data-driven approaches, temporal sampling optimization is usually achieved by autocorrelation reduction,21–22 temporal variogram analysis,4,10,23 or statistical trend analyses.4,8,9,24 The first two types of approaches determine the minimum sampling interval that would lead to statistical independence or zero autocorrelation between consecutive samples, thereby increasing the information gain of each individual sample. Tuckfield,23 for example, utilized a higher order autocorrelation model for the construction of a temporal variogram. The fitted range of the temporal variogram is an estimate of the sampling interval required to achieve zero autocorrelation. A representative approach for determining the least frequent sampling utilizing statistical trend analysis is that of Ridley et al.8,9 The logic of their approach is that a slow change in concentrations can be tracked with a low frequency of sampling while a drastic change in concentrations should be tracked with a high frequency of sampling. Another example is the “iterative thinning” used by Cameron and Hunter.4 In “iterative thinning”, sampling frequency is reduced gradually by comparing the change in the trend estimate before and after taking out samples from a time series.
Simulation–optimization approaches usually combine a transport simulation model and an optimization algorithm to determine reduced sampling plans. Reed et al.,6 for example, presented an approach to reduce an existing monitoring network without significant loss of accuracy in contaminant mass estimation. In their approach, a numerical simulation model is used to predict the future contaminant plume; its total mass is estimated via plume interpolation based on subsets of wells; and a genetic algorithm is used to search all potential subsets for the optimal ones. Reed et al.7 built on this approach and examined the tradeoffs between sampling cost reduction and local concentration estimation errors. Herrera et al.3 combined a stochastic flow-and-transport model and a linear Kalman filter to choose the optimal sampling locations and sampling times for an existing monitoring network.
The aforementioned approaches are statistically sound and provide insight into the issue of improving existing monitoring plans. However, most of these approaches are designed for large-scale sites. When applied to small-scale sites, i.e., sites with less than 20 monitoring locations and a relatively short sampling history, the aforementioned techniques are faced with several obstacles. First, the number of sampling points in small-scale sites is insufficient for deriving a dependable spatial correlation structure.25 Second, due to inadequate site assessments at many small-scale sites, the available data are often insufficient for setting up a reliable transport model, making simulation-optimization approaches impractical. Third, the short record of monitoring data (e.g., 4∼5 years of data) at some sites makes it difficult to use autocorrelation-based or temporal variogram-based sampling frequency determination approaches. Therefore, a data-driven methodology that is more appropriate for improving existing groundwater monitoring plans at small-scale sites is needed.
This paper presents an innovative data-driven methodology specifically designed for improving groundwater monitoring plans at small-scale sites. The methodology is used to determine the minimum number of sampling locations, the optimal frequency of sampling, and the location for new monitoring wells. The methodology is illustrated with a site application to demonstrate its effectiveness and validity.
(1) |
(2) |
(3) |
For any sampling plan, Pd, the overall plume delineation error, DE, is calculated as:
(4) |
(5) |
Similarly, the average plume concentration error, CE, for any sampling plan, Pd, is defined as:
An enumeration algorithm is designed to search for the optimal sampling plan(s) within the solution space. First, the optimization starts with the n sampling plans that have n−1 sampling locations (i.e., only one location eliminated). If more than one such sampling plan satisfies eqns. (2) and (3), “seed” locations are selected. The “seed” locations are the sampling locations that when eliminated make the DE and CE values of the resulting sampling plan less than fDEspecified and fCEspecified, respectively. In this case, f is the relaxation factor used to loosen the selection process, e.g., f = 2. Second, all combinations of any two “seed” locations are eliminated to form new sampling plans that have n−2 sampling locations. If more than one such sampling plan satisfies eqns. (2) and (3), a new set of “seed” locations (a subset of the previously selected “seed” locations) is selected using the same rule defined in the previous step. Third, the process is repeated (each time with one less location in the sampling plans) until only one or no sampling plans meet eqns. (2) and (3) criteria. In the former case, the only one sampling plan left is the optimal solution. In the latter case, the sampling plan in the previous step that has the smallest DE and CE values is the optimal solution.
The use of “seed” locations in the enumeration algorithm helps reduce the number of sampling plans to be examined and the use of a relaxation factor ensures good sampling plans will not be dismissed. Preliminary tests with monitoring data from several sites showed that an f between 2 and 3 is sufficient to cover almost all sub-optimal sampling plans. Also because the method proposed in this paper is for sites with a small number of sampling locations (10 ∼ 20), the optimization method is computationally efficient.
There are generally two types of inadequately delineated plumes: (1) plumes whose downgradient or leading edges are not captured; and (2) plumes whose lateral boundaries are not captured. This paper presents a method to locate new sampling points for the first type of plumes based on the configuration and characteristics of the observed plume. Once the leading edge of the plume is captured, utilizing estimated dispersivity values, the second type of problem can be easily solved. Procedures for this well-siting method include the following 4 steps.
Step 1: determine the most probable plume centerline based on measured groundwater elevation contours and the observed plume. Groundwater elevation contours can be generated based on the averaged elevations across representative sampling events (e.g., 4 quarterly measurements in one year) so that seasonal changes in groundwater flow can be accounted for. The plume centerline starts from the source and generally follows the direction of hydraulic gradient determined from the elevation data. Although the process may include subjective judgement and is not complete, the resulting accuracy is sufficient for well-siting purposes.
Step 2: regress plume centerline concentrations against their respective distances away from the source with an exponential function. This regression follows the concept of bulk attenuation rate in natural attenuation, which assumes that the spatial change in plume centerline concentrations can be modeled as exponentially decaying with distance downgradient from the source.26 Two types of data can be used for this regression: (1) data from sampling points located on or close to the centerline; and (2) data estimated from hypothetical sampling points on the centerline through plume contouring. The first type of data yields more accurate results than the second type of data, which are only approximate values derived from contour lines. But in many small sites with a limited number of sampling locations, it is difficult to find monitoring wells on or close to the plume centerline and the second type of data has to be used instead. The exponential regression of plume centerline concentrations can also be replaced with a linear regression of logarithmic centerline concentrations against their respective distances to the source.
Step 3: predict the distance from the leading edge of plume to the source (dL) and estimate plume dispersivity values from literature data. The exponential function obtained in Step 2 is used to estimate dL. The dispersivity values can then be estimated from empirical relationships between dispersivity and plume length.27,28 For example, if the plume length is Lp, the longitudinal dispersivity (αx) can be determined from 0.1Lp based on Pickens and Grisak29 and the transverse dispersivity (αy) determined using 0.33αx.30
Step 4: determine new sampling locations using results from Step 3. One or more leading-plume-edge sampling points (depending on the width of the plume) can be located within a distance that is dL + fxαx along the plume centerline downgradient from the source, where fx is a coefficient that can range from 0∼2. This coefficient is used to account for uncertainty in the regression results and uncertainty in the hydrogeologic data. One or more lateral sampling points can be located at a distance that is fyαy perpendicular to the plume centerline and approximately halfway between the new leading-plume-edge sampling location and the existing leading-plume-edge sampling location. Here fy is a coefficient that can range from 4∼8 incorporating both estimation uncertainty and allowing a sufficient distance between the plume edge and the new location.
If a single iteration of the above procedures cannot fully capture the contaminant plume, i.e., due to uncertainty involved in the estimation process, a second iteration can be performed using the monitoring results from the new sampling points located in the first iteration. Once the plume has been captured to its intended concentration level, guidance from AFCEE31 can be used to complete the monitoring network. For example, a “sentry” well can be placed in between the leading-plume-edge well and the nearest downgradient receptor.
The IRLS robust regression is a type of weighted least squares regression and its regression coefficients can be obtained from
b = (X′WX)−1X′WY | (7) |
The slope estimated from the IRLS robust regression provides the magnitude and direction (i.e., positive or negative) of the concentration trend. For decision purposes, three thresholds are defined for the magnitude: Low (e.g., 1 MCL year−1), Medium (e.g., 2 MCL year−1), and High (e.g., 4 MCL year−1). These thresholds are defined according to the hydrogeologic characteristics and plume conditions of the site and are used later to form decision rules for the sampling frequency determination.
For regression analysis of “noisy” data (e.g., when robust regression is used), standard analytical methods for evaluating the confidence limits may not be available or may only be approximately applicable when the sample size is large. In such complex cases, Bootstrap procedures33 provide a reliable estimate for the confidence limits of the regression coefficients. In short, the Bootstrap method uses a resample strategy to generate a random sample from the observed data with replacement and evaluates the results derived from statistical analysis of the generated sample. The Bootstrap procedures as applied in this study includes: (1) generating a random sample of large size from the monitoring data observed at a sampling location; (2) performing the IRLS robust regression on each of the generated sample to obtain the slope estimate; and (3) computing the bias-corrected, adjusted confidence limits36 for the slope estimate. The uncertainty of the trend is determined in the following way: if the lower 95% confidence limit on a positive slope is greater than 0, the uncertainty is <5%; if the lower 90% confidence limit on a positive slope is less than 0, the uncertainty is >10%; if the upper 95% confidence limit on a negative slope is less than 0, the uncertainty is <5%, etc.
The uncertainty and direction of the concentration trend are combined to classify the concentration trend qualitatively into six categories: Decreasing (negative slope with uncertainty <5%), Probably Decreasing (negative slope with uncertainty between 5% and 10%), Stable (negative slope with uncertainty >10% and COV < 1), No Trend (positive slope with uncertainty >10% or negative slope with uncertainty >10% and COV ≥ 1), Probably Increasing (positive slope with uncertainty between 5% and 10%), and Increasing (positive slope with uncertainty <5%). The COV introduced above is a varied form of the coefficient of variation defined as the range of data divided by its mean.
These qualitative concentration trends are considered with the magnitude of the concentration trend to form a set of decision rules for determining the sampling frequency (Fig. 1). For example, if the qualitative trend is Probably Decreasing and the magnitude of the trend is between Low and Medium, the sampling frequency suggested for the sampling location is annual. A biennial sampling decision can also be made if the qualitative trend is Stable, Probably Decreasing, or Decreasing, and the maximum concentration in the data is less than the specified cleanup level (e.g., MCL). The end-point sampling frequency in this approach can be quarterly, semiannual, annual, or biennial. The determined frequency of sampling should be subjected to further review to ensure continued compliance with regulatory and remedial objectives. The logic behind the aforementioned decision process is summarized as follows: (1) a higher rate of change in concentrations should be tracked with a higher frequency of sampling; (2) an increasing trend is more of a concern than a decreasing trend; and (3) flexibility should be available for adjusting the monitoring frequency over the life of a remedial process. These decision rules are in agreement with approaches suggested by the US EPA.37
Fig. 1 Decision matrix for determining the frequency of sampling. |
Fig. 2 Site plan. Groundwater elevation contours were drawn with January 1999 data. |
Petroleum hydrocarbon contamination was first discovered in 1991 during the excavation of several petroleum product storage tanks. Field observations at the time of tank removal and subsequent soil sampling investigations indicated the presence of hydrocarbons in the tank backfill material and in the native soil surrounding the tank basin. A total of 11 monitoring wells (MW-1 through MW-11) were installed between 1991 and 1992 at onsite (7 wells) and off-site (4 wells) locations (Fig. 2). Based on the observed data at the time of tank decommissioning, the plume length was estimated to be approximately 230 feet. Product accumulations in wells MW-1 and MW-2 were removed by manual bailing during 1991 and by passive bailers between 1992 and 1999. One additional downgradient offsite well (MW-12) was installed in 1997 but inconclusive results indicated the potential for an accidental spill of petroleum hydrocarbons affecting this location. No active remediation was undertaken at the site. A risk assessment was performed for the site and benzene was found to be the contaminant of concern. Generally, a steadily decreasing concentration trend is evident between 1991 and 1999 (Fig. 3). Monitoring results and natural attenuation parameter measurements also indicated that natural attenuation is remediating the contaminant plume. Sampling was terminated in January 1999 after demonstrating that the plume was stable and the risk-based target level of benzene of 470 ppb had been met in downgradient monitoring wells.
Fig. 3 Benzene concentrations and groundwater elevations at an upgradient well (MW-7), two inside-plume wells (MW-3 and MW-5), and a downgradient well (MW-11). |
Although the depth of the lower silty sand aquifer is not known (>13 feet), the aquifer can still be approximated as a single-layer aquifer in which measured concentrations are assumed to be vertically-mixed. There are two reasons for this assumption. First, the site was contaminated by petroleum hydrocarbon, which is a Light Non-Aqueous Phase Liquid (LNAPL). This means vertically the source is around the fluctuating water table. Second, no obvious downward vertical hydraulic gradient was observed at this site, meaning the downward movement of contaminants is primarily due to vertical dispersion and effects of recharge.
Monitoring data from a total of 20 sampling events were available for analysis. The first 8 sampling events were performed between February 1991 and December 1992 with a quarterly frequency on average, followed by a break of sampling for about two years. From October 1994 to January 1999 another 12 sampling events were performed with an average frequency of twice per year. Only eight sampling events were selected for the spatial redundancy reduction analysis. All other sampling events included less than 10 monitoring wells and were not selected since they do not have enough sampling points for delineating the plume by interpolation. Although the risk-based benzene target concentration was 470 ppb, a standard target concentration of 5 ppb (the MCL of benzene) was used for this case study to model the worst-case scenario potentially applicable for off-site areas. The addition of new sampling locations for the inadequately delineated plume was also illustrated based on the 5 ppb target concentration. For sampling frequency determination, all 12 sampling events were analyzed after analyzing the first 8 sampling events to further demonstrate the validity of this method.
Fig. 4 Benzene plume contours (in ppb) for sampling event 13. Sampling locations are marked by open triangles. |
Well(s) to be eliminated | Delineation error (DE) | Concentration error (CE) | “Seed” location? | |
---|---|---|---|---|
a The results are based on contours of 4 concentration levels: 1 MCL, 20 MCL, 100 MCL, and 1000 MCL; weights used for these concentration levels are 4, 3, 2, and 1, respectively; the MCL of benzene is 5 ppb; plume interpolation was performed on a grid with equal-sized cells using natural neighbor interpolation. | ||||
One well eliminated | MW-1 | 11% | 6.9% | |
MW-2 | 8.5% | 7.1% | Yes | |
MW-3 | 4.0% | 4.1% | Yes | |
MW-4 | 9.4% | 6.0% | Yes | |
MW-5 | 12.3% | 15.0% | ||
MW-6 | 5.2% | 4.8% | Yes | |
MW-7 | 31.8% | 204.4% | ||
MW-8 | 14.8% | 1.8% | ||
MW-9 | 18.7% | 4.7% | ||
MW-10 | 51.3% | 260.3% | ||
MW-11 | 26.8% | 11.3% | ||
MW-12 | 33.9% | 16.4% | ||
Two wells eliminated | MW-2 and MW-3 | 11.8% | 2.4% | |
MW-2 and MW-4 | 13.9% | 1.8% | ||
MW-2 and MW-6 | 17.2% | 4.8% | ||
MW-3 and MW-4 | 11.0% | 9.7% | ||
MW-3 and MW-6 | 8.2% | 8.1% | ||
MW-4 and MW-6 | 15.1% | 10.1% |
Fig. 5 Addition of proposed new sampling locations into the existing network. Groundwater elevation contours were drawn using averaged data between 1991 and 1999. |
The regression results are listed in Table 2. For comparison, two regressions were derived, one using representative data obtained by averaging concentrations from all 20 sampling events, another using representative data obtained by averaging concentrations from 4 sampling events performed between 1995 and 1996, the time period just before MW-12 was installed. Similar results are obtained for dL: 365 feet for the former regression and 380 feet for the latter regression. The longitudinal dispersivity (αx) is assumed to be 36.5 feet, using the estimated plume length of 365 feet (delineated to 5 ppb) and based on Pickens and Grisak.29 The transverse dispersivity is calculated to be 11 feet assuming that the transverse dispersivity is 33% of the longitudinal dispersivity.30 The two coefficients, fx and fy, are assumed to be 1 and 6, respectively. Finally, the distance from the new leading-plume-edge well to the source is 401.5 feet and the distance from the lateral new wells to the plume centerline is 66 feet. The approximate positions of these new proposed sampling locations are illustrated in Fig. 5. It is seen that the proposed leading-plume-edge well is just a little farther than MW-12. However, concentrations from MW-12 were much higher than 5 ppb, even higher than concentrations from MW-11, indicating the potential for another source near this location.
Well ID | Distance from source (x feet) | Regression 1 concentrationsa (y ppb) | Regression 2 concentrationsb (y ppb) | |
---|---|---|---|---|
a Concentrations used in regression 1 are based on the average concentrations between 1991 and 1999. b Concentrations used in regression 2 are based on the average concentrations between 1995 and 1996. | ||||
Regression data | MW-2 | 0 | 12656 | 18365 |
MW-5 | 99.5 | 8025 | 9953 | |
MW-11 | 224.4 | 78 | 113 | |
Regression parameters | y = 24610e−0.0233x | y = 33731e−0.0232x | ||
R 2 = 0.86 | R 2 = 0.88 | |||
Distance from source for 5 ppb (feet) | 365 | 380 |
Well ID | IRLS slope (ppb year−1) | Uncertainty of the trend (%) | Sample size | Qualitative trend | Frequency | |
---|---|---|---|---|---|---|
a The thresholds for the magnitude of trend are Low (5 ppb), Medium (10 ppb), and High (20 ppb); the uncertainty in the trend is estimated with 1000 Bootstrap runs. | ||||||
First 8 sampling events (1991–1992) | MW-1 | Not analyzed (no dissolved phase concentration was measured) | ||||
MW-2 | Not analyzed (no dissolved phase concentration was measured) | |||||
MW-3 | −2102 | >10% | 8 | Stable (COV = 0.84) | Quarterly | |
MW-4 | −245 | >10% | 8 | No trend (COV = 2.03) | Quarterly | |
MW-5 | −1527 | >10% | 8 | No trend (COV = 1.60) | Quarterly | |
MW-6 | −169 | >10% | 7 | Stable (COV = 0.87) | Quarterly | |
MW-7 | −0.25 | Between 5∼10% | 7 | Probably decreasing | Biennial | |
MW-8 | −194 | <5% | 7 | Decreasing | Semiannual | |
MW-9 | 0.03 | >10% | 7 | No trend (COV = 2.55) | Annual | |
MW-10 | All below detection limit (1 ppb), recommended frequency is | Biennial | ||||
MW-11 | Not analyzed (only 3 records, at least 6 records are required for analysis) | |||||
MW-12 | Not analyzed (monitoring well not installed during this period) | |||||
Last 12 sampling events (1994–1999) | MW-1 | −1829 | <5% | 11 | Decreasing | Semiannual |
MW-2 | −1902 | <5% | 11 | Decreasing | Semiannual | |
MW-3 | −377 | >10% | 9 | No trend (COV = 1.26) | Quarterly | |
MW-4 | −0.64 | >10% | 7 | No trend (COV = 1.96) | Annual | |
MW-5 | −1353 | <5% | 9 | Decreasing | Semiannual | |
MW-6 | 69 | >10% | 9 | No trend (COV = 1.13) | Quarterly | |
MW-7 | All below detection limit (1 ppb), recommended frequency is | Biennial | ||||
MW-8 | All below detection limit (1 ppb), recommended frequency is | Biennial | ||||
MW-9 | All below detection limit (1 ppb), recommended frequency is | Biennial | ||||
MW-10 | All below detection limit (1 ppb), recommended frequency is | Biennial | ||||
MW-11 | −16 | >10% | 8 | No trend (COV = 1.61) | Semiannual | |
MW-12 | −719 | <5% | 6 | Decreasing | Semiannual |
The validity of these sampling frequency recommendations can also be judged by comparing the results between the two time periods. For example, MW-5 has a quarterly recommendation for the first time period because of the fluctuating concentrations while the steady decreasing concentrations in the second time period lead to a semiannual recommendation (see Fig. 3, MW-5). In the case of MW-8 and MW-9, etc., the falling of concentrations below the detection limit in the second time period indicates that the biennial sampling would be adequate for characterizing the concentration trend. In addition, the two most downgradient monitoring wells, MW-11 and MW-12, are sampled semiannually which provide an added measure of protection against unexpected plume behavior.
Although the methodology as proposed is designed for small-scale sites, the basic strategies are applicable to large-scale sites. The two criteria used in the spatial redundancy reduction analysis, when combined with more powerful optimization algorithms such as genetic algorithms, can be applied to large-scale sites. The logic in the sampling frequency determination method is generally applicable and can be realized with other statistical methods. For example, using a quadratic function instead of a linear function in fitting the concentration data may in some cases provides a better description of the concentration trend. For a complicated three-dimensional aquifer, provided that enough vertical monitoring data are available, a plume volume delineation criterion and a plume mass criterion can be developed to facilitate the redundancy reduction process.
This journal is © The Royal Society of Chemistry 2003 |