Sharon Squire*a, Michael H. Ramseyb and Michael J. Gardnerc
aEnvironmental Geochemistry Research Group, T. H. Huxley School of Environment, Earth Science and
Engineering, Imperial College of Science, Technology and Medicine, London, UK SW7 2BP
bCentre for Environmental Research, School of Chemistry, Physics and Environmental Science, University of Sussex, Falmer, Brighton, UK BN1 9QJ
cWRc-NSF, Henley Road, Medmenham, Marlow, Buckinghamshire, UK SL7 2HD
First published on UnassignedUnassigned7th January 2000
The fitness-for-purpose of a sampling protocol to spatially delineate a region of contamination has been assessed for the first time by use of a collaborative trial in sampling, conducted on a synthetic reference sampling target (RST). This trial employed the RST to show the agreement between one participant’s estimate of the extent and intensity of contamination with that of the ‘true’ value and those of other participants, when they were all using the same nominal protocol. The collaborative trial showed the performance of the protocol when it was applied in any of its four, equally probable orientations. Nine samplers each independently collected soil samples using a herringbone sampling protocol, applied in two randomly selected orientations. Test portions of the samples were then chemically analysed using a single analytical system and the resulting ‘hot spot’ of contamination spatially delineated using two independent methodologies. This spatial extent of contamination was compared with the dimensions of the true hot spot to score the participants, based on a novel adaptation of the International Harmonised Protocol. The value of the score was derived from a weighted sum of the false negative and false positive areas designated as contaminated by the participants. Within- and between-sampler variations were used to assess the performance of the sampling protocol both for the spatial delineation and for the estimation of contaminant concentration at particular sampling locations. The sampling protocol investigated in this CTS was found to be fit-for purpose on this, relatively simple, RST. For a single sampling location situated on a hot spot, sampling repeatability was estimated as 60.08%, and sampling reproducibility 85.79%. This uncertainty contrasts with the sampling reproducibility of 3.77% for a single sampling location situated on the background population of uncontaminated soil. This difference is partially due to a variation in the soil heterogeneity between the contaminated and uncontaminated sample populations. Sampling bias was not significant for either samplers or the sampling protocol, although such a bias may have been masked by the heterogeneity of the sampling target.
Applied to sampling, the above approach requires a number of participants (called samplers) to take two sets of samples from a target using various interpretations of the same sampling protocol. Each sample is then analysed in duplicate, under randomised repeatability conditions, and hierarchical analysis of variance (ANOVA) used to decide whether within-sampler and between-sampler precision are within a specified fitness-for-purpose criterion.2 Chemical analysis under repeatability conditions is required to avoid confusing analytical and sampling variations. The measurements of concentration are treated with ANOVA to estimate precision (as standard deviations) between-samplers (s2), within-samplers (s1) and between analytical duplicates (s0). The within-sampler variation is also called the sampling repeatability standard deviation3 (s1 = sr(s)) and refers to one sampler using the same procedure and equipment over a short period of time. Reproducibility is derived from the sum of squares of the within- and between-sampler standard deviations (√s21 + s22 ) and refers to measurements made on a single or composite sample, collected by different participants using the same sampling protocol. The reproducibility standard deviation represents the uncertainty in measuring the mean concentration of an analyte using the selected protocol. If the uncertainty is found to be too large for particular investigations (i.e., not fit-for-purpose) then modifications to the protocol would be required, e.g., collecting composite rather than single samples.4
The above methodology assesses the protocol in terms of precision, but makes no estimate of bias arising from the sampling methodology. The existence of this bias is a contentious issue, being questioned by some authors,5 but recognised by others.6 Such bias is usually difficult to estimate with respect to the true concentration of a contaminant within contaminated land, as the true concentration is never known. An alternative reference point for the estimation of the sampling bias is the consensus value from a substantial number of measurements made by different protocols4 and/or independent samplers.2 Previous applications of this methodology have taken no account of the spatial variability of the analyte in question, which is a parameter often required for assessing potentially contaminated areas for remediation.
The present study therefore used a synthetic reference sampling target (RST), comprising a single hot spot of known concentration and position, to act as a reference value against which to assess the performance of the sampling process.7 This collaborative trial in sampling (CTS), in addition to the objectives of previous trials, allows the first estimates of the bias from sampling to be obtained, these being traceable to a known mass of pure analyte. Such biases could arise from several causes, such as contamination from the sampling tools, inappropriate handling or selective sampling.8 A new scoring method, based on the true hot spot characteristics, was required to assess the fitness-for-purpose of the sampling protocol to spatially delineate an area of contamination. The results of the CTS are processed to provide an assessment of the sampling protocol, and its application by each participant, in the form of a score derived from a novel adaptation of the International Harmonised Protocol.9
The objectives of this study were therefore to determine: (1) whether it is possible to use a spatially resolved CTS to judge the fitness-for-purpose of a particular sampling protocol (e.g., herringbone pattern, n = 25); (2) whether the variation in spatial delineation by samplers was greater within-sampler or between-samplers; and (3) the measurement uncertainty caused by the precision and bias of the sampling and analytical methods at two selected sampling locations, where one location is on the hot spot and the other is on the background population of uncontaminated soil. The methodology for the estimation of spatial uncertainty from the regions of soil classified as contaminated will be described in a subsequent paper.
![]() | ||
Fig. 1 Diagram of the sampling target showing the true hot spot location and four possible herringbone pattern orientations with sample locations. |
Nine organisations, listed in the acknowledgements (five university
departments and 4 commercial organisations) sent samplers to the site,
sequentially over a period of 3 months, between October 1997 and January
1998. The samplers’ aim for the project was given to each participant
1 month before the first participant commenced sampling. This aim was to
spatially delineate regions of soil containing 171 μg
g−1 of barium. It was intended that the samplers should
collect soil samples using a common protocol specified by the organisers.
The organisers would then analyse the soils and use the results to arrive
at an estimate of the location of the area of contamination, using two
different methodologies. This spatial delineation step was not required of
the participants in this trial so as to maintain the independence of a
sampling proficiency test, which was also being undertaken by the same
participants. Participants visited the site independently and did not
observe any other participant during the sampling exercise. Holes left from
sampling were closed to remove any visible trace of the sampling that might
affect later participants.
Participants were asked to use the equipment provided by the organisers, although it was optional as to how much of the equipment was used. This allowed participants to use their own judgement. In this way, the CTS was intended to give a realistic picture of the usual practice of samplers in interpreting a sampling protocol. Some of the interpretation of each participant was recorded using a video camera to make comparisons between the participants’ sampling techniques.
The soil samples from all participants were collected by the organisers for sample preparation and chemical analysis at Imperial College. The soil samples were dried at 65 °C, then dissagregated to liberate the natural grain size using a pestle and mortar. The soil size fraction passing through a 2 mm stainless steel sieve was ground in a chrome–steel pot within a swing mill to a grain size of <75 μm, to produce the laboratory sample. Analytical test portions of the laboratory samples were digestedin a mixture of nitric, perchloric and hydrofluoric acids11 and analysed by ICP-AES for barium. This analytical method was chosen because it performed acceptably for reference materials when judged against their certified reference values, and it was sufficiently rapid and inexpensive. All the samples from the collaborative trial in sampling were analysed in randomised order within nine analytical batches. Analytical quality control procedures were used to determine analytical precision and bias, and to test if there was any significant differences in the quality of measurements between the batches.
Certified reference materials (NIST 2709 and 2711) were analysed in duplicate, at random positions between each batch, to estimate analytical bias. House reference materials (HRM 1 and HRM 2) and a special house reference material (HRM 32) spiked with BaSO4 were analysed at random positions within each batch to estimate between-batch precision. The BaSO4 used in the preparation of HRM 32 reference material was the same as that used to prepare the RST. Measurements were corrected for Ba where significant concentrations were in reagent blanks. Analytical duplicates were used to estimate analytical precision. Sample duplicates were collected with a separation distance of 20 cm at 8 sample locations to represent potential surveying error. The analysis of variance (ANOVA) method was used to estimate the measurement uncertainty across the whole site, and for two different sampling locations.
The variances to be described for sampler number 5, protocol orientation C (abbreviated as 5C), were typical of all the participants’ results. The component standard deviations measured using robust analysis of variance (ANOVA)13 for this single sampler using a single sampling protocol were sgeochemical = 1.10 μg g−1, ssampling = 1.96 μg g−1 and sanalysis = 1.53 μg g−1. The measurement uncertainty (smeas) was calculated from the sum of squares from sampling and analysis, and was calculated as 2.49 μg g−1 (1 s). The expanded uncertainty for 95% confidence, U, was 4.88 μg g−1 (1.96 s). This gives a relative measurement uncertainty, U, of 3.34%, expressed relative to the mean Ba concentration of 146 μg g−1.
The RST used in this investigation contained a large proportion (92% by area) of relatively homogenous background concentrations of uncontaminated soil. The majority of the sampling duplicates were therefore collected away from the hot spot. This resulted in a low value of measurement uncertainty (typically U of 3.34%), as there was very little small-scale variability between the sample duplicates in this homogenous soil (154 ± 11 μg g−1 at 95% confidence). Therefore, this estimate of uncertainty is not simply a characteristic of the sampling and analysis procedures, but also of the site heterogeneity. This single estimate of measurement uncertainty can best be considered as a lower limit of measurement uncertainty that applies to such relatively homogeneous sites. Applying this value to all sample locations (including those on a hot spot) is therefore considered to underestimate the measurement uncertainty in sample locations within areas of higher geochemical variability.
The two sampling locations (numbered 3 and 13) were sampled twice by all nine samplers (once in each protocol orientation). Each sample was analysed once for Ba by ICP-AES and all of the measurements interpreted using classical analysis of variance (ANOVA) to estimate precision (as standard deviations) under sampling repeatability (sr(s) = s1) and reproducibility (sR(s) = s12 + s22) conditions. The symbols s1 and s2 refer to the within- and between-group standard deviations. Classical, rather than robust, statistics were applied in this case as there was no intention of focusing on the main population and down-weighting outlying values.
The ISO definition of analytical bias is the difference between the expectation of the test results and an accepted reference value.3 Sampling bias can therefore be defined, by analogy, as the difference between the mean of the population of sampling measurements and the assigned value of the sampling target. The assigned value for the RST was derived from the spiked concentration of barium sulfate added to the soil. The confidence limits (at 95% confidence) for the assigned value were based on the standard deviation of the measured concentration results. The mean concentrations for locations 3 and 13 for each sampler were compared with the respective assigned concentration value to estimate the sampling bias. The consensus mean was also compared with the assigned mean in order to determine if the protocol gave rise to an overall bias in concentration.
Location 13 is situated within Ring 4 of the hot spot and has an
assigned concentration of 468 ± 451 μg g−1 at
95% confidence.7 The large uncertainty on
this value was introduced inadvertently by the heterogeneous mixing in this
zone of the hot spot. None of the measured Ba concentrations from the nine
participants differed significantly from the assigned value for this
location (as shown in Table 1). Even if
the −6% analytical bias is allowed for, none of the measurements
shows a significant bias against the assigned value. The large
heterogeneity of Ba within this hot spot ring (RSD of 85%) made identifying
sampling bias difficult using the assigned value at this location. This
indicates the need for a more homogenous sampling target, or concentrations
within the hot spot being much higher above the background population. The
performance of the sampling protocol was judged against the consensus
value. The mean Ba concentration over all the participants (504 ±
376 μg g−1 at 95% confidence) was found to be not
significantly different from the assigned value. The uncertainty of a
single sampler identifying a single sampling location (sampling
repeatability) was estimated at 60.08% at 95% confidence (using the
equation 196 ×
swithin/).
Similarly, the uncertainty of multiple samplers identifying a single
sampling location (sampling reproducibility) was estimated as 85.79% at 95%
confidence. There was no statistically significant difference in
within-sampler variance compared with between-sampler variance for location
13. For site investigations requiring lower uncertainty, such as those
where the mis-classification of the land could cause unacceptable financial
losses, then one way of reducing this uncertainty would be by the
collection of larger or composite samples.
Location 13 concentration/μg g−1 | ||||
---|---|---|---|---|
Participant number | Sample 1 | Sample 2 | Average concentration/μg g−1 | Absolute biasa/μg g−1 |
a Where the assigned concentration is 468 ± 451 μg g−1 at 95% confidence. | ||||
1 | 472 | 869 | 671 | 203 |
2 | 832 | 641 | 737 | 269 |
3 | 255 | 318 | 287 | −182 |
4 | 146 | 256 | 201 | −267 |
5 | 393 | 294 | 344 | −125 |
6 | 790 | 629 | 710 | 242 |
7 | 398 | 576 | 487 | 19 |
8 | 748 | 374 | 561 | 93 |
9 | 592 | 484 | 538 | 70 |
![]() |
Location 3 is situated in the background population and has an assigned
concentration of 154 ± 11 μg g−1 at 95%
confidence. None of the nine participants measured statistically different
concentrations of Ba from that assigned for location 3 (as shown in
Table 2). No significant difference
between the consensus of the nine participants and assigned value was
evident, even when the –6% analytical bias was taken into account.
The performance of the sampling protocol judged against the consensus value
showed the mean measured concentration (145 ± 2.59 μg
g−1 at 95% confidence) to be not significantly different
from the assigned concentration. The uncertainty of a single sampler
quantifying the concentration at a single sampling location (sampling
repeatability) was estimated at 3.77% at 95% confidence (using the equation
196 ×
swithin/). For multiple
samplers (sampling reproducibility) this uncertainty was the same (3.77%)
as there was no extra variance between-samplers. These results contradict
those found in analogous situations encountered in collaborative trials in
chemical analysis, where inter-laboratory variations tend to be greater
than those within a laboratory. This is due to the relatively homogenous
background population of Ba and all samples being analysed within one
laboratory. It can therefore be concluded that the protocol is fit for
estimating the background concentrations of barium to within 3.77% of the
consensus concentration at this location.
Location 13 concentration/μg g−1 | ||||
---|---|---|---|---|
Participant number | Sample 1 | Sample 2 | Average concentration/μg g−1 | Absolute biasa/μg g−1 |
a Where the assigned concentration is 154 ± 11 μg g−1 at 95% confidence. | ||||
1 | 144 | 148 | 146 | −8 |
2 | 145 | 147 | 146 | −8 |
3 | 145 | 143 | 144 | −10 |
4 | 142 | 145 | 143.5 | −10.5 |
5 | 147 | 144 | 145.5 | −8.5 |
6 | 143 | 152 | 147.5 | −6.5 |
7 | 143 | 144 | 143.5 | −10.5 |
8 | 145 | 145 | 145 | −9 |
9 | 143 | 147 | 145 | −9 |
![]() |
‘Excess cost’ = a(E − i) + b (T − i) | (1) |
![]() | ||
Fig. 2 Schematic diagram showing the false positive and false negative delineations of a hot spot, which are factors influencing the spatial scoring system for the CTS. |
Scores for this trial were produced that ranged upwards from zero. A score of zero indicates perfect spatial delineation with no excess cost. A larger score reflects greater ‘excess cost’. The fitness-for-purpose criterion of this trial has been set at a score of ⩽3 based on professional judgement. Fig. 3 shows that when the measured area of the hot spot is equal to the assigned area, only a 40% overlap area is required to achieve a satisfactory score (i.e., ⩽3) from equation 1. A score better than required (e.g., 1) could be achieved when the measured hot spot area is the same size as the assigned with an overlap of 80%. For site investigations requiring less spatial precision a FFP score of 5 may be acceptable. Such a score could be achieved with the measured hot spot size being three times that of the assigned, with an overlap of 40% for these particular values of a and b.
![]() | ||
Fig. 3 Graphical demonstration of the fitness-for-purpose scoring system used for the CTS, derived from equation 1. The score for participants varies with the area and percentage overlap measured in comparison to the assigned hot spot. Participants achieving a FFP score of ⩽3 were classed as satisfactory in this CTS. Where the measured hot spot is the same area as the assigned and has an overlap of 40%, a satisfactory score can be achieved. |
The results of the spatial delineation of the CTS data using the linear interpolation, shown as the solid line in Fig. 4, indicates a greater extent of between-orientation variability (in rows) than within-orientation variability (in columns). Protocol design orientation A shows the greatest variability in spatial delineation between-samplers out of all orientations. Comparisons of the hot spot hits (against the accepted values for each individual sampling location) indicate the varying delineations to be partially a result of heterogeneity within the outer two rings of the hot spot. Participant number 4, orientation A, (4A) is one such example, which showed no evidence of contamination (<171 μg g−1 of Ba) at 2 out of 3 locations within the hot spot. The fitness-for-purpose score for each organisation’s sampling designs (given in Fig. 4) was calculated using eqn. 1. These scores showed that the linear interpolation method indicates satisfactory performance (scores ⩽3) for all but one instance (4A).
![]() | ||
Fig. 4 Spatial delineation of hot spots based on measurements made by participants in the CTS. The solid line is delineation based on linear interpolation. When compared with the assigned location of the hot spot (dashed line), the performance scores (given below each map) are mainly satisfactory (score of ⩽3) with one exception (A4). A second method of interpolation (based on joining the nearest uncontaminated sampling locations) shows a satisfactory performance for all participants. This indicates that the protocol is fit for the specified purpose of spatially delineating a single hot spot of contamination with minimal misclassification. |
Triangulation was also used to define the edge of the hot spot using the measurements from each participant, to judge the possible effect of this method on the score. The triangulation method assumes a ‘worse case’ scenario in which the soil is contaminated right up to the nearest uncontaminated sampling location (<171 μg g−1 Ba). The advantage of this methodology is that it does not make any assumptions about the spatial distribution of the barium between the sampling locations. The triangulation results were similar to that of the linear interpolation with the exception of case 4A, which was found to be fit-for-purpose in this instance. Performance scores for triangulation were, on average, 25% higher than liner interpolation, primarily because of a greater proportion of ‘false positive’ classifications.
All but one of the protocol designs (Participant 4, design orientation A) had a fitness-for-purpose score of ⩽3, indicating that the herringbone sampling protocol was fit for the purpose in identifying the true hot spot location and dimensions on this RST, with minimal misclassification. There was no significant difference in within-sampler scores compared with between-samplers scores using one-way analysis of variance. However, Fig. 4 shows that a particular protocol orientation does tend to produce a distinctive shape of the measured hot spot. This sampling target was very simple in design when compared with typical contaminated land investigations. The hot spot was perfectly circular and the site was perfectly square and flat, with no obstacles such as building foundations, mounds and trees. This closely corresponds with the idealised model assumed in the theoretical testing of this sampling protocol (Ferguson).17 A potentially more informative approach in the future would be to perform a CTS on a more realistic site with irregular hot spots and typical obstacles such as buildings, trees and topographic irregularities. The approach would then allow assessments of such protocols in more realistic circumstances.
This journal is © The Royal Society of Chemistry 2000 |