Michael
Thompson
School of Biological and Chemical Sciences, Birkbeck University of London, Malet Street, London WC1E 7HX, UK. E-mail: m.thompson@bbk.ac.uk
The normal distribution, under its popular name ‘the bell curve’, has attracted adverse criticism in recent years, owing mainly to being prominently featured in some well-publicised books on socio-economic topics. A number of conclusions in these fields, ranging from questionable to spectacularly incorrect, have been drawn from an ill-considered use of the bell curve as a statistical tool. Those application sectors do not directly impinge on chemical measurement but the toxic fallout has been widespread and likely to bias the reading public against inferences based on the normal distribution. It is therefore essential that analytical chemists should be able to recognise appropriate and inappropriate uses of the normal distribution and to defend their decisions adequately when working alongside those unfamiliar with measurement and statistics.
More recently, a best-selling book, The Black Swan,2 largely about econometrics, appeared to heap indiscriminate scorn on the bell curve. Chapter 15, for instance, is entitled The Bell Curve, That Great Intellectual Fraud, and the foreword begins ‘Forget everything you heard in college statistics or probability theory’. The trouble with this lurid kind of writing is that it engenders a widespread and broadly-targeted scorn for the normal distribution in general, not just where it is misapplied.
As an outcome of this publicity the normal distribution, as a basis for inference, has become broadly suspect—indeed you might almost say politically incorrect—in the minds of a substantial proportion of the population. That’s not likely to affect analytical chemists directly, but it could have a pernicious effect upon end-users of analytical data who are not statistically-minded—manufacturers, health workers, enforcement agencies, lawyers, politicians—by undermining their confidence in inferences based on the normal distribution and thereby laying sound decisions open to undue criticism.
Analytical chemists are most likely to be concerned with the application of the normal distribution to variation in the measurement process itself. Chemical measurement is nearly always undertaken to inform a decision such as “should we take action on this possible non-compliance?”. We need to ask whether inferences supporting such a decision are sound and that, of course, depends amongst other things on whether the normal distribution is both appropriate and correctly used. The answer depends to a degree on the conditions under which replicated measurements are made. Different conditions of replication—repeatability, run-to-run, and reproducibility—have to be considered separately, as shown below (see also AMC Technical Briefs no. 70 (ref. 3)).
In addition, but crucially, variation in replicate measurements on a single object must be sharply distinguished from variation in composition among different objects of the same type. For instance, Fig. 1 shows the variation of the measured results for copper in a single sediment reference material analysed in 158 successive runs of a procedure in one laboratory. The distribution does not differ significantly from the normal distribution. Fig. 2 shows in contrast the results for copper obtained from the analysis of 49305 different samples of stream sediment, a distribution that deviates grossly from a normal distribution (and, for those who might be wondering, it also deviates grossly from a lognormal distribution!). This strongly skewed behaviour is often observed in distributions of concentration of trace constituents in collections of objects of the same kind.
Fig. 2 Results for copper in 49305 samples of sediment, taken at an average density of one per square mile, from the whole of England and Wales. Results above 100 mg kg−1 not shown. |
For analysts perhaps the most familiar encounter with the normal assumption under repeatability conditions is in finding confidence limits for a mean of several results (and related calculations such as p-values for significance tests) via the t-distribution. This usage comes about in questions such as ‘does this instrument need recalibration?’, ‘are my results biased?’ or ‘does this driver show a prohibited level of ethanol?’. Here we are on much firmer ground, because means derived from non-normal distributions (unless they are really weird) will tend towards the normal, even when based on small numbers of observations. Even so, we should be wary of over-interpreting small probabilities in the tails of the normal distribution. The purpose of significance testing is to warn us against making unsound inferences, not to calculate tiny probabilities from insufficient data. So with four results, 95% confidence limits, or p-values down to about 0.05, will be fairly safe. A p-value lower than 0.01, however, should be regarded as providing no better than order-of-magnitude indications of probability, if that. (And, of course, you have to remember what p-values mean exactly, but that’s another story!).
The proficiency test provider must cope with this variety of outcomes using experience and judgement in the selection of the statistical tools best suited to find a consensus, which may be a median, robust mean or mode. Accredited proficiency testing schemes, of course, will have access to statistical experts who can make the appropriate decisions and defend their choices. For participants in a proficiency test, however, the question of normality of the results does not arise—the interpretation of z-scores does not depend on an assumption that the participants’ results in a round follow the normal distribution (see AMC Technical Briefs no. 74 (ref. 4)).
In some statistical applications to chemical measurement it may be necessary to test for deviation from normality, and AMC Technical Briefs no. 82 (ref. 5) in this issue of Analytical Methods provides an account of the available methods.
This journal is © The Royal Society of Chemistry 2017 |