Analytical Methods Committee, AMCTB No. 74

Received
19th May 2016

First published on 5th July 2016

z-Scores were devised to provide a transparent but widely-applicable scoring system for participants in proficiency tests for analytical laboratories. The essential idea is to provide an appropriate scaling of the difference between a participant’s result and the ‘assigned value’ for the concentration of the analyte. Interpretation of a z-score is straightforward but some aspects need careful attention to avoid misconception. Over time several related scores have been devised to cope with a diversified range of applications. The main types of score have recently been codified in ISO 13528 (2015).

Proficiency tests are regular interlaboratory studies designed to identify a noteworthy inaccuracy in any participant's result. Wherever possible, results are converted into scores, the purpose of which is to provide a basis for instigating remedial action where necessary. Initially there was a diversity of scoring methods based on different arbitrary transformations of the result. However, it was soon evident that a single straightforward scoring method would allow analysts to interpret a score uniformly across different test materials, analytes, concentration ranges, and measurement principles, even across different proficiency testing schemes.

The widespread proliferation of proficiency testing in the wake of accreditation, however, generated the need for some small variations on the z-scoring theme to cope with different applications. As an outcome there are now z-scores, z′-scores, zeta (ζ) scores, z_{L}-scores, D-scores and E_{n} scores. Successive authors and documents have used old names for new meanings and new names and symbols for old meanings. That's confusing. So let's have a quick look at the current state of play, as laid down in ISO 13528 (2015).^{4}

The assigned value is the provider's best available estimate of the true quantity value, often a participant consensus. An assumption underlying the z-score is that the uncertainty on the assigned value is negligible in comparison with that on the participant's result. The SDPT (originally called the ‘target value’) is best taken as the standard uncertainty that is regarded as optimally fit for purpose in the relevant sector (see AMCTB No. 68) and must be known in advance by the participants. Other options for evaluating the SDPT are recognised by the ISO standard but all have one or more practical shortcomings.

Zeta scores increase as either the deviation from the assigned value increases or as the reported uncertainty gets smaller, so a larger zeta score can indicate a large error, an underestimated uncertainty, or both. This ambiguity leaves the zeta score open to improper manipulation should participants choose to reduce their score by overstating u(x).

E
_{n} is essentially similar to the zeta score but replaces the standard uncertainties with expanded uncertainties. E_{n} scores are therefore about half of the corresponding zeta scores, so a value outside ±1 is usually taken as questionable. E_{n} is used more in calibration laboratories than analytical laboratories.

It is essential to emphasise that interpreting z-scores thus does not assume the participants' results in a round are normally distributed. That is a common misconception among statisticians and regulators unfamiliar with proficiency testing. The interpretation of z-scores relies rather on the idea that, if all the laboratories performed similarly and exactly in accordance with the requirement set by the assigned value and the SDPT, their results would be approximately normally distributed with mean x_{pt} and standard deviation σ_{pt}. z-Scores would then show a normal distribution with zero mean and unit standard deviation. Notice that this does not assume that the actual participant results are normally distributed; only that idealised performance from all participants would have led to a standard normal distribution of scores. So over time, z-scores compare a participant with the PT provider's criterion of good performance.

A simple and effective long-term view for a participant is provided by plotting successive z-scores on a control chart based on a zero mean and unit standard deviation, either a Shewhart chart or, better still, a range chart (see AMCTB Nos. 12 and 16).

Michael Thompson (Birkbeck University of London)

This Technical Brief was prepared for the Analytical Methods Committee and approved on 14/05/16.

- Analytical Methods Committee, Analyst, 1992, 117, 97–104 RSC.
- M. Thompson and R. Wood, Pure Appl. Chem., 1993, 65, 2123–2144 CrossRef CAS.
- M. Thompson, S. L. R. Ellison and R. Wood, Pure Appl. Chem., 2006, 78, 145–196 CrossRef CAS.
- ISO 13528, 2015.

This journal is © The Royal Society of Chemistry 2016 |