Benchmarking for Analytical Methods: The Horwitz Curve

October 1, 2005
Yvan Vander Heyden

,
Desiré L. Massart

,
Johanna Smeyers-Verbeke

LCGC Europe

LCGC Europe, LCGC Europe-10-01-2005, Volume 18, Issue 10
Page Number: 528–531

Analytical chemists are concerned with the quality of their methods and results. An important question in this context is whether the precision of a newly developed and validated method is up to standard. In other words: is the precision of the newly developed method comparable to what could be expected? This article looks at how the Horwitz equation can answer this. It also describes the results of an extensive study involving 10000 laboratories which indicates that the relative reproducibility approximately doubles for every 100-fold decrease in concentration and that, surprisingly, it does not depend on the type of material or method.

The Horwitz curve gives an indication of the precision to be expected of a newly developed method as a function of the concentration of the analyte. It is named after W. Horwitz, a respected statistician, now retired from the FDA and very active in the AOAC, The Association of Official Analytical Chemists. Before describing the Horwitz curve, let us consider some different types of precision. The precision is a measure for the size of random errors. It measures the dispersion around the mean result and, therefore, it requires the calculation of the standard deviation of the measurement results. Precision can be determined at several levels. Repeatability is measured under repeatability conditions, meaning that the operator, the instrument and the laboratory are the same, and the time interval is kept short. These are the most favourable conditions possible and they yield the best precision, (i.e., the smallest standard deviation).

Reproducibility is defined as measured under "conditions where test results are obtained with the same method on identical test material in different laboratories with different operators using different equipment."1 It takes into account many more sources of variation than the repeatability does. These are the worst precision conditions that can occur when studying the precision of a method. It can be determined only with inter-laboratory method performance studies, colloquially known as collaborative trials.

Intermediate situations occur and give rise to an intermediate precision. They take into account more within-laboratory variations than when the precision is measured under repeatability conditions, such as the additional variation due to the measurements being performed over a longer period of time. The intermediate precision can then be seen as a measure of long-term precision in a given laboratory.

A fourth somewhat different level is the determination of robustness (sometimes also called ruggedness). It measures to what extent a procedure is affected by small, deliberate variations introduced in the procedure. If one or more of these variations are found to be responsible for a significant difference in the results, the procedure must be adapted and more strictly controlled. If not, the method is considered robust, but the variations still lead to less precise measurements and robustness can therefore be seen as a measure of the intermediate precision or the reproducibility that might be expected.

ISO uses the symbol r for repeatability and R for reproducibility. Repeatability and reproducibility are measured as the repeatability standard deviation, sr, and the reproducibility standard deviation, sR. For the intermediate precision ISO proposes the symbol I( ) with additional symbols inside the parentheses referring to the intermediate precision conditions. In this way s(TO), for example, means that the intermediate precision includes variability due to the time elapsed between measurements as well as due to the operator.

Reproducibility as a Function of Concentration

Horwitz et al.2 initially examined results of a few thousand interlaboratory collaborative studies on various commodities ranging in concentration from a few percent (salt in foods) to the ppb (ng/g) level (aflatoxin M1 in foods) but also including studies on, for example, drug formulations, antibiotics in feeds and pesticide residues. They concluded that the predicted RSDR (%) as a function of concentration is approximated by the following relationship:

where C is the concentration expressed as a dimensionless fraction (for example for a concentration of 1 μg/g, C = 10–6 g/g). In this context the predicted RSDR% is sometimes also written as σH, where the H stands for Horwitz. Equation 1 still holds for the 10000 interlaboratory studies that have been evaluated up to now.3 It states that σH approximately doubles for every 100-fold decrease in concentration, starting at 2% for C = 1. This means, for instance, that when a purity check is performed by determining the concentration of the main component, a relative reproducibility standard deviation of 2% should be expected. Table 1 shows some other expected values of σH. The graphical representation of Equation 1 is referred to as the Horwitz curve and is shown in Figure 1.

Figure 1: Relative reproducibility standard deviation RSDR as a function of concentration.

Equation (1) can be rewritten as

or as

Benchmarking with the HORRAT

A conclusion of Horwitz's study is that the RSDR only depends on the concentration and not on the nature of the analyte, the test material or the analytical method. The ratio between sR, the reproducibility obtained, and σH, the one expected from Equations 1–3, is sometimes called the HORRAT (short for Horwitz ratio).

Horwitz concludes that sR-values are suspect, when the HORRAT exceeds by more than a factor 2 what is expected from Equations 1–3. When sR is much smaller than σH it should be suspected that the collaborative method was not performed correctly and that it gives values of sR that are too optimistic. The conditions under which the study was performed and the statistical calculations should be reviewed. When sR is larger than σH by a factor 2, the method performs worse than could be hoped.

Table 1: Predicted relative reproducibility standard deviation for some concentrations (in %).

It follows that the participants in an interlaboratory study can benchmark their result with that obtained with many other such studies at a similar concentration level by using the Horwitz curve. One of the more remarkable aspects of the Horwitz curve is its generality. It probably appears strange to the analyst that the reproducibility and (as we will see later) the repeatability do not depend on the analytical method used. One of the reasons is that the method being investigated is studied intensively before an interlaboratory study is undertaken, so that as many sources of variation as possible are kept under control by including the necessary specifications in the procedure to be followed. Thus, methods difficult to control or operating too close to detection limits, etc, will be not be subjected to interlaboratory studies.

Repeatability as a Function of Concentration

Another interesting result of the Horwitz study is that the corresponding repeatability measure (RSDr) is generally one-half to two-thirds of the reproducibility measure RSDR. These repeatability figures are based on the repeatability of the laboratories participating in the interlaboratory studies that formed the basis of Horwitz's study. They are determined under the strict rules of ISO or the AOAC/IUPAC protocols for such studies. Repeatabilities determined by individual laboratories outside such studies often tend to underestimate the variation and therefore yield too optimistic results.

Extensions of the Horwitz Curve

In a later study Horwitz3 showed that, for very low concentrations, the estimates of sR are somewhat better, (i.e., lower, than expected from the above equations). He concluded that for such concentrations σH is constant at about 1/3 C. Thompson4 came to a similar conclusion, in the sense that he finds an invariant σH of about 1/4 to 1/5 C below 10 ppb. Moreover, he also found a divergence from the Horwitz equation at high C (more than 13%) and proposed the following equation:

Equations (4) and (5) appear to have gained less general acceptance than the Equations 1–3.

Uncertainty and the Horwitz Curve

Uncertainty is defined by Eurachem as "a parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand".5 There are different ways of characterizing the dispersion as we will explain in one of the following columns. In this column, we will assume that the parameter in question is a multiple of a standard deviation. It is preferable that this should be a reproducibility standard deviation, although the intermediate precision and the repeatability can also be used in certain cases.

The idea is to create an interval around the analytical result such that there is a 95% certainty that the true value is encompassed in it. This interval, known as the expanded uncertainty, is obtained as result ± 2sR. The value of σH derived from the Horwitz curve can be used as a best guess when sR is not known (yet). When a given maximum level of uncertainty is required for some application, for which the analytical method still has to be developed, the analyst can use σH to evaluate the probability that the method will prove fit for its purpose.

Column editor Desiré L. Massart is an emeritus professor at the Vrije Universiteit Brussel, Belgium and performs research on chemometrics in process analysis and its use in the detection of counterfeiting products or illegal manufacturing processes.

Johanna (An) Smeyers-Verbeke is a professor at the Vrije Universiteit Brussel, Belgium and is head of the department of analytical chemistry and pharmaceutical technology.

Yvan Vander Heyden is a professor at the same university and heads a research group on chemometrics and separation science.

References

1. ISO 5725 –1:1994, Accuracy (trueness and precision) of measurement methods and results – Part 1.

2. W. Horwitz, L.R. Kamps and K.W. Boyer, J. Assoc. Off. Anal. Chem., 63, 1344 (1980).

3. W. Horwitz, J. Assoc. Off. Anal. Chem., 86, 109 (2003).

4. M. Thompson, AMC Technical Brief no. 17, (2004). http://www.rsc.org/pdf/amc/brief17.pdf.

5. Eurachem, Quantifying uncertainty in analytical measurement, 2nd ed., (2000). http://www.measurementuncertainty.org