Benchmarking for Analytical Methods: The Horwitz Curve

Massart,Desire;Smeyers-Verbeke,Johanna;Vander Heyden,Yvan;

Benchmarking for Analytical Methods: The Horwitz Curve

October 1, 2005

By Desiré L. Massart
Johanna Smeyers-Verbeke

Article

LCGC Europe

LCGC EuropeLCGC Europe-10-01-2005

Volume 18

Issue 10

Pages: 528–531

Analytical chemists are concerned with the quality of their methods and results. An important question in this context is whether the precision of a newly developed and validated method is up to standard. In other words: is the precision of the newly developed method comparable to what could be expected? This article looks at how the Horwitz equation can answer this. It also describes the results of an extensive study involving 10000 laboratories which indicates that the relative reproducibility approximately doubles for every 100-fold decrease in concentration and that, surprisingly, it does not depend on the type of material or method.

The Horwitz curve gives an indication of the precision to be expected of a newly developed method as a function of the concentration of the analyte. It is named after W. Horwitz, a respected statistician, now retired from the FDA and very active in the AOAC, The Association of Official Analytical Chemists. Before describing the Horwitz curve, let us consider some different types of precision. The precision is a measure for the size of random errors. It measures the dispersion around the mean result and, therefore, it requires the calculation of the standard deviation of the measurement results. Precision can be determined at several levels. Repeatability is measured under repeatability conditions, meaning that the operator, the instrument and the laboratory are the same, and the time interval is kept short. These are the most favourable conditions possible and they yield the best precision, (i.e., the smallest standard deviation).

Reproducibility is defined as measured under "conditions where test results are obtained with the same method on identical test material in different laboratories with different operators using different equipment."¹ It takes into account many more sources of variation than the repeatability does. These are the worst precision conditions that can occur when studying the precision of a method. It can be determined only with inter-laboratory method performance studies, colloquially known as collaborative trials.

Intermediate situations occur and give rise to an intermediate precision. They take into account more within-laboratory variations than when the precision is measured under repeatability conditions, such as the additional variation due to the measurements being performed over a longer period of time. The intermediate precision can then be seen as a measure of long-term precision in a given laboratory.

A fourth somewhat different level is the determination of robustness (sometimes also called ruggedness). It measures to what extent a procedure is affected by small, deliberate variations introduced in the procedure. If one or more of these variations are found to be responsible for a significant difference in the results, the procedure must be adapted and more strictly controlled. If not, the method is considered robust, but the variations still lead to less precise measurements and robustness can therefore be seen as a measure of the intermediate precision or the reproducibility that might be expected.

ISO uses the symbol r for repeatability and R for reproducibility. Repeatability and reproducibility are measured as the repeatability standard deviation, s_r, and the reproducibility standard deviation, s_R. For the intermediate precision ISO proposes the symbol I_{( )} with additional symbols inside the parentheses referring to the intermediate precision conditions. In this way s_(TO), for example, means that the intermediate precision includes variability due to the time elapsed between measurements as well as due to the operator.

Reproducibility as a Function of Concentration

Horwitz et al.² initially examined results of a few thousand interlaboratory collaborative studies on various commodities ranging in concentration from a few percent (salt in foods) to the ppb (ng/g) level (aflatoxin M1 in foods) but also including studies on, for example, drug formulations, antibiotics in feeds and pesticide residues. They concluded that the predicted RSD_R (%) as a function of concentration is approximated by the following relationship:

where C is the concentration expressed as a dimensionless fraction (for example for a concentration of 1 μg/g, C = 10^–6 g/g). In this context the predicted RSD_R% is sometimes also written as σ_H, where the H stands for Horwitz. Equation 1 still holds for the 10000 interlaboratory studies that have been evaluated up to now.³ It states that σ_H approximately doubles for every 100-fold decrease in concentration, starting at 2% for C = 1. This means, for instance, that when a purity check is performed by determining the concentration of the main component, a relative reproducibility standard deviation of 2% should be expected. Table 1 shows some other expected values of σ_H. The graphical representation of Equation 1 is referred to as the Horwitz curve and is shown in Figure 1.

Figure 1: Relative reproducibility standard deviation RSDR as a function of concentration.

Equation (1) can be rewritten as

or as

Benchmarking with the HORRAT

A conclusion of Horwitz's study is that the RSD_R only depends on the concentration and not on the nature of the analyte, the test material or the analytical method. The ratio between s_R, the reproducibility obtained, and σ_H, the one expected from Equations 1–3, is sometimes called the HORRAT (short for Horwitz ratio).

Horwitz concludes that s_R-values are suspect, when the HORRAT exceeds by more than a factor 2 what is expected from Equations 1–3. When s_R is much smaller than σ_H it should be suspected that the collaborative method was not performed correctly and that it gives values of s_R that are too optimistic. The conditions under which the study was performed and the statistical calculations should be reviewed. When s_R is larger than σ_H by a factor 2, the method performs worse than could be hoped.

Table 1: Predicted relative reproducibility standard deviation for some concentrations (in %).

It follows that the participants in an interlaboratory study can benchmark their result with that obtained with many other such studies at a similar concentration level by using the Horwitz curve. One of the more remarkable aspects of the Horwitz curve is its generality. It probably appears strange to the analyst that the reproducibility and (as we will see later) the repeatability do not depend on the analytical method used. One of the reasons is that the method being investigated is studied intensively before an interlaboratory study is undertaken, so that as many sources of variation as possible are kept under control by including the necessary specifications in the procedure to be followed. Thus, methods difficult to control or operating too close to detection limits, etc, will be not be subjected to interlaboratory studies.

Repeatability as a Function of Concentration

Another interesting result of the Horwitz study is that the corresponding repeatability measure (RSD_r) is generally one-half to two-thirds of the reproducibility measure RSD_R. These repeatability figures are based on the repeatability of the laboratories participating in the interlaboratory studies that formed the basis of Horwitz's study. They are determined under the strict rules of ISO or the AOAC/IUPAC protocols for such studies. Repeatabilities determined by individual laboratories outside such studies often tend to underestimate the variation and therefore yield too optimistic results.

Extensions of the Horwitz Curve

In a later study Horwitz³ showed that, for very low concentrations, the estimates of s_R are somewhat better, (i.e., lower, than expected from the above equations). He concluded that for such concentrations σ_H is constant at about 1/3 C. Thompson⁴ came to a similar conclusion, in the sense that he finds an invariant σ_H of about 1/4 to 1/5 C below 10 ppb. Moreover, he also found a divergence from the Horwitz equation at high C (more than 13%) and proposed the following equation:

Equations (4) and (5) appear to have gained less general acceptance than the Equations 1–3.

Uncertainty and the Horwitz Curve

Uncertainty is defined by Eurachem as "a parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand".⁵ There are different ways of characterizing the dispersion as we will explain in one of the following columns. In this column, we will assume that the parameter in question is a multiple of a standard deviation. It is preferable that this should be a reproducibility standard deviation, although the intermediate precision and the repeatability can also be used in certain cases.

The idea is to create an interval around the analytical result such that there is a 95% certainty that the true value is encompassed in it. This interval, known as the expanded uncertainty, is obtained as result ± 2s_R. The value of σ_H derived from the Horwitz curve can be used as a best guess when s_R is not known (yet). When a given maximum level of uncertainty is required for some application, for which the analytical method still has to be developed, the analyst can use σ_H to evaluate the probability that the method will prove fit for its purpose.

Column editor Desiré L. Massart is an emeritus professor at the Vrije Universiteit Brussel, Belgium and performs research on chemometrics in process analysis and its use in the detection of counterfeiting products or illegal manufacturing processes.

Johanna (An) Smeyers-Verbeke is a professor at the Vrije Universiteit Brussel, Belgium and is head of the department of analytical chemistry and pharmaceutical technology.

Yvan Vander Heyden is a professor at the same university and heads a research group on chemometrics and separation science.

References

1. ISO 5725 –1:1994, Accuracy (trueness and precision) of measurement methods and results – Part 1.

2. W. Horwitz, L.R. Kamps and K.W. Boyer, J. Assoc. Off. Anal. Chem., 63, 1344 (1980).

3. W. Horwitz, J. Assoc. Off. Anal. Chem., 86, 109 (2003).

4. M. Thompson, AMC Technical Brief no. 17, (2004). http://www.rsc.org/pdf/amc/brief17.pdf.

5. Eurachem, Quantifying uncertainty in analytical measurement, 2nd ed., (2000). http://www.measurementuncertainty.org

Articles in this issue

How To Buy Gas Calibration Mixes

Asymmetric Flow Field-Flow Fractionation (AF4) with Multi-Angle Light Scattering (MALS) for High-Throughput Protein Refolding

Benchmarking for Analytical Methods: The Horwitz Curve

Evaluation of Fatty Acids as Biochemical Markers for Source Identification of Indian Opium

Broad Peaks