Você está na página 1de 12

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/221954344

Limitations of the application of the Horwitz


equation

Article in TrAC Trends in Analytical Chemistry · December 2006


DOI: 10.1016/j.trac.2006.11.002

CITATIONS READS

22 1,846

2 authors:

Thomas P J Linsinger Ralf D Josephs


European Commission Bureau International des Poids et Mesures
121 PUBLICATIONS 2,346 CITATIONS 72 PUBLICATIONS 895 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

FP7 NanoLyse “Nanoparticles in Food: Analytical methods for detection and characterization”, Jan
2010 – Sep 2013, grant agreement no. 245162 View project

Peptide/Protein Purity View project

All content following this page was uploaded by Thomas P J Linsinger on 09 October 2017.

The user has requested enhancement of the downloaded file.


Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trends

Limitations of the application


of the Horwitz equation
Thomas P.J. Linsinger, Ralf D. Josephs

We revisit the basic assumptions of the Horwitz equation using the example precision predicted by the Horwitz equa-
of mycotoxin assays. Prediction intervals from the Horwitz equation often tion for a measuring method at that
span one order of magnitude. Including this variation in calculation of Horrat particular level of analyte and is calcu-
values would lead to a range of values exceeding 2, which is often used as a lated as:
criterion to assess interlaboratory comparisons, so we question the suitability
of Horrat value for this purpose. In addition, available analytical data show Horrat ¼ RSDR; measured =RSDR; Horwitz predicted
significant improvement in reliability over time, which casts serious doubts ð2Þ
on the applicability of the Horwitz equation for current analytical methods.
A Horrat value of 1 indicates satisfactory
We discuss the use of the Horwitz equation in the analytical laboratory,
interlaboratory precision, whereas a value
and conclude that it is not suitable for estimating uncertainties, as required
of 2 indicates unsatisfactory precision (i.e.
by ISO 17025.
one that is too variable for most analytical
The Horwitz equation can be a valuable summary of historical data of
purposes or where the variation obtained
analytical performance. However, it should not be used as a performance
is greater than that expected for the type
criterion due to:
of method employed according to
 shortcomings of the basic model;
Horwitz).
 uncertainty in the values determined using it; and,
Furthermore, the values predicted from
 its incompatibility with accepted methods for the determination of
the Horwitz equation have been used in
measurement uncertainty, as required by ISO 17025.
legislation to set acceptance limits for
We recommend that, instead of using the Horwitz equation, there should
analytical methods (e.g. [3]). It has even
be a proper identification of all components of uncertainty of measurement
been suggested recently to use the results
and reasonable estimation, as stipulated by the ‘‘Guide to the Expression of
from the Horwitz equation as an estima-
Uncertainty in Measurements’’ (GUM).
tion of measurement uncertainties [4].
ª 2006 Elsevier Ltd. All rights reserved.
Subsequent analysis of more datasets by
Keywords: Horrat value; Horwitz equation; Interlaboratory comparison; ISO 17025; Horwitz himself and others [5–7] fre-
Measurement uncertainty; Quality assurance quently showed significant deviations
from the values predicted by the original
1. Introduction Horwitz function.
Thomas P.J. Linsinger*
In this article, we will discuss some basic
EC-JRC, Institute for Reference
Materials and Measurements In 1980, Horwitz et al. published an assumptions of the Horwitz equation and
(IRMM), Retieseweg 111, evaluation of 1000 interlaboratory com- will compare results from mycotoxin
B-2440 Geel, Belgium parisons that led them to conclude that interlaboratory studies spread over more
there is a fixed relationship between ana- than 30 years. Furthermore, we will point
Ralf D. Josephs
lyte level and reproducibility standard to the error made by using regression
Bureau International des Poids
et Mesures (BIPM), Pavillon de deviation (RSDR) [1,2]. According to this parameters in the prediction of future
Breteuil, F-92312 Sèvres Cedex, analysis, the relationship between RSDR events. Finally, we will show why use of
France and the analyte level c is: the Horwitz equation as a performance
criterion contradicts modern practice in
RSDR ½% ¼ 2ð10:5log10 cÞ ð1Þ
quality management and is not compliant
irrespective of the kind of analyte, matrix with guides to the estimation of measure-
or method. Equation (1) has been widely ment uncertainties, such as the Eurachem
*
used to assess the quality of interlabo- Guide ‘‘Quantifying Uncertainty in Ana-
Tel.: +32 14 571 956;
Fax: +32 14 571 548;
ratory comparisons using the Horrat lytical Measurement’’ [8] and the ‘‘Guide
E-mail: thomas.linsinger@ value, which gives a comparison of the to the Expression of Uncertainty in
ec.europa.eu actual precision measured with the Measurements’’ (GUM) [9].

0165-9936/$ - see front matter ª 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.trac.2006.11.002 1125
Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006

Mycotoxin data have been used because the original Table 1. Regression parameters of the mycotoxin data from [5]
RSDR values are available in the literature and more
subsequent data are available for this group of analytes. Parameter Value
However, this does not limit the generality of our con- Slope ± s 1.291 ± 0.087
clusions, as data for other analyte groups show similar Intercept ± s 4.66 ± 3.01
scatter. Coefficient of determination (r2) 0.218
Standard error of the estimation (sy x) 26.80

2. Mathematical limitations of the Horwitz


equation not significantly different from zero. This difference is
also visible in Fig. 1, where the line from the classical
In his original paper and in some of the follow-up papers, Horwitz equation is completely outside the 95%-confi-
Horwitz demonstrated that the concentration level is the dence band of the regression of the mycotoxin assays.
most important variable in explaining the reproducibility However, more revealing is the coefficient of determi-
of interlaboratory-comparison studies. However, he nation of 0.22. This indicates that only 22% of the
never stated that all variation can be explained by this variance of the RSDR can be explained by the Horwitz
parameter, which is a crucial prerequisite for using the equation. Although concentration certainly influences
equation as a performance criterion. the reproducibility, nearly 80% of the variance remains
Database 1 from Horwitzs paper on mycotoxin assays independent of 2ð10:5log10 cÞ and is not explained by the
[5] was transferred to Statistica 7.0 in order to demon- Horwitz equation. This poor fit is also reflected in the
strate this difference. A regression line of RSDR versus very broad prediction bands in Fig. 1.
2ð10:5log10 cÞ was calculated (Fig. 1). The regression This does not matter as long as the evaluation stopped
parameters were calculated and are shown in Table 1. at the regression (i.e. if the conclusion had been that the
Statistical regression reveals that data are homosce- parameter ‘‘concentration’’ significantly influences
dastic and confirms the former statement of Horwitz comparability of results). However, the Horwitz equation
et al. [5] that the mycotoxin assays in the period 1968– has been abused to predict RSDR values and to set per-
1992 show a higher variability than predicted from the formance criteria. To illustrate the fallacy of setting
Horwitz equation. The slope of the regression line RSDR regression equal to prediction, RSDR values for three
vs. 2ð10:5log10 cÞ is significantly different from 1 on a 95%- different concentrations were calculated based on the
confidence level (two-sided t-test), while the intercept is regression parameters shown in Table 1. Furthermore,

RSDR for mycotoxin assays


300

250

200

150
RSDR

100

50

-50
0 10 20 30 40 50 60 70 80 90
(1-0.5logc)
2
1 mg/kg 100 µg/kg 10 µg/kg 1 µg/kg 100 ng/kg

Figure 1. Mycotoxin data from [5]. Solid line: regression line for mycotoxins assays; inner dotted line: 95%-confidence interval of the regression
line; outer dotted line: 95%-prediction interval for new observations; dashed line: regression line of the traditional Horwitz equation.

1126 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trends

the half-width of the 95%-prediction intervals (PIs) for predicting an RSDR of 87 ± 52% (0.1 lg/kg) can be
the respective RSDR values were calculated according to doubted.
Equation (3): While the prediction error makes the use of Horrat
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi values to judge the quality of intercomparisons doubtful,
2
nþ1 ðx  xÞ it invalidates its use for setting performance criteria, as
PI ¼ t0:95;n2  syx  þP 2 ð3Þ
n ðxi  xÞ the predictive uncertainty should be taken into consid-
eration. Two examples demonstrate this:
syx standard error of the estimation
 rather than stating ‘‘at a mass fraction of 10 lg/kg,
n number of data
RSDR should be 46% or better’’, the correct statement
x x-value for which the prediction interval is esti-
should be ‘‘RSDR should be 46 ± 53%’’; and, simi-
mated
larly,
x average x+value of the regression line
 for a concentration of 200 lg/kg, the correct state-
xi individual x-values of the regression line
ment should be ‘‘RSDR should be 31 ± 52%’’.
The limitations of this approach are obvious from the
These prediction limits are the limits between which an
prediction bands, as shown in these examples. It
RSDR for a given concentration can be expected. Pre-
furthermore suggests that ‘‘anything goes’’, a view that
dicted RSDR values and their respective prediction
is unlikely to be taken by any legislator. If this is why the
intervals for three different concentrations are shown in
prediction intervals have not been used so far, one
Table 2.
should not forget that not mentioning the uncertainty of
By definition, PIs cannot be higher than their corre-
the prediction does not eliminate it – it is just invisible.
sponding RSDR values because this would result in
The Horwitz equation therefore does not allow method
negative RSDR values. The occurrence of such cases
performance to be predicted with any reasonable
(e.g., Table 2 for 10 lg/kg) clearly demonstrates the
certainty.
mathematical limitations of the Horwitz approach.
In general, severe deviations from the predicted RSDR
value can occur, as shown in Table 2. For a concen-
3. Methodological limitations of the Horwitz
tration of 15 lg/kg, more than 1 out of 20 intercom-
equation
parisons will have an RSDR of more than twice the
predicted value. Transformed into the Horrat value, 1
In addition to the mathematical limitations, the Horwitz
out of 20 intercomparisons will have a Horrat value >2,
approach disregards other crucial prerequisites.
which is generally regarded as unsatisfactory. The situ-
Analytes, matrices, methods and time are considered as
ation is even worse for higher concentrations because
irrelevant for method reproducibility [10]. The approach
more than 12% of intercomparisons can be expected
is derived from regression analysis of many interlabo-
to have Horrat values >2 at concentrations above
ratory comparison studies, but is counter-intuitive and is
200 lg/kg.
contradicted by everyday experience in the laboratory.
The situation is apparently better at lower concen-
trations (e.g. 0.1 lg/kg), where the upper 95%-predic-
tion limit is ‘‘only’’ a factor of 1.6 above the predicted
3.1. Influence of analytes
value. However, 0.1 lg/kg corresponds to an x-value of
It is a widespread observation that the kind of analyte
64 in Fig. 1 and deviations from the predicted curve start
affects repeatability and reproducibility of a measure-
to become particularly severe at this concentration level.
ment procedure. This is even the case for closely related
Only at concentrations below 5.4 lg/kg is the 95%-
analytes, such as trace metals. Evaluation of 13 profi-
prediction interval smaller than twice the RSDR pre-
ciency tests of trace metals in water showed that the
dicted. This means that only below this concentration is
reproducibility of As is worse than that of Cd, even if the
it statistically valid to state that Horrat values >2 are
concentration levels of Cd are on average one order of
unsatisfactory. Horrat values >2 can be expected for all
magnitude below those of As [11]. Horwitz himself dis-
higher concentrations. However, the usefulness of
cussed the influence of the analyte [12], but this finding
was afterwards neglected. Interestingly enough, Horwitz
concluded in his paper on mycotoxins [5] that myco-
Table 2. Calculated RSDR and their confidence limits for toxin assays are less reproducible than other assays.
mycotoxins Apart from the intrinsic problems connected with
some analytes, there can be another influence of the
Concentration RSDR, predicted ± PI (%)
analyte; namely, that the concentration levels differ from
10
1 Æ 10 (0.1 lg/kg) 87 ± 53 analyte to analyte. Proficiency tests focus on analytes in
1 Æ 109 (1 lg/kg) 63 ± 52
typical concentrations. For the mycotoxins used in [5],
1 Æ 108 (10 lg/kg) 46 ± 52
this means that concentration levels for aflatoxins were

http://www.elsevier.com/locate/trac 1127
Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006

significantly below those of other mycotoxins, such as A second factor is the development of analytical
deoxynivalenol or zearalenone. Worse precision in the equipment and methodologies. Looking at the database
determination of aflatoxins can therefore be an intrinsic from 1993 [5], a vast number of interlaboratory studies
problem of aflatoxin determination rather than a was based on TLC and ELISA, which are today consid-
concentration effect. This potential correlation between ered as semi-quantitative assays and screening tests at
analyte type and analyte concentration casts further best. It might be expected that the wider use of LC and
doubt on the applicability of the model. GC, the development of columns of better quality and
more reliable standards would result in an improvement
3.2. Influence of methods of the reproducibility over time, as also shown by
As was explicitly pointed out in [5], there is a difference Whitaker et al. [16]. This assumption is confirmed by
in reproducibility between the various methods. Not statistical data. While in 1993, Horwitz et al. [5]
surprisingly, measurements based on liquid chromato- concluded that mycotoxin assays tended to show
graphy (LC) tended to have a better reproducibility than reproducibilities above those predicted by the Horwitz
those based on thin-layer chromatography (TLC), which equation, Thompson et al. showed that interlaboratory
in turn were more reproducible than enzyme-linked comparisons after 1997 have significantly better repro-
immunosorbent assay (ELISA) measurements. This ducibility than expected from the Horwitz equation, thus
shows that, as expected, the method clearly influences refuting an earlier statement that reproducibility does
the precision of results, which is in strong contradiction not improve over time [17].
with the basic assumption of the model. A third factor is the improvement in analytical equip-
In general, ELISA and TLC methods are used as ment itself. This improvement results frequently in higher
screening tests for mycotoxins because of poorer perfor- signals that can be determined more accurately. Examples
mance in comparison with LC or GC methods [13]. A for this are modern diode-array detectors that are signif-
notorious problem of ELISA-based methods is cross icantly more sensitive than older models. For example,
reactivity of analog compounds resulting in large scatter this increased sensitivity of instrumentation can be
and significant overestimation. For example, it is well- exploited by narrowing the wavelength range that is
known that ELISA methods used for the determination of measured, thus decreasing the influence of interferences.
the mycotoxin deoxynivalenol (DON) principally cannot A major improvement in mycotoxin analysis was
distinguish between the naturally-occurring mycotoxins achieved by the technical advances in the field of LC-mass
DON, 3-acetyl-DON, 15-acetyl-DON, and 3,15-diacetyl- spectrometry (MS). In recent years very accurate, sensi-
DON, because the mycotoxin antibodies are designed tive and robust LC-tandem mass spectrometry (MS2)
against 3,7,15-triacetyl-DON. These findings were methods with electrospray ionization (ESI) or atmospheric
underpinned by interlaboratory studies showing signifi- pressure chemical ionization (APCI) have been developed
cantly higher results for ELISA methods, when compared for a variety of mycotoxins. These methods do not require
with LC or GC results [14]. any derivatization of the analytes, as GC methods and
matrix effects are considerably reduced using the multiple
3.3. Influence of time reaction monitoring (MRM) mode [18].
The passage of time should not in itself change the In addition, the development of immuno-affinity
accuracy of measurements but analytical equipment columns or MycoSep columns for several mycotoxins
changes with time. The combined effect of all these improved sample clean-up [13,19]; for example, a recent
influences can be summarized in the variable ‘‘time’’. interlaboratory study on aflatoxin B1, B2, G1, G2 myco-
This influence will typically be more pronounced in new toxins in hazelnut paste by immuno-affinity column
fields of analysis, be it new analytes or new concentra- clean-up with LC – fluorescence detection (FLD) using
tions. Improvement in trace analysis can therefore be post-column bromination demonstrated substantially
expected to be larger than for long-established bulk improved RSDRs in the range 6.1–7.0% for total
analytical techniques. aflatoxins and 7.3–7.8% for aflatoxin B1 at mass fraction
The mycotoxin assays based on LC, TLC and ELISA levels of 4.0–11.8 lg/kg total aflatoxins [20] (predicted
evaluated in [5] span the period 1968–1992, i.e. nearly about 45%). Another example of significant improve-
25 years. It would be disappointing if the performance of ment in method performance in the field of mycotoxin
laboratories had not improved over this time. Time can analysis was demonstrated in a project for the produc-
influence reproducibility of assays in many ways. One tion of certified reference materials (CRMs) for aflatoxin
influence is certainly the availability of reliable reference M1 (AfM1) in milk powders [21]. The characterization
materials, as shown in [15]. Lack and quality of study on AfM1 in milk powders by different immuno-
calibrator and matrix reference materials was certainly a affinity column clean-ups with LC-FLD methods resulted
problem in the early days of mycotoxin analysis. in even better RSDRs in the range 5.9–4.8% [21] for
Agreement between various laboratories might therefore AfM1 at mass fraction levels of 0.11–0.44 lg/kg
have been expected to improve over time. (predicted >70%), respectively.

1128 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 25, No. 11, 2006 Trends

4. Measurement uncertainty and the Horwitz other matrices and analytes is not in line with this
equation requirement. Furthermore, it is difficult to argue that, if
expected RSDR values are somewhere between 11% and
As described above, the predicted RSDR value from the 115%, assuming a value of 63% is equivalent to an
Horwitz equation has been prescribed as a criterion for expert judgment.
method performance [3]. This means using it as an Another incongruity in using the Horwitz equation as
estimate for measurement uncertainty, as explicitly an estimate for measurement uncertainty concerns the
suggested by Massart et al. [4]. However, this practice is demand of ISO 17025 that a laboratory must make
not in line with the GUM [9] and infringes more than efforts to determine the goal of the analysis of its cus-
one current quality-management practice. tomer. The result of a measurement always includes the
Especially, the contradiction with the estimation of measurement uncertainty, be it explicit or implicit. It is
uncertainties of measurements as demanded in section therefore assumed that the customer knows how accu-
5.4.6.2 of ISO 17025 [22] is of utmost importance. The rate the results need to be. Using an average perfor-
basic idea of any uncertainty estimation is to gather mance, such as the RSDR from the Horwitz equation,
information on a particular measurement. In an ideal corresponds to what is usual, but not to what is neces-
case, all uncertainty contributions for the particular sary and therefore complies with neither ISO 17025 nor
measurement are evaluated. If measurement uncer- other current approaches for judging laboratory perfor-
tainty is evaluated from method-validation data (e.g., as mance. Thompson et al. [23] decidedly dismissed the
suggested by the Eurachem Guide [8]), the actual mea- practice of normalizing z-scores to the average perfor-
surement must be linked to the average method perfor- mance in the recent IUPAC protocol for proficiency
mance using the normal quality-assurance tools such as testing, as it does not take into consideration customer
quality-control charts. Using the Horwitz equation as an needs. This affects even more RSDR values from the
estimate for RSDR means that there is no link whatso- Horwitz equation, which correspond to outdated average
ever between the actual measurement and the uncer- laboratory performance, as shown above, and which are
tainty estimation. This means that, even if one accepts unrelated to current customer needs so they are no
RSDR values as estimates for uncertainties, the values longer valid.
from the Horwitz equation are most likely not to be
accurate estimates for the measurements in question.
The ideal solution would be identification of all 5. Conclusion
components of uncertainty of measurement and rea-
sonable estimation, as stipulated by the aforementioned We have demonstrated that the fit of the individual data
standards and guides [8,9,22]. In the field of myco- from the Horwitz equation is rather poor. The correla-
toxin analysis, considerable improvement in method tion is not good enough to use the Horwitz equation as a
performances was demonstrated in the project for the predictive model and confidence levels can exceed the
production of CRMs for AfM1 in milk powders [21], as expected values by more than a factor of 2.
already mentioned above. Within the frame of this In addition, there are technical reservations about the
project, expanded measurement uncertainties for the Horwitz equation (i.e. its assumptions that RSDR values
determination of AfM1 were also assessed for each do not depend on analyte, matrix, method and/or time
laboratory by summing the combined standard are mistaken, as demonstrated by recent interlaboratory
uncertainties of sample mass, common calibrant, studies). The Horwitz equation does not make
external calibration, precision and recovery at AfM1 allowances for the improvement of analytical methods
mass-fraction levels of 0.1 lg/kg and 0.4 lg/kg. Rela- and techniques, so using the Horwitz equation is
tive expanded uncertainties in the range 7.7–23.7% benchmarking against outdated standards and leads to
(n = 7) and 6.8–19.7% (n = 8) for the lower and complacency with results that are not currently state-
higher mass-fraction levels were calculated. The rela- of-the-art.
tive expanded uncertainties are significantly better Nevertheless, the Horwitz equation is a useful tool to
than anticipated from the Horwitz equation and, above summarize historical measurement performance but the
all, the approach complies with the requirements of the unreliability of its results for a specific problem in ques-
GUM [9]. tion, the disagreement with modern quality manage-
It might be argued that using the values from the ment and the need of uncertainty estimation make it
Horwitz equation constitutes an ‘‘expert judgment’’, as unsuitable as a criterion for method performance. In-
explicitly foreseen in the GUM. However, the GUM stead of using the Horwitz equation, measurement
clearly states that uncertainty evaluation requires uncertainties should be identified and estimated
‘‘detailed knowledge of the measurand and the mea- according to the GUM approach [9], which is consistent
surement’’ (3.4.8). Using data from other methods for with ISO 17025 [22].

http://www.elsevier.com/locate/trac 1129
Trends Trends in Analytical Chemistry, Vol. 25, No. 11, 2006

In view of the movement away from specifying par- [5] W. Horwitz, R. Albert, S. Nesheim, J. AOAC Int. 76 (1993) 461.
ticular analytical methods towards specifying method [6] W. Horwitz, R. Albert, J. AOAC Int. 79 (1996) 589.
[7] M. Thompson, Analyst (Cambridge, UK) 125 (2000) 385.
performance criteria, the Codex Committee on Methods [8] S.L.R. Ellison, M. Rosslein, A. Williams, (Editors), EURACHEM/
of Analysis and Sampling of the Codex Alimentarius CITAC Guide Quantifying Uncertainty in Analytical Measurement,
Commission is discussing a fitness-for-purpose approach 2nd Edition, Eurachem, 2000 (http://www.eurachem.ul.pt/
to evaluating methods of analysis [24,25]. This guides/QUAM2000-1.pdf).
approach would be based on an uncertainty function [9] International Organization for Standardization (ISO), ISO Guide to
the Expression of Uncertainty in Measurements, ISO, Geneva,
constructed from precision data inter alia, and is to be Switzerland, 1995.
judged against the Horwitz equation. This kind of [10] W. Horwitz, R. Albert, J. AOAC Int. 89 (1996) 1095.
uncertainty function should be reconsidered, because [11] W. Kandler, Aufbau und Betrieb eines Kontrollprobensystems zur
the Horwitz equation does not provide an accurate, Qualitätssicherung in der Wasseranalytik, PhD Thesis, Vienna
traceable and state-of-the-art judgment, as we have University of Technology, Austria, 1999.
[12] W. Horwitz, J. AOAC Int. 84 (2001) 919.
shown above. [13] R. Krska, S. Baumgartner, R.D. Josephs, Fresenius J. Anal. Chem.
371 (2001) 285.
[14] R.D. Josephs, R. Schuhmacher, R. Krska, Food Addit. Contam. 18
Acknowledgement (2004) 417.
[15] R. Schuhmacher, R. Krska, J. Weingaertner, M. Grasserbauer,
Fresenius J. Anal. Chem. 359 (1997) 510.
The authors were very saddened to hear about the death
[16] T. Whitaker, W. Horwitz, R. Albert, S. Nesheim, J. AOAC Int. 79
of William Horwitz while the present paper was being (1996) 476.
submitted. An eminent chemist and administrator at the [17] M. Thompson, J.L. Philip, J. AOAC Int. 80 (1997) 676.
Food and Drug Administration, he was the recipient of [18] F. Berthiller, R. Schuhmacher, G. Buttinger, R. Krska, J. Chroma-
many prestigious awards for his work in analytical togr. A 1062 (2005) 209.
[19] R.D. Josephs, R. Krska, Fresenius J. Anal. Chem. 369 (2001) 469.
chemistry and was for many years Executive Director of
[20] H.Z. Senyuva, J. Gilbert, J. AOAC Int. 88 (2005) 526.
the Association of Official Analytical Chemists (now [21] R.D. Josephs, R. Koeber, T.P.J. Linsinger, A. Bernreuther, F.
AOAC International). We wish to acknowledge his Ulberth, H. Schimmel, Anal. Bioanal. Chem. 378 (2004)
important contribution to analytical food chemistry and 1190.
his excellent work in the field of food standards. [22] International Organization for Standardization (ISO), ISO 17025,
General requirements for the competence of testing and calibra-
We also thank Susanna Linsinger for transferring the
tion laboratories, ISO, Geneva, Switzerland, 2005.
data from Horwitzs publication into a spreadsheet. [23] M. Thompson, S. Ellison, R. Wood, Pure Appl. Chem. 78 (2006)
145.
[24] ALINORM 05/28/23, Report of the 26th session of the Codex
References Committee on Methods of Analysis and Sampling 2005, 28th
session of the Codex Alimentarius Commission, Joint FAO/WHO
[1] W. Horwitz, L.R. Kamps, K.W. Boyer, J. Assoc. Off. Anal. Chem. Food Standards Programme, Rome, Italy, 2005. (www.codexal-
63 (1980) 1344. imentarius.net/download/report/636/al28_23e.pdf).
[2] W. Horwitz, Anal. Chem. 54 (1982) 67A. [25] CX/MAS 05/26/4, Proposed draft recommendations on the
[3] European Commission, Commission Decision 2002/657/EC imple- fitness-for-purpose approach to evaluating methods of analysis,
menting Council Directive 96/23/EC, Off. J. Eur. Commun. L221 26th session of the Codex Committee on Methods of Analysis and
(2002) 8. Sampling, Codex Alimentarius Commission, Joint FAO/WHO Food
[4] D.L. Massart, J. Smeyers-Verbeke, Y. Van der Heyden, LC-GC Eur. Standards Programme, Budapest, Hungary, 2005. (ftp://
10 (2005) 528. ftp.fao.org/codex/ccmas26/ma26_04e.pdf).

1130 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 26, No. 7, 2007 Correspondence

Correspondence
Limitations of the application of
the Horwitz Equation: A rebuttal
Michael Thompson

1. Introduction erable to transform the Equation to show the reproduc-


^H ¼ 0:02c0:8495 .
ibility standard deviation itself, r
I must take issue with the paper ‘‘Limitations of the
application of the Horwitz Equation’’ [1].
The actual limitations of the Equation have been
known to the analytical community for at least 10 years. 2. The statistics
It is still regarded as a useful tool in appropriate cir-
cumstances (i.e. within its known limitations). In the statistical section of the paper, we see simple
However, the critique is founded on an unfortunate regression applied to unsuitable data. First, the depen-
use of regression and a misreading of the literature, so it dent variable (the percentage-reproducibility relative
has the potential to cause confusion for legislators, standard deviation, RSDR % ¼ 100^ rR =c) is constrained
enforcement agencies, accreditation agencies, profi- non-negative on the low side and contains many out-
ciency-test providers and practicing analytical scientists. liers on the high side. Second, the independent variable
I believe that the ÔproblemsÕ discussed in the paper are is also a function of the concentration c (i.e. we are
insubstantial and can be seen to be so if we moderate considering the regression of 100^ rR =c against
with a little common sense the particular approach to 2ð10:5log10 cÞ ), which would automatically give rise to a
the estimation of uncertainty that seems to underlie the meaningless correlation even if the r ^R data were com-
paper. It would be a great loss if the Horwitz function pletely unrelated to c.
were incorrectly thought to be discredited. Chemical A further problem is that the dataset itself tends to
measurement is a difficult enough task and, to conduct it prejudice the outcome: much of the data fall in a con-
efficiently, we need access to all of the relevant tools that centration range where the Horwitz function is already
are available. known to be inapplicable and would not be used. No
For the benefit of readers unfamiliar with the topic, the reliable information about the Horwitz function could
Horwitz Equation generalizes statistics derived from col- therefore be derived from this exercise, and, indeed, the
laborative trials (interlaboratory method-performance outcome cannot be reconciled with the data (e.g., the
studies) in the food-analysis sector. It originated from the calculated prediction limits for RSDR% manifestly do not
empirical observation that reproducibility (interlabora- represent the data: the lower limit is below zero for about
tory) relative standard deviations (RSDR % ¼ 100^ rR =c) half of its range while none of the data is below zero.
tended to be around 4% when the analyte-mass fraction RSDR% is non-negative by definition.) However, the pa-
was c = 0.01 (that is, at a concentration of 1% by mass). per treats the resulting problems as shortcomings of the
Moreover, RSDR% tended to double approximately for Horwitz Equation, rather than of the statistical model.
every reduction in analyte concentration by a factor of
100. This relationship can be expressed mathematically as
RSDR % ¼ 2ð10:5log10 cÞ , although, in many ways it is pref-
3. Properties of the Horwitz function

School of Biological and Chemical Sciences, Birkbeck College (University of


The paper condemns the Horwitz function on the
London), Malet Street, London WC1E 7HX, UK incorrect grounds that it is used unthinkingly by ana-
E-mail: M.Thompson@bbk.ac.uk lytical chemists to predict uncertainties.

0165-9936/$ - see front matter ª 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.trac.2007.05.008 659
Correspondence Trends in Analytical Chemistry, Vol. 26, No. 7, 2007

To clear up the resulting confusion, we need to be abso- towards fitness for purpose. Methods that are unneces-
lutely clear about the previously published knowledge of the sarily accurate, and therefore unduly expensive, tend to
Horwitz function, which can be summarized as follows. be discarded in favor of cheaper ones. Those that are not
1. In the food analysis sector (to which its use is largely accurate enough, and therefore too frequently lead to
restricted), the Horwitz function is a remarkably financial penalties due to incorrect decisions, are also
good predictor of the trend of reproducibility stan- discarded and replaced by more accurate methods. The
dard deviations in the concentration range of outcome is an uncontrived general drift towards fitness
roughly 108–101 mass fraction (i.e. from 10 ppb for purpose, which is manifested in the food sector (and
to 10% by mass (the ‘‘Horwitz region’’)). in the appropriate concentration range) as. . .the Horwitz
2. Individual observations within the Horwitz region function.
show considerable deviations from the trend. About Even without this rationale, it is perfectly justifiable to
half of this variation is the result of random chance use the Horwitz function (or, indeed, any function
brought about by the small number (in statisticiansÕ agreed among the stakeholders) as a performance crite-
terms) of laboratories participating in collaborative rion in proficiency testing. It is used there to define in
trials. The remainder is due to systematic effects, aris- advance an appropriate standard uncertainty for a par-
ing from the use of specific measurement technolo- ticipant in the proficiency test, not to predict the
gies, and, to some extent, where methods have been uncertainty that a participant delivers. This is exclu-
used at concentrations near their limits of detection sively the concern of the proficiency-test provider, its
(LODs) where RSDR% values are high by definition. advisory committee and its customers, and is not a
3. At concentrations below about 108, the trend of metrological principle. Naturally, the provider will use a
results deviates from the Horwitz function and fol- criterion that is relevant to the application sector.
lows a relative standard deviation of 0.20–0.25. In the same way, the Horwitz function could be a valid
(In this range the Horwitz function would predict way of expressing fitness for purpose in routine analysis,
standard deviations that, if realized, would render if it were found to be relevant after expert consideration
analysis futile: the results would be mostly below and consultation in that sector. Again, this is not a
the LOD (e.g., at a concentration of 109, the metrological principle: it is a way of prescribing, not
function predicts an RSDR% of 45%). describing, the uncertainty of data.
4. At concentrations above 101, the trend of the re- A similar consideration applies to the use of ‘‘Horrat’’
sults is again for smaller RSDR% than predicted by values used to assess the performance of analytical
the function. methods in the food sector (the Horrat being the ratio of
5. Some specific measurement technologies can buck the observed RSD% to that predicted by the Horwitz
the trend in the Horwitz region by showing system- function). An average Horrat of 2 from a collaborative
atically better or worse standard deviations than pre- trial strongly suggests that the method will not provide
dicted. results that are fit for purpose. A single value that large
6. When laboratories are required to achieve a perfor- could occasionally arise by random chance from a sat-
mance better than predicted by the Horwitz func- isfactory method. However, collaborative trials demand
tion, they can do it. a minimum of five test materials, and the probability of
Given all of the above, only an very naı¨ve analytical the mean of five or more results showing such a high
chemist would use the Horwitz function as the paper Horrat by chance is very small.
perceives it to be used, namely: Reiterating a previous statement, only the naı¨ve
(a) to define fitness for purpose without reference to the would use the Horwitz function directly to estimate
customerÕs requirements; uncertainty, even in the Horwitz region and certainly
(b) to estimate uncertainties; not outside it. However, the function does have an
(c) to predict relative standard deviations in specific in- important and critical bearing on uncertainty estimates.
stances; and, Metrologists rightly object to analytical scientists using
(d) to use the function outside the Horwitz region. reproducibility standard deviation r ^R from a collabora-
tive trial as an unqualified estimate of standard uncer-
tainty. Different laboratories will vary in their precision
4. The Horwitz function and uncertainty performance, some better and some worse than the trend
indicated by r ^R . More importantly, r ^R does not include
In contrast with using the Horwitz function for predic- possible error contributions from method bias. This is
tion, using it to prescribe standard deviations and saying that r ^R itself underestimates uncertainty, at least
uncertainties is not only useful, it is justified rationally. in principle. However, in practice, r ^R interpolated di-
There is a cogent argument that shows that, within rectly from relevant collaborative trials more often ex-
appropriate application sectors (e.g., food analysis), the ceeds uncertainty estimates based on single laboratory
function represents an evolution of analytical methods validation. This is because cause-and-effect (‘‘bottom-

660 http://www.elsevier.com/locate/trac
Trends in Analytical Chemistry, Vol. 26, No. 7, 2007 Correspondence

up’’) uncertainty models are notoriously unable to trials. However, occasionally, there are no relevant col-
incorporate the variations of unknown origin that give laborative trial statistics available from which to infer a
rise to the ubiquitous between-laboratory effect. If the value for r ^R . In the food-analysis sector, at least, it is
cause cannot be identified, or is not even known to exist, worth comparing the estimated standard uncertainty
the effect cannot be estimated a priori. In short, most with the known trend of r ^R , which, in the Horwitz re-
laboratories using the ‘‘bottom-up’’ approach underes- gion, is r ^H . If the uncertainty estimate is smaller than
timate their uncertainty. r
^H , it is possibly incorrect and should at least be further
As a consequence, an uncertainty estimate less than investigated. In such an investigation, z-scores from a
r
^R must be suspect prima facie, unless the laboratory can proficiency test might help.
demonstrate that it has taken exceptional measures to
reduce error, as would happen, for example, in a na-
tional reference laboratory, so it is always a useful cross- Reference
check to compare uncertainty estimates with an inter-
polated value of r ^R derived from relevant collaborative [1] T.P.J. Linsinger, R.D. Josephs, Trends Anal. Chem. 25 (2006) 1125.

http://www.elsevier.com/locate/trac 661
Correspondence Trends in Analytical Chemistry, Vol. 26, No. 7, 2007

Correspondence

Reply to Professor Michael


ThompsonÕs rebuttal
Thomas P.J. Linsinger1,*, Ralf D. Josephs2

1. Introduction that Prof. ThompsonÕs comments strengthen, not


weaken, our reservations.
We greatly appreciate Professor Thompson giving an (a) The data used (including the outliers) have been re-
overview about the application of the Horwitz Equa- garded as ‘‘technically valid’’ in the publication by
tion today. His contribution is most welcome because Horwitz et al. [2], and other published collections
it is the appropriate time for discussion of uncertainty of intercomparisons also show outliers. Removal
estimations and the role and limitations of the Horwitz of outliers does not significantly alter the result
Equation, especially in the field of food analysis of our statistical evaluation. In fact, presence of out-
regarding method-performance criteria. In his reaction liers is to be expected, as mere screening methods
to our paper [1], Prof. Thompson brings two main (e.g., ELISA) have also been included in the dataset.
arguments: (b) Relative standard deviations (RSDs) are not only
(1) our statistical approach is wrong, which invali- non-negative by definition, but are also expected
dates our conclusions; and, to be below 50%. The calculation for RSD assumes
(2) the Horwitz Equation can be used as a fitness-for- normally-distributed data. For RSDs above 50%,
purpose criterion, provided it is not used naı̈vely. there would be a significant probability of obtain-
ing results below 0, as about 95% of results are ex-
pected to be in a range of xi  2s, so data from
2. Our statistical evaluation intercomparisons that show RSDs above 50% can-
not follow normal distributions, and the RSD val-
We would like to point out that we used the same ues for these intercomparisons are therefore
data that have been previously evaluated by Horwitz meaningless.
et al. [2]. As we explain in our publication, we used However, such data were used in the original publi-
mycotoxin data, as the individual RSDR values of the cations, despite the fact that there are these reasons to
intercomparisons are available in the original publi- question the validity of several individual data points. As
cation by Horwitz et al., and as more subsequent data we have used the same dataset and evaluation principles
are available for this group of analytes. Shortcomings as the original publication, reservations against our
of the dataset itself therefore also affect the original evaluation apply equally to the original evaluation by
evaluation. Published evaluations for other analytes Horwitz et al.
(e.g., polychlorinated biphenyls (PCBs)) give only In his criticism of our statistical approach, Prof.
graphical representations that show similar scatter. Thompson overlooks that we use statistics only to
This indicates that our results for mycotoxins are more quantify the obviously large scatter around the
generally valid. Regarding our evaluation, we believe regression line and no statistical treatment can de-
crease this scatter. We are therefore convinced that
1
EC-JRC, Institute for Reference Materials and Measurements (IRMM), our statistical evaluation is fit for its purpose, namely
Retieseweg 111, B-2440 Geel, Belgium to give an impression of the variance around the
2
Bureau International des Poids et Mesures (BIPM), Pavillon de Breteuil, F-
92312 Sèvres Cedex, France
regression line and thus the low reliability of the
*
Corresponding author. Tel.: +32 14 571 956; Fax: +32 14 571 548; estimate from the Horwitz Equation for an individual
E-mail: thomas.linsinger@ec.europa.eu method.

662 0165-9936/$ - see front matter ª 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.trac.2007.05.009
Trends in Analytical Chemistry, Vol. 26, No. 7, 2007 Correspondence

3. Use of the Horwitz function as fitness-for- uncertainty is not yet very well established and there is
purpose criterion and for uncertainty evaluation great incentive to make inappropriate use of seemingly
easy tools for this problem. This is possibly the reason
Prof. Thompson proposes that the Horwitz Equation is a why the estimate from the Horwitz Equation is regularly
good fitness-for-purpose criterion. Meaningful evaluation put forward as the estimate for a reproducibility standard
of the fitness for purpose of a method requires a priori deviation to be in turn used as estimate of measurement
definition of performance criteria, which will depend on uncertainties of results obtained by this method. In
the purpose of analysis (e.g., protection of human health, addition, one has to bear in mind that the guidelines for
or product pricing). In contrast, the Horwitz Equation uncertainty estimations (e.g., the ISO Guide to the
was developed as an empirical description of RSDR Expression of Uncertainty in Measurement [3]) strongly
values and was only afterwards adopted as a fitness- emphasize that knowledge of oneÕs own measurement is
for-purpose criterion. The observation that accepted a prerequisite for uncertainty estimation. In our opinion,
analytical methods tend to fulfill the Horwitz criterion is using reproducibility data from other laboratories,
therefore a circular argument. maybe even from other methods, does not fulfill this
Prof. Thompson argues that the Horwitz function is requirement. Furthermore, we do not see any reason
an appropriate benchmark for uncertainty estimations. why a laboratory that finds – after having validated a
It is of course true that even the most rigorous single- method and having shown the applicability of validation
laboratory validation (which, by the way, can also data to real samples – that its uncertainty estimate
follow a ‘‘top-down’’ approach) needs confirmation of agrees with published reproducibility data would want to
the result, especially confirmation of absence of any replace its uncertainty estimation with this published
effects unaccounted for. The proper way to do this is reproducibility. Careful expert consideration will there-
participation in intercomparisons (as also suggested by fore eliminate the need to use the Horwitz Equation
Prof. Thompson) or measurement of certified reference rather than support its use in measurement uncertainty
materials. If the laboratory result agrees with the tar- estimations.
get value within the respective uncertainties, the lab- We agree with Prof. Thompson that chemical mea-
oratoryÕs uncertainty estimation was correct, regardless surements are not only difficult but they are also
of whether this uncertainty was lower or higher expensive and occur in large quantities, especially in the
than the reproducibility standard deviation. There is food sector, so experts should avoid prescribing gen-
therefore no reason to treat uncertainties lower than eralized fitness-for-purpose approaches based on unsuit-
the reproducibility standard deviation with special able models and historical data, and should rather rely
suspicion. on thoroughly-validated methods with corresponding
uncertainties cross-checked against application-specific
criteria.
4. Inappropriate use of the Horwitz Equation in In our view, Prof. Thompson has not refuted our ori-
uncertainty estimations ginal conclusions – that the RSDs predicted from the
Horwitz Equation cannot be used as an accurate
We do not share Prof. ThompsonÕs optimism that the benchmark for measurement uncertainties for a partic-
relationship between the Horwitz Equation and mea- ular method. Furthermore, current practice of quality
surement uncertainty estimation will be readily under- management and uncertainty estimation is at odds with
stood. It is clear that uncertainty is the statistical any use of the Horwitz Equation other than as a sum-
parameter related to a laboratoryÕs analytical measure- mary of historical method performance.
ment results. r^R and r
^H , are statistical parameters and
functions related to interlaboratory studies. There is a
relationship between the two but it is not as straight- References
forward as suggested by the Horwitz Equation.
[1] T.P.J. Linsinger, R.D. Josephs, Trends Anal. Chem. 25 (2006) 1125.
Our own experience in giving training courses on the [2] W. Horwitz, R. Albert, S. Neshelm, J. AOAC. Int. 76 (1993) 461.
subject, dealing with other laboratories as well as the [3] International Organization for Standardization (ISO), ISO Guide to
scarcity of uncertainty data in the analytical literature, Expression of Uncertainty in Measurements, ISO, Geneva, Switzer-
clearly shows that the estimation of measurement land, 1995.

http://www.elsevier.com/locate/trac 663

View publication stats

Você também pode gostar