Você está na página 1de 7

DETECTION STATISTICS PRIMER © 2007 Bruce L.

Rosenberg

Abstract

Although the terrorist attacks of September 11th, 2001 have initiated dramatic action on a
number of fronts, there is a need for continuing measured, patient research and development and
scientific test and evaluation in the field of aviation security. Central to aviation security is the
use of statistics to evaluate performance of detection devices. This document briefly explains
detection statistics. It follows a logical progression from hypothesis testing in 2x2 tables,
through marginal counts in testing of screening devices, to the signal-to-noise probabilities used
in determining Receiver Operating Characteristic (ROC) curves. Using two figures it explains
the manner of production of detection and false alarm probability curves and their combination
into an ROC curve. Human behavior names given to the four cells in a 2x2 decision table should
aid students' comprehension. A thorough understanding of the statistical concepts herein is
essential for the evaluation of the performance of screening devices.

Introduction

The collection of readings on ROC curves and related issues in human signal detection edited by
Swets, [1], is the basis for much of the material herein. The purpose of this paper is to explain
the underlying rationale for statistical methods used in the analysis of baggage screening device
test data. This document provides a consistent terminology for communicating common concepts
underlying different statistical approaches to the same material. A thorough understanding of
the material herein, will provide the tools to approach real world analysis problems that often
deviate from those in traditional statistics texts.

The Null Hypothesis

Table I shows the null hypothesis testing approach. In each of the four possible outcome cells,
equivalent behavior in a human decision-maker is given a name (SKEPTIC, etc.). Alpha (α) is
the probability of making a Type I error, of saying there is a significant effect when none exists
(a BELIEVER’s error). A typical value for α in analysis of experimental data is 0.05 or one
chance in twenty of saying an effect exists when in fact there is none. Beta (β) is the probability
of a Type II error, of saying there is no effect when in reality one exists (a SKEPTIC’s error). A
typical value for β in analysis of experimental data is 0.10 or one chance in ten of saying there is
no effect when in fact one exists. The choice of names in Table I for the four human decision-
making behaviors is unique to this paper. It clearly defines the outcomes and should aid in
students' comprehension.

Frequently, a source of confusion is the logic of the “null hypothesis” or H(0). The null
hypothesis states that there is no difference between the control and treatment conditions in an
experiment. It states that their measured values are the same and a “null” or zero difference
exists. If you can reject the null hypothesis (at a given level of alpha significance) then you
accept the alternate hypothesis that the treatment condition differs from the control. Negation of
a negative becomes a positive.

When determining error probabilities, Leach [2], p. 39, teaches us that the null distribution
should be used for determination of the alpha error, whereas the alternate distribution should be

Page 1 of 7
DETECTION STATISTICS PRIMER © 2007 Bruce L. Rosenberg

used for determination of the beta error,. In [3], Navarro states that the hypothesis testing
approach is preferred to the confidence band approach because it allows estimating both the
alpha and the beta errors.

The term (1-β) gives the statistical “power” of the test, which is the probability of detecting an
effect if one is truly present. It will be shown that (1-β) and α are the probabilities, the z scores
of which determine d' for the ROC curve.

2x2 Contingency Table With Event Counts

In [1], page 652, Elliott presents notation and a 2x2 table for signal detection analysis. Based on
the counts of events falling into the four cells, she defines the probability that the receiver reports
a signal when a signal is present (Psn(A), a true positive) and the probability that the receiver
reports a signal when only noise is present (Pn(A), a false positive). TABLE II. shows an alarm-
by-threat contingency table that is consistent with the arrangement in the hypothesis-testing table
(TABLE I). The counts in TABLE II are represented as: C(alarm|threat), C(no alarm|threat),
C(alarm|no threat), and C(no alarm|no threat). C(alarm|threat) is to be read as, “the count of
events in which the system alarmed, given a threat was present.”

Following [1], op cit., the probability of a true positive (1-β, TABLE I) is defined as:
Psn(A) = C(alarm|threat)/(C(alarm|threat)+C(no alarm|threat)). (1)
The “sn” in Psn(A) above indicates the presence of signal plus noise.

Further, the probability of a false positive (α in TABLE I.) is defined as:

Pn(A) = C(alarm|no threat)/(C(alarm|no threat)+C(no alarm|no threat)). (2)


The “n” in Pn(A) above indicates the presence of noise alone. The two denominators of the
above probability definitions are the column totals shown in TABLE II of C(threat) and C(no
threat), respectively.

Contingency Table Showing Probabilities

Elliott [1] then writes: “By means of the computed values of Psn(A) and Pn(A), the appropriate
value of d' may be read from the table.” TABLE III shows probabilities using Elliott’s notation.

In evaluating performance of a detection device, all necessary information is contained in the top
two cells Psn(A) and Pn(A) that denote the probabilities of true detections and false alarms,
respectively. A detection device’s cumulative probability distributions for Psn(A) and Pn(A) are
used to compute its ROC curve.

Receiver Operating Characteristic Curves

ROC theory assumes that the distributions for the null case (alarm with no threat present) and
alternate case (alarm with threat present) are both Gaussian with equal standard deviations. The

Page 2 of 7
DETECTION STATISTICS PRIMER © 2007 Bruce L. Rosenberg

threat being present (signal plus noise) shifts the mean of the alternate distribution; but does not
change its spread (variance). The difference between the two distributions is d' (spoken as "d
prime") standard deviations. To be consistent with current FAA practice, in the rest of this
document, P(d) is used instead of Psn(A) and P(fa) is used instead of Pn(A).

In [4], ROC information was determined by table look-up. To provide more accuracy and
flexibility than table look-up, a spreadsheet was programmed to generate idealized probability of
detection (P(d) = 1-β), probability of false alarm (P(fa) = α), and d' (the difference between the
two). Fig. 1 shows the alternate, P(d), and the null, P(fa), cumulative probability distributions
for a d' of 1.0. The curve to the left represents the probability of detection, P(d). The parallel
curve to the right represents the probability of false alarm, P(fa). The X-axis shows the
normalized Z score in standard deviation units.

Moving to the right along the X-axis is equivalent to increasing the gain of a receiver. As the
gain is increased, the probability of detection increases along with the probability of false alarm.
The Y-axis shows the detection or false alarm probability. The value of d' for Fig. 1 is 1.0
standard deviation. Three horizontal lines of length corresponding to a d' value of 1.0 standard
deviation are drawn between the detection and the false alarm curves to illustrate the fact that d'
is a constant for these two curves. Because the curves have identical cumulative Gaussian
distributions and equal standard deviations, they differ only as to their means, i.e., they are
exactly the same shape and only shifted along the X-axis by the difference between their means.

Fig. 2 shows ten spreadsheet-generated ROC curves; one for each of ten values of d'. Each ROC
curve was derived from replotting the data from pairs of cumulative curves like the pair shown in
Fig. 1. Instead of Fig. 1’s probability versus Z-score axes, Fig. 2 plots detection probability
versus false alarm probability. Here is how the replotting works. Fig. 1 shows two vertical
arrows on the Z-score = -0.5 grid line pointing to circles on the two curves. The upper arrow
points to a Y value of 0.7 on the detection probability curve. The lower arrow points to a Y
value of 0.3 on the false alarm probability curve. These two points correspond to a single point
on the d' = 1.0 curve of Fig. 2 (shown in bold). This point, (0.3, 0.7) is shown enclosed by a
circle. Thus, an ROC curve is the locus of all points defined by corresponding points on the pair
of cumulative probability curves. Higher values of d' mean better screening device performance.

Conclusion

The collection of real world event counts is the starting point in evaluation of screening device
performance. This primer explains the logical progression from hypothesis testing, to event
counts, to computation of probabilities, to production of ROC curves. Throughout this process,
one must take care to avoid errors and inappropriate statistical techniques. Rarely do real world
detection and false alarm data have Gaussian distributions and equal variances. There are ways
to perform ROC curve analyses in the presence of deviations from the ideal. Many statistical
techniques are available to the investigator. The universal use of personal computers and
availability of a variety of statistical packages reduces drudgery of computation but requires
increasing sophistication on the part of the analyst. It is hoped that this primer on the statistics
involved in analyzing the performance of screening devices will help further the cause of a
secure aviation system.

Page 3 of 7
DETECTION STATISTICS PRIMER © 2007 Bruce L. Rosenberg

TABLE I. Hypothesis Testing 2 x 2 Table

In reality, an effect exists In reality, no effect exists


(reject H(0)) (accept H(0))

For probabilities, use the For probabilities, use the


alternate hypothesis null hypothesis
distribution distribution

The judgment is PROVER correctly BELIEVER wrongly


that affirms affirms
an
effect exists 1-β (power) α error (typical α= 0.05)
The judgment is SKEPTIC wrongly DISPROVER correctly
that denies denies
no
effect exists β error (typical β= 0.10) 1-α

TABLE II. Alarm-By-Threat Event Counts

In reality, In reality,
Row Totals
a threat is present no threat is present
The system True Positive False Positive
C(alarm)
alarms C(alarm|threat) C(alarm|no threat)
The system False Negative True Negative
C(no alarm)
does not alarm C(no alarm|threat) C(no alarm|no threat)
Column
C(threat) C(no threat) C(total)
Totals

Page 4 of 7
DETECTION STATISTICS PRIMER © 2007 Bruce L. Rosenberg

TABLE III. Alarm-By-Threat Probabilities

In reality, In reality,
a threat is present no threat is present
The system True Positive False Positive
alarms Psn(A) = 1 - β Pn(A) = α
The system False Negative True Negative
does not alarm Psn(CA) = β Pn(CA) = 1 - α

Page 5 of 7
DETECTION STATISTICS PRIMER © 2007 Bruce L. Rosenberg
P(fa) P(d)

0.9
d' = 1.0

0.8

Probability of Detection and False Alarm


0.7

0.6

d' = 1.0

0.5

0.4

0.3

0.2

d' = 1.0
0.1

0
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
Z Score (Standard Deviation Units)

Fig. 1. Cumulative Gaussian Distributions, d' = 1.0

1.00

0.90

0.80

0.70
PROBABILITY OF DETECTION (P(d))

0.60

3.0

0.50 2.5

2.0

1.5

0.40 1.0

0.8

0.6

0.30 0.4

0.2

0.1
0.20 Fig. 1

0.10

0.00
0.10

0.18

0.22

0.30

0.42

0.50

0.54

0.62

0.86

0.90

0.94
0.02

0.06

0.14

0.26

0.34

0.38

0.46

0.58

0.66

0.70

0.74

0.78

0.82

0.98

PROBABILITY OF FALSE ALARM (P(fa))

Fig. 2. Idealized ROC Curves for d' = 0.1 to 3.0

Page 6 of 7
DETECTION STATISTICS PRIMER © 2007 Bruce L. Rosenberg

References

[1] P. B. Elliott, "Appendix 1 - Tables of d' (d prime)" in Signal Detection and Recognition by
Human Observers: Contemporary Readings, J. Swets, Ed., John Wiley & Sons, NY, 1964, pp
651-658.

[2] C. Leach, Introduction to Statistics: A Nonparametric Approach for the Social Sciences,
John Wiley & Sons, NY, 1979.

[3] J. Navarro, D. Becker, B. Kenna, and C. Kossack, "A general protocol for operational
testing and evaluation of bulk explosive systems", in Proceedings 1st International Symp. on
Explosive Detection Technology, November 13-15, 1991, DOT/FAA/CT-92/11, May 1992, pp
347-367.

[4] T. McGhee, and J. Connelly, "Developmental test and evaluation of three commercial x-ray
explosives detection devices", Final Report DOT/FAA/AR-97/12,I, June 1997.

About The Author

Bruce L. Rosenberg has a Masters degree in experimental psychology and a Bachelors degree in
psychology with minors in statistics and mathematics and electronics training in the US Air
Force. He is a Life Member of the Institute of Electrical and Electronics Engineers (IEEE). He
has a strong technical background in testing of advanced electronic systems, statistics,
programming, and electronic circuit design. He held a Senior Test Engineer position supporting
Federal Aviation Administration (FAA) Aviation Security Laboratory projects (now under
Homeland Security Administration). From 1969 to 1995, he served as Senior Engineering
Research Psychologist at the FAA W. J. Hughes Technical Center, Atlantic City Airport, Pomona,
NJ. He performed over 40 T&E and R&D studies, designed over 20 test and evaluation protocols
(including questionnaires, surveys, and debriefings), coded over 10 major software applications
for system testing and data analysis, authored over 100 technical reports, taught 14 college-level
courses, and patented 4 inventions.

Page 7 of 7

Você também pode gostar