Você está na página 1de 11

The Statistical Sign Test

Author(s): W. J. Dixon and A. M. Mood


Reviewed work(s):
Source: Journal of the American Statistical Association, Vol. 41, No. 236 (Dec., 1946), pp. 557-
566
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2280577 .
Accessed: 19/02/2013 17:39

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.

http://www.jstor.org

This content downloaded on Tue, 19 Feb 2013 17:39:22 PM


All use subject to JSTOR Terms and Conditions
THE STATISTICAL SIGN TEST*
W. J. DIXON
of Oregon
University
A. M. MOOD
Iowa State College

This paper presentsand illustratesa simplestatisticaltest


forjudgingwhetherone oftwo materialsor treatmentsis bet-
terthantheother.The data to whichthetestis appliedconsist
of paired observationson the two materialsor treatments.
The test is based on the signs of the differences betweenthe
pairs of observations.
It is immaterialwhetherall the pairs of observationsare
comparableor not. However,when all the pairs are compar-
able, there are more efficient tests (the t test, for example)
whichtake account of the magnitudesas well the signsof the
differences.Even in this case, the simplicityof the sign test
makesit a usefultool fora quick preliminary appraisal ofthe
data.
In thispapertheresultsofpreviouslypublishedworkon the
sign test have been included,togetherwith a table of signifi-
cance levels and illustrativeexamples.

INTRODUCTION

IN EXPERIMENTAL investigations,it is oftendesired to compare two


materials or treatmentsunder various sets of conditions.Pairs of
observations(one observationforeach of the two materials or treat-
ments) are obtained for each of the separate sets of conditions.For
example,in comparingthe yieldof two hybridlines of corn,A and B,
one mighthave a few results fromeach of several experimentscar-
ried out under widelyvaryingconditions.The experimentsmay have
been performedon differentsoil types, with different and
fertilizers,
in differentyearswithconsequentvariationsin seasonal effectssuch as
rainfall,temperature,amount of sunshine,and so forth.It is supposed
that both linesappeared equally oftenin each block ofeach experiment
so that the observedyields occurin pairs (one yield foreach line) pro-
duced under quite similarconditions.
The above example illustratesthe circumstancesunder which the
signtestis mostuseful:
(a) There are pairs of observationson two thingsbeing compared.
* This paper is an adaptation of a memorandumsubmittedto the Applied MathematicsParel by
the StatisticalResearch Group,PrincetonUniversity.The StatisticalResearch Group operatedunder
a contractwiththe Officeof ScientificResearch and Development,and was directedby the Applied
Mathematic Panel ofthe National DefenseResearchCommittee.

557

This content downloaded on Tue, 19 Feb 2013 17:39:22 PM


All use subject to JSTOR Terms and Conditions
558 AMERICAN STATISTICAL ASSOCIATION

(b) Each ofthe two observationsof a given pair arose undersimilar


conditions.
(c) The different pairs were observedunder different conditions.
This last conditiongenerallymakes the t test invalid. If this were not
the case (that is, if all the pairs of observationswere comparable),the
ttest would ordinarilybe employedunlesstherewereotherreasons,for
example,obvious non-normality, fornot usingit.
Even whenthe t test is the appropriatetechniquemanystatisticians
like to use the sign test because of its extremesimplicity.One merely
counts the numberof positiveand negativedifferences and refersto a
table of significancevalues. Frequently the question of significance
may be settled at once by the sign test withoutany need forcalcula-
tions.
It should be pointed out that, strictlyspeaking,the methodsofthis
paper are applicable only to the case in which no ties in paired com-
parisons occur. In practice,however,even when ties would not occur
if measurementswere sufficiently precise,ties do occur because meas-
urementsare oftenmade only to the nearest unit or tenth of a unit
forexample. Such ties should be includedamong the observationswith
half ofthembeing counted as positiveand halfnegative.
Finally, it is assumed that the differences between paired observa-
tions are independent,that is, that the outcome of one pair of obser-
vations is in no way influencedby the outcome of any otherpair.

PROCEDURE

Let A and B representtwo materialsor treatmentsto be compared.


Let x and y representmeasurementsmade on A and B. Let the num-
ber of pairs of observationsbe n. The n pairs of observationsand their
differencesmay be denoted by:

(XI, Yl1), (X2, y2), ***,(X., Yn)

and
X1 - Yl, X2 - Y2 ...* Xn - Yn.

The sign test is based on the signs of these differences.


The letter r
willbe used to denotethe numberoftimestheless frequentsign occurs.
If some ofthe differences
are zero,halfofthemwillbe givena plus sign
and halfa minussign.
As an example ofthe type ofdata forwhichthe sign test is appropri-
ate, we may considerthe followingyields of two hybridlines of corn
obtained fromseveral differentexperiments.In this example n =28
and r=7.

This content downloaded on Tue, 19 Feb 2013 17:39:22 PM


All use subject to JSTOR Terms and Conditions
THE STATISTICAL SIGN TEST 559
If thereis no difference
in theyieldingabilityofthetwolines,the
positiveand negativesignsshouldbe distributedby thebinomialdis-
tributionwith p= . The null hypothesishere is that each difference
has a probability (whichneednotbe thesameforall dif-
distribution
ferences)withmedianequal to zero.This nullhypothesiswillobtain,
forinstance,if each difference
is symmetrically
distributedabout a
meanofzero,althoughsuchsymmetry is notnecessary.The nullhy-
pothesiswillbe rejectedwhenthe numbersof positiveand negative
signsdiffer
significantlyfromequality.

YIELDS OF TWO HYBRID LINES OF CORN

Experiment Yield of Sign of Experiment Yield of Sign of


Number A B x-y Number A B x-v

1 47.8 46.1 + 4 40.8 41.3 _


48.6 50.1 - 39.8 40.8 _
47.6 48.2 - 42.2 42.0 +
43.0 48.6 - 41.4 42.5 -

42.1 43.4 _
41.0 42.9 - 5 38.9 39.14
39.0 39.4 -

2 28.9 38.8 - 37.5 37.3 +


29.0 31.1 -
27.4 28.0 - 6 36.8 37.5 -
28.1 27.5 + 35.9 37.3 -
28.0 28.7 _ 33.6 34.0 -
28.3 28.8
26.4 26.3 + 7 39.2 40.1 -
26.8 26.1 + 39.1 42.6 -

8 33.3 32.4 +
30.6 31.7 _

Table 1 givesthecriticalvaluesofr forthe 1, 5, 10,and 25 percent


levelsofsignificance.A discussionofhowthesevaluesare computed
maybe foundin theappendix.A value ofr lessthanorequal to that
in thetableis significant
at thegivenpercentlevel.
Thus in the example above where n = 28 and r = 7, there is sig-
nificanceat the 5% level,as shownby Table 1. That is, the chances
are only1 in 20 ofobtaining a value ofr equal to orlessthan8 when
thereis no real differencein the yieldsofthe two linesof corn.It is
concluded, at the5% levelof significance,
therefore, thatthetwolines
havedifferentyields.
In general,thereare no valuesofr whichcorrespond exactlyto the
levelsofsignificance1, 5, 10, 25 per cent.The values givenare such
thattheyresultin a level of significance as closeas possibleto, but
notexceeding1,5, 10,25 percent.Thus?thetestis a littlemorestrict,

This content downloaded on Tue, 19 Feb 2013 17:39:22 PM


All use subject to JSTOR Terms and Conditions
TABLE 1
TABLE OF CRITICAL VALUES OF r FOR THE SIGN TEST

Per Cent Level of Per Cent Level of


Significance Significance

n 1 5 10 25 n 1 5 10 25

1 _ - - - 51 15 18 19 20
2 - _ - 52 16 18 19 21
3 - _ - 0 53 16 18 20 21
4 - _ - 0 54 17 19 20 22
5 - - 0 0 55 17 19 20 22

6 - 0 0 1 56 17 20 21 23
7 - 0 0 1 57 18 20 21 23
8 0 0 1 1 58 18 21 22 24
9 0 1 1 2 59 19 21 22 24
10 0 1 1 2 60 19 21 23 25

11 0 1 2 3 61 20 22 23 25
12 1 2 2 3 62 20 22 24 25
13 1 2 3 3 63 20 23 24 26
14 1 2 3 4 64 21 23 24 26
15 2 3 3 4 65 21 24 25 27

16 2 3 4 5 66 22 24 25 27
17 2 4 4 5 67 22 25 26 28
18 3 4 5 6 68 22 25 26 28
19 3 4 5 6 69 23 25 27 29
20 3 5 5 6 70 23 26 27 29

21 4 5 6 7 71 24 26 28 30
22 4 5 6 7 72 24 27 28 30
23 4 6 7 8 73 25 27 28 31
24 5 6 7 8 74 25 28 29 31
25 5 7 7 9 75 25 28 29 32

26 6 7 8 9 76 26 28 30 32
27 6 7 8 10 77 26 29 30 32
28 6 8 9 10 78 27 29 31 33
29 7 8 9 10 79 27 30 31 33
30 7 9 10 11 80 28 30 32 34

31 7 9 10 11 81 28 31 32 34
32 8 9 10 12 82 28 31 33 35
33 8 10 11 12 83 29 32 33 35
34 9 10 11 13 84 29 32 33 36
35 9 11 12 13 85 30 32 34 36

36 9 11 12 14 86 30 33 34 37
37 10 12 13 14 87 31 33 35 37
38 10 12 13 14 88 31 34 35 38
39 11 12 13 15 89 31 34 36 38
40 11 13 14 15 90 32 35 36 39

41 11 13 14 16 91 32 35 37 39
42 12 14 15 16 92 33 36 37 39
43 12 14 15 17 93 33 36 38 40
44 13 15 16 17 94 34 37 38 40
45 13 15 16 18 95 34 37 38 41

46 13 15 16 18 96 34 37 39 41
47 14 16 17 19 97 35 38 39 42
48 14 16 17 19 98 35 38 40 42
49 15 17 18 19 99 36 39 40 43
50 15 17 18 20 100 36 39 41 43

For n > 100,approximatevalues of r may be foundby takingthenearestintegerlessthan in -k v n,


wherek =1.3, 1, .82, .58 forthe 1, 5,10, 25 per centvalues respectively.A closerapproximationto the
values of r is obtained fromi(n -1)-k -4 In+1 and the more exact values of k, 1.2879, .9800, .8224,
.5752.

This content downloaded on Tue, 19 Feb 2013 17:39:22 PM


All use subject to JSTOR Terms and Conditions
THE STATISTICAL SIGN TEST 561
on the average,thanthe level ofsignificance whichis indicated.For
smallsamplesthetestis considerably morest-ictin somecases. For
example,thevalueofr forn= 12forthe10percentlevelofsignificance
actuallycorresponds to a percentlevellessthan5.
The criticalvaluesofr inTable 1 forthevariouslevelsofsignificance
werecomputedforthecases whereeitherthe +'s or -'s occura sig-
nificantlysmallnumberof times.Sometimesthe interestmay be in
onlyoneofthesigns.Forexample,intestingtwotreatments, A and B,
A maybe identical with B except forcertainadditions which can only
of
have the effect improving B. In thiscase one would be interested
onlyin whetherthe deficiency of minussigns(fordifferences in the
directionA minusB) weresignificant or not. In cases ofthiskindthe
percentlevelsofsignificance inTable 1 wouldbe dividedbytwo.Thus,
8 minussignsin a sampleof28 wouldcorrespond to the2.5% levelof
significance.
SIZE OF SAMPLE

Even thoughthereis no realdifference,a sampleoffourorevenfive


withall signs alikewilloccurby chancemorethan 5% of the time.
Foursignsalikewilloccurby chance12.5% ofthetimeand fivesigns
alikewilloccurby chance6.25% of the time.Therefore, at the 5%
itis necessary
levelofsignificance, to haveat leastsixpairsofobserva-
tionseven if all signsare alike beforeany decisioncan be made.
As in moststatisticalwork,morereliableresultsare obtainedfrom
a largernumberofobservations. Onewouldnotordinarily use thesign
testforsamplesas smallas 10 or 15, exceptforroughor preliminary
work.
The questionmaybe raisedas to theminimum samplesizenecessary
to detecta givendifferenceintwomaterials. Supposethatin an indefi-
nitelylargenumberofobservations 30% +'s and 70% -'s are to be
expectedand thatwewishthesampleto be largeenoughto detectthis
differenceat the 1% level ofsignificance.Althoughno sample,how-
everlarge,willmakeit absolutelycertainthata significant difference
willbe found,the samplesize can be chosento makethe probability
of finding a significantresultas near to certaintyas is desired.In
Table 2, this probability has been chosenas 95%; the minimum
values of n (samplesize) and the corresponding criticalvalues of r
to insurea decision95% ofthetimeare givenforvariousactual per-
centagespoand levelsofsignificance a.
The signtestmerelymeasuresthesignificance ofdepartures froma
50-50 distribution.If the signsare actuallydistributed 45-55, then
the departurefrom50-50 is not likelyto be significant unlessthe

This content downloaded on Tue, 19 Feb 2013 17:39:22 PM


All use subject to JSTOR Terms and Conditions
562 AMERICAN STATISTICAL ASSOCIATION

sample is quite large. Table 2 shows that if the signs are actually
distributed45-55, then one must take samples of 1,297 pairs in order
to get a significantdeparture froma 50-50 distributionat the 5%
level of significance.The number1,297 is selected to give the desired
significance95% of the time; that is, if a large numberof samples of
1,297 each were drawn froma 45-55 distribution,then 95% of those
samples could be expected to indicate a significantdeparture (at the
5% level) froma 50-50 distribution.
TABLE 2
MINIMUM VALUES OF n NECESSARY TO FIND SIGNIFICANT
DIFFERENCES 95% OF THE TIME FOR VARIOUS
GIVEN PROPORTIONS

nt r
Pe
=1% 5% 10% 25% a=1% 5% 10% 25%

.45(.55) 1,777 1,297 1,080 780 833 612 612 373


.40(.60) 442 327 267 193 193 145 119 87
.35(.65) 193 143 118 86 78 59 49 37
.30(.70) 106 79 67 47 39 30 26 19
.25(.75) 66 49 42 32 22 17 15 12
.20(.80) 44 35 28 21 13 11 9 7
.15(.85) 32 23 18 14 8 6 5 4
.10(.90) 24 17 13 11 5 4 3 3
.05(.95) 15 12 11 6 2 2 2 1

The italicizedvalues are approximate.The maximumerroris about 5 forthe value of n, and 2


forthe value of r. The values of n and r for5% weretaken fromMacStewart (reference1) who gives
(the above table uses only 95%) and a
a table ofvalues of n and r fora rangeof confidencecoefficients
singlevalue a ' 5%.

Of course,in practice one would not do any testingif he knew in


advance the expected distributionof signs (that it was 45-55, for
example). The practical significanceof Table 2 is of the following
nature: In comparingtwo materials one is interestedin determining
whetherthey are of about equal or of different value. Before the in-
vestigationis begun, a decision must be made as to how different
the materials must be in order to be classed as different.Expressed
in anotherway, how large a difference may be toleratedin the state-
mentthat "the two materialsare ofabout equal value?" This decision,
togetherwithTable 2, determinesthe sample size. If one is interested
in detectinga differenceso small that the signs may be distributed
45-55, he must be preparedto take a very large sample. If, however,
one is interestedonlyin detectinglargerdifferences,(forexample,dif-
ferencesrepresentedby a 70-30 distributionofsigns),a smallersample
willsuffice.
In many investigations,the sample size can be leftundetermined,

This content downloaded on Tue, 19 Feb 2013 17:39:22 PM


All use subject to JSTOR Terms and Conditions
THE STATISTICAL SIGN TEST 563
and onlyas muchdata accumulatedas is needed to arriveat a decision.
In such cases, the signtest could be used in conjunctionwithmethods
of sequential analysis. These methods provide a desiredamount of
informationwith the minimumamount of sampling on the average.
A completeexpositionofthe theoryand practiceofsequential analysis
may be foundin references3 and 4.

MODIFICATIONS OF THE SIGN TEST

When the data are homogeneous (measurementsare comparable


between pairs of observations),the sign test can be used to answer
questionsofthe followingkind:
1. Is materialA betterthan B by P per cent?
2. Is materialA betterthan B by Q units?
The firstquestion would be tested by increasingthe measurementon
B by P per cent and comparingthe resultswiththe measurementson
A. Thus, let
(X1, y,), (X2, Y2), (X3, Y3), etc.
be pairs of measurementson A and B, and suppose one wishedto test
the hypothesisthat the measurements,x, on A were 5% higherthan
the measurements,y, on B. The sign test would simplybe applied to
the signs of the differences
xl- 1.05y1, X2 - 1.05Y2, X3 - 1.05y3, etc.
In the case ofthe second questionthe sign test would be applied to the
differences
x- (Yl + Q), x2 - (Y2 + Q), X3 - (Y3 + Q),etc.
In eithercase, if the resultingdistributionof signs is not significantly
differentfrom 50-50, the data are not inconsistentwith a positive
answer to the question. Usually there will be a range of values of P
(or Q) which will produce a non-significant distributionof signs. If
one determinessuch range,using the 5% level ofsignificanceforex-
a
ample, then that range will be a 95% confidenceintervalforP (or Q).
Even when the data are not homogeneous,it may be possible to
framequestionsofthe above kind, or it may be possible to change the
scales of measurementso that such questions would be meaningful.

MATHEMATICAL APPENDIX

A. Assumptions.Let observationson two materialsor treatmentsA


and B be denoted by x and y, respectively.It is assumed that forany
pair of observations (xi, yi) there is a probabilityp(O<p<1) that

This content downloaded on Tue, 19 Feb 2013 17:39:22 PM


All use subject to JSTOR Terms and Conditions
564 AMERICAN STATISTICAL ASSOCIATION

xi>yi (i=l, 2, *, n); p is assumed to be unknown.' It is also


assumed that the n pairs of observations (xi, yi), (i=l, 2, , n)
are independent;i.e., the outcome (+ or -) for(xi, y1)is independent
of the outcomefor (xi, y,) (isj).
B. The Observations. The purpose of obtainingobservations(xi, yj)
is to make an inferenceregardingp. The observedquantityupon which
an inferenceis to be based is r, the numberof +'s or -'s (whichever
occur in fewernumbers) obtained fromn paired observations(xi, yi).
On the basis ofthe assumptionabove it followsthat the probabilityof
obtainingexactlyr as the minimumnumberof +'s or -'s is:
In\ n -i
- - r = 0, 1, 2, , ; n odd
(n) [pr(1 p) n-r + pn-r(l p)r] X
r ~~~~~~~~~~~~2
n-2
r-OJ1 2}}, - ;neven

2~~~
(~~
Vpnlp)in r = 2; n even.

C. The Inference.In the sign test the hypothesisbeing testedis


that p=J; in other words that the distributionsof the differences
Xi-yi (i= 1, 2, * , n) have zero medians. For the more general
tests discussedin Section5, the hypothesisis that the differences
xi-f(yi) (i= 1, 2, * * , n) have zero medians. The functionf(y) may
be Py or Q+y (whereP and Q are the constantsmentionedin Section
5) or any other functionappropriate for comparisonwith x in the
problemat hand.
The hypothesisthat p= 2 is tested by dividingthe possiblevalues of
r into two classes and accepting or rejectingthe hypothesisaccording
as r fallsin one or the otherclass. The classes are chosenso as to make
small (say < a) the chance of rejecting the hypothesis when it is
true and also to make small the chance of accepting the hypothesis
when it is untrue.It can be shown that in a certainsense, the best set
of rejectionvalues forr is 0, 1, * *, 1,
R whereR depends on a and n.
R can be determinedby solvingforR =maximum i in the inequality:

( )(42)= I&n- i, i+ 1) _ c

whereI. (a, b) is the incompletebeta function.Table 1 was computed


in this way.
1 An additional assumptionis that the probabilityAi -Bi is 0; thus the probabilityBj >A, is
(1-p).

This content downloaded on Tue, 19 Feb 2013 17:39:22 PM


All use subject to JSTOR Terms and Conditions
THE STATISTICAL SIGN TEST 565
D. SampleSizes.Whenthesamplesizeis smallthesigntestis likely
to rejectthehypothesis,p= 4,onlyifp is nearzeroorone.If p is near,
butnotequalto 4,thetestis likelyto rejectthehypothesis,p= 4,only
whenthesampleis large.
The samplesize requiredto rejectthehypothesisp = 4at thea level
ofsignificance,
100X%ofthetime,maybe determined by finding the
largesti and smallestn whichsatisfy:

and
~0C)(~Y
-a
(.pi( - p) f-i >\
j=0

n andi aregivenin Table II forvariousvaluesofp and a; Xwas taken


to be .95in all cases.The tabularvaluesfor1-p arethesameas those
forp becauseofthesymmetry ofthebinomialdistribution.
E. Efficiency of theSign Test. Let z = x-y. Assume z is normally
distributedwithmeana and variancecr2.The probabality ofobtaining
a + on a particular zi is:
1 00?
p \=2r e-lu'du.

An estimateof p involvingonlythe signs of zi (i 1, 2, * n) yields


an estimateof 2) has shownthat in large
Cochran(reference
(a/co).
samplesthevariance ofthisestimate of (a/cr)is 2rpq e(aU)2 /n., We shall
denotea/crby c.
The efficiencyofan estimatebased on n independent observations
is defined
as thelimit(as n-> oo) oftheratioofthevarianceofan ef-
estimateof
estimateto thatofthe givenestimate.An efficient
ficient
c is:
t

wheret is Student's t and z -zi/n.


The varianceof this estimateis 1/(n-2); thus the efficiency,E,
is
of the signtestis e_c2/27rpq.If c = 0, thenp= and the efficiency
2/7r=63.7%.
The precedingdiscussionpertainsto largevalues of n; forsmall
valuesofn, theefficiencyis a littlebetterthan63.7%. Computations

This content downloaded on Tue, 19 Feb 2013 17:39:22 PM


All use subject to JSTOR Terms and Conditions
566 AMERICAN STATISTICAL ASSOCIATION

were made forseveral smaller values of n, namely,forn = 18, 30, 44


pairs of observationsat the 10% level of significance.It was found
that the sign test using 18 pairs of observations is approximately
equivalent to the t-testusing 12 pairs of observations; for 30 pairs
the equivalent t-testrequires between 20 and 21 pairs; and for 44
pairs the equivalent t-testrequiresbetween 28 and 29 pairs. Cochran
shows that the efficiencyof r/n for estimating c decreases as Ic|
increases.
REFERENCES

[1] W. MacStewart, 'A note on the power of the sign test," Annals of Mathe-
maticalStatistics,Vol. 12 (1941), pp. 236-238.
[2] W. G. Cochran,"The efficiencies of the binomialseriestest of significance
of
a mean and of a correlationcoefficient," Journal Royal StatisticalSociety,
Vol. C, Part I (1937), pp. 69-73.
[3] A. Wald, "Sequentialmethodofsamplingfordecidingbetweentwo coursesof
action,"JournalAmericanStatisticalAssociation,Vol. 40 (1945), pp. 277-306.
[4] Statistical Research Group, Columbia University,Sequential Analysis of
StatisticalData: Applications(1945), Columbia UniversityPress,New York.

This content downloaded on Tue, 19 Feb 2013 17:39:22 PM


All use subject to JSTOR Terms and Conditions

Você também pode gostar