Escolar Documentos
Profissional Documentos
Cultura Documentos
222]
On: 22 October 2014, At: 20:16
Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK
To cite this article: W. Alan Nicewander & James M. Price (1997) A Consonance Criterion for Choosing Sample Size, The
American Statistician, 51:4, 311-317
Taylor & Francis makes every effort to ensure the accuracy of all the information (the Content) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
A Consonance Criterion for Choosing Sample Size
and James M. PRICE
W. Alan NICEWANDER
methods the consonance criterion produces formulas that
are similar, or identical, to some power-based methods, es-
Sample sizes determined by the proposed criterion ensure pecially the recent work of Harris and Quade (1992) and
that subjectively important parameter estimates will be Willan (1994). Following an explication of the consonance
statistically significant; alternatively, subjectively trivial criterion in the case of two-sample differences, we present
values will be excluded from confidence intervals for those generalizations to more complex situations, approximation
parameters. Formulas are given for simple mean differ- formulas for all cases presented, power computations, and
ences, multiple correlations, and mean comparison methods a discussion of the other proposals just mentioned.
in multiple group settings, with suggestions for extension to
larger, factorial designs. Although no assumptions are nec- 2. THE CONSONANCE CRITERION
essary about the true values of unknown parameters or a As used in this inquiry, consonance means bringing into
desired level of power, adequate power is provided for im- agreement the values of sample statistics and the statistical
portant results. Only central distributions are required for statements based upon them. As such, our use of the term
either the exact iterative solutions given or their closed form is quite similar to Gabriels (1969) usage in the context of
approximations. multiple comparison techniques. In what follows we assume
that observations are normal and independent, with a com-
KEY WORDS: Effect size; Mean difference; Power;
mon population standard deviation 0 , and that samples are
Downloaded by [141.214.17.222] at 20:16 22 October 2014
@ 1997 American Statistical Association The American Statistician, November 1997, Vol. 51, No. 4 311
be the sample estimate of the effect size, where If, as before, 6, is substituted for d in Equation (S), then
an equation identical to (5) is obtained; further simplifica-
s2P =
s;
~
+ s; (3) tion leads to Equation (6), and one can solve for sample
2 sizes such that if the (standardized) mean difference d is
is a pooled unbiased estimate of the (assumed common) 6, or greater, then the value of zero will fall outside the
variance, and s: and s: are the unbiased variance estimates (1 - a)100% confidence interval for (PI- p 2 ) .
for the two samples. It is easy to show that Stein (1945; Steel and Torrie 1960, pp. 86-87) proposed
- a somewhat similar method for sample size determination
It1 = d,/E2 (4) that is based on setting the absolute width of a confidence
interval to some a priori value. Such an interval will have
is the value of the sample t for testing the hypothesis that a desired absolute precision (say, f 5 mm), but may also
the group means are equal. In order to find, a priori, the include 0 as a possible value; the present method employs a
value of n for which one will always reject Ho whenever d specified relative level of precision to guarantee exclusion
equals or exceeds a, one must adjust n iteratively until of 0.
For the sake of simplicity and brevity only the hypoth-
esis testing approach is detailed in the following sections.
be,/; = L 4 2 ; 2 ( n - l ) (5) However, a derivation based on the estimation aspect will
yield identical formulas.
where tl--cl/2;2(n-1) is the two-tailed critical value of t for
2 ( n - 1) df at the a level of significance. Equivalently, the 3. GENERALIZATIONS TO MULTIPLE-GROUP
consonant sample size can be found by solving, iteratively, EXPERIMENTS
the implicit equation
Downloaded by [141.214.17.222] at 20:16 22 October 2014
or equivalently
where p is the grand mean (see, e.g., Cohen 1988, p. 277;
(9) Scheffk 1959, pp. 63-64). This configuration of population
312 General
means represents the minimum departure from HOsuch that statistic q is given by
at least one pair of population means differs by &CT. As
such, this arrangement yields the minimum important value XL - xs
= dmax fi (15)
for the noncentrality parameter of the appropriate sampling q= lpzJi
distribution. If one calculates the sample size under the as-
sumptions of the Tang Condition, then the value of n pro- where MS,,,,, is the mean-square error from a correspond-
duced will be an upper bound to the sample size actually ing one-way ANOVA. Given this relationship between max-
needed to produce a specified level of power. imum sample effect size d, and the studentized range
The consonance criterion makes no assumptions about statistic q, it is obvious from previously developed logic that
population means or desired power. Aside from the later the value of n that will guarantee rejection of the hypoth-
comparisons of n and power produced by different methods, esis of equality of the population means when d, 2 6,
we will assume that the configuration of sample means is must satisfy
as described by (10) when population means are replaced
by corresponding sample means and the common variance 6 c f i = ql--a;J;J(n-l) (16)
o2 is replaced by MS,,,,,. where ql--a;J,J(n-l) is the 1 - a percentile point of the
3.1 Scheffes Method of Simultaneous Confidence sampling distribution of q. An alternative implicit formula
Intervals for n is
2
Given an experiment with J 2 2 treatment conditions in ql--a;J;J(n-l)
a single factor layout, let n=
6
:
Downloaded by [141.214.17.222] at 20:16 22 October 2014
*: 6:
known) pattern of differences among the population means.
One might reasonably be concerned about the actual power Unlike the exact solutions the approximations employ n
associated with using a consonant sample size instead of the only on the left-hand side, yielding simple, closed form
standard value. Table 2 shows the power levels associated solutions. Note that in the case of consonant samples for
with the Scheffk and Tukey methods (under the Tang Con- J samples using Tukey's method, the approximate sample
dition) as a function of values of s,, the number of groups sizes are based on the distribution of the range-which is
J, the (unknown) true value of 6,and the sample sizes given given as the last row (infinite n) of the tabled distribution
in Table 1. Again, SAS's PROBF function was used to in- of the studentized range statistic q.
tegrate the appropriate noncentral F distribution for each It has been our experience, and that of Harris and Quade
of the Scheffk cases. Because of the problems associated (1991), that the approximate solutions for the consonance
with a noncentral studentized range distribution (Hochberg criterion sample sizes rarely differ by 1 or 2 (and occa-
and Tamhane 1987) a noncentral F distribution was used to sionally 3) from the exact solution. Kupper and Hafner
compute lower bounds to power for the studentized range- (1989) noted similar empirical discrepancies in power-based
determined sample sizes; these values should be considered approximation formulas, and offered tables for correcting
quite conservative. the value of n so derived. Guenther (1981), also in a pa-
Notice that a number of power lower bounds are enclosed per discussing power-based sample size determination, sug-
in boxes. These boxed values are the lower bounds to power gested a correction to a similar approximate solution for
for true values of S that are equal to or greater than 6,. n in the two-group case that can be used in conjunction
Ordinarily, one is most interested in the performance of a with the consonance-based sample sizes for two-group ex-
statistical test only for such values of S because values that periments, namely, increase the computed approximate n
314 General
Table 2. Conditional Lower Bounds to Power for Consonance Criterion Sample Sizes for Selected Values of J and 6
True 6
10 .25 .50 .75 1.00 1.25 1.50
6, = .I0
J n
2 770 50 .99 .99 .99 .99 .99 .99
3 1,200(1,096) 58 (54) .99 (.99) .99(.99) .99(.99) .99 (.99) .99 (.99) .99 (.99)
5 1,899(1,490) .69(57) .99(.99) .99(.99) .99(.99) .99(.99) .99 (.99) .99 (.99)
7 2,519(1,739) .76 ( 5 7 ) .99(.99) .99 (.99) .99 (.99) .99(.99) .99(.99) .99 (.99)
9 3,102(1,928) .82(57) .99(.99) .99 (.99) .99 (.99) .99(.99) .99(.99) .99 (.99)
6, = .50
J n
2 32 .07 .I 7 50 .84 .98 .99 .99
3 49 (47) .07(.07) .I8(.16) 58 (53) .92(.88) .99(.99) .99(.99) .99 (.99)
5 77 (60) .07(.06) .20(.16) .69 (56) .97(.92) .99(.99) .99(.99) .99 (.99)
7 102 (70) .07(.06) .22 (.16) .76 (57) .99 (.93) .99(.99) .99 (.99) .99(.99)
9 125 (78) .07(.06) .23(.16) .82 (57) .99 (.94) .99(.99) .99 (.99) .99(.99)
6, = 1.00
J n
2 9 .05 .08 .I 7 .32
3 14 (12) .05 (.05) .08(.08) .I9 (.16) .38(.33) 6 2 (54) .82(.75) .94(.89)
Downloaded by [141.214.17.222] at 20:16 22 October 2014
5 20 (16) .05 (.05) .08(.07) .21 (.15) .47(.32) .76(56) .94(.79) .99(.93)
7 26 (18) .05 (.05) .08(.07) .21 (.15) .47(.32) .76 (56) .94(.79) .99(.93)
9 32 (20) .05 (.05) .08(.07) .23(.15) 52 (.32)
NOTE: All values of n are based on the consonance criterion, and are shown in Table 1. For J > 2, sample sizes are based on both the omnibus F test and the studentized range statistic (values
of n in parentheses). All conditional lower bounds to power are based on rejectingthe hypothesis tested by the omnibus F test, that is, power values in parentheses are for sample sizes determined
from the studentized range statistic, but power was evaluated in terms of rejecting the omnibus F test (see text).
by 0.252,2-,/, observations. This modification gives val- The expression in (18) is not, in itself, hard to derive from
ues that correspond roughly to the empirical discrepancies a power-based standpoint, and seemingly quite a few others
mentioned above, and it has the advantage of automatically have done so because each paper just mentioned credits
adjusting for the level of a and the nature of the test (one- a different set of predecessors for its origin. The present
or two-tailed). Whether this modification can be improved paper arrives at (18) by dropping all reference to power in
and/or extended to multiple-group techniques is an open the derivation, relying instead on a requirement of assured
question. significance or certain exclusion of zero from a confidence
interval when Id1 2 6,. This represents one major difference
7. COMPARISON WITH OTHER PROPOSALS between the present paper and those of Harris and Quade
As mentioned at the outset, in the simplest case, involv- (1992) and Willan (1994).
ing the simple difference between two sample means, the A second major difference with the work of Harris and
formulas produced by satisfying the consonance criterion Quade (1992) is our approach to multiple sample settings.
are similar or identical to those produced by others. This is Although they concentrate on single degree of freedom
due, in part, to similarities or identities among those other methods, Harris and Quade point out, quite correctly, that
approaches. For instance, both Guenther (1981) and Kupper their MIDS criterion may produce values of n in an ANOVA
and Hafner (1989) make use of the formula (modified for setting that are larger than necessary for specific (i.e., a pri-
consistency with the present notation) ori) contrasts among means, and they recommend methods
for reducing n until 50% power is achieved for those con-
n 2 2[(21-,/2 + z1-a)/6l2 (18) trasts and the chosen method of familywise control of a
which determines sample size for a two-tailed test of signif- (p. 40).
icance for two independent samples at the a level of signifi- The present paper gives simple formulas for tests of mean
cance with power equal to 1- ,O (exact for known common contrasts by both Scheffks and Tukeys methods if an a
variance and approximate otherwise). Harris and Quades priori approach is taken; these formulas may readily be ex-
(1992, p. 46) formula for this situation is the special case of tended to other criteria for significance. If, however, the
(18) that occurs when their minimally important difference more traditional a posteriori approach is appropriate, then
significant (or MIDS) criterion (that the optimal power for we recommend setting n according to (14) or its approxi-
a test of significance is S O ) is adopted and 21-p = 0. Willan mation. Under the sample Tang Condition, if the omnibus
(1994), working in the context of controlled management F test is significant, then at least one test of pairwise mean
trials, derives a similar formula based on setting power at differences will also be significant by both Scheffks and
S O for the value of (our) 6 that represents the researchers Tukey s methods.
point of indifference between two competing treatment reg- Willans (1994) alternative approach for choosing n in
imens. clinical management trials represents yet another departure
The American Statistician, November 1997, Vol. 51, No. 4 315
from the single degrees of freedom work previously cited. differ by 6, or more, under any pattern of differences in
As the author states (p. 212), the objective of manage- the population means. The consulting statistician can tell
ment trials is aimed at deciding which treatment should be clients, I have determined a sample size such that if some-
used[,] as opposed to explanatory trials, which are con- thing important happens in your experiment (as measured
ducted to determine whether a difference in treatments ex- by d, relative to S,), it will be statistically significant.
ists at all. The management trial approach is in the tradi- Finally, the approximate solutions, based on the central nor-
tion of Wald (1947), in which one desires strong evidence mal, chi-square, and range distributions, have closed form
for one or the other of two distinct hypotheses. As such, solutions that are trivial to compute; even the iterative ex-
Willans S (for his symmetric case, p. 215) is half the act solutions are easier to find than those involving noncen-
value used in the present paper. Substituting this value into tral distributions. This simplicity makes the present method
the present formulas (and recognizing that his n is the total easier to present to students or clients, and may prove ap-
number of observations, not n per sample) leads to Willans pealing to those who need such formulas only occasionally.
equation (6). Blind use of the present formulas in the man- With a little reflection it is easy to see that the consonance
agement trial setting would lead to dramatic underestimates criterion may be used for determining n in other situations.
(by a factor of 8) relative to Willans criterion. The ap- For instance, main effect and cell means tests in multifac-
proaches taken by all the other works cited in the present tor designs may be handled by substituting the appropriate
paper, as well as our approach, lead to values that would degrees of freedom in Equations (14) and (17) or their ap-
be appropriate for explanatory trials only, without such ad- proximations. Although derived from a different standpoint,
justments. Harris and Quades (1991) formulas for Pearsons T and x2
are also such extensions, which may be further generalized.
8. SUMMARY AND DISCUSSION As an illustration, suppose one wanted to test the multiple
Downloaded by [141.214.17.222] at 20:16 22 October 2014
The investigations that led to the consonance criterion correlation coefficient for relating several independent vari-
were inspired by a recent paper in which Schmidt (1992) ables to some dependent variable. The consonance criterion
discussed situations in which the values of n were such that sample size in this situation is the value of n that guaran-
the sample estimate of S had to exceed 6, by 20% or more tees the rejection of Ho: p 2 = 0 if the (squared) sample
in order for HO to be rejected. As previously stated, such multiple correlation exceeds some critical size, say pz. It is
results can be vexing (embarrassing?)to both the researcher easy to show that the exact implicit equation for the conso-
and the statistical consultant. The consonance criterion was nance sample size and the (known-variance) approximation
proposed because it determines a value of n such that re- are, respectively,
jection of HO is assured whenever the largest sample effect 2
size d, equals or exceeds some critical effect size 6,. For- n = P F l - - a ; p ; n - p - l ( l - P,) + p + l
mulas for determining this value of n were proposed and P:
used to construct tables of sample sizes and minimal tradi- and
tional power for simple experiments. A striking feature of
the sample size table is how large these sample sizes are
for effect sizes as small as .lo, and how small they are for
effect sizes as large as 1.0. This range of values emphasizes where p is the number of independent (predictor) variables.
an observation made not long ago by Tukey (1986): For example, given five independent variables, and given
that a (squared) multiple correlation of .4is considered im-
With a reasonable amount of data, things of size 50 are nearly trivial to portant, the exact formula yieldsa sample size of n = 58,
find-anyone should be able to find them-whereas things of 0 . 0 5 ~ can with an approximate sample size of 56.
be nearly impossible to find, once we face the presence of systematic error Cohen (1992) recounts his seemingly futile efforts over
as well as those very nice errors whose effects come down like cr/J;E
(P. 76).
three decades to make sample size selection easier for the
everyday research worker, through simpler (and fewer) for-
A double bonus of the generalizations to multiple sample mulas and improved tables. Like him, we hope that our
situations is that choosing n for the ANOVAs omnibus F proposed methods will help to reverse the negative answer
test assures rejection of at least one a posteriori pairwise to Sedlmeier and Gigerenzers (1 989) title question.
mean comparison, by either Scheffks or Tukeys method.
One question that will surely arise in connection with [Received September 1993. Revised September 1996.1
the consonance criterion is, What advantage does this cri-
terion for determining n have over older methods in which REFERENCES
one specifies values of a, S,, and power, and then calculates Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences
sample size based on some assumed pattern of differences (2nd ed.), New York: Academic Press.
(19921, A Power Primer, Psychological Bulletin, 112, 155-159.
among the population treatment means? For one thing a Gabriel, K. R. (1969), Simultaneous Test Procedures-Some Theory of
value of power for a particular configuration of unknown Multiple Comparisons, Annals of Mathematical Statistics, 40,224-250.
population means does not have to be specified in order to Guenther, W. G. (1981), Sample Size Formulas for Normal Theory T
do the calculations. Second, the sample sizes given by the Tests, The American Statistician, 35, 243-244.
Harris, R. J., and Quade, D. (1992), The Minimally Important Difference
consonance criterion deliver their promise of certain rejec- SignificantCriterion for Sample Size, Journal of Educational Statistics,
tion of HO whenever one or more pairs of sample means 17, 27-49.
316 General
Hochberg, Y., and Tamhane, A. C. (19871, Multiple Comparison Proce- 309-3 16.
dures, New York: Wiley. Steel, R. G. D., and Torrie, J. H. (1960), Principles and Procedures of
Kupper, L. L., and Hafner, K. B. (1989), How Appropriate are Popular Statistics, New York: McGraw-Hill.
Sample Size Formulas?, The American Statistician, 43, 101-105. Stein, C. (1945), A Two-Sample Test for a Linear Hypothesis Whose
Pearson, E. S., and Hartley, H. 0. (19511, Charts of the Power Function Power is Independent of Variance, Annals of Mathematical Statistics,
of the Analysis of Variance Tests, Derived from the Non-Central F - 16,243-258.
Distribution, Biometrika, 38, 112-1 30. Tang, P. C. (1938), unpublished manuscript.
Scheffk, H. A. (1959), The Analysis of Variance, New York: Wiley. Tukey, J. W. (1986), Sunset Salvo, The American Statistician, 40, 72-76.
Schmidt, F. L. (1992), What do Data Really Mean?, Anrerican Psychol- Wald, A. (1947), Sequential Analysis (1973 reprint by Dover Publications,
ogist, 47, 1173-1 180. New York).
Sedlmeier, P., and Gigerenzer, G. (1989), Do Studies of Statistical Power Willan, A. R. (1994), Alternative Approach for Analyzing Management
have an Effect on the Power of Studies?, Psychological Bulletin, 105, Trials, Controlled Clinical Trials, 15, 21 1-219.
Downloaded by [141.214.17.222] at 20:16 22 October 2014