Você está na página 1de 13

i.

) \

Frog deformities

4 Designing experiments
Two types of investigations are carried out in biology: observational and experimen-
tal. In an experimental study, the researcher assigns treatments to units or subjects so
that differences in response can be compared. In an observational study, on the other
hand, nature does the assigning of treatments to subjects. The researcher has no influ-
ence over which subjects receive which treatment.
What's so important about the distinction? Whereas observational studies can
identify associations between treatment and response variables, properly designed
experimental studies can identify the causes ofthese associations.
How do we best design an experiment to get the most information possible out of
it'?The short answer is that we must design to eliminate bias and to reduce the influ-
CIWuor surnpling error. The present chapter outlines the basics on how to accomplish
IhiNrut'!. WI.; ulso briefly discuss how 10 d NignMI1 observational study: by taking the
IWNII'I'IIII11'()r
'101 oxporlm 'nlal th.'lli liN lIlid illl'tlI'P()I'lllillblIlN mnny of (hem us possible.
4J 111111111/1,1 WIIV !lu 1 XII 1'1111 III t 42

Finally, we discuss how (0 plnn the snmple sil',, ne xlcd ill lIl1t'lIP '!'illl '111111 01 lIlt 'Ol1l'Olilldl1l vurlnhl 'S hills the estimate 01' the causal relationship between
111'UNlIr'd cxpluuutory ruul r 'sponse variables, sometimes even reversing the appar-
vational study.
snt effect or 011(.: Oil t he ot her. For example, observational studies have indicated
that breast-Jed babies have lower weight at six and 12 months of age compared with
formula-ted infants (Interleaf 4). But an experimental study using randomization
found that mean infant weight was actually higher in breast-fed babies at six months
Why do experiments? of age and was not less than that in formula-fed babies at 12 months (Kramer et al.
2002). The observed relationship between breast feeding and infant growth was con-
In an experimental study, there must be at least two treatments and the cxp '1'11111 III founded by unmeasured variables such as the socioeconomic status of the parents.
(rather than nature) must assign them to units or subjects. The crucial advuuuu« With an experiment, random assignment of treatments to participants allows
experiments derives from the random assignment of treatments to units. I 11111111 researchers to tease apart the effects of the explanatory variable from those of con-
assignment, or randomization, minimizes the influence of confounding vllllllhi founding variables. With random assignment, no confounding variables will be
(Interleaf 4), allowing the experimenter to isolate the effects ofthe treatment V IIllIh' associated with treatment except by chance. For example, if women who choose to
breast-feed their babies have a different average socioeconomic background than
women who choose to feed their infants formula, randomly assigning the treatments
Confounding variables "breast feeding" and "formula feeding" to women in an experiment will break this
connection, roughly equalizing the backgrounds of the two treatment groups. In this
Studies in biology are usually carried out with the aim of deciding how an '11(11111 case, any resulting difference between groups in infant weight (beyond chance) must
atory variable or treatment affects a response variable. How are injury rat '/01 III I II be caused by treatment.
with "high-rise syndrome" affected by the number of stories fallen? What is till' lit
of marine reserves on fish biomass? How does the use of supplemental oxy 1~'11 lilt
the probability of surviving an ascent of Mount Everest? The easiest way to Hid" Experimental artifacts
these questions is with an observational study-that is, to gather measur II H'II I I
both variables of interest on a set of subjects and estimate the association hI'! ~I I Unfortunately, experiments themselves might inadvertently create artificial con-
them. If the two variables are correlated or associated, then one may be the '1111 I I ditions that distort cause and effect. Experiments should be designed to minimize
the other. artifacts.
The limitation of the observational approach is that, by itself, it cannot d i1011111'I1I I
between two completely different reasons behind an association between <111 l' (111111 An experimental artifact is a bias in a measurement produced by unintended
atory variable X and a response variable Y. One possibility is that X really doe: 1111 consequences of experimental procedures.
a response in Y. For example, taking supplemental oxygen might increase th ' ,'1111111
of survival during a climb of Mount Everest. The other possibility is that the 'x I Ii III
atory variable X has no effect at all on the response variable Y; they are ai'ii'i(H 1111 For example, experiments conducted on aquatic birds have shown that their heart
only because other variables affect both X and Yat the same time. For exarnph lit rates drop sharply when they are forcibly submerged in water, compared with indi-
use of supplemental oxygen might just be a benign indicator of a greater ov '1'11111'1 viduals remaining above water. The drop in heart rate has been interpreted as an
paredness ofthe climbers who use it, and greater preparedness rather than oxy '1111 oxygen-saving response. Later studies using improved technology showed that vol-
is the real cause of the enhanced survival. Variables (like preparedness) thu: til 1111 untary dives do not produce such a large drop in heart rate (Kanwisher et al. 1981).
the causal relationship between the measured variables of interest (oxygen liNt· III' This finding suggested that a component of the heart rate response in forced dives
survival) are called confounding variables. Recall from Interleaf 4, for example, lit was induced by the stress of being forcibly dunked underwater, rather than the
ice cream consumption and violent crime are correlated, but neither is (11' '1111 I I dive it elf. The experimental conditions introduced an artifact that for a while went
the other. Instead, increases in both ice cream consumption and cri me arc cuu I II I unrecognized.
higher temperatures. Temperature is a confounding variable in this cxampl '. To prevent artifacts, experimental studies should be conducted under conditions
that arc as natural as possible. A potential drawback is that more natural conditions
might introduce more sources of variation, reducing power and precision. Obser-
A confounding variable is a variable Ihlllllltlsks or distlll'ls til 'l,'IIUSlt!I'l'IlIlitlil
vutional ,'(Judi'14 'un provide lmportunt il1Hi~1t1into what is the best setting for an
ship between measured variablc« ill II sludy,
exp 'I'im '11(.
426 h'lll I 111 1>, I 11111 I Kill dill III t UUII til, 427

TABLE14.2·1 I{ I lilt' (lith clinical trial in Example 14.2


IIIIl!J Lessons from clinical trials (n I th numb r 01 ublects).

Nonoxynol·9 Placebo
The gold standard of experimental designs is the clinical trml, an cxporim 'III II 11111
in which two or more treatments are assigned to human participants, The It I 'II t Clinic n Number infected n Number infected
clinical trials has been refined because the cost of making a mistake with huuuru ]. I
Abidjan 78 0 84 5
ticipants is so high. Experiments on nonhuman subjects are simply called "1111101111111
experiments" or "field experiments," depending on where they take plac " II Bangkok 26 0 25 0
mental studies in all areas of biology have been greatly informed by procedure II Cotonou 100 12 103 10
in clinical trials. Durban 94 42 93 30
HatYai 2 22 0 25 0
HatYai 3 56 5 59 0
A clinical trial is an experimental study in which two or more treatments 1111
applied to human participants. Total 376 59 389 45

Before we dig into the logic of the main components of experimental desl '11, II I The most significant elements in the design of the clinical trial in Example 14.2
look at the clinical trial in Example 14.2, which incorporates many of these r'lIllll addressed these two objectives. To reduce bias, the experiment included the following
elements.
EXAMPLE Reducing HIVtransmission 1. A simultaneous control group: the study included both the treatment of interest
14.2 Transmission of the HIV·l virus via sex workers contrlb- and a control group (the women receiving the placebo).
utes to the rapid spread of AIDS in Africa. How can this 2. Randomization: treatments were randomly assigned to women at each clinic.
transmission be reduced? In laboratory experiments, the 3. Blinding: neither the participants nor the clinicians knew which women were
spermicide nonoxynol·9 had shown in vitro activity against assigned which treatment.
HIV·l, shown schematically at the right. This finding rnoti-
vated a clinical trial by van Damme et at. (2002), who tested To reduce the effects of sampling error, the experiment included these elements.
whether a vaginal gel containing the chemical would reduce 1. Replication: the study was carried out on multiple independent participants.
female sex workers' risk of acquiring the disease. Data were
2. Balance: the number of women was nearly equal in the two groups at every
gathered on a volunteer sample of 765 Hlv-free sex workers
clinic.
in six clinics in Asia and Africa. Two gel treatments were assigned randomly to w 111111 I
3. Blocking: participants were grouped according to the clinic they attended,
each clinic. One gel contained nonoxynol·9, and the other contained a placebo ( II III I
yielding multiple repetitions of the same experiment in different settings
tive compound that participants could not distinguish from the treatment of inter I). N I
(i.e., "blocks").
ther the participants nor the researchers making observations at the clinics knew Willi II I
received the treatment and who had received the placebo. (A system of number d IlIiI
kept track of who got which treatment.) By the end of the experiment, 59 of 376 WOIIII II II The goal of experimental design is to eliminate bias and to reduce sampling
the nonoxynol·9 group (15.9 %) were Hlv-positive (Table 14.2·1), compared with 111111111 error when estimating and testing the effects of one variable on another.
of 389 women in the placebo group (11.6 %). Thus, the odds of contracting HIV·' W I
slightly higher in the nonoxynol·9 group compared with the placebo group-whirl! w
the opposite of the expected result. The reason seems to be that repeated use of IWIlII V In Section 14.3, we discuss the virtues of the three main strategies used to reduce
nel-s causes tissue damage that leads to higher risk. bias-namely, simultaneous controls, randomization, and blinding. In Section 14.4,
w explain the strategies used to reduce the effects of sampling error-namely, repli-
Design components cation, balance, and blocking. As usual, we assume throughout that units or subjects
huvob n randomly snmpl 'd from th population of interest.
A good experiment is designed with two ohj 'ctiv 'S in mind:
• To reduce bill,~ ill '~linllltjnl und Il NIIIl 11\III III 'III ~,rr~\~'IH
• To I'l\dIlC~' Ilw (l'Ih'ls iii' nmplln '11111
428 ch pt r 111 II) I nllllg ~p rlriH III 11011
1/1, I How to r due bta 429

perturbed in the ,~llIll • wilyus the other subjects, except for the treatment itself,
How to reduce bias as fur as ethical considerations permit. The "sham operation," in which surgery
is carried out without the experimental treatment itself, is an example. Sham
We have seen how confounding variables in observational studies CLU1 h III III operations are very rare in human studies, but they are more common in animal
estimated effects of an explanatory variable on a response variable. The 1"nllll\III experiments.
experimental procedures are meant to eliminate bias. • In field experiments, applying a treatment of interest may physically disturb
the plots receiving it and the surrounding areas, perhaps by the researchers
trampling. Ideally, the same disturbance should be applied to the control plots.
Simultaneous control group
Often it is desirable to have more than one control group. For example, two
A control group is a group of subjects who are treated like all of the exp 1'11111 III
control groups, where one is a harmless placebo and the other is the best existing
subjects, except that the control group does not receive the treatment of interest, treatment, may be used in a study so that the total effect of the treatment and the
improvement of the new treatment over the old may both be measured. However,
A control group is a group of subjects who do not receive the treatment or using resources for multiple controls might reduce the power of the study to test its
interest but who otherwise experience similar conditions as the treated main hypotheses.
subjects.

Randomization
In an uncontrolled experiment, a group of subjects are treated in some way lUll 1 III
measured to see how they have responded. Lacking a control group for comprut III Once the treatments have been chosen, the researcher should randomize their assign-
such a study cannot determine whether the treatment of interest is the caus I II 1111 I
ment to units or subjects in the sample. Randomization means that treatments are
of the observed changes. There are several possible reasons for this, inclurllnp III assigned to units at random, such as by flipping a coin. Chance rather than conscious
following: or unconscious decision determines which units end up receiving the treatment of
interest and which receive the control. A completely randomized design is an
• Sick human participants selected for a medical treatment may tend to "hi 111111 experimental design in which treatments are assigned to all units by randomization.
back" toward their average condition regardless of any effect of the trel1(1I1I
III
(Interleaf 6).
Randomization is the random assignment of treatments to units in an experi-
• Stress and other impacts associated with administering the treatment (s~ldlll
mental study.
surgery or confinement) might themselves produce a response separate 1"111111
the effect of the treatment of interest.
• The health of human participants often improves after treatment merely The virtue of randomization is that it breaks the association between possible
because of their expectation that the treatment will have an effect. This I~h confounding variables and the explanatory variable, allowing the causal relation-
nomenon is known as the placebo effect (Interleaf 6). ship between the explanatory and response variables to be assessed. Randomization
The solution to all of these problems is to include a control group of slIitl' I I
doesn't eliminate the variation contributed by confounding variables, only their cor-
measured for comparison. The treatment and control subjects should be test(')dMilllill relation with treatment. It ensures that variation from confounding variables is spread
taneously or in random order, to ensure that any temporal changes in experluu III II more evenly between the different treatment groups, and so it creates no bias. If
conditions do not affect the outcome. randomization is done properly, any remaining influence of confounding variables
The appropriate control group will depend on the circumstance. Here ill't 10111 occurs by chance alone, which statistical methods can account for.
examples: Randomization should be carried out using a random process. The following steps
describe one way to assign treatments randomly:
• In clinical trials, either a placebo or the currently accepted treatment Nhollill I. Lis( ull n subjects, one per row, in a computer spreadsheet.
be provided, such as in Example 14.2. A placebo is an inactive tr 'U[1l1 '111111111
2. Usc the ximpu: 'I' to give ach individual a random number.I
subjects cannot distinguish from tho I11l1in rrearmcnt 01"in! 'l\lSI.
:1. I\sHigll 11'tlllllllt'lll 1\ 10 [hos Hllh.ll t'ls roc living the lowest numbers and treat-
• In experiments requirin intruxlv IIWlhodH 10 lIdlllllllfol!~lr Ir~'111111 Ill, MII'1l II
Ilwlll B 10 Ih(l)~twllh lilt hi Ilt'H[ IHllllht'I'H,
inj ctions, MUrp,(ll'y, r~',~II'IIIIlI, Hliotlld II
Ill' ('1l1l111IIlIllIl, Ih' '1l1I11'()IIHlh.l(\~·IM

'1111 I 11111111111111'WI "Mill III '11I111/1I111l111111111~111''IIIIIIII. wllllll~ll III1II1IN


t 111111
1'1 'I lIuw II It Ulltl 1111 lnllunn I III IIl1plll1!1 rrer 431

urc liml: d 10 Ihol< wllhuu! hlinding (Ernst and White 1998). Studies implementing
Exp rim nt l unlt

Random number 11 18
DIIO[ ll~
87 55 76 70 90 4
hiinding have not round Ihlll ucupuncture has an ameliorating effect on back pain.
In a Single-blind experiment, participants are unaware of the treatment they
Treatment A A B A B B B A have been assigned. This requires that the treatments be indistinguishable to the par-
ticipants, a particular necessity in experiments involving humans. Single-blinding
FIGURE 14.3-1 A procedure for randomization. Eachof eight subjects was assign d I prevents participants from responding differently according to their knowledge of
number between 0 and 99 that was drawn at random by a computer. Treatment A (co lUll II
their treatment. This is not much of a concern in nonhuman studies.
red) was assigned to the four subjects with the lowest random numbers, whereas tr 11111111
B (go/d) was assigned to the rest.
In a double-blind experiment, the researchers administering the treatments and
measuring the response are also unaware of which subjects are receiving which treat-
This process is demonstrated in Figure 14.3-1, where eight subjects ar liS I~IIII ments. This prevents researchers who are interacting with the subjects from behaving
to two treatments, A and B. differently toward them according to their treatments. Researchers sometimes have
Other ways of assigning treatments to subjects are almost always inferior, h'l III pet hypotheses, and they might treat experimental subjects in different ways depend-
they do not eliminate the effects of confounding variables. For example, the 1,111.. ing on their hopes for the outcome. Moreover, many response variables are difficult
ing methods can lead to problems: to measure and require some subjective interpretation, which makes the results prone
to a bias in favor of the researchers' wishes and expectations. Finally, researchers are
• Assign treatment A to all patients attending one clinic and treatment Bill
naturally more interested in the treated subjects than the control subjects, and this
patients attending a second clinic. (Problem: All of the other differenc H
increased attention can itself result in improved response. Reviews of medical studies
between the two clinics become confounding variables. If one clinic is I '1111
have revealed that studies carried out without double-blinding exaggerated treatment
than the other in general, then the difference in clinic quality would show I'l' I
effects by 16% on average, compared with studies carried out with double-blinding
a difference in treatments.)
(Juni et al. 2001).
• Assign treatments to human participants alphabetically. (Problem: This
might inadvertently group individuals having the same nationality, gcn I 11111
unwanted differences between treatments in health histories and gen til' Blinding is the process of concealing information from participants (sometimes
variables.) including researchers) about which individuals receive which treatment.

It is important to use a computer random-number generator or random-ntuuh


tables to assign individuals randomly to treatments. "Haphazard" assignrn 'III II Experiments on nonhuman subjects are also prone to bias from lack of blinding.
which the researcher chooses a treatment while trying to make it random, ht1~11'1" II Bebarta et al. (2003) reviewed 290 two-treatment experiments carried out on animals
edly been shown to be non-random and prone to bias.? or on cell lines. They found that the odds of detecting a positive effect of treatment
were more than threefold higher in studies without blinding than in studies with
blinding.' Blinding can be incorporated into experiments on nonhuman subjects by
Blinding using coded tags that identify the subject to a "blind" observer without revealing the
treatment (and then the observer measures units from different treatments in random
The process of concealing information from participants and researchers about With I
of them receive which treatments is called blinding. Blinding prevents particlpmu order).
and researchers from changing their behavior, consciously or unconsciously, 1111 ,
on their knowledge of which treatment they were receiving or adrninist rill II I
example, a researcher who believes that acupuncture helps alleviate back pain 1111 ,[11
unconsciously interpret a patient's report of pain differently if the researcher klill How to reduce the influence of sampling error
the patient was assigned the acupuncture treatment instead of a placebo. ThiH1111 hi
explain why studies that have shown acupuncture has a significant effect 01' buck 1',1111 Assuming we have designed our experiment to minimize sources of bias, there is
I{liII th problem or detecting any treatment effects against the background of varia-
I ion b 'I we 'n individuuls ("noise") CUllS d by other variables. Such variability creates
2. What do you do if, by chance, the first four of cight unlt« ''''1.l1I11nssigncd It' uuucn: 1\ uml Ilw IIiNI 111111
are assigned treatmentB, yielding the urrungcmcnt 1\1\1\1\11H1lB'I SOllie 11iolol iHIN,Ill III 1'111111111111 I
again to ensure the interspersion or
II' 111mnIN.I'lIlllll\ll~ nlll Hid '11y I" 1lllI1l1ll'. 1I'IIw IIt'HI 1'111111It111 1, ThiN tl'~11111'1111111111 IIVI'"'NIIIIIIIIl'H 1111','111'1INIIi' II liwl, III' illltldlll~, hll~IIItNC 111 cxp rim 'nls without
are different somehow lrom tho Ilisl 1'11111"
11\)111'11'1(1111111'1111111
I Ill' II 1hlm'klll!!
111, (SwllIlIl 111,11)~llllllhlll It IIIIdl 1111IIIHII 1 1It1I·tlIIIIIIIVI'I'lllthlllllllllt 1'101111 IIIH. HIII'it liN 1IIIII'k 1III'IIIl~II1I11IZ~ltillil (lil.lhlll'lll .'1
.onsidcr d tiy,
tI~ II I' '11ll' III IOllll
432 Chept r 111 0 I nln xp rtm III Ilhlll I1111 IlI,w Iu I dllc Ihl InrtuII (01 mplln rror 433

sampling error in the estimates, rcdu .ing precision and pow 'I'. l low '1\I111w'111'11
sampling error be minimized? Two pots ~
One way to reduce noise is to make the experimental conditions constant. 1'1 II
temperature, humidity, and other environmental conditions, for example, Illd II
U ~

I
only participants who are of the same age, sex, genotype, and so on, In fl 'Id I I' Chamber 1 Chamber 2

I I~~~~I
iments, however, highly constant experimental conditions might not be 1'( I II I
Constant conditions might not be desirable, either. By limiting the conditlon Itl
experiment, we also limit the generality ofthe results-that is, the conclusioux 1111 I Two chambers
II
apply only under the conditions tested and not more broadly. Until recently, II '1111 ~~~~
cant source of bias in medical practice stemmed from the fact that many clinl 'Idlo
of the effects of medical treatments were carried out only on men, yet the II' '1111111111
were subsequently applied to women as well (e.g., McCarthy 1994).
In this section, we review replication, balance, and blocking, the three nuiiu I Eight replicates

tistical design procedures used to minimize the effects of sampling error. WI ,II ~~~~~~~~
review a strategy to reduce the effect of noise by using extreme treatments. FIGURE 14.4-1 Three experimental designs used to compare plant growth under two ferti-
lizer treatments (indicated by the shading of the pots). The upper ("two pots") and middle
("two chambers") designs are unreplicated.
Replication
not the plant, is the experimental unit in a test of fertilizer effects. Because there are
Because of variability, replication-the repetition of every treatment on 1IIIIIIq I
only two chambers, one per treatment, the experiment is unrepl.icated. .
experimental units-is essential. Without replication, we would not know will lit
Only the third design (the bottom panel) in Figure 14.4-11S properly rephca~ed,
response differences were due to the treatments or just chance differences I 'I
because here treatments have been randomly assigned to individual plants. A grve-
the treatments caused by other factors. Studies that use more units (i.e., Iluu Ii I
away indicator of replication in the third design is interspersion of exper~me~tal
larger sample sizes) will have smaller standard errors and a higher probability III I I
units assigned different treatments, which is an expected outcome of randomization,
ting the correct answer from a hypothesis test. Larger samples give more inforuuurnn
Such interspersion is lacking in the two-chamber design (the middle panel in Fig-
and more information gives better estimates and more powerful tests.
ure 14.4-1), which is a clear sign of a replication problem.
An experimental unit might be a single animal or plant if individuals are randomly
Replication is the applicationof everytreatmentto multiple,independenteXl'll1 sampled and assigned treatments independently. Or, an experimental unit might be
mental units. made up of a batch of individual organisms treated as a group, such as ~ ~eld plot
containing multiple individuals, a cage of animals, a household, a petn dish, or a
family. Multiple individual organisms belonging to the same unit (e.g., plants in the
Replication is not just about the number of plants or animals used. Tru r I III lilt I same plot, bacteria in the same dish, members of the same family, and so o~) ~hould
depends on the number of independent units in the experiment. An "exp 'I'illilIII I be considered together as a single replicate if they are likely to be more similar on
unit" is the independent unit to which treatments are assigned. Figure] 4.4-1 Itll average to each other than to individuals in separate units (apart from the effects of
three hypothetical experiments designed to compare the effects of two f 'I t II I
treatment).
treatments on plant growth. The lack of replication is obvious in the first d 'si '1111111 Correctly identifying replicates in an experiment is crucial to planning its design
panel), because there is only one plant per treatment. You won't see many puhl] It II and analyzing the results. Erroneously treating the single organism as t?e indepe~-
experiments like it. dent replicate when the chamber (Figure 14.4-1) or field plot is the expenmental ~n:t
The lack of replication is less obvious in the second design (the rnicldl' PIIIII II will lead to calculations of standard errors and P-values that are too small. ThIS IS
Figure 14.4-1). Although there are multiple plants per treatment in th H cond (II I II
pscudoreplication, as discussed in Interleaf 2. . . .
all plants in one treatment are confined to one chamber and all plants in the I 111111 h'ol11 111 standpoint of r ducin sampling error, more replication ISalways better.
treatment are confined to another charnb 'I', I r (11 rc til" 'I1virOI1I11 '11(111 til IT 'I IIII
As proof, examine the Iorrnulu for th standard error of the difference between two
between chambers (e.g., diffcrcnc 'S in Ii 'hI 'on Iltion« or humidity) h 'yolld 11111 I
SHlllpl ' 111'111 r 'SPOIIS\H(0 IWO II' '111111 '1IIH, Y, Y2 :
stemming from the tr atrncnt its 'Ir, 111'11 plllillS ill the sume {'hllllih 'I' will III 111111
sirnilnr in rhetr 1'l'SPOIlS('N Ilulil 111111111'1
('I'I'\Il'IH, '1'111 pllllil IIIl!rl' 111111'
III d 111'11'111 t'ltllllllwl'H, IIPIIII Ihllllllily
I'!rlllllil '" 111'11111 iillit'IH'lItil III
11\ 111111111
1111 1111,1111' 1'1111111 .. I ,'II I, I J,1'i.e:, I ,;.).
434 Ch pt r 14 0 I nln Xp( rlrlH III IIlhlll I I I IIl1wto redu Ih lntlu II 01' mpllnl:!' rror 435

The symbols nl and n2 refer to the number of experimental units, 01' I"pll III shur common I' 'tilIII''N, Within blocks, treatments are assigned randomly to exper-
in each of the two treatments. Based on this equation, increasin III und 1/) d IIIII imental units, Blo 'kill' 'ss mtially repeats the same completely randomized experi-
reduces the standard error, increasing precision. Increased precision yi Idfl II IIII I menl rnultipl lim 's, one' for each block, as shown schematically in Figure 14.4-2.
confidence intervals and more powerful tests of the difference between m linN, ( '" II Differences between treatments are evaluated only within blocks. In this way, much
other hand, increasing sample size also has costs in terms of time, mon 'y, 1111.1 , of the variation arising hom differences between blocks is accounted for and won't
lives. We discuss how to plan a sufficient sample size in more detail in Sc '(ioll I I reduce the power of the study.

Chamber 1 Chamber 2

~t~~ ~~~ir
Balance
A study design is balanced if all treatments have the same sample size. OI1WI
design is unbalanced if there are unequal sample sizes between treatments, I I I I
FIGURE14.4-2 An experimental design incorporating blocking to test effects of fertilizer
In a balanced experimental design, all treatments have equal sample size, on plant growth (see Figure 14.4-1). Shading of the pots indicates which fertilizertreatment
each plant received. Chambers might differ in unknown ways and add unwanted noise to the
experiment. To remove the effects of such variation, carry out the same completely random-
Balance is a second way to reduce the influence of sampling error on 0101111111111 ized experiment separately within each chamber. In this design, each chamber represents
one block.
and hypothesis testing. To appreciate this, look again at the equation for th ' NI 1111III
error of the difference between two treatment means (given on page 433). Por n II
total number of experimental units, ni + n2, the standard error is smallest will II II The women participating in the nonoxynol-9 HIV study discussed in Example 14.2
quantity were grouped according to the clinic they attended. This made sense because there

(~I+ ~J
were age differences between women attending different clinics as well as differences
in condom usage and sexual practices, all of which are likely to affect HIV trans-
mission rates (van Damme et al. 2002). Blocking removes the variation in response
is smallest, which occurs when ni and n: are equal. Convince yourself thlll llil among clinics, allowing more precise estimates and more powerful tests of the treat-
true by plugging in some numbers. For example, if the total number of unll ment effects.
the quantity I/ni + I/n2 is 0.2 when ni = n: = 10, but it is 1.05 when nl PI II
n2 = 1. With better balance, the standard error is much smaller.
To estimate the difference between two groups, we need precise est 11111111 Blocking is the grouping of experimental units that have similar properties.
the means of both groups. With an unbalanced design, we may know the III 1111 Within each block, treatments are randomly assigned to experimental units.
one group with great precision, but this does not help us much if we have V'I 11111
information about the other group that we're comparing it with. Balance allo« III II
sampling effort in the optimal way. The paired design for two treatments (Chapter 12) is an example of blocking. In a
Nevertheless, the precision of an estimate of a difference between groups 11 I paired design, both of two treatments are applied to each plot or other experimental
increases with larger sample sizes, even if the sample size is increased in only 11111 unit representing a block. The difference between the two responses made on each
two groups. But for a fixed total number of subjects, the optimal allocation is III It I block is the measure of the treatment effect.
an equal number in each group. The randomized block design is analogous to the paired design, but it can have
Balance has other benefits, which we discuss elsewhere in the book. For 11111'1 I
more than two treatments, as shown in Example 14.4A.
the methods based on the normal distribution for comparing population 111(' III I
most robust to departures from the assumption of equal variances wh n d 'MI~II I Th randomized block design is like a paired design but for more than two
balanced or nearly so (see Chapters 12 and 15). treutm nrs.

Blocking
Blockl"" iN 1111 'xp'l'im'llllll <i1'HI II 111111 Ilsld III lIi'('1l1iII I 1'111 'xl I'll iii 1111 \'111
111111 h dlvhllu lit' ')lP'1 III Iilld 111111 111111 IIIIIP, 1111d hlm'k III 1111111, III
I Ihili 1/1,/, !lIIW In I (hili till I"'hlt 11 I (II mpllng rror 437

EXAMPLE Holey waters l'0J1111 ntul or nth 'I' diflercn . 'So »'()rexample, blocks can be made up of any of these
14.4A The compact size of water-filled tr hoi s, which can h rbor dlv r
units:
aquatic insect larvae, makes them useful microcosms for ecological xp rim III • Field plots experiencing similar local environmental conditions
vastava and lawton (1998) made artificial tree holes from plastic that mimi k 11111 • Animals from the same litter
tress tree holes of European beech trees • Aquaria located on the same side of the room
(see image on right). They placed the plas- • Patients attending the same clinic
tic holes next to trees in a forest in south-
• Runs of an experiment executed on the same day
ern England to examine how the amount
of decaying leaf litter present in the holes One potential drawback to blocking might occur if the effects of one treatment
affected the number of insect eggs depos- contaminate the effects of the other in the same block. For example, watering one half
ited (mainly by mosquitoes and hover flies) of a block might raise the soil humidity of the adjacent, unwatered half. Experiments
and the survival of the larvae emerging from should be designed carefully to minimize contamination.
those eggs. leaf litter is the source of all
nutrients in these holes, so increasing the
Extreme treatments
amount of litter might result in more food for
the insect larvae. There were three different Treatment effects are easiest to detect when they are large. Small differences between
treatments. In one treatment (Ll), a low amount of leaf litter was provided. In ,. 11111 treatments are difficult to detect and require larger samples, whereas larger treatment
treatment (HH), a high level of debris was provided. In the third treatment (LH), II III differences are more likely to stand out against random variability within treatments.
ter amounts were initially low but were then made high after eggs had been d pli II Therefore, one strategy to enhance the probability of detecting differences in an
A randomized block design was used in which artificial tree holes were laid out 111 Illlil experiment is to include extreme treatments. Example 14.4B shows why this might
(blocks). Each block consisted of one II tree hole, one HH tree hole, and one lH 111 11111 be.
The location of each treatment within a block was randomized, as shown in Figur 111'I
XAMPLE Plastic hormones
IL4B Bisphenol-A, or BPA, is an estro-
genic compound found in plastics
Block 2
widely used to line food and drink
containers and in dental sealants.
Human daily exposures are typi-
Block4 Block 5 cally in the range of 0.5-1 fLgjkg
body weight (Gray et al. 2004).
FIGURE 14.4-3 Schematic of the randomized block design used in the tree-hole study III Sakaue et al. (2001) measured
Example 14.4A. Eachblock of three tree holes was placed next to its own beech tr 111111
sperm production of 13-week-
woods. Within blocks, the three treatments were randomly assigned to tree holes.
old male rats exposed to fixed
daily doses of BPA between 0 and
As in the paired design, treatment effects in a randomized block design til III 2000 fLgjkg body weight for six days. The results are shown in a dose-response curve in
sured by differences between treatments exclusively within blocks, a sll'lll ' ,\' III Figure 14.4-4.
minimizes the influence of variation among blocks.
In the randomized block design, each treatment is applied once to ev ry I 1111" I This experiment included doses much higher than the typical doses faced by
accounting for some sources of sampling variation, such as the variation 11111011 ' II humans at risk, a strategy that enhanced the ability to detect an effect of BPA. For
blocking can make differences between treatments stand out. In haptcr I H, \ I til example, Pi rurc 14.4-4 shows that th re was a much larger difference in mean sperm
cuss in greater detail how to analyze data from a randomized block d 'si in. productiOI1 bctw '11 til 0 lind 2000 jJ. , I kg groups than between the 0 and 0.002 f.Lg/kg
Blocking is worthwhile if units within blocks are relativ Iy homo sneou ,1111 II''utments. I r 111' cxp 'IojIII nil'!' w 'I" to d 'sign U HIudy to compare just one of these
from treatment effects, and units belonging 1.0 diff r 111 bin 'kH vary b l'IIlINi'(II II dOH '101 with Ih 'l'OIlII'OI,IISIIl ()O or O()() fJ. '/1 i would yi 'itl the most power, because
IIl'y, how Ilw 1111 I Ii 11'1 11('1'111 Njli'I'lllllllUIIlI'lltlllll'tllllllw
I' ~·I)1l11'01.
438 hupl r 1I~ 0 I nln I X.PI liIlIl III 1111111 III.' 1)11' 11111 III wllh 1111111 til "UIII Ilclor 4 9

6 The I'll 'Ioriul d ~HI!II Is Ill' most


'0111!110n experimental design used to inves-
r::- (lgutc 11101"thun nil' II'.uun 'nl variable, or factor, at the same time. In a factorial
0

~
5
t f t design, every combination or treatments from two (or more) treatment variables is


inv stigated.
FIGURE 14.4-4
e
....v
0
4 +
A dose-response curve showing the
results of an experiment measuring the
:::I
"'C
0 3
+ • An experiment having afactorial design investigates all treatment combina-
Q.
rates of sperm production of male rats tions of two or more variables. A factorial design can measure interactions
E
exposed to fixed daily doses of bisphe- ; 2 between treatment variables.
0..
nol-A (BPA)(Sakaue et al. 2001). Symbols III

are the mean ±1 SE. ~


.;
c
The main purpose of a factorial design is to evaluate possible interactions between
0 I
0 0.002 0.02 0.2 2 20 ~IH) variables. An interaction between two explanatory variables means that the
Daily BPAdose (,_"g/kg body weight) effect of one variable on the response depends on the state of a second variable.
Example 14.5 illustrates an interaction in a factorial design.
A larger dose, or stronger treatment, can increase the probability of d 'j( 11111
response. But be aware that the effects of a treatment do not always SCUll'11111 I I XAMPLE Lethal combination
with the magnitude of a treatment. The effects of a large dose may be q~l~dllllll 14.5 Frog populations are declining everywhere,
different from those of a smaller, more realistic dose. Still, as a first st P, III I spawning research to identify the causes.
treatments can be a very good way to detect whether one variable has any f'lh I III Relyea (2003) looked at how a moderate dose
on another variable. (1.6 mg/I) of a commonly used pesticide, car-
baryl (Sevin), affected bullfrog tadpole sur-
vival. In particular, the experiment asked how
the effect of carbaryl depended on whether a
Experiments with more than one factor native predator, the red-spotted newt, was also present. The newt was caged and could
cause no direct harm, but it emitted visual and chemical cues that are known to affect
Up to now, we have considered only experiments that focus on measuring lIllIlll tadpoles. The experiment was carried out in t O-liter tubs, each containing 10 tadpoles.
the effects of a single factor. A factor is a single treatment variable Wh08 '11'( ~ I The four combinations of pesticide treatment (carbaryl vs. water only) and predator treat-
of interest to the researcher. However, many experiments in biology invcslillll~ III ment (present or absent) were randomly assigned to tubs. For each combination of treat-
than one factor, because answering two questions from a single experiment IIIII ments, there were four replicate tubs. The effects on tadpole survival are displayed in
than just one makes more efficient use of time, supplies, and other costs. Figure 14.5-1.
Another reason to consider experiments with multiple factors is thai till' I " I
might interact. When operating together, the factors might have synergisti ' (!I IIIIt 1.0 ••• OOQ aoo
itory effects not seen when each factor is tested alone. For example, hUI1H111 I" II • 0
O.S
has driven global increases in atmospheric CO2 and temperature, as w II I 'II II
11111 H lion between the effects of the pesticide jij 0.6
nitrogen deposition and precipitation. Increases in all of these factors h IV II .s:>
I" II yl) and predator (red-spotted newt) treat-
shown to stimulate plant growth by experimental studies in which each lit ~IIII 'I III Oil tadpole survival. Eachpoint gives the 5 0.4
III
variable was examined separately. But what are the effects of these factors III I til It 1111111 of tadpoles in a tub that survived. Lines
0.2
bination? The only way to answer this is to design experiments in which IIHIII II 011111 I I In an survival in the two pesticide treat-
• • o Predator absent
one factor is manipulated simultaneously. If the climate variables interact wlu-u 1,,11 III I 'I(parately for each predator treatment. 0 • • Predator present
encing plant growth, then their joint effects can be very different from IIK'II Water Carbaryl
effects (Shaw et al. 2002). Pesticidetreatment

Afetc/or is a sing! II' 111111('111 vnriubh- wluu( 111,,('1 '"' 01' 1,,1"ll'Nllo IIIl' The tub, not II1\' individual tndpol " is the experimental unit, because tadpoles
rl~H('tm'Ill'I', slulrillg rhc 10111111' luh 111'(' 1101IlId('p'IHll'lIl, 'I'll ' n'siliis show 'd that survival was high,
('Xl'i'pl when III HII '1111' WII IIpplll'lI III l't 111'1 wi Ih Ihi 1I1'lllilllOl" Iwilh 'I' t I'( III rn 'nl
441

alone had much '1'I"ct (Jllgur' I i..!i I), Thus, the two tr 'UillWllts, PI' xlutlun nu 11" 'I'll' 67 lnl' l'l ,tt 'IIN(, W'I' 'II -h pull' °d with u control individual matched for age,
S 'X, hospi!ul utimisNlon 1111',nnd admission department.
cide, seem to have intcructcd-e-thul is, til' .Ilcct or on' vuriuhlc d 'P 'lids Oil till I
of the other variable. An experiment investigating the cffcctx or th p 'sti 'Idl 11111 Matching 1" III "H bins by limiting the contribution of suspected confounding
would have measured little effect at this dose. Similarly, an eXIcrim I1tinv 'HII'1111 variables [0 differences between treatments. Unlike randomization in an experiment, I

the effect of the predator only would not have seen an effect on survival. matching in an obs rvational study does not account for all confounding variables, I
only those explicitly used to match participants. Thus, while matching reduces bias, I
it does not eliminate bias. Matching also reduces sampling error by grouping exper-
An interaction between two (or more) explanatory variables means that till imental units into similar pairs, analogous to blocking in experimental studies. It is
I
effect of one variable depends upon the state of the other variable. with such a matched case-control design that the link between smoking and lung can-
cer was convincingly demonstrated.
A factorial design can still be worthwhile even if there is no interaction III I •
explanatory variables. In this case, there are efficiency advantages because till With matching, every individual in the treatment group is paired with a control
experimental units can be used to measure the effect of two (or more) variables individual having the same or closely similar values for the suspected con-
taneously. founding variables.

In a weaker version of this approach, a comparison group is chosen that has a


frequency distribution of measurements for each confounding variable that is sim-
What if you can't do experiments? ilar to that of the treatment group, but no pairing takes place. For example, atten-
tion deficit/hyperactivity disorder (ADHD) is often treated with stimulants, such as
Experimental studies are not always feasible, in which case we must 1'111110 I amphetamines. Biederman et al. (2009) carried out an observational study to exam-
upon observational studies. Observational studies can be very important, 11'1 III ine the psychiatric impacts later in life of stimulant treatment. A sample of ADHD
they detect patterns and can help generate hypotheses. The best obs 'I'VIIII,," youths who had been treated with stimulants was compared with a control sample of
studies incorporate all of the features of experimental design used to minimln 101 untreated ADHD individuals that was similar to the treated group in the distribution
(e.g., simultaneous controls and blinding) and the impact of samplinu ! iI. of ages, sex, ethnic background, sensorimotor function, other psychiatric conditions,
(e.g., replication, balance, blocking, and even extreme treatments), except 1'01 "" and IQ.
randomization. Randomization is out of the question because, in an obs rVIIII,," The second strategy used to limit the effects of confounding variables in a
study, the researcher does not assign treatments to subjects. Instead, the HIIIIII• I controlled observational study is adjustment, in which statistical methods such as
come as they are. analysis of covariance (Chapter 18) are used to correct for differences between treat-
ment and control groups in suspected confounding variables. For example, LaCroix
et al. (1996) compared the incidence of cardiovascular disease between two groups of
Match and adjust
older adults: those who walked more than four hours per week and those who walked
Without randomization, minimizing bias resulting from confounding variable less than one hour per week. The ages of the adults were not identical in the two
greatest challenge of observational studies. Two types of strategies are us d lil 1111' groups, and this could affect the results. To compensate, the authors examined the
the effects of confounding variables on a difference between treatments ill I I III relationship between cardiovascular disease and age within each exercise group, so
trolled observational study. One strategy, commonly used in epidemiological Hllllih that they could compare the predicted disease rates in the two groups for adults of the
is matching. With matching, every individual in the target group with a diN(' I I I same age. This approach is discussed in more detail in Chapter 18.
other health condition is paired with a corresponding healthy individual who 1111 III
same measurements for known confounding variables, such as age, weight, H' , III
ethnic background (Bland and Altman 1994).
Matching is often used when designing case-control studies. Recall troru ( 'II II
ter 9 that in a case-control study, exposure to one or more possible causal I'll '!III I
Choosing a sample size
compared between a sample of individuals having a disease (the CHS(;S) lind II t II III I
1\ I 'y pur! or plunnlng un experiment or observational study is to decide how many
sample of participants not having the disease (the controls). Matching '111'1111''S lit Illh
indcp lid '1I1uIlItN 01' PIII'tiviplilitS 10 Include. 'I'll 'I" is 11() point in conducting a study
cases and controls are oth rwise similar. For xnmplc, )))'.i 'kiln ~t III. (2()()() lnvr II
whost ,~lIIllpll" 11',( 1M Ion 111111111)dl'I('l'lilli' (''11)( ('llI([ 1I'('lllllIl'nl '1'1'1<':1. Hqunlly,!h 1"
gated possible causes or u hcsplrul outbreak or IIlllihioli' 1'~'HiMIIIIlI 8(1/(1/11'/111'1111"
443

is no point in mukln Ill) 'NlillIlll il' 111' .ontldcnc int 'I'VllII'OI'IIll' II' 1I11l1~1I11'1I I This i'o('tl1ulu11'1dill IV -\I 11'\)111 (11' 2SB rule of thumb that was introduced in Sec-
expected to be extremely broad h cuusc or IINIWdlsample Nlz,'. Usin lou 1111111\ I lion 4.3:1 According I() IhlN formula, a larger sample size is needed if (J", the standard
ticipants is also undesirable, because sach replicate costs tim' und mon 'y,ullt! IIld deviation within 'TOUpS, is lurgc than if it is small. Additionally, a larger sample size
one more might put yet another individual in harm's way, dcp mding nn 1111 IIII is needed to achieve a high precision (a narrow confidence interval) than to achieve a
If the treatment is unsafe, as the spermicide nonoxynol-9 appears to be (I! 111111 low precision.
14.2), then we want to injure as few people or animals as possible in cornln 11'11t A major challenge in planning sample size is that key factors, like (J", are not
conclusion. Ethics boards and animal-care committees require research I'S1(1 III I I known. Typically, a researcher makes an educated guess for these unknown param-
the sample sizes for proposed experiments. How is the decision made? 11'1' - ill eters based on pilot studies or previous investigations. (If no information is avail-
tion 14.7, we answer this question for two objectives: when the goal is to II '!tl, able, then consider carrying out a small pilot study first, before attempting a large
predetermined level of precision of an estimate of treatment effect, or wh n I' I experiment.)
to achieve predetermined power in a test of the null hypothesis of no tr atrn 1111 I" For example, let's plan an experiment to measure the effect of diet on the eye span
We focus here on techniques for studies that compare the means of two group I of male stalk-eyed flies (Example 11.2). The planned experiment will randomly place
mulas to help plan experiments for some other kinds of data are given in (11' 11111 individual fly larvae into cups containing either corn or spinach. The target param-
Formula Summary (Section 14.9). eter is the difference between mean eye spans in the two diet treatments, J.L1 - J.L2'
Assume that we would like to obtain a 95% confidence interval for this difference
An important part of planning an experiment or observational study is cho« whose expected margin of error is 0.1 mm (i.e., the desired full width of the confi-
ing a sample size that will give sufficient power or precision. dence interval is 0.2 mm). How many male flies should be used in each treatment to
achieve this goal?
Our sample estimate for (J" was about 0.4, based on the sample of nine individuals
in Example 11.2. Using these values gives
Plan for precision
n = 8( (J"
margin of error
)2 = 8(0.4)2
0.1
128.
A frequent goal of studies in biology is to estimate the magnitude of the brl' 111111 I
effect as precisely as possible. Planning for precision involves choosing I HIIIII!,1 This is the sample size in each treatment, so the total number of male flies would be
size that yields a confidence interval of expected width. Typically,we hope to ,I II 256. At this point, we would need to decide whether this sample size is feasible in an
bounds as narrowly as we can afford. experiment. If not, then there might be no point in carrying out the experiment. Alter-
By way of example, let's develop a plan for a two-treatment comparison or 1111 lilt natively, we could revisit the desired width of the 95% confidence interval. That is,
Let the unknown population mean of the response variable be ILl in the tre 111111 II could we be satisfied with a higher margin of error? If so, then we should decide on
group of interest and J.L2 in the control group. When the results are in, we wiIII IIltt this new width and then recalculate n.
pute the sample means Yl and Y2 and use them to calculate a 95% confidenc lttll I I After all this planning, imagine that jhe experiment is run and we now have our
for J.L1 - IL2, the difference between the population means of the treatment and ~'IIIIIII data. Will the confidence interval we calculate have the precision we planned for?
groups. To simplify matters somewhat, we will assume that the sample siz k III bill I There are two reasons that it probably won't. First, 0.4 was just an educated guess
treatments are the same number, n. Let's also assume that the measurement in lit I for the value of (J" to help our planning, and it was based on only nine individuals.
populations is normally distributed and has the same standard deviation, (J". The true value of (J" in the population might be larger or smaller. Second, even if we
In this case, a 95% confidence interval for ILl - J.L2 will take the form were lucky and the true value of (J" really is close to 0.4, the within-treatment standard
deviation s from the experiment will not equal 0.4 because of sampling error. The
Yl - Y2 ± margin of error,
resulting confidence interval will be narrower or wider accordingly. The probability
where "margin of error" is half the width of the confidence interval. Planulnp II that the width of the resulting confidence interval is less than or equal to the desired
precision involves deciding in advance how much uncertainty we can tolcrat ,( 1111 width is only about 0.5. To increase the probability of obtaining a confidence interval
we've decided that, then the sample size needed in each group is approximat ily no wider than the desired interval width, we would need an even larger sample size.
Pigurc 14.7-1 shows the generaI relationship between the expected precision

n=8-- (J"
( mar in of' rror
)2 . or !Iw 9SIYo conlldcnc . illt crvul lind II, th . sample size in each of two groups. The

'I. '1'11" 1111I1j.iilllll'\IIIII'IN IIppl(l~lilllIll\ly Iwli'l I iii' Nllllldlll'tll\l(llIlIl'llw dll'l'cI'L\IICC hclw en HlIlIlfilc 111'ails
(,'H),1I1 I IIJ(;I, I .'iI 1,,'/11 MIIJ/II ,'IIIVIl 1111/1 IvrHlhl'lliI~IIIIIli'I"XL
(Lchr I<)1)2). Tid 1111 II II ilil II, IIIIWH thut th ' two populations are normally distributed
und have III SIIIII 'sllllldtll"li
I Ii svlution (T), which we are forced to assume is known. A
FIGURE 14.7-1 11101" 'Xtlcl lonnuln is provided in the Quick Formula Summary (Section 14.9), which
Expected precision of the 95% confidence interval for the also allows you to choos - other values for power and significance level.
difference between two treatment means depending on
1"01' a given power and significance level, a larger sample size is needed when the
sample size n in each treatment. The vertical axis is given
in standardized units, (margin of error)/a-. We calculated standard deviation cr within groups is large, or if the minimum difference that we
the expected confidence interval using the r-distribution, wish to detect is small.
rather than with the 2SEapproximation. Let's return to our experiment to test the effect of diet on the eye span of male
2 stalk-eyed flies. We would like to reject Ho at a = 0.05 with probability 0.80 if the
absolute value of the difference between means were truly D = IILI ~ IL21 = 0.2 mm.
How many males should be used in each treatment?
variable on the vertical axis is standardized as margin of error divided by II II Let's assume again that o: = 0.4. Using this value in the equation for power gives
effect of sample size from n = 2 to n = 20 is shown.
The graph shows that very small sample sizes lead to very wide interval '101111111 n = 16(°.4)2 = 64.
of the difference between treatment means. More data gives better precision NI I 0.2
also that interval precision initially declines rapidly with increasing sampl. I This is the number in each treatment, so the total number of males needed in the
(e.g., from n = 2 to n = 10), but it then declines more slowly (e.g., from /I I experiment would be 128.
to n = 20). Precision is 0.63 at n = 20, but it drops to 0.40 by n = 50, to () I These power calculations assume that we know the standard deviation (o-), which
n = 100, and to 0.20 by n = 200. Thus, we get diminishing returns by incu II II is stretching the truth. For this and other reasons, we must always view the results
the sample size past a certain point. of power calculations with a great deal of caution. The calculations provide useful
guidelines, but they do not give infallible answers.
We have explored only the sample sizes needed to compare the means of two
Plan for power groups, but similar methods are available for other kinds of statistical comparisons
Next we consider choosing a sample size based on a desired probability of r '.1 '11111 as well. Sample sizes for desired precision and power are available for one- and
false null hypothesis-that is, planning a sample based on a desired power. 111111 'III two-sample means, proportions, and odds ratios in the Quick Formula Summary
for example, that we want to test the following hypotheses on the effect of di (Section 14.9). A variety of computer programs are available to calculate sample
span in stalk-eyed flies. sizes when planning for power and precision. A good place to start investigating these
programs is http://www.divms.uiowa.edu/-rlenth/Power/.
Ho: ILl ~ IL2 = O.
HA: ILl ~ IL2 oj:. O.
Plan for data loss
The null hypothesis is that diet has no effect on mean eye span. The pow 'I' pi lhl
test is the probability of rejecting Ho if it is false. Planning for power involvcs IUIII The methods given here in Section 14.7 for planning sample sizes refer to sample
ing a sample size that would have a high probability of rejecting Ho if th 1111 111111 sizes still available at the end of the experiment. But some experimental individuals
value of the difference between the means, IILI ~ IL21, is at least as great as a SPI 1111 may die, leave the study, or be lost between the start and the end of the study. The
value D. The value for D won't be the true difference between the means; it is.lll I lit starting sample sizes should be made even larger to compensate.
minimum we care about. By specifying a value for D in a sample size calculatlun
are deciding that we aren't much interested in rejecting the null hypothesis or 11111111
ference if IILI - IL21is smaller than D.
A conventional power to aim for is 0.80. That is, if Ho is false, we aim (0 dl'lllllll Summary
strate that it is false in 80% of the experiments (the other 20% of experim '111101 wuu]
fail to reject Ho even though it is false). If we aim for a power of 0.80 and II 'Oil ,11 • In an experimental study, the researcher assigns treatments to subjects.
tional significance level of a = 0.05, then a quick approximation LO the plunmxl 1111 • Th - PlIl'pOS ' or tincxpcrim ntal study is to examine the causal relationship
pie size n in each of two groups is h itw cn un 'Xp11l1111101'Y vuriuhl" SLI .h liS trcutrn nt, and a response variable.
Till' virtue ol'\'xp 'l'inwlllH i,~111111111 ~'IT\l-1 of'II"u(111 nl can be isolated by
I

1"111U 11111d,.,1II 1111 I Ilh·tH 01'('0111'0111111111 VIlJllhll'H.


446 ction1/1,iP Quick FormulaSummary 447

• A confounding variable masks or distorts the causal relationship


explanatory variable and a response variable in a study.
h'tWl-I('1I1111
Quick Formula Summary
• A clinical trial is an experimental study involving human participants,
• Experiments should be designed to minimize bias and limit the effects nl Planning for precision
sampling error.
• Bias in experimental studies is reduced by the use of controls, by randlllltl/ill Planned sample size for a 95% confidence interval
the assignment of treatments to experimental units, and by blindi ng. of a proportion
• In a completely randomized experiment, treatments are assigned to CXIWIIIIIIII
tal units by randomization. Randomization reduces the bias caused by 'Oil What is it for? To set the sample size of a planned experiment to achieve approxi-
founding variables by making nonexperimental variables equal (on ~1V'1'lIf'l I mately a specified half-width ("margin of error") of a 95% confidence interval for a
proportion p.
between treatments.
• The effect of sampling error in experimental studies is reduced by rcpl !L'III "II ~hat does it assume? The population proportion P is not close to zero or one, and n
by blocking, and by balanced designs. ISlarge.
• A randomized block design is like a paired design but for more than t.WO
treatments. 4p(1 - p) .
Formula: n = ( . f )2' where P IS the proportion being estimated and
• The use of extreme treatments can increase the power of the experiment III margm 0 error
detect a treatment effect. "margin of error" is the half-width of the confidence interval for the proportion p.
• Observational studies should employ as many of the strategies of expcrlun-uul For the most conservative scenario, set P = 0.50 when calculating n. The symbol =
studies as possible to minimize bias and limit the effect of sampling error, stands for "is approximately equal to."
• Although randomization is not possible in observational studies, the cllc 'I~10'
confounding variables can be reduced by matching and by adjusting for dill, I
ences between treatments in known confounding variables.
Planned sample size for a 95% confidence interval
of a log-odds ratio
• A factorial design is used to investigate the interaction between two 01"II IIlIi
treatment variables. The factorial design includes all possible combinatlous III What is it for? To set the sample size n in each of two groups for a planned experi-
the treatment variables. ment to achieve approximately a specified half-width ("margin of error") of a 95%
• When planning an experiment, the number of experimental units to inchul confidence interval for a log-odds ratio, InCOR).
can be chosen so as to achieve the desired width of confidence interval 1'01 II"
What does it assume? Sample size n is the same in both groups.
difference between treatment means.
• Alternatively, the number of experimental units to include when planning 1111
experiment can be chosen so that the probability of rejecting a false T-I!)(I" I \ I , Formula:n =. 4
(margin of errorr'
(1 -
PI
+ ---
1 - PI
1 1
+ - + ---
P2
1)
1 - P2 '
where "mar in
g
is high for a specified magnitude of the difference between treatment mcun
• Compensate for possible data loss when planning sample sizes for an of error" is the half-width of the confidence interval for In(OR), and PI and P2 are the
experiment. probabilities of success in the two treatment groups.

Planned sample size for a 95% confidence interval


of the difference between two proportions
What Is It for? To Hel 111sample size 11. in each of two groups for a planned experi-
111int ro ucbi V' lLl'lwoxinwl 'ly It Np' 'if] d half-width ("marg.in of error") of a 95%
(.'onfitk'n 'C IIIINV!" j'0I' II dil'l'~'I"'IfL'tIWIW~\~'11 two proportions, PI - 1'2' This is an
1I11~'I'llIIllv lI[1pl'Olll'lr In I Ill' 1I111'IIIIIIIIH'N 1111) utlliH 1'lIlio 10cornpure I hI.) pruportion
oi'NIIL'L" H IlIlwIIl1 Iltlll III IIl1lp