Escolar Documentos
Profissional Documentos
Cultura Documentos
HvporHrsrs
HvporHrsts Trsrtruc
Errors in Hypothesis Testing
The Probability of a Type I Error
The Probability of a Type ll Error
Why We Don't Accept the Null Hypothesis
How To Do Scrrrucr
Summnnv
42
('lr.rPt1,p. Ilrr.t,r,
i:ffitffT:::^t:;;1"*tional
derin*ion;T::i""fi
:,.""j";.?llioo:"ilffi
researcher will measure
*:*i
*ott-Jni.iJi;;;
,iXt.,r-uer
list of
jt".,ru,,m
Togethet, these
i':r
HH:: jiffi;T""il"
manner. BT
",;;;itii;,:i
v,h
i51i,
An operatioy] definition, T: ;#,ffi:lru
of .or.re, has to
p a rti cip ants tha
t th ev wi I I u ",,i
ri
"
:il,i,#:tj,
*li,u*"
jiltiT,:;"ru"o
ke
v e s risa ti
* .r*r
tup
"
"
ers tan
th e d efini ti
:l :, ri:l,il,.'J
on
|,#fi
""
"
p
T,i; ;ffiTlt1trffi :* iig
" ".;?d; i; ",",y p i.
;l
+ ll game
operationut derimotivatio.,urrp"ukers,r.;;ld:*;r".?r__=;H,t:tf
""ii;, il#""J::':iTl^'fis
iiffil,Twould be
nition
of ri',r"
speaking in
p'tlic ,"j'r-"a
'
[:']Ttj:::t
"rl"
find it
"rp".iJily
stressfur.
:::tf"*:
o,:Til1f,ff:T:n*J,il:::,"t
correct responses on
a 25-ques-
a recture.
.13
.l.t
lt.tP[1'1'
Ilrrt'r'
study entirely-.
There are numerous other potential confounds; each can threaten the
internal validity of a study. We'll discuss more threats to internal validity
in later chapters, especially chapters 5 and 6.
External Validity
Another important goal of research projects is external validity. Exter-
menter might affect how long the participant is willing to tolerate the
pain of the ice water. For instance, male participants may withstand the
pain for a ionger period of time in front of a male experimenter than in
front of a female. Also, male participants may feel honor-bound to sustain
pain longcr than would female participants. How should the researcher
deal with these problems? Should both male and female experimenters be
'l'lrt' l{t'st'arcll
Itrot't'ss
45
used? The sex of the experimenter could be balanced across the control
that
and experimental grorrpr, and the researcher could also make certain
sex
same
the
of
member
u
by
tested
half of the people itt euch grouP are
female
and
male
both
Should
sex.
of tie opposite
and half bi
"member
one
participar,is be involved in the study, or should it be limited to only
experfemale
and
male
and
se*f Using both male and female participants
imenters increases the external validity of the experiment, but complimore
cates the design of the study and requires more participants and
of
the population. The larger-the population-represented by fhe sample
effective
An
study)
the
of
participants, the greatei the exiernal validity
selecprocedure is to identify participants from a population by landom
likely
equally
are
population
tion. In random selection, all mbmbers of the
is
sample
that-the
to be chosen. This procedure maximizes the probabitity
particiof
number
representative of the population, as long as a sufficient
is
pants are chosen. Choosing five people from 5,000 possible participants
population'
the
of
not likely to yield a sample that is representative
More often than noi, participanis are not selected randomly; instead'
they come from a readily availible pool of potential volunteers, such as
coliege students. It is very co**ott for researchers to solicit volunteers
|
from introductory psychoiogy classes. This type of sample is called a con,
samPling,
In
convenience
venience sample ioi ut't ac-cidental sample).
participants are not rand,omly chosen, but instead happen to be in the
,lght ptu." at the right time. Once a grouP of volunteers has been identifiJa, ine participani, uru assigned to different experimental conditions
(the different levels of the independent variable)' The most common way
of assigning the participants to the conditions is by random assignment'
flipRandoin uriign-"nt is the use of a procedure-perhaps as simple as
to
assigned
be
to
ping a coin-iuch that each participant is equally likely
anv of the conditions. Notice how this differs from random selection'
are chosen from the popul-n#ao* selection describes how participants
to
lation; random assignment describes how participants are assigned
experimental conditions.' i
It
Does convenience sampling automatically reduce external validity?
cotrpolitical
depends on the research. Ii a researcher is investigating the
."ir* of 18- to 22-year-olds, then using only college students of that agt'
[-rt'
range will limit the external validity of the study; the results cannot
HypoTHESES
The other half of the fun in research is learning new things by testing
your ideas. Suppose that a researcher is interested in the relationship
between summer programs and the intelligence of grade-school children.
In particular, this researcher wishes to know whether those who participate in a summer program where students can pick from among a number of intellectual topics are smarter than most people. This is the research
question. On the basis of this question, the researcher forms one or more
hypotheses (or predictions). In this case, the researcher may hypothesize
that the IQ scores of students in the summer program will be higher than
those of the population in general. This is the researcher's hypothesis.
To be precise, two hypotheses are involved because there are two sides
to every question: what the researcher expects and what the researcher
does not expect. One of these hypotheses is called the null hypothesis
(represented by Hs), and the other is called the alternative (or research)
hypothesis (represented by Hr or sometimes Ha). The null hypothesis is
the prediction that there is no difference between the groups being compared. We would expect the null hypothesis to be correct if the population
from which the sample is taken is the same as the population with which it
is beirig compared. In our example, if the students in the summer program
are actlrally a representative sample of the general population, the students'IQ scores will be roughly equivalent to the IQ scores of the general
population. The null hypothesis is typically what the researcher does not
expect to,find; a researcher does not usually predict the null hypothesis.
The alternative hypothesis is the prediction the researcher makes
about the results of the research. It states that there is a difference between
the scores of the groups being compared. In other words, it states that the
sample is not representativc of that particular population's scores, but
instead better represents some other population's scores. There are two
)l'(
)('('ss
47
a two-tailed hypothesis'
group will score higher or lower. This is called
the normal curve in figTo clarify why it ls Jaid to be two-tailed, consider
mean; in the case of
ure 3.1. In the middle of the curve is the population
(an average of the
mean
the IQ example, that would be 100. If a sumple
it would fall fat
100,
than
higher
sample ,-rr"*b"rrl tQ ,.or"s) were much
If a
distribution'
the
of
tuit
to the right of the mean, up in Jh" g::itii"
the
of
j00,
left
the
to
far
fall
it would
sample mean were much lower than
simply
researcher
a
If
mean, down i";h; negative tail of the distribution'
from the population mean
predicts that a ,urrrplJ*ean will be different
or lower' the researcher is
and does not predict whether it will be higher
of the distribution' Thus'
predicting that it will fall in one of the twJtails
direction of the differthe
an alternative n-ypoin"ris that does not predict
ence is called a two-tailed hypothesis'
55
70
100 115
130
145
Asyoumayhaveguessed,.iftheresearcherpredictsthedirectionof
('lt.t1r[1'1''l lr;t't'
HypoTHESIS TESTING
Although scientific research is designed to determine if the alternative hypothesis is supportable, hypothesis testing actually involves testing the null hypothesis, not the alternative hypothesis. If the difference
between the groups being compared is so large that the difference is
unlikely to have been caused by chance, then the groups being compared
are unlikely to represent the same population and the null hypothesis is
rejected.If the null hypothesis is rejected, the alternative hypothests rs sLtpported. On the other hand, if the difference between the groups is so small
that the difference is not unlikely to have occurred simply by chance, we
fail to reject the null hypothesis. f the null hypothesis-is no't rejected, the
alternative hypoth esis cannot be supported.
In our example, the researcher has predicted that the mean IQ scores
of summer-program students will be greater than the population mean of
100. This is a one-tailed alternative hypothesis. The null hypothesis is
always that there is no difference between the groups being compared. In
this case, the null hypothesis is that the sample mean will be no different
from the general population mean. If we collect our data and find a mean
that is greater than 100 (the mean IQ for the general population) by more
than could reasonably be expected by chance, then we can reject the null
hypothesis. When we do this, we are saying that the null hypothesis is
wrong. Because we have rejected the null hypothesis and because the
sample mean is greater than the population mean, as was predicted, we
support our alternative hypothesis. In other words, the evidence suggests
that the sample of summer-program students represents a popuiation
that scores higher on the IQ test than the general population.
On the other hand, if we collect our data and the mean IQ score does
not differ from the population mean by more than could reasonably be
expected by chance, then we fail to reject the null hypothesis and also fail
to support our alternative hypothesis.
may conclude that two populations differ when in fact they do not.
Another possible error is to find no difference in a study when a difference between the populations truly exists.
For any research problem there are two possibilities: either that the
null hypothesis is correct and there is no difference betr,veen the populations or that the null hypothesis is false and there is a difference between
the popuiations. The researcher, however, never knows the truth. Look
49
.rtiigtrrc3.2.Altlngthclup.l'thetruth(whichthereseirrcherCanncvcr
researcher's two decision choices'
.nc1 arlong the left side url th"
k'.w),
torejecttherrullhypothesi,o..failtorejectit.ThiSallowsfourpossibleoutcomes-twowaysfortheresearchertobecorrect,andtwoways
t.,
oiilJ?;?
";;;;
(" TlPh"":l
between
,Tl do" r error is to find a difference
Regardless of how
population.
the
in
exist
compared that'd'oes not truly
detected
som"ti^"t
difflrence is
well designed a study Tigni be, a reflect an actual difference in the
does not
between sample grouPs that
find that the mean IQ score of our
*"'*ight
populations. Fori*u*pt",
is higher than that of the general
sample of summer-progra* ,*JErlts
population.Butperhapso""u,,.pleof"'-*-"'-Programstudentsjust
;;'J there truly isn t a difference between
happenea to ue ulight stud""t;,
students and
or
the IQ scores of the overall poirriuiio"
1"1,*er-program
not truly
does
that
diff"'""ce was identified
the general public. Because u
a study
of
results
the
.; rg tfe "1,"1, that
exist, we have made a Type I
to the
changes
instance' if important
have immediate ramifications-for
very
be
can
errors
I
IQ scores-Type
curriculu* ur" *uJ" on the basis of
serious indeed'
Figure 3"2
Type I error
Reiect Hs
Fail
to reiect
Hs
(ct)
50
('lr.rPlt't' Iltt't't'
5I
fooT
chance of ou*u*ptu *Ju^ being drawn
H1. The 5-1. 9f .th" distribusupport
lt'ra
u6
chosen cr = .05, we then reject
ut'd is called the region of
tion that is shaded in figure 3.3 represents o,
null hypothesis
If a score fallslwithin the region of rejection' the
reiection.
is rejected.
ffiegionofrejectionforaone.tailedhypothesis
Region of reiection
A researcher collects information on family size and concludes, on the basis of the
data, that Midwestern families are larger than the average family in the United States.
However,,unbeknownst to the researcher, the sample includes several unusually large
families, and in reality, Midwestern families are no larger than the national average. What
type of error was made?
one-tailed' with a
example, the alternative hypothesis was
end of the clistrione
at
lies
one-tailed hypothesis, the region of teiection
split ecltriilly
rejection-is
of
bution. For a two-tailed hypothesis, tkre region
tail whcrr tr
other
the
in
between the two tails-2.56h rnone tail uid'2'5"/"
In our
52
('lr,rPt1'p'l'hr.t't'
= '05 (figr-rre ? 4)
]f our s;rmple mean is so greart thart it l"alls irr thc tep 2.5,2,
sampling distribution for the generii public, or it is so small that
it
:llh:
falls in the bottom 2.5'/. of the tu-fli.tg distribution, then we
infer thart
there
1as only a 5'/" chance (5% becausJ the two regions of rejection add
up to 5'h of the distribution) of our sample mean b6ir,g dra*.,
from that
population. Having chosen cr = .05, we then reject Ho aia support
H1.
Figure
3.4
.--i5-.ozi
The Probabili$ of a Type II Error
The probabilit{
m-aklng a Type II error is calred beta (p). Beta is a
9f
measure of the likelihoo
d of not finding a difference that truly exists. The
opposite of B is called Power and is caliulated as 1
- B. poweil, th" likelihoo-{ of finding a true difference. In general, researchers
want to design
Ho
Hl
of summer-Program stud.ents if they do score significrrntly higher than the general public'
The darker shadeJ ur"u tibeted " aIpha" is the top 5o/o of the null
falls
hypothesis distribution. If our sample'J me_an is so large that it
it is
that
*iinir-r the top 5o/" ofthe null hypothesis distribution, then we say
unlikely to belong to that distiibution, and we reject the null hypothesis'
(to
fne tignter staded area of the alternative hypothesis-distribution
Type
a
the left ot"utpnu; represents beta. This is the probability of making
II error. If a mean ii too small to land in the region of reiection, but actubeta
ally does belong to a separate population, then it will fall within the
and
false,
is
it
,."[ior,. The resJarchers wlll fail to reject Hs, ven though
attempts
thus will make a Type II error. Whenever possible, a researcher
of
to increase the Power of a project, in order to increase the likelihood
relecting a false null hypothesis'
beta
Consider figure 3.5 again. Power can be increased by reducing
beta
and
lejt),
(imagine *orrmlg the line"delineating beta and alpha to-the
and
beta
can be reduced fiy i.r.r"using alpha (moving the line delineating
realistic
a
alpha to the tefi). Often, io*",r"t, increasing atpha is not
the
option. Only rur"iy will a researcher-and perhaps more importantly,
alpha-is
researcher's colleagues-trust the results of a study where
increase the
greater than .05. A"relatively simple way for a researcher to
power of a study is to increase tkre sample size. The larger the size of the
using stasamples in a stuiy, the easier it is to find a significant difference
diftistical tests. Statisiically, a small difference may indicate a significant
small
same
ference if many participants were involved' in the study' If the
is more
difference is based on only a few participants, the statistical test
just
by chance'
likely to suggest that the results could have happened
ol the
p-ro1-rr-rlrrtion
('lt.t
pt1'1''
I'llrt't'
c()nfound in clur study caused our results tcl c()mc out cliifercntly than
expected. A^y of these reasons and more could cause a Type ll error and
make us fail to reject the null hypothesis when it is false. Of course, there
is also the possibility that we failed to reject the null hypothesis because
the null hypothesis is actually true. How can we tell if the null hypothesis
is true or if we have made a Rpe II error? We can't, and for this reason it
is risky to accept the null hypothesis as true when no difference is
detected. Similarly, it is risky to predict no difference between our sample
and the population. If we find no difference, we cannot know by this one
study if it is because our prediction was accurate or because we made a
Rp" II error.
If a researcher
significantly greater than the population mean by chance. However, neither can be totally confident that his or her explanation for the
results is correct. Rejection of the null hypothesis and support of the alternative hypothesis lend confidence to the results found, but not to the
explanation given. The explanation may or may not be correct; it is vulnerable to all of the subjectivity, wishful thinking, and faulty reasoning
that humans are heir to. The best explanation will emerge only after other
carefully designed investigations have been conducted.
scores be
How TO Do Sctnxcn
Conducting scientific research, like any project, involves a series of
steps. In general, the steps are the same for any scientific research project,
only the specifics differ from project to project. These steps are outlined in
figure 3.6.
The first step is to identify the topic to be studied. At first this can be
somewhat difficult, not because there are so few topics to choose from,
but because there are so many.
One way to begin is to think about courses you have had in psychology and related fields. What courses were your favorites? Thumb
55
Figure3.6Thestepsinconductingscientificresearch
Step
ldentifY a toPic
-"t
Communicate
results
,","'t0111'1,,,,,
3t"p
Analyze
\..0
,r"{
data
\
step5
Step
t"'-ltitJtne'is
steP 4
Design the studY
Collect data^/
56
('lraP[1.p'l'llt.t.t,
I
:ff*" lll:
lrt' ltt'st'.tt't'lt
l'r'ot'('ss
57
SurvrrvrARY
^
Conducting
scientific research involves being precise arbout what is
studied and how it is studied so that confounds can be avoided. An
stt
('lr.tpt1'1' I lrrt't'
59
I ('Onrltrcting rcscarrch is not a linear task, but instead tencls ttl Lrc circttnext study is
lar; thc rt'sults ,',1 o." study affect the way in which the
rlesignecl ttnd interPreted. I
research)
beta (B)
confound
convenience (or accidental)
sample
demand characteristics
experimenter effect
external validitY
extraneous variables
internal validitY
null hypothesis
one-tailed hYPothesis
PoPulation
Power
random assignment
random selection
region of rejection
reliability
sample
sampling distribution
significant difference
statistical analYsis
two-tailed hYPothesis
Typ" I error
Typ" II error
validity
operational definition
ExEnctsEs
on fidgeting' What
1. A researcher wishes to look at the effect of stress
oPerto be operationally defined? What are some possible
terms need
ational definitions?
Can a study
2. what is the difference between reliability and validity?
but not
data
reliable
be valid and not reliable? Can a study provide
valid data?
were asked to
3. Suppose I have conducted a study in which participants
sad, or neutral
perform a mood induction task ihat created a happy,
questionnaire
a
complete
to
asked
then
mood. The participants were
be a
characteristics
demand
might
why
about their sense of wellness.
affect the
problem in this study? How .o.rid de"mand characteristics
results?
at Smart U'
4. A researcher wants to create a random sample of students
camPus
A friend suggests that the researcher walk across
"lq
this a random sample?
approach every third Person she encounters. Is
to creIf not, what tip. of sample is it? Can you dsvelop a procedure
ate a random samPle?
and an alternative
5. What is the difference between a null hypothesis
hypothesis?
of chocolate on mood. one group of pirr6.
" I want to investigate the effect
completing a mood.scale.; thc othe r
before
ii.if""ts eats a cf,ocolate bar
eating chocolate' I
participants complete the mood scale without first