Você está na página 1de 10

The Research Process

RrtrRgttrtY AND Vnttpttv


lnternal Validity and Confounds
External Validity

HvporHrsrs
HvporHrsts Trsrtruc
Errors in Hypothesis Testing
The Probability of a Type I Error
The Probability of a Type ll Error
Why We Don't Accept the Null Hypothesis

How To Do Scrrrucr
Summnnv

The scientific method is the process by which scientists, including


psychologists, collect information and draw conclusions about their disciplines. In this method, observations are made systematically and objectively, so that the results will be as meaningful as possible.
When using the scientific method in psychology, the researcher often
tries to determine the effect of some factor on some type of behavior. In
other words, the researcher wants to know if a change in an independent
variable will cause a change in a dependent variable.
It is important to be precise and concrete when designing a study
using the scientific method. This precision and clarity allows the
researcher to more readily foresee pitfalls, ambiguitf, and confounds that
could render the results meaningless.
One important way to avoid confounds and ambiguity in research is
by carefully defining all of the important concepts. Perhaps a researcher is
interested in the effect of stress on work efficiency. The researcher plans tcr
study this effect by inducing stress in half of the participants and then nrca41

42

('lr.rPt1,p. Ilrr.t,r,

slrr.' all ()f thc participants'


performance on some
define the terms "str6ss"
task. .r-hc
u.,t "-o.k Jil;cy.,, Dictionary first step is kr
not precise enough for
definitions arc
a researcher,s ,-r""ar.
wt,at is required
tional definitioti-u

is an operaauri'itior. that tells in"'r"ua"r


to produce a phenomenon
exactly what was done
o. to measure solne variable.
the resear.h"r
In this example,
to explaln how r,."r,
l:,".ds
*irr
b" i"d;;;; ura
performance w'l
uu -uur'.r*d. The .";;;;;;1mar
how
"*u.tly
in one sroup of particip;;;;y
intend to induce
stress
terins th;;th"t thly
they give an impro-pttt
_i[;;;deotaped as
t6.r. rr,"i, i'rormation'abour
deri"iti"" r', stress in thisir," ,ria"otaped
experiment; it

i:ffitffT:::^t:;;1"*tional
derin*ion;T::i""fi
:,.""j";.?llioo:"ilffi
researcher will measure
*:*i
*ott-Jni.iJi;;;
,iXt.,r-uer
list of

jt".,ru,,m

10 that are solved


during a 3-minut" i.i"rrrur. of anagrams from a
nition of work efficiency
This op"Ltio.,al defi_
,"rilir-r"
long it was performea,'""i'-r-,at ,"ua"r-*ira oro *u, p"rfi..*a, for
how
measu."*u.,, _u, mide.

Togethet, these

;ff ::ll:'H,i;f :,':":rl'"1,:1l:"ffi ,ili[iH;{ix;#,#:*"sj,.,,uasupAn operational


j;tfr
th

definition should u"

i':r

HH:: jiffi;T""il"

t a n i" Ji _ a a . u,, . o,,1"


"
use it inih""

o,, .u,, und

manner. BT

",;;;itii;,:i

v,h
i51i,
An operatioy] definition, T: ;#,ffi:lru
of .or.re, has to
p a rti cip ants tha
t th ev wi I I u ",,i
ri

"

that a person who is


not

:il,i,#:tj,

*li,u*"

jiltiT,:;"ru"o
ke

v e s risa ti

* .r*r

tup

"

"

ers tan

th e d efini ti

:l :, ri:l,il,.'J

on

|,#fi

:1"?dy specifying how


r a ur,.,,,i
L r."s i i m ore

""

"

*iir; Ht ltlry :i':#i"",Tr,??

p
T,i; ;ffiTlt1trffi :* iig
" ".;?d; i; ",",y p i.
;l
+ ll game
operationut derimotivatio.,urrp"ukers,r.;;ld:*;r".?r__=;H,t:tf
""ii;, il#""J::':iTl^'fis

iiffil,Twould be
nition
of ri',r"

speaking in

p'tlic ,"j'r-"a

which of the forowing

'

[:']Ttj:::t

"rl"

find it

"rp".iJily

stressfur.

:::tf"*:

is the most comprete


operationar definitionl

as the amount of information


retained after a

o,:Til1f,ff:T:n*J,il:::,"t

correct responses on
a 25-ques-

c' Learning derined as srudying


ror;:ffJ:'ffil:::.in
d' Learning defined as a rerati*,y

a recture.

change in behavior that


occurs because of experienau. ourranent
'

Ilrt' lit'st',rt't lt I'l'(rt'('ss

.13

RureBrLrTY AND ValtoITY


Reliability is a key concept in research. Just as a reliable vehicle will
start each time the ignition key is turned, a reliable measure is consistent.
In other words, different researchers who use the same procedure to measure the same phenomenon should obtain the same results if the procedure is reliable. When used with comparable participants, a reliable
operational definition should yield similar results each time.
Validity is the extent to which a measurement technique measures
what it purports to measure. An operational definition is likely to yield
valid results if it corresponds closely to what is to be measured. Thus,
measuring work efficiency by the number of anagrams completed in 3
minutes may be a valid measure if the results are meant to generalize to
work based on written language. The same measurement technique
would probably be an invalid measure if it were meant to generalize to
physical labor because anagram solving and physical labor are not very
closely related.

Internal Validity and Confounds


A specific type of validity that is important in scientific research is
internal validity. Internal validity is the extent to which the design of an
experiment ensures that the independent variable, and not some other
variable or variables, caused the measured difference in the dependent
variables. In other words, an internally valid study has no problems that
would confound the results. A confound, as described in chapter 1, is a
factor that yields alternative explanations for a study's results and thus
limits its internal validity. Internal validity is maximized by eliminating
confounds. Experienced researchers automatically watch for some common confounds and design their studies so that these confounds are
avoided or controlled. For example, an inexperienced researcher may
wish to compare performance on a simple task under two temperature
conditions: warm and cool. One research assistant is responsible for the
cool condition, and another is responsible for the warm condition. If performance is found to be better in the cooler condition, it might be because
the temperature had an effect on behavior, or it might be that the research
assistants affected the participants' behavior in some manner. Perhaps
one research assistant was more neatly dressed than the other and the
participants with the neater assistant took the project more seriously. This

would be a confound called an experimenter effect. An experienced


researcher might foresee this problem and avoid it by using only one
assistant or by keeping both assistants but having each collect half their
data in the warm condition and half in the cool condition.
Researchers must also ensure that the study is not confounded by
demand characteristics. Demand characteristics are the cues participants

.l.t

lt.tP[1'1'

Ilrrt'r'

rtsc to clt'tcrntinc whut is expected of them in a stucly. Suppctse that to


sttrcly the effect of mood on sense of wellness, a researcher induces either
rr positive or a negative mood and then asks the participant some questions about how healthy he or she feels. A participant in this study might
very well perceive that the researcher expects mood to affect the
responses and may try to help the researcher by responding as the
researcher expects. To avoid this problem, the researcher would want to
take special steps to dissociate the two parts of the study, perhaps by having a confederate act as if the questions about health are for a different

study entirely-.
There are numerous other potential confounds; each can threaten the
internal validity of a study. We'll discuss more threats to internal validity
in later chapters, especially chapters 5 and 6.

External Validity
Another important goal of research projects is external validity. Exter-

nal validity is the generalizabllity of the results of an investigation


beyond the specific participants, measures, and site of the research. For
example, a study with results that generahze to all English-speaking
adults has greater external validity than a study with results that generalize to English-speaking college students. There is no rule of thumb, however, about how externally valid a study needs to be. Many useful
research ideas come from studies with little external validity. Any investigation needs to have some external validity, though; an experiment with
results irrelevant beyond the particular participants in the study is of little
or no value.
The controls needed to create an internally valid study can sometimes
limit the external validity of the study. For example, suppose an investigator wishes to research the effect of hypnosis on pain tolerance. In an
experimental group, each participant will be hypnotized and given the
suggestion that he or she cannot feel pain; then, each participant will submerge his or her arm in a bucket of ice water. Participants in the control
group will not be hypnotized, but each person will also submerge an arm
in the ice water. The dependent variable is the length of time that each
participant keeps his or her arm in the water.

The investigator is aware that factors other than the independent


variabiles could possibly affect the outcome of this study-these are called
extraneous variables. In the present case, the sex of the experimenter and
the sex of the participants are extraneous variables. The sex of the experi-

menter might affect how long the participant is willing to tolerate the
pain of the ice water. For instance, male participants may withstand the
pain for a ionger period of time in front of a male experimenter than in
front of a female. Also, male participants may feel honor-bound to sustain
pain longcr than would female participants. How should the researcher
deal with these problems? Should both male and female experimenters be

'l'lrt' l{t'st'arcll

Itrot't'ss

45

used? The sex of the experimenter could be balanced across the control
that
and experimental grorrpr, and the researcher could also make certain
sex
same
the
of
member
u
by
tested
half of the people itt euch grouP are
female
and
male
both
Should
sex.
of tie opposite
and half bi
"member
one
participar,is be involved in the study, or should it be limited to only
experfemale
and
male
and
se*f Using both male and female participants
imenters increases the external validity of the experiment, but complimore
cates the design of the study and requires more participants and

time for the study to be conducted. there is no correct answer to this


problem. Some researchers will choose greater external validity, while
bth"tt will opt for a simpler, quicker study'
in
The external validity of uit.tdy can also be affected by the manner
talk
we
research'
In
project'
the
for
which the participants are selected
A popuabout selectlng a sample of participants from a larger population'
to
animals)
sometimes
people,
(usually
lation is all of the orgunii-s
A
results'
research
the
generalize
which the researcher wishes to be able to
represent
sample is a subset of the population; the goal is for the sample to

of
the population. The larger-the population-represented by fhe sample
effective
An
study)
the
of
participants, the greatei the exiernal validity
selecprocedure is to identify participants from a population by landom
likely
equally
are
population
tion. In random selection, all mbmbers of the
is
sample
that-the
to be chosen. This procedure maximizes the probabitity
particiof
number
representative of the population, as long as a sufficient
is
pants are chosen. Choosing five people from 5,000 possible participants
population'
the
of
not likely to yield a sample that is representative
More often than noi, participanis are not selected randomly; instead'
they come from a readily availible pool of potential volunteers, such as
coliege students. It is very co**ott for researchers to solicit volunteers
|
from introductory psychoiogy classes. This type of sample is called a con,
samPling,
In
convenience
venience sample ioi ut't ac-cidental sample).
participants are not rand,omly chosen, but instead happen to be in the
,lght ptu." at the right time. Once a grouP of volunteers has been identifiJa, ine participani, uru assigned to different experimental conditions
(the different levels of the independent variable)' The most common way
of assigning the participants to the conditions is by random assignment'
flipRandoin uriign-"nt is the use of a procedure-perhaps as simple as
to
assigned
be
to
ping a coin-iuch that each participant is equally likely

anv of the conditions. Notice how this differs from random selection'
are chosen from the popul-n#ao* selection describes how participants
to
lation; random assignment describes how participants are assigned
experimental conditions.' i
It
Does convenience sampling automatically reduce external validity?
cotrpolitical
depends on the research. Ii a researcher is investigating the
."ir* of 18- to 22-year-olds, then using only college students of that agt'
[-rt'
range will limit the external validity of the study; the results cannot

.16 ( lt,tPll'1' llu't't.


8t'r1('r(rlizccl ttl ltl- tct22-year-olcls who clo n()t i-tttcncl collcgc. ()n the other'
hanc1, research into physiological or perceptual processes, which a'lre.
arssumed to be pretty much the same whether an individual is in college

or not, would be likely to have reasonable external validity even if the


participants were exclusively college students. Finally, the external validity of a study is open to testing. We simply repeat the work in a different
context to see if the results can be generalized.
Careful and precise planning is necessary when conducting research
by the scientific method. Only by planning ahead and thinking critically
can a researcher avoid design flaws and make choices that will maximize
a study's internal and external validity. Actually, designing projects
devoid of confounds can be something of a brain teaser; for me, it makes
up half the fun of doing research.

HypoTHESES
The other half of the fun in research is learning new things by testing
your ideas. Suppose that a researcher is interested in the relationship
between summer programs and the intelligence of grade-school children.
In particular, this researcher wishes to know whether those who participate in a summer program where students can pick from among a number of intellectual topics are smarter than most people. This is the research
question. On the basis of this question, the researcher forms one or more
hypotheses (or predictions). In this case, the researcher may hypothesize
that the IQ scores of students in the summer program will be higher than
those of the population in general. This is the researcher's hypothesis.
To be precise, two hypotheses are involved because there are two sides
to every question: what the researcher expects and what the researcher
does not expect. One of these hypotheses is called the null hypothesis
(represented by Hs), and the other is called the alternative (or research)
hypothesis (represented by Hr or sometimes Ha). The null hypothesis is
the prediction that there is no difference between the groups being compared. We would expect the null hypothesis to be correct if the population
from which the sample is taken is the same as the population with which it
is beirig compared. In our example, if the students in the summer program
are actlrally a representative sample of the general population, the students'IQ scores will be roughly equivalent to the IQ scores of the general
population. The null hypothesis is typically what the researcher does not
expect to,find; a researcher does not usually predict the null hypothesis.
The alternative hypothesis is the prediction the researcher makes
about the results of the research. It states that there is a difference between
the scores of the groups being compared. In other words, it states that the
sample is not representativc of that particular population's scores, but
instead better represents some other population's scores. There are two

lrt' l{t'st'.r t't'lr

)l'(

)('('ss

47

tyP"l- thc researcher sinrply prt'tVpes of irlte rnirtive hyp-rotheses' ln tlne


wlU differ' but does not predict
rlicts tftat the two groups being comParea
researcher does not predict which
tl-re clirection of tnat aifference-thl

a two-tailed hypothesis'
group will score higher or lower. This is called
the normal curve in figTo clarify why it ls Jaid to be two-tailed, consider
mean; in the case of
ure 3.1. In the middle of the curve is the population
(an average of the
mean
the IQ example, that would be 100. If a sumple
it would fall fat
100,
than
higher
sample ,-rr"*b"rrl tQ ,.or"s) were much
If a
distribution'
the
of
tuit
to the right of the mean, up in Jh" g::itii"
the
of
j00,
left
the
to
far
fall
it would
sample mean were much lower than
simply
researcher
a
If
mean, down i";h; negative tail of the distribution'
from the population mean
predicts that a ,urrrplJ*ean will be different
or lower' the researcher is
and does not predict whether it will be higher
of the distribution' Thus'
predicting that it will fall in one of the twJtails
direction of the differthe
an alternative n-ypoin"ris that does not predict
ence is called a two-tailed hypothesis'

-igure f .l The normal distribution of lQ scores

55

70

100 115

130

145

Asyoumayhaveguessed,.iftheresearcherpredictsthedirectionof

if the researche, p'"dittt that the mean IQ of


the differ"rr."-for^
"*uilpl",
population mean-this is a onecollege stud.ents will be i",igt "t than the
tail of the distribution
tailed hypothesis. The rese-archerpredicts in which
alternative hypoththe
example,
the sample mean is expected to fall. In our
IQ scores greater
have
will
esis is that the studenis in the summer Program
alternatwo-tailed
the
(V/hit would
than those of th;;;r,".ul population.
be?)
hypothesis
tive hypothesis be"l Wnut would the other one-tailed

from the Midwest


A researcher hypothesizes that a sample of families
null and
national average family size' What are the

differs in size from the


alternative hYPoth eses

('lt.t1r[1'1''l lr;t't'

HypoTHESIS TESTING
Although scientific research is designed to determine if the alternative hypothesis is supportable, hypothesis testing actually involves testing the null hypothesis, not the alternative hypothesis. If the difference
between the groups being compared is so large that the difference is
unlikely to have been caused by chance, then the groups being compared
are unlikely to represent the same population and the null hypothesis is
rejected.If the null hypothesis is rejected, the alternative hypothests rs sLtpported. On the other hand, if the difference between the groups is so small
that the difference is not unlikely to have occurred simply by chance, we
fail to reject the null hypothesis. f the null hypothesis-is no't rejected, the
alternative hypoth esis cannot be supported.
In our example, the researcher has predicted that the mean IQ scores
of summer-program students will be greater than the population mean of
100. This is a one-tailed alternative hypothesis. The null hypothesis is
always that there is no difference between the groups being compared. In
this case, the null hypothesis is that the sample mean will be no different
from the general population mean. If we collect our data and find a mean
that is greater than 100 (the mean IQ for the general population) by more
than could reasonably be expected by chance, then we can reject the null
hypothesis. When we do this, we are saying that the null hypothesis is
wrong. Because we have rejected the null hypothesis and because the
sample mean is greater than the population mean, as was predicted, we
support our alternative hypothesis. In other words, the evidence suggests
that the sample of summer-program students represents a popuiation
that scores higher on the IQ test than the general population.
On the other hand, if we collect our data and the mean IQ score does
not differ from the population mean by more than could reasonably be
expected by chance, then we fail to reject the null hypothesis and also fail
to support our alternative hypothesis.

may conclude that two populations differ when in fact they do not.
Another possible error is to find no difference in a study when a difference between the populations truly exists.
For any research problem there are two possibilities: either that the
null hypothesis is correct and there is no difference betr,veen the populations or that the null hypothesis is false and there is a difference between
the popuiations. The researcher, however, never knows the truth. Look

49

.rtiigtrrc3.2.Altlngthclup.l'thetruth(whichthereseirrcherCanncvcr
researcher's two decision choices'
.nc1 arlong the left side url th"

k'.w),

torejecttherrullhypothesi,o..failtorejectit.ThiSallowsfourpossibleoutcomes-twowaysfortheresearchertobecorrect,andtwoways
t.,

oiilJ?;?

First, the researcher


ways to be correct are straightforward. false; that is' the
is
*n"r',, iri reality, it
can reject the riull hypothesis
between the'groups b:l"g.compared'
researcher finds a true difference
when' in
ruir to reiect the riull hypothesis
second, the researcher might
not
would
In this case, the reselrcher
fact, the null hypothesis is true.
reality'
in
and'
groups being compared'
detect a differeir'ce between th;
grouPs'
the
in.t" is no difference between
it is
reject the.null hypothesis when
The two possible errors "r;1;
false
is
it
when
;nd to fail t"o t";".i,n" null hypoihesis
true (a Typ" r

";;;;

(" TlPh"":l

the groups bging

between
,Tl do" r error is to find a difference
Regardless of how
population.
the
in
exist
compared that'd'oes not truly
detected
som"ti^"t
difflrence is
well designed a study Tigni be, a reflect an actual difference in the
does not
between sample grouPs that
find that the mean IQ score of our
*"'*ight
populations. Fori*u*pt",
is higher than that of the general
sample of summer-progra* ,*JErlts
population.Butperhapso""u,,.pleof"'-*-"'-Programstudentsjust
;;'J there truly isn t a difference between
happenea to ue ulight stud""t;,
students and
or
the IQ scores of the overall poirriuiio"
1"1,*er-program
not truly
does
that
diff"'""ce was identified
the general public. Because u
a study
of
results
the
.; rg tfe "1,"1, that
exist, we have made a Type I
to the
changes
instance' if important
have immediate ramifications-for
very
be
can
errors
I
IQ scores-Type
curriculu* ur" *uJ" on the basis of
serious indeed'

Figure 3"2

The four possible research outcomes


THE TRUTH

Errors in Hypothesis Testing


Researchers carefully design their studies so that they answer their
research questions by either supporting or failing to support their alternative hypotheses. However, because researchers are not omniscient, it is
possible to reject the null hypothesis when it really is true. A researcher

lrt' lit'st'.lrt'll l'l'(tt'('ss

The null hYPothesis


THE
DECISION

Type I error

Reiect Hs

Fail

to reiect

Hs

(ct)

The null hYPothesis


is false

50

('lr.rPlt't' Iltt't't'

'l'lrt''l'y1-rt'll ertrlr is to iail to clctcct a rlifit'rt'nct.[rt'twt't'rr thc sarnPlt'


gr()LtPS whcn ar clifference truly exists Lretween the po1-rtrlations. We wor-rlcj
have macle a Type II error if our sample of summer-progrelm students clid
not have a mean IQ score significantly greater than the mean IQ for the
general population when, in fact, the population of summer-program students did have a higher IQ than the general population. Our study would
have failed to detect a difference that actually exists. This can happen for
a number of reasons. Perhaps our sample included the less intelligent of
the summer-program students. Perhaps our IQ test was administered in a
nonstandard way that caused greater variation in the scores than if it had
been conducted in the standard way. Still another possibility is that we
included too few students in our sample to detect the difference.
Typ" II errors are often seen as less serious than Typ" I errors. If a difference truly exists but is not identified in one research project, continued
research is likely to detect the difference. On the other hand, a Type I error
is seen as something to be avoided. The results of applied research affect
policy and practice in many areas of our life, such as education, medicine,
and government. The results of basic research further our body of knowledge and move along the development and advancement of theory that
affects applied research. Researchers set their standards high to avoid
making Typ" I errors, to avoid finding differences between comparison
groups that don't actually exist in the populations. We need to keep the
odds that advances in research and any changes in policy or practice are
based on real results, not erroneous results.
An analogy with the U.S. justice system may clarify the significance
of Type I and Typ" II errors. Consider the case of a person accused of a
crime. The null hypothesis is that an accused person is innocent; the
accused Person is no different from the general population. The alternative hypothesis is that the accused is guilty; the accused person is different from the general population, a deviant. In the United States, it is
considered a more serious error to convict an innocent person than to
acquit a guilty person; that is, it is more serious to find a difference that
does not exist (a TVpe I error) than to fail to find a difference that really is
there (a Type II error).

llt' lit'st"lt't lt l'l'ttt't'ss

5I

The Probabiliff of a TYPe I Error


'['hc pr0balrility of making a Type I error is called alpha (c)' The
the researcher; in the social
acccptabl" alpha level is typicflty .ftit". b.y

been set at '05' In other words'


and behavioral sciences, it has truditlo.tally
are willing to accepl a5"/"
researchers in the social and behavioral sciences
.05, a difference between
risk of making tto" I error. with alpha set at
"
the null hypothesis will
the groups that is'l'urgu ".ough for us to reject
when the null hypothesis is true'
occur by chance only 5 times out of 100
difference'
A difference this lutfe is said to be a significant
again. The normal disexample
Let,s consider our Summer-Program
dirttib.ttion of IQ scores in
tribution in figure 3.3 representJ the"sampring
the distribution of samthe general public. (The sampling distributlJi:t
scores') If the null
ple meanr, u, offosed to a irstrTbution of individual
sample of summer-program
hypothesis is trrle, the mean IQ score for the
However' if the
distribution'
this
of
part
as
students will be included

alternativehypothesis-thatthepopulatiolm:anlQofthesummer-pro^*"utt for the general population-is


gram students is greut"r than. th"

represents a different distribution'


correct, the mean for our sample better
of the summer-program stuTo determine whether the population mean
mean of the general pubdents is greater than or equal'to the population
mean of the general
lic, we compare our samPl" -"un to tn" population
it ialls in the top Soh of the sampublic. If our sample 1-.,"u., is so great-thaf
infer that there was only a5"/o
pling distribution fo, the generaipublic, we

that population' Having

fooT
chance of ou*u*ptu *Ju^ being drawn
H1. The 5-1. 9f .th" distribusupport
lt'ra
u6
chosen cr = .05, we then reject
ut'd is called the region of
tion that is shaded in figure 3.3 represents o,
null hypothesis
If a score fallslwithin the region of rejection' the
reiection.

is rejected.

ffiegionofrejectionforaone.tailedhypothesis

Region of reiection

A researcher collects information on family size and concludes, on the basis of the
data, that Midwestern families are larger than the average family in the United States.
However,,unbeknownst to the researcher, the sample includes several unusually large
families, and in reality, Midwestern families are no larger than the national average. What
type of error was made?

one-tailed' with a
example, the alternative hypothesis was
end of the clistrione
at
lies
one-tailed hypothesis, the region of teiection
split ecltriilly
rejection-is
of
bution. For a two-tailed hypothesis, tkre region
tail whcrr tr
other
the
in
between the two tails-2.56h rnone tail uid'2'5"/"

In our

52

('lr,rPt1'p'l'hr.t't'

I'lrt' l{t'st'.t rclt l't'ttt't'ss

= '05 (figr-rre ? 4)

]f our s;rmple mean is so greart thart it l"alls irr thc tep 2.5,2,
sampling distribution for the generii public, or it is so small that
it
:llh:
falls in the bottom 2.5'/. of the tu-fli.tg distribution, then we
infer thart

there
1as only a 5'/" chance (5% becausJ the two regions of rejection add
up to 5'h of the distribution) of our sample mean b6ir,g dra*.,
from that
population. Having chosen cr = .05, we then reject Ho aia support
H1.

Figure

3.4

Regions of rejection for a two-taired hypothesis

.--i5-.ozi
The Probabili$ of a Type II Error
The probabilit{
m-aklng a Type II error is calred beta (p). Beta is a
9f
measure of the likelihoo
d of not finding a difference that truly exists. The
opposite of B is called Power and is caliulated as 1
- B. poweil, th" likelihoo-{ of finding a true difference. In general, researchers
want to design

studies that are high in power and have a low


B. However, B, cr, and
Power are interconnected, as an examination of figure 3.5
makes clear.
In figure 3.5, the distribution on the left represents the distribution
of
sample means when the null hypothesis is correct. The distribution
on the
right represents the distribution of sample means when the alternative
hypothesis is correct. In terms of our ,.rro-"r-program example,
the dis_
tribution on the left is the distribution of
IQ scores for'the general
-eun
public; this distribution would include the sample of summer-program
students if they are not significantly different from the general
public. The
distribution of sample means on tire right represent, thu rnuu1
Ie scores

Figure"3.5 A representation of power, beta, and alpha

Ho

Hl

of summer-Program stud.ents if they do score significrrntly higher than the general public'
The darker shadeJ ur"u tibeted " aIpha" is the top 5o/o of the null
falls
hypothesis distribution. If our sample'J me_an is so large that it
it is
that
*iinir-r the top 5o/" ofthe null hypothesis distribution, then we say
unlikely to belong to that distiibution, and we reject the null hypothesis'
(to
fne tignter staded area of the alternative hypothesis-distribution
Type
a
the left ot"utpnu; represents beta. This is the probability of making
II error. If a mean ii too small to land in the region of reiection, but actubeta
ally does belong to a separate population, then it will fall within the
and
false,
is
it
,."[ior,. The resJarchers wlll fail to reject Hs, ven though
attempts
thus will make a Type II error. Whenever possible, a researcher
of
to increase the Power of a project, in order to increase the likelihood
relecting a false null hypothesis'
beta
Consider figure 3.5 again. Power can be increased by reducing
beta
and
lejt),
(imagine *orrmlg the line"delineating beta and alpha to-the
and
beta
can be reduced fiy i.r.r"using alpha (moving the line delineating
realistic
a
alpha to the tefi). Often, io*",r"t, increasing atpha is not
the
option. Only rur"iy will a researcher-and perhaps more importantly,
alpha-is
researcher's colleagues-trust the results of a study where
increase the
greater than .05. A"relatively simple way for a researcher to
power of a study is to increase tkre sample size. The larger the size of the
using stasamples in a stuiy, the easier it is to find a significant difference
diftistical tests. Statisiically, a small difference may indicate a significant
small
same
ference if many participants were involved' in the study' If the
is more
difference is based on only a few participants, the statistical test
just
by chance'
likely to suggest that the results could have happened

ol the

p-ro1-rr-rlrrtion

Why \it/e Dont Accept the Null Hypothesis


the
You may be wondering why we keep saying that we foil !? reiect
hypothenull
the
accept
we
that
ntttl hypotheiis,instead of slmply stating
finding was
sis. If we reject the null hypothesis, we know it is because our
reject the
not
do
we
if
But,
alone.
relatively unlikely to occnt by chance
there is
says
hypothesis
null
The
null hypothesis, *not does that mean?

no significant difference between our sample mean and the population


that our
mean. If we do not reject the null hypothesis, does that mean
By
necessarily'
Not
scores?
sample's scores are equal to the popul-ation's
significant
a
find
to
failed
have
we
failing to reject the null hypotheiis,
can
differince, but that does not mean we have found an equality. There
failing
for
is,
difference-that
a
find
be a number of reasons for failing to
II error'
to reject the null hypothesis. It could be because we made a TyPe
detect
to
enough
sensitive
not
was
Perhaps our meth"d of data collection
c()lrdiffercnce
the
detect
to
the difference, or we needed a larger sample
rllctlrl
its
that
such
was
sample
our
sistently. Perhaps, simply by chance,
*u, .toi signifiiantty iiiferent from the population's score' or perhtrps a

('lt.t

pt1'1''

I'ltt' lit'st'.lrt'lr l'l'ttt't':';s

I'llrt't'

c()nfound in clur study caused our results tcl c()mc out cliifercntly than
expected. A^y of these reasons and more could cause a Type ll error and
make us fail to reject the null hypothesis when it is false. Of course, there
is also the possibility that we failed to reject the null hypothesis because
the null hypothesis is actually true. How can we tell if the null hypothesis
is true or if we have made a Rpe II error? We can't, and for this reason it
is risky to accept the null hypothesis as true when no difference is
detected. Similarly, it is risky to predict no difference between our sample
and the population. If we find no difference, we cannot know by this one
study if it is because our prediction was accurate or because we made a

Rp" II error.
If a researcher

does reject the null hypothesis, how much does that


support the alternative hypothesis? Support for the alternative hypothesis
means that the identified difference was so large as to be unlikely to have
occurred by chance. If the difference didn't occur by chance, why did it
occur? Explaining the difference is the researcher's task. One researcher
may believe that summer-program students are smarter than the general
public, while another researcher may think that the summer program
serves to increase students' IQ scores. On the basis of their beliefs, both of
these researchers are likely to predict that the mean IQ score for a sample
of summer-program students will be greater than 100. Suppose that both
researchers collect and analyze some data, and both find results that are
consistent with their predictions. Each researcher can be sure that only 5
times out of 100 would the mean of the summer-program students' IQ

significantly greater than the population mean by chance. However, neither can be totally confident that his or her explanation for the
results is correct. Rejection of the null hypothesis and support of the alternative hypothesis lend confidence to the results found, but not to the
explanation given. The explanation may or may not be correct; it is vulnerable to all of the subjectivity, wishful thinking, and faulty reasoning
that humans are heir to. The best explanation will emerge only after other
carefully designed investigations have been conducted.
scores be

How TO Do Sctnxcn
Conducting scientific research, like any project, involves a series of
steps. In general, the steps are the same for any scientific research project,
only the specifics differ from project to project. These steps are outlined in

figure 3.6.
The first step is to identify the topic to be studied. At first this can be
somewhat difficult, not because there are so few topics to choose from,
but because there are so many.
One way to begin is to think about courses you have had in psychology and related fields. What courses were your favorites? Thumb

55

Figure3.6Thestepsinconductingscientificresearch
Step

ldentifY a toPic

-"t

Communicate

results

Learn about the topic

,","'t0111'1,,,,,
3t"p
Analyze

\..0

,r"{

data
\
step5

Step

t"'-ltitJtne'is

steP 4
Design the studY

Collect data^/

which chapters did


through your old introductory psychology textbook'
with faculty or
i's
consult
to
a"oinut approach
f." fi?a ^ost fascinating? conductin[
are more
there
Often'
research'
other students who ut!
address'
to
time
has
projects that need to be done than ariy single.person
research
interestin$i
Choose a topic to research that you fit a plrtiiularly
especially
aren't
you
if
is a time-.onr.r,,'ing task u.rd .ur, becoml tedious
curious about the qlestions you have asked'
step is to learn about
Once yor' havi identified a topic, the second
is your primary
library
what has already been done in the area. The
on your topic
research
source for this iniormation. The results of previous
journals
Research
Pubmay be found in whole books or in book cirapters'
they also publish
but
projects,
research
individual
of
lish descriptions
projects' (For more
review articles, which describe the resulis of many
see aPPendix
searches,
information uborrt journal articles and literature
about
information
of
C.) Courses and. textbooks can also serve as sources
actual
is
*'itlY|ile,
Very
considered, but often
an area. Less frequently

CorresPond.ence*itntheexpertsandresearchersinafield.Theseindividdetails and nuances of their


uals can provide valuable information about
work that would be unavailable elsewhere'
aPPear during your
The impetus for specific research projects may
study doubtful and
one
of
review of the area. Perhaps you find the reiults
phenomparticular
a
detect
wish to replicate it. Maybe you'd like to try to
to
combine
decide
might
You
enon under a different set of circumstarr.ur.
a
series of
in
next
the
ideas from two different studies or to cond'uct
discover
to
is
experiences
projects. The only way to learn from others'
what others have done'
question and form a
The third step is to focus on a specific research
a general area of
from
hypothesis. This entails narrowing your focus
Your predicted
answer'
to
research to a specific question that you want

56

('lraP[1.p'l'llt.t.t,
I

ill-rswcr t. the research question. is


the hypothesis. F{ypotlrcscs can Lre
derived from theory or from previous
."r"u..t-, or may simply reflect curiosity about a topic.
. f."thu-ps you have been learning about the research conducted on eating disorders.and, during the same tlme
period, you rearnedabout oper_
ant conditioning in one of your psychology
.turrlr. y;;At
wonder if
operant conditioning can be usid- to
mJdify eating beha#ors by using
positive reinforcement to increase eating.
A hypotriesis needs to be pre_
cisely stated in a testable manner; thereLre,
this hypothesis needs to be
honed some' Perhaps it develops into
the-iollowing statement: participants who receive positive reinforcement
contingent upon eating will
consume more food than do participants
who receive no positive reinforcement. As you learn more about."ruu..h,
you,ll see that the research
question and hypothesis guide how
u rt"a/ i, designed.
The fourth step involves designing
yo,lr r,"ay so that the results will
either support or refute your hypotheJir.
rni, is when you decide exactly
you will make yorr observatior,r. H".,
!o*
define your terms. Continuing with
orr. uuttriglr""l#il?:
terms "positive reinforcement contingent
upon eating,, and ,,consume
more food" tt"* to be operationally
iefinui. vtuyue positive reinforcement will be defined as complimentary
statements about the participant,s
hair and clothing made *ithi' one second
after the participant eats a
potato chip. Having thus defined the
food us potuto chips, we might mea_
sure the consumption of potato chips
i' g'ru*r. Th; research design
implied in this hypothesis involves
u_. Jrp"rr-".,tut group (which
receives the positive reinforcement)
and a contror group. Many other
types of research designs can be tm"a
to test hypotheses; each has its own
advantages and
The choice tf .eseurch design often
reflects a balance ,disadvantages.
between theienefits and pitfalls
of the design, the practical concerns of the particular situalion,
ur-,i-f".ronal preference.
Many other specific decisions about
yo.rl ,rray must also be made
during this stage. who will be the particila;;
will you need? will they be testediog"tr*r, * yg-r, study? How many
ir-, smulr groups, or individu_
ally? How will the potaio chips
b" p.Jr"r,t"iz wiil the same experimenter
interact with all of the particiiantsf
where and when will this Jxperiment
take plate? How long-will ii take?
should the participants be asked to
refrain frgm eating foisome amount
of time prior to the study? when will
the participants ue tota the true
purpose of the study? As the questions
are
answered and the experiment begins
a:.ruk: J*", it is important to keep a
wary eye out for potential confo,,tndr.
The chall*ru is to design your study
so that the results either clearly
supporro.;;';;J"pport your hypothesis.
The fifth step in conduciing icientific
resea.in i, io u.t.ruily make
your observations and collect y*. data
according to the procedures pre_
scribed in your research design. Here
is where attention io detail during
the design stage
Pays off. It is"often unwise to change the procedures
after

:ff*" lll:

lrt' ltt'st'.tt't'lt

l'r'ot'('ss

57

a sttrcly is unclerway, as this makes it more difficult to intcrprct thc


results. F{owever, even the most experienced researchers are occasionally
surprised by problems that arise while the data are being collected, and
sometimes this means scrapping the project and redesigning the study.
Surprises are not necessarily bad, though, for with every surprise comes a
bit of new information and perhaps the seed of new research efforts.
In the sixth step, the data that have been collected are summarized
and analyzed to determine whether the results support the hypothesis.
This process is called statistical analysis. By using statistical analysis, you
can determine how likely or unlikely it is that your results are due to
chance and with how much confidence you can state that your results
reflect reality.

The seventh step involves interpreting the results of the statistical


analyses and drawing conclusions about your hypotheses. Here you
determine the implications of your results in relation to the topic you
focused on in step 1.
Finally, the eighth step is to communicate your research results to others. In psychology, this is done a number of ways, including conference
presentations and publications. Psychology conferences are an important
venue for presenting research. Many, but not all, conferences review submitted projects and allow the authors of the best projects to present their
work. Other conferences allow all members of the sponsoring organization
to present research. Also, numerous student conferences allow undergraduates to present research to their peers from other institutions. All of these
types of conferences provide excellent opportunities to gain up-to-date
information, to meet with people who are researching an area in which you
are interested, and to become enthused and inspired to conduct research.
Probably the most prestigious way to communicate research results is
by publishing an article in a scholarly journal. Other researchers review
the proposed article and provide the journal editor and author with feedback about how to improve the project and/or manuscript; they also pro-

vide their opinion about whether the article should be published.


(Appendix C relays more information about this topic.) A published
research article has been read by a number of professionals and typically
(but not always) represents excellent research work.

Regardless of whether a research project results in a publication or


presentation, doing research inevitably provides the researcher with new
information, new insights, or simply new questions. Then the cycle begins
again, as a new research project begins to grow in the researcher's mind.

SurvrrvrARY
^
Conducting
scientific research involves being precise arbout what is
studied and how it is studied so that confounds can be avoided. An

stt

('lr.tpt1'1' I lrrt't'

imprlrtant tcrnrs using ()pcr(rti()nal


clcfinitions. Operational definitions differ from dictionary definitions in
that they describe the exact procedures used to produce a phenomenon or
measure some variable.
A good study not only has clearly defined terms but also provides.
consistent results.jThe production of consistent results is called reliabitity.l
As important as reliability is validity.lValidlty is the extent to which a
measurement tool or technique measures what it purports to measurel A
study that is not valid and/ or is not reliable is of no use to the researcher.
When a study is designed well, so as to provide reliable and valid
data for which there is only one explanation, then the study is said to
have good internal validity. Internal validity can be threatened by the
existence of confounds such as experimenter effects or demand characteristics.(If the results of a study may be generalized beyond the original set
of participants, it is said to have strong external validity.l
One way to increase the external validity of a study is by choosing a
sample carefully. Random selection maximizes the probability that the
sample is representative of the population. However, convenience sampling is used more often and is typically followed by random assignment
of participants to different experimental conditions.
Much research in psychology and the other sciences is based on
hypothesis testing. The null hypothesis states that there is no effect of the
independent variable on the dependent variable. If two groups are being
compared, the null hypothesis states that there will be no significant difference between the two groups.
The alternative hypothesis is typically the researcher's prediction.
The researcher might predict that the independent variable will cause an
increase in the dependent variable-that vrould then be the alternative
hypothesis. This particular example would be a one-tailed hypothesis,
because it predicts the direction of the difference. A two-tailed hypothesis
/predicts a difference but does not predict its direction.l
The researcher actually tests the null hypothesis. If the researcher
finds strong enough evidence, he or she will reject the null hypothesis
and thus support the alternative hypothesis. Without strong evidence, the
researcher fails to reject the null hypothesis.
The null hypothesis is rejected, or is not rejected, on the basis of probabilities. If the probability is strong enough, the researcher will reject the
null hypothesis. If in reality the null hypothesis is true, however, the
researcher has made a Type I error. If the researcher fails to reject the null
hypothesis when it is actually false, the researcher has made a Type II
error. Researchers never know if they have made one of these errors, but
they do take measures to reduce the probability of doing to. fn" most
straightforward way to reduce the probability of a Type II error without
increasing the probability pf a Type I error is to increase the number of
participants in the study. I F
irlplrp[.111t stcpr is to cerreftrlly defirle all

llt' ltt'st'lt't'lt l'r(tt'('ss

59

I ('Onrltrcting rcscarrch is not a linear task, but instead tencls ttl Lrc circttnext study is
lar; thc rt'sults ,',1 o." study affect the way in which the
rlesignecl ttnd interPreted. I

IvrponrANT Tsnnas AND CoNcnPrs


alpha (cr)
aliernative (or
hypothesis

research)

beta (B)

confound
convenience (or accidental)
sample
demand characteristics
experimenter effect
external validitY
extraneous variables

internal validitY
null hypothesis
one-tailed hYPothesis

PoPulation
Power

random assignment
random selection
region of rejection

reliability
sample

sampling distribution
significant difference
statistical analYsis

two-tailed hYPothesis
Typ" I error
Typ" II error
validity

operational definition

ExEnctsEs
on fidgeting' What
1. A researcher wishes to look at the effect of stress
oPerto be operationally defined? What are some possible
terms need

ational definitions?
Can a study
2. what is the difference between reliability and validity?
but not
data
reliable
be valid and not reliable? Can a study provide

valid data?

were asked to
3. Suppose I have conducted a study in which participants
sad, or neutral
perform a mood induction task ihat created a happy,
questionnaire
a
complete
to
asked
then
mood. The participants were
be a
characteristics
demand
might
why
about their sense of wellness.
affect the
problem in this study? How .o.rid de"mand characteristics
results?

at Smart U'
4. A researcher wants to create a random sample of students
camPus
A friend suggests that the researcher walk across
"lq
this a random sample?
approach every third Person she encounters. Is
to creIf not, what tip. of sample is it? Can you dsvelop a procedure
ate a random samPle?
and an alternative
5. What is the difference between a null hypothesis
hypothesis?
of chocolate on mood. one group of pirr6.
" I want to investigate the effect
completing a mood.scale.; thc othe r
before
ii.if""ts eats a cf,ocolate bar
eating chocolate' I
participants complete the mood scale without first

Você também pode gostar