Você está na página 1de 32

Samplingand Sampling

Distributions
@ Oxford Cereals
USINGSTATISTICS

7.1 TYPESOF SAMPLING METHODS fromNormallyDistributed


Sampling Populations
SimpleRandomSamples fromNon-Normally
Sampling Distributed
Systematic Samples CentralLimitTheorem
Populations-The
StratifiedSamples
7.5 SAMPLINGDISTRIBUTION
ClusterSamples
OF THE PROPORTION
7.2 EVALUATING SURVEY WORTHINESS 7.6 ct (CD-ROMTOPrq SAMPLTNG
SurveyErrors
FROMFINITEPOPULATIONS
EthicalIssues
EXCELCOMPANIONTO CHAPTER
7
7,3 SAMPLING DISTRIBUTIONS
E7.l Creating Simple Random Samples
7.4 SAMPLING DISTRIBUTIONOF THE MEAN (Without Replacement)
The UnbiasedPropertyof the SampleMean E7 .2 Creating Simulated Sampling Distributions
Error of the Mean
Standard

In this chapter,you learn:


r To distinguish betweendifferent sarnplingmethods
I The conceptof the samplingdistribution
r To compute probabilities related to the sample mean and the sample proportion
r The importance of the Central Limit Theorem
252 CHAPTERSEVEN SamplingandSamplingDistributions

Using Statistics@ Oxford Cereals


Oxford Cerealsfills thousandsof boxesof cereal during an
hour shift. As the plant operationsmanager,you are responsible
monitoringthe amountof cerealplacedin eachbox. To be consi
with packagelabeling,boxesshouldcontaina meanof 368 grams
cereal.Becauseof the speedof the process,the cerealweightvari
from box to box, causingsomeboxesto be underfilled and
overfilled.If the processis not working properly,the meanweighti
the boxescould vary too much from the labelweight of 368 grams
be acceptable.
Becauseweighingeverysinglebox is too time-consuming,
and inefficient, you must take a sampleof boxes.For eachsample
select,you plan to weigh the individualboxesand calculatea
mean.You needto determinethe probability that such a sample
could have been randomly selectedfrom a populationwhosemean
368 grams.Basedon your analysis,you will haveto decidewhetherto maintain,alter,or
down the process.

Jn Chapter6, you usedthe normal distributionto studythe distributionof downloadti


lfor the OurCampus!Web site. In this chapter,you need to make a decisionabout
cereal-fillingprocess,basedon a sampleof cerealboxes.You will learn different
of sampling and about sampling distributions and how to use them to solve b
problems.

7.1 TYPESOF SAMPLINGMETHODS


In Section1.3,a samplewas defined as the portion of a populationthat hasbeenselected
analysis.Ratherthan selectingeveryitem in the population,statisticalsampling
focus on collecting a small representativegroup of the larger population.The resultsof
sampleare then usedto estimatecharacteristicsof the entirepopulation.Therearethree
reasonsfor selectinga sample:

r Selectinga sampleis lesstime-consumingthan selectingeveryitem in the population.


r Selectinga sampleis lesscostlythan selectingeveryitem in the population.
r An analysisof a sampleis lesscumbersomeand more practicalthan an analysisof
entirepopulation.

The samplingprocessbeginsby defining the frame. The frame is a listing of items


makeup the population.Framesare datasourcessuchaspopulationlists, directories,or
Samplesaredrawnfrom frames.Inaccurateor biasedresultscanresultif a frameexcludes
tain portions of the population.Using different framesto generatedata can lead to
conclusions.
After you selecta frame,you draw a samplefrom the frame.As illustratedin Figure7
therearetwo kinds of samples:the nonprobabilitysampleand the probability sample.
7.I : Typesof SurveySarnpling
Mcthods 253

FIGURE
7.1
Types
of samples Typesof SamplesUsed

__"_""
1____
I I
N o n p r o b a b i l i tS
yamples ProbabilitySamples

r:
..,. li
,, ".,,"-..- ,'-,: lr "L'-'*-''-"".-"''''*=
='"-
-+ ii :;: -i =:S'
Judgment Ouota C h u n k C o n v e n i e n c e S i m p l e Systematic Stratified Cluster
Sample Sample Sample Sample Random Sample Sample Sample
Sample

In a nonprobability sample, you selectthe items or individuals without knowing therr


I
probabilitiesof selection.Thus, the theory that has been developedfor probability sampling
s
cannot be applied to nonprobabilitysamples.A common type of nonprobabilitysamplingrs
J
conveniencesampling. In conveniencesampling, items are selectedbasedonly on the fact that
they are easy, inexpensive,or convenient to sample. In many cases,participants are self-
selected.For example,many companiesconduct surveysby giving visitors to their Web site the
opportunity to complete survey forms and submit them electronically.The responsesto these
surveyscan providelargeamountsof dataquickly and inexpensively, but the sampleconsistsof
self-selectedWeb users(seep. 8). For many studies,only a nonprobabilitysamplesuch as a
judgment sample is available. In a judgment sample, you get the opinions of preselected
CS
experts in the subject matter. Some other common proceduresof nonprobability sampling are
he quota samplingand chunk sampling.Theseare discussedin detail in specializedbookson sam-
ds pling methods(seereferencel).
)SS
Nonprobability samplescan have certain advantages,such as convenience,spee{ and low
cost.However,their lack ofaccuracydue to selectionbias and the fact that the resultscannotbe
generalizedmore than offset these advantages.Therefore, you should use nonprobability sam-
pling methods only for small-scalestudiesthat precedelarger investigations.
In a probability sample, you selectthe items basedon known probabilities.Whenever
possible,you should use probability sampling methods. Probability samplesallow you to
make unbiasedinferencesabout the population of interest.In practice,it is often difficult or
for impossible to take a probability sample. However, you should work toward achieving a
lres probability sample and acknowledgeany potential biasesthat might exist. The four types of
'the probability samples most commonly used are simple random, systematic, stratified, and
nain cluster samples.These sampling methods vary in their cost, accuracy,and complexity.

S i mp l e R a n d o mSam ples
f the In a simple random sample, every itern from a frame has the same chanceof selectionas
every other item. ln addition,every sampleof a fixed size has the samechanceof selectionas
every other sample of that size. Simple random sampling is the most elementary random sam-
s that pling technique.It forms the basisfor the other random samplingtechniques.
maps. With simple random sampling, you use r?to representthe san,ple size and lV to represent
)scer- the frame size.You number every item in the frame from I to N. The chancethat you will select
posite any particular member of the frame on the first selectionis l/N.
You select sampleswith replacementor without replacement.Sampling with replace-
r c ' 1. 1 , ment meansthat after you selectan item, you return it to the frame, where it has the sarneprob-
ability of being selectedagain.hnaginethat you have a fishbowl containing.V businesscards.
254 CHAPTER SEVEN Sampling and Sampling Distributions

On the first selection,you selectthe card for Judy Craven.You recordpertinentinformation


and replacethe businesscard in the bowl. You then mix up the cardsin the bowl and selectthe
secondcard. On the secondselection,Judy Cravenhasthe sameprobability of being selected
again,llN.You repeatthis processuntil you haveselectedthe desiredsamplesize,n. However,
usuallyyou do not want the sameitem to be selectedagain.
Sampling without replacementmeansthat onceyou selectan item, you cannotselectit
again.The chancethat you will selectany particularitem in the frame-for example,the busi-
nesscard for Judy Craven-on the first draw is l/N. The chancethat you will selectany card
not previouslyselectedon the seconddrawis now 1 out ofN- L This processcontinues until
you haveselectedthe desiredsampleof sizen.
Regardlessof whether you have sampledwith or without replacement,"fishbowl"
of sampleselectionhavea major drawback-the ability to thoroughlymix the cardsand
domly selectthe sample.As a result,fishbowl methodsarenot very useful.Youneedto use
cumbersome andmorescientificmethodsof selection.
One suchmethodusesa table of random numbers(seeTableE.l) for selectingthe
ple. A tableof randomnumbersconsistsof a seriesof digits listed in a randomly
sequence (seereference 8). Because the numericsystemusesl0 digits(0, 1,2, . . . ,9),
chancethat you will randomlygenerateany particulardigit is equalto the probabilityof
ating any other digit. This probability is 1 out of 10.Hence,if you generatea sequence of
digits, you would expectabout80 to be the digit 0, 80 to be the digit I, and so on. In fact,
who use tablesof randomnumbersusually test the generateddigits for randomnessprior
usingthem.TableE.l hasmet all suchcriteriafor randomness. Becauseeverydigit or
of digits in the table is random, the table can be readeitherhorizontallyor vertically.The
gins of the table designaterow numbersand column numbers.The digits themselves
groupedinto sequences of five in orderto makereadingthe tableeasier.
To use sucha table insteadof a fishbowl for selectingthe sample,you first needto
codenumbersto the individual membersof the frame.Then you generatethe random
by readingthe table of randomnumbersand selectingthoseindividualsfrom the frame
assignedcode numbersmatch the digits found in the table.You can betterunderstand
processof sampleselectionby examiningExample7.1.

E X A M P L E7 . 1 SELECTINGA SIMPLERANDOMSAMPLEBY USING


A TABLEOF RANDOMNUMBERS
wantsto selecta sampleof 32full+imeworkersfroma population
A company of 800ful
employeesin order to collect information on expendituresconcerninga
dentalplan. How do you selecta simplerandomsample?

SOLUTION The companydecidesto conductan email survey.Assumingthat not


will respondto the survey,you need to send more than 32 surveysto get the necessary
responses. Assumingthat 8 out of l0 fuIl+ime workerswill respondto sucha survey(thati
responserateof 80%),you decideto send40 surveys.
The frame consistsof a listing of the namesand email addresses of all N: 800 ful
employeestaken from the companypersonnelfiles. Thus, the frame is an accurateand
plete listing of the population.To selectthe randomsampleof 40 employeesfrom this
you usea table of randomnumbers.Becausethe populationsize (800) is a three-digit
eachassignedcodenumbermust alsobe threedigits sothat everyfull-time workerhasan
chanceof selection.You assigna codeof 001 to the first full-time employeein the
listing, a codeof 002 to the secondfull-time employeein the populationlisting, andsoon,
a codeof 800 is assignedto the Nth full+ime worker in the listing. BecauseN: 800is
largestpossiblecodedvalue,you discardall three-digitcode sequences greaterthan 800
is, 801through999 and000).
7.1: Typesof SurveySampling
Methods 255

To selectthe sirnple random sample, you choose an arbitrary starting point from the table
of random numbers. One method you can use is to close your eyes and strike the table of ran-
dom numberswith a pencil. Supposeyou used this procedureand you selectedrow 06, column
05, of Table 7.1 (which is extractedfrom TableE.l) as the startingpoint. Although you can go
in any direction, in this example,you read the table from left to right, in sequencesof three dig-
its, without skipping.

T A B L E7 . 1 Column
Usinga Tableof 00000 00001 11r11 tttt2 ) ' r ) r ) ) ) ) ) 7 33333 33334
Random Numbers Row 12345 67890 12345 67890 12345 67890 12345 67890
0l 49280 88924 35779 00283 81163 072',75 89863 02348
02 61870 41657 07468 08612 98083 97349 20775 4s091
03 43898 6s923 2s078 86129 78496 97653 91550 08078
04 62993 93912 30454 84s98 56095 20664 12872 64647
05 33850 58555 51438 85507 71865 79488 76783 31708
Begin 06 97340 03364 88472 04334 63919 36394 l 10e5 92470
sefection 07 70543 29776 10087 10072 s5980 64688 68239 20461
(row 06, 08 89382 93809 00796 95945 34101 81277 66090 88872
'72142
column5) 09 37818 67140 50785 22380 16703 53362 44940
10 60430 22834 14130 96593 23298 56203 926',71 1s92s
rl 82975 661s8 84731 19436 ss190 69229 28661 r367s
12 39087 71938 403s5 s4324 08401 26299 49420 59208
13 ss700 24s86 93247 32s96 I 1865 63397 44251 43189
'7s912
14 l4ts6 23991 78643 83832 32768 18928 s7010
15 32166 53251 70654 92827 6349t 04233 33825 69662
l6 23236 73751 31888 81718 06546 83246 47651 04877
17 45794 26926 15130 82455 78305 55058 52551 47182
18 09893 20505 14225 68514 46427 56788 96297 78822
19 s4382 74598 91499 t4523 68479 27686 46162 83554
20 94750 89923 37089 20048 80336 94598 26940 36858
21 10297 34135 53140 33340 42050 82341 44104 82949
22 85157 47954 3297e 26s15 57600 40881 122s0 13142
23 l l 100 02340 12860 74691 96644 89439 28107 25815
24 36871 50775 30592 57r43 17381 68856 25853 35041
2s 23913 48357 63308 16090 51690 54607 72407 55538
Source: Partialll, extractcd.froniThe Rand Corporation. A Million Random Digits with 100,000Norrnal Deviates
(Glencoe, lL: The Free Press, 1955.1 and displayed in TableE.l in Appendi.r E.

The individualwith codenumber003 is the first full-timeemployeein the sample(row 06


andcolumns05-07),thesecondindividualhascodenumber364(row06 andcolumns08-10),
and the third individualhascodenumber884. Becausethe highestcodefor any employeeis
800,youdiscardthenumber884.Individuals with codenumbers720,433,463,363, 109,592,
4J0, and705 areselectedthird throughtenth,respectively.
You continuethe selectionprocessuntil you get the requiredsamplesizeof 40 full-time
n employees. Duringtheselectionprocess,ifany three-digitcodedsequence you include
repeats,
il the employeecorresponding to thatcodedsequence againaspartof the sampleif you aresam-
e pling with replacement.Youdiscardthe repeatingcodedsequence if you aresamplingwithout
rt replacement.
256 CHAPTERSEVEN SamplingandSamplingDistributions

SystematicSamples
In a systematicsample,you partition the N items in the frame into n groupsof /citems,

A/

Youroundfr to the nearestinteger.To selecta systematicsample,you choosethe first itemto


selectedat random from the first f items in the frame.Then, you selectthe remainingn -
itemsby taking everykth item thereafterfrom the entireframe.
If the frameconsistsof a listingof prenumbered checks,salesreceipts,or invoices,
a
tematicsampleis fasterand easierto takethana simplerandomsample.A systematic
is also a convenientmechanismfor collectingdatafrom telephonebooks,classrosters,
consecutiveitems coming off an assemblyline.
To takea systematicsampleof n: 40 from the populationof N: 800 employees, you
tition the frameof 800 into 40 groups,eachof which contains20 employees. Youthenselect
randomnumberfrom the first 20 individualsand include everv twentiethindividual after
first selectionin the sample.For example,if the first randomnumberyou selectis 008,
subsequent selections
are028,048,068,088, 108,. . .,768, and788.
Although they are simpler to use, simple random samplingand systematicsampling
generallylessefficient than otheq more sophisticated, probabilitysamplingmethods.E
greaterpossibilitiesfor selectionbias and lack of representation
of the population
tics occur when using systematicsamplesthan with simple randomsamples.If thereis a
tern in the frame,you could havesevereselectionbiases.To overcomethe potentialproblem
disproportionaterepresentationof specific groups in a sample,you can use either stratified
samplingmethodsor clustersamplingmethods.

Stratified Samples
In a stratified sample,you first subdividethe N items in the frame into separatesubpopula-
tions, or strata. A stratais defined by somecommoncharacteristic,suchas genderor yearin
school.You selecta simple randomsamplewithin eachof the strataand combinethe results
from the separatesimplerandomsamples.Stratifiedsamplingis moreefficient thaneithersim-
ple randomsamplingor systematicsamplingbecauseyou are ensuredof the representation of
items acrossthe entire population.The homogeneityof items within each stratumprovides
greaterprecisionin the estimatesof underlyingpopulationparameters.

E X A MP L E 7 . 2 SELECTINGA STRATIFIEDSAMPLE
A companywantsto selecta sampleof 32 full-time workersfrom a populationof 800 full-time
employeesin orderto estimateexpenditures from a company-sponsored dentalplan. Of the
full-time employees,25Yoaremanagersand,75Yo are nonmanagerial workers.How do you
selectthe stratified samplein order for the sampleto representthe correctpercentageof man-
agersand nonmanagerialworkers?
SOLUTION If you assumean 80% responserate,you needto send40 surveysto get the nec-
essary32 responses.The frame consistsof a listing of the namesand email addressesof all
N: 800 full-time employeesincludedin the companypersonnelfiles. Because25o/o of the full-
time employeesare managers,you first separatethe population frame into two strata:a sub-
populationlisting of all 200 managerial-levelpersonneland a separatesubpopulationlisting of
all 600 full-time nonmanagerialworkers.Becausethe first stratumconsistsof a listing of 200
managers,you assignthree-digitcode numbersfrom 001 to 200. Becausethe secondstratum
7.1: Typesof SurveySamplingMethods 257

containsa listing of 600 nonmanagerialworkers,you assignthree-digitcode numbersfrom 00 I


to 600.
To collect a stratified sample proportional to the sizes of the strata,you select 25o/oof the
overall sample from the first stratum andT5oh of the overall sample from the secondstratum.
You take two separatesimple random samples,each of which is basedon a distinct random
startingpoint from a table of random numbers(Table E.l). In the first sample,you select l0
managersfrom the listing of 200 in the first stratum,and in the secondsample,you select30
nonmanagerialworkers from the listing of 600 in the second stratum. You then combine the
resultsto reflect the composition of the entire company.

C l u ste rS a mp les
In a cluster sample,you divide the N items in the frame into severalclustersso that eachclus-
ter is representativeof the entire population. Clusters are naturally occurring designations,
suchas counties,electiondistricts,city blocks,households,or salesterritories.You then take a
random sampleof one or more clustersand study all items in each selectedcluster.If clusters
are large,a probability-basedsampletaken from a singleclusteris all that is needed.
Clustersamplingis often more cost-effectivethan simple random sampling,particr-rlarly if
the population is spread over a wide geographic region. However, cluster sampling often
requires a larger sample size to produce results as precise as those frorn sirnple random sam-
pling or stratifiedsampling.A detaileddiscussionof systematicsampling,stratifiedsampling,
and cluster sampling procedurescan be found in reference L

Learningthe Basics ffiffi$ 7.5 Youwantto selecta randomsampleof n : I


lAsslsrl frorn a populationof threeitems (which are
n Eil 7 . 1 F o r a p o p u l a t i o nc o n t a i n i n gl / : 9 0 2 i n d i -
calledA, B, andC). The rule for selecting the
S lAsslsrI viduals,what code numberwould you assignfor sample is: Flipa coin;if it is heads,
pickitemA:if it is tails,
a. the first person on the list? flip thecoinagain;thistime,if it is heads,chooseB; if it is
f b. the fortieth person on the list? tails,chooseC. Explainwhy this is a probabilitysample
S c. thelast personon the list'l but not a simplerandomsample.
7,2 For a populationof N - 902, verify that by startingin
row 05, column I of the table of random numbers(Table 7.6 A populationhasfourmembers (calledA, B,
E . 1 ) y, o u n e e do n l y s i x r o w s t o s e l e c ta s a m p l eo f n : 6 0 C. and D). You would like to selecta randonr
withoutreplacement. sampleof n - 2, which you decideto do in the
following way: Flip a coin; if it is heads,the samplewill be
7.3 Civen a population of N: 93, starting in
![ffi| items I and B; if it is tails, the samplewill be items C and
l A s s l s r l r o w 2 9 o f t h e t a b l e o f r a n d o m n u m b e r s( T a b l e
D. Although this is a random sample,it is not a simple ran-
E.1), and readingacrossthe row, selecta sample
dom sample.Explain why. (lf you did Problem 7.5, com-
o f n= 1 5 pare the procedure described there with the procedure
a. without replacement.
describedin this problem.)
b, v'ith replacement.
lt 7.7 The registrar of a college with a population of 1/ :
Applyingthe Concepts
l- 4,000 full-time studentsis asked by the presidentto con-
t- 7.4 For a study that consistsof personalinterviewswith duct a survey to measuresatisfactionwith the quality of
rf participants(rather than mail or phone surveys),explain life on campus.The table at the top of page 258 containsa
0 whysimplerandom samplingmight be less practicalthan breakdown of the 4,000 registeredfull-tirne students,by
n someother sampling methods. genderand classdesignation:
2s6 CHAPTERSEJ npling and SamplingDistributions

oq

6
). o
assDesignation and thereforeaccommodates 200 students.It is college
(D
-a policy to fully integratestudentsby genderand class
r6 Jr. Sr. Total designation on eachfloor of eachdormitory.If theregis-
^\
p1 ^)5_ +
a
b.J 500 480 2,200 trar is able to compile a frame througha listing of all
O (rl
studentoccupantson eachfloor within eachdormitory,
:2 * o --.1 400 380 1,800
53-ro whattypeof sampleshouldyou take?Discuss.
a)2 900 860 4,000
probabilitysampleof n: 7.8 Prenumbered salesinvoicesarekeptin a salesjournal.
----.vsultsfrom the sampleto the The invoicesarenumberedfrom 0001to 5,000.
ol'tfiiT;:iflll'i.ni,ou,,, a. Beginningin row 16,column l, and proceeding hori-
m rl",r,un zontallyin TableE.1, selecta simplerandomsampleof
alphabetical listing of the namesof ltt ,r,l: 4,000regis- 50 invoicenumbers.
teredfull+ime students, what type of samplecouldyou b. Selecta systematicsampleof 50 invoicenumbers.Use
take?Discuss. the random numbersin row 20, columns5-7, as the
b. What is the advantageof selectinga simple random startingpoint for your selection.
samplein (a)? c. Are the invoicesselectedin (a) the sameas those
c. What is the advantage of selectinga systematicsample selectedin (b)?Why or why not?
in (a)?
d. If the frameavailablefrom theregistrartfiles is a listing 7.9 Supposethat 5,000salesinvoicesare sepa-
of the namesof all N: 4,000registeredfull-time stu- rated into four strata. Stratum 1 contains50
dentscompiledfrom eight separatealphabeticallists, invoices,stratum2 contains500 invoices,stra-
basedon the genderand classdesignationbreakdowns tum 3 contains1,000invoices,and stratum4 contains
shownin the classdesignation table,what type of sam- 3,450invoices.A sampleof 500 salesinvoicesis needed.
ple shouldyou take?Discuss. a. Whattype of samplingshouldyou do?Why?
e. Supposethat eachof the N : 4,000registeredfull-time b. Explain how you would carry out the samplingaccord-
students livedin oneof the20 campusdormitories.Each ing to the methodstatedin (a).
dormitorycontainsfour floors,with 50 bedsper floor, c. Why is the samplingin (a) not simplerandomsampling?

7.2 EVALUAilNG SURVEYWORTHINESS


Surveysare often usedto collect samples.Nearly every day,you reador hearaboutsurveyor
opinionpoll resultsin newspapers, on the Internet,or on radio or television.To identify surveys
that lack objectivityor credibility,you must critically evaluatewhat you readandhearby exam-
ining the worthinessof the survey.First,you mustevaluatethe purposeof the survey,why it was
conducted"and for whom it was conducted.An opinion poll or a surveyconductedto satisfr
curiosityis mainly for entertainment. Its resultis an end in itself ratherthana meansto anend"
You shouldbe skepticalofsuch a surveybecausethe resultshouldnot be put to furtheruse.
The secondstep in evaluatingthe worthinessof a survey is to determinewhetherit was
basedon a probabilityor nonprobability sample(as discussed in Section7.1).Youneedto
rememberthat the only way to makecorrectstatisticalinferencesfrom a sampleto a population
is throughthe useof a probabilitysample.Surveysthat usenonprobabilitysamplingmethods
aresubjectto serious,perhapsunintentional, biasesthatmay rendertheresultsmeaningless, as
illustratedin the followingexamplefrom the 1948U.S.presidential election.
In 1948,major pollsterspredictedthe outcomeof the U.S.presidentialelectionbetween
Harry S.Truman,the incumbentpresident,andThomasE. Dewey,then governorof NewYorb
as going to Dewey.The ChicagoTribunewas so confidentof the polls' predictionsthat
printed its early edition basedon the predictionsratherthan waiting for the ballotsto
counted.
An embarrassed newspaper andthe pollstersit hadreliedon hada lot of explainingto
Why were the pollstersso wrong? Intent on discoveringthe sourceof the error, the
found that their useof a nonprobabilitysamplingmethodwas the culprit (seereference7).
a result,polling organizations adoptedprobabilitysamplingmethodsfor futureelections.
7.2: EvaluatingSurveyWorthiness 259

Survey Error
Even when surveysuse random probability sampling methods,they are subjectto potential
errors.Thereare four typesofsurvey errors:
r Coverageerror
r Nonresponseerror
r Samplingerror
r Measurementerror
Good survey researchdesignattemptsto reduceor minimize thesevarious types of survey
error. often at considerablecost.

Coverage Error The key to proper sampleselectionis an adequateframe. Remember,a


frame is an up-to-datelist of all the items from which you will selectthe sample.Coverage
error occurs if certain groups of items are excluded from this frame so that they have no
chanceof being selectedin the sample.Coverageerror resultsin a selectionbias. If the frame
is inadequatebecausecertaingroupsof itemsin the populationwerenot properlyincluded,any
randomprobability sampleselectedwill providean estimateof the characteristics of the frame,
not the actual population.

Nonresponse Error Not everyoneis willing to respondto a survey.In fact, researchhas


shownthat individualsin the upperand lower economicclassestendto respondlessfrequently
to surveysthando peoplein the middle class.Nonresponseerror arisesfrom the failure to col-
lect dataon all items in the sampleand resultsin a nonresponsebias. Becauseyou cannot
alwaysassumethat personswho do not respondto surveysaresimilarto thosewho do, you need
to follow up on the nonresponsesafter a specified period of time. You should make several
attemptsto convincesuchindividualsto completethe survey.The follow-up responses arethen
comparedto the initial responses in orderto makevalid inferencesfrom the survey(reference1).
The modeof responseyou useaffectsthe rate of response.The personalinterviewand the
telephoneinterviewusuallyproducea higherresponseratethan doesthe mail survey-but at a
highercost.The following is a famousexampleof coverageerror and nonresponse error.
In 1936,the magazineLiterary Digest predictedthat GovernorAlf Landonof Kansas
would receive57o/oof the votes in the U.S. presidentialelection and overwhelminglydefeat
PresidentFranklin D. Roosevelt'sbid for a secondterm. However,Landon was soundly
defeatedwhen he receivedonly 38% ofthe vote. Sucha large error in a poll conductedby a
well-knownsourcehad neveroccurredbefore.As a result,the predictiondevastated the maga-
zine'scredibility with the public, eventuallycausingit to go bankrupt.Literary Digestthought
it had done everythingright. It had basedits predictionon a huge samplesize,2.4million
respondents, out of a surveysentto l0 million registeredvoters.What went wrong?Thereare
two answers:selectionbias and nonresponse bias.
To understandthe role ofselectionbias, you needsomehistoricalbackground.In 1936,the
United Stateswas still sufferingfrom the GreatDepression.Not accountingfor this, Literary
Digest compiledits frame from suchsourcesastelephonebooks,club membershiplists, mag-
azinesubscriptions,and automobileregistrations(reference2). Inadvertently,it chosea frame
primarily composedof the rich and excludedthe majority of the voting population,who, dur-
ing the GreatDepression,could not afford telephones,club memberships,magazinesubscrip-
tions, and automobiles.Thus,the 57o/oestimatefor the Landonvote may havebeenvery close
to the framebut certainlynot the total U.S.population.
Nonresponseerror produceda possiblebiaswhenthe hugesampleof l0 million registered
votersproducedonly 2.4 million responses. A responserateof only 24o/ois far too low to yield
accurateestimatesof the populationparameterswithout someway of ensuringthat the 7.6 mil-
lion individual nonrespondents have similar opinions.However,the problem of nonresponse
bias was secondaryto the problemof selectionbias.Even if all l0 million registeredvotersin
the samplehad responde{this would not havecompensated for the fact that the compositionof
the framediffered substantiallyfrom that of the actualvoting population.
260 CHAPTERSEVEN SamplingandSamplingDistributions

Sampling Enor A sampleis selectedbecauseit is simpler,lesscostly,andmoreefficient.


However,chancedictateswhich individualsor itemswill or will not be includedin the sample.
Sampling error reflectsthe variation,or "chancedifferences,"from sampleto sample,based
on the probability of particularindividualsor itemsbeing selectedin the particularsamples.
When you read aboutthe resultsof surveysor polls in newspapersor magazines,thereis
often a statementregardinga marginof error, suchas"the resultsof this poll areexpectedto be
within +4 percentagepointsof the actualvalue."This marginof error is the samplingerror.You
canreducesamplingerror by taking largersamplesizes,althoughthis alsoincreasesthe cost
conductingthe survey.

Measurement Error In the practiceof good surveyresearch,you designa q


with the intentionof gatheringmeaningfulinformation.But you havea dilemmahere:
meaningfulmeasurements is often easiersaidthan done.Considerthe following proverb:
A personwith one watchalwaysknowswhat time it is;
A personwith two watchesalwayssearchesto identi$ the correctone;
A personwith ten watchesis alwaysremindedof the difficulty in measuringtime.
Unfortunately,the processof measurementis often governedby what is convenient,
whatis needed.The measurements you get areoften only a proxy for the onesyou reallydesi
Much attentionhas been given to measurementerror that occurs becauseof a weakness
questionwording (reference3). A questionshouldbe clear,not ambiguous.Furthermore,
orderto avoidleadingquestions,you needto presentthem in a neutralmanner.
Threesourcesof measurementerror are ambiguouswordingof questions,the halo
and respondenterror. As an example of ambiguouswording, in November 1993,the
Departmentof Labor reportedthat the unemploymentrate in the United Stateshadbeen
timatedfor more than a decadebecauseof poor questionnairewording in the Current
Survey.In particular,thewordinghadled to a significantundercountof womenin thelabor
Becauseunemploymentratesare tied to benefit programssuchas stateunemployment
sation,surveyresearchershad to rectiff the situationby adjustingthe questionnairewording.
The "halo effect" occurs when the respondentfeels obligatedto pleasethe intervi
Properinterviewertrainingcanminimizethe halo effect.
Respondenterror occursasa resultofan overzealousor underzealous effort by the
dent.You canminimize this error in two ways:(l) by carefullyscrutinizingthe dataand
backthoseindividualswhoseresponses seemunusualand(2) by establishinga programof
dom callbacksin orderto determinethe reliability of the responses.

Ethicallssues
Ethical considerationsarise with respectto the four types ofpotential errors that can
when designingsurveysthat useprobability samples:coverageerror, nonresponse error,
pling error,andmeasurement error.Coverageerror canresultin selectionbiasand
ethical issueif particulargroupsor individuals arepurposelyexcludedfrom the frameso
the surveyresultsare more favorableto the survey'ssponsor.Nonresponseerror can
nonresponsebias andbecomesan ethica\ issueif the sponsorknowingly designsthe
that particular grcups or individuals are less likely than othersto respond.Sampling
becomes an ethical issue if the findings are purposely presented without reference to
size and margin of error so that tfte sponsor can promote a viewpoint that might ofherwise
truly insignificant.Measurementerror becomesan ethicalissuein oneof threeways:(1) a
vey sponsorchoosesleadingquestionsthat guidethe responsesin a particulardirection;(2) an
interviewer,throughmannerismsand tone,purposelycreatesa halo effector otherwiseguides
the responsesin a particulardirection;or (3) a respondentwillfully providesfalseinformation.
Ethical issuesalso arisewhen the resultsof nonprobabilitysamplesare usedto form conclu-
sionsaboutthe entire population.When you use a nonprobabilitysamplingmethod,you needto
explainthe samplingproceduresandstatethat the resultscannotbe generalizedbeyondthe sample.
7.3: Sampling
Distribution 261

Applyingthe Concepts 7.14 Oily l0% of Americansratedtheir financialsitua-


tion as "excellent,"accordingto a GallupPoll takenApril
7.1O *A surveyindicatesthat the vastmajorityof college 10-13,2006.However,4lohratedtheirfinancialsituation
students own their own personalcomputers."What infor- as "good," whlle 37o/osaid "only fair" and l2o/o"poor"
mationwould you want to know beforeyou acceptedthe (J. M. Jones,'AmericansMore WorriedAbout Meeting
resultsof this survey? BasicFinancialNeeds,"The GallupPoll, galluppoll.com,
7.11 A simple random sampleof n : 300 full-time Aprll25,2006). What additionalinformationwould you
employees is selectedfrom a companylist containingthe wantto knowbeforeyou accepted theresultsofthe survey?
namesof all N : 5,000full-time employeesin orderto 7.15 Researchers studiedrepeatpurchasesfrom two
job satisfaction.
evaluate onlinegrocers.Validresponsesfrom 1,150customers indi-
a. Givean exampleof possiblecoverageerror. catedthat 28.7%placedno furtherordersin the following
b. Givean exampleof possiblenonresponse error. 12 months,35.4%placed1-10 orders,and35.8%placed
c. Givean exampleof possiblesamplingerror. 1l or more orders (K K. Boyer and G. T. M. Hult,
d. Givean exampleof possiblemeasurement error. "CustomerBehaviorin an OnlineOrderingApplication:A
7.12 BusinessprofessorThomasCallarmantraveledto DecisionScoringModel," DecisionSciences,December,
Chinamore than a dozentimes from 2000 to 2005.He 2005,pp. 569-598).What additionalinformationwould
warnspeopleaboutbelievingeverythingthey read about you want to know beforeyou acceptedthe resultsofthis
surveys conducted in Chinaandgivestwo specificreasons. study?
Callarman stated,"First,thingsarechangingsorapidlythat 7.16 A studyinvestigating the effectsof CEO succession
whatyou heartodaymay not be true tomorrow.Second,the on the stockperformanceof largepublicly held corpora-
peoplewho answerthe surveysmay tell you what they tions also investigatedthe demographicsof the newly
thinkyou wantto hear,ratherthanwhatthey really believe" announced CEOs.The meanand standarddeviationof the
(T.E. Callarman,"SomeThoughtson China,"Decision new CEO'sagewere53.3and5.97,respectively. Themean
Line,March,2006,pp. 1, 4344). and standarddeviationof the number of yearsthe new
a. List the four types(or categories)of surveyerror dis- CEO had beenwith the firm were 20.1 and 12.6,respec-
cussedin this section. tively.93.60/oof thenewCEOsheldcollegedegrees, 30.4%
b. Whichcategories bestdescribethetypesof surveyerror held MBAs, and 3.2ohheld doctorates(J. C. Rhim, J. V
discussed by ProfessorCallarman? Peluchette,andI. Song,"StockMarketReactions andFirm
7.13 The gourmetfoods industryis expectedto exceed PerformanceSurroundingCEO Succession: Antecedents
$62billionin salesby theyear2009.A surveyconducted by of Successionand SuccessorOrigin," Mid-American
Packaged Factsindicatesthat one-fifth of Americanadults Journql of Business,Spring2006,pp.2l-30). What addi-
considerthemselves"gourmetconsumers"("Galloping tional information would you want to know before you
Gourmet,"The ProgressiveGrocer, January 7, 2006, accepted the resultsofthis study?
pp.80-81).What additionalinformationwould you want to
knowbeforeyou acceptedthe resultsofthe survey?

7.3 SAMPLINGDISTRIBUTIONS
In many applications,you want to make statisticalinferencesthat use statisticscalculatedfrom
samples to estimate the values of population parameters.In the next two sections,you will
learn about how the sample mean (the statistic) is used to estimatea population mean (a para-
meter) and how the sample proportion (the statistic) is used to estimatethe population propor-
tion (a parameter). Your main concern when making a statistical inference is drawing conclu-
sions about a population, not about a sample. For example, a political pollster is interestedin
the sample results only as a way of estimating the actual proportion of the votes that each can-
didate will receive from the population of voters. Likewise, as plant operations manager for
Oxford Cereals,you are only interestedin using the sample mean calculatedfrom a sample of
cerealboxes for estimatingthe mean weight containedin a population of boxes.
In practice, you select a single random sample of a predeterminedsize from the popula-
tion. The items included in the samole are determined throush the use of a random number
262 CHAPTERSEVEN Sampling Distributions
andSampling

generator,such as a table of random numbers (see Section 7.1 and Table E.l) orby usmg
Microsoft Excel (seepages281 282).
Hypothetically,to use the sample statisticto estimatethe population parameter,you should
examine every possible sample of a given size that could occur.A sampling distribution is the
distribution of the results if you actually selectedall possible samples.

7.4 OF THE MEAN


SAMPLINGDISTRIBUTION
In Chapter 3, several measuresof central tendency, including the mean, median, and mode,
were discussed.Undoubtedly, the mean is the most widely used measureof central tendency.
The sample mean is often used to estimatethe population mean.The sampling distribution of
the mean is the distribution of all possible sample meansif you selectall possible samplesof a
certain size.

The Unbiased Property of the Sample Mean


The samplemean is unbiased becausethe mean of all the possiblesamplemeans(of agiven
sample size,n) is equal to the population mean, pr.A simple example concerning a population
of four administrativeassistantsdemonstratesthis property. Each assistantis askedto typethe
same page of a manuscript. Table 7.2 presentsthe number of errors. This population distribu'
tion is shown in Figure 7.2.

TABLE 7.2 AdministrativeAssistant Number of Errors


V :A
Numberof Errors Ann "l
Madeby Eachof Bob "2
FourAdministrative Carla -Y
- 3: |
Assistants Dave 1' -A
Al - a

FIGURE7.2
Numberof errorsmade
by a populationof four
administrativeassistants

25
N u m b e ro f E r r o r s

When you have the data from a population, you compute the mean by using Equation(7.1),

POPULATION MEAN
dividedby thepopulation
meanis thesumof thevaluesin thepopulation
Thepopulation
size.1y'.

\i r.
L,t
rt-- (7.1)
7.4: Sampling
Distribution
of theMean 263

You computethe populationstandarddeviation,o, using Equation(7.2):

POPUIATIOI!STANDARDDEVIATION

ltxi -r)2
i=l
(7.2'

Thus,for the dataof Table7.1,

3+2+l+4 =2.5errors
It=
O

and

- - - l.lz vrlvlJ
r4
1l

If you selectsamplesof two administrativeassistantswith replacementfrom this population,


thereare 16possiblesamples(Nn :42: 16).Table7.3 liststhe 16 possiblesampleoutcomes.
If you averageall 16 of thesesamplemeans,the meanof thesevalues,!rt, is equalto 2.5,
which is alsothe meanof the populationp.

E 7.3 Administrative
Sample Assistants SampleOutcomes SampleMean
15Samples of
= 2 Administrative I Ann,Ann 3,3 Fr=3
nts from a
onofN=4
2 Ann, Bob 3,2 X2=2.5
iveAssistants Ann, Carla 3 ,I xr:2
Samplingwr'th 4 Ann, Dave 3,4 Xa = 3.5
) Bob,Ann 213 Xt:z.s
6 Bob, Bob 2,2 Xa=2
Bob, Carla 2,1 X1 = 1.5
8 Bob, Dave 2,4 x-s:3
9 Carla, Ann 1,3 Xs =2
l0 Carla,Bob 1,2 &o = 1.5
ll Carla,Carla l, I Xrr=l
t2 Carla, Dave 1,4 xtr:2's
l3 Dave,Ann 413 Xs=3.5
l4 Dave,Bob 4,2 Xu:3
l5 Dave,Carla 4,1 X6=2,5
l6 Dave, Dave 4,4 Yrc:4
ILx :2.5

Becausethe mean of the 16 samplemeansis equal to the populationmean,the sample


meanis an unbiasedestimatorof the populationmean.Therefore,althoughyou do not know
how closethe samplemeanof any particular sampleselectedcomesto the populationmean,
you are at leastassuredthat the meanof all the possiblesamplemeansthat could havebeen
selectedis equalto the populationmean.
264 CHAPTERSEVEN Sampling Distributions
andSampling

Standard Error of the Mean


Figure 7.3 illustratesthe variation in the samplemeans when selectingall 16 possible samples.
In this small example, although the sample means vary from sample to sample, dependingon
which two administrativeassistantsare selected,the sample means do not vary as much as the
individual values in the population. That the samplemeans are less variable than the individual
values in the population follows directly from the fact that each sample mean averagestogether
all the values in the sample.A population consists of individual outcomes that can take on a
wide range of values, from extremely small to extremely large. However, if a sample contains
an extreme value, although this value will have an effect on the sample mean, the effect is
reduced becausethe value is averagedwith all the other values in the sample.As the sample
size increases,the effect of a single extremevalue becomessmaller becauseit is averagedwith
more values.

FIGURE
7.3
S a m p l i n gd i s t r i b u t i o n
of the mean,based
o n a l l p o s s i b l es a m p l e s
c o n t a i n i n gt w o
a d mi n i s t r a t i v ea s s i s t a n t s
Source: Data are from
Table 7.3.

z5
M e a n N u m b e ro f E r r o r s

The value of the standard deviation of all possible sample means, called the standard
error of the mean, expresseshow the sample means vary from sample to sample. Equation
(7.3) defines the standard error of the mean when sampling with replacement or without
replacement(seepage 254) fuom large or infinite populations.

STANDARDERROROF THE MEAN


The standarderror of the mean, ot, is equalto the standarddeviationin the population,o,
dividedby the squareroot of the samplesize,n.

o sa
o t = -T (7.3) nu
^'ln
be
sa

Therefore, as the sample size increases,the standarderror of the mean decreasesby a fac- the
tor equal to the squareroot of the sample size.
You can also use Equation (7.3) as an approximation of the standard error of themean
when the sample is selectedwithout replacement if the sample contains less than 5% of the
entire population. Example 7.3 computes the standard error of the mean for such a situa-
tion. (See the section 7.6.pdf file on the StudentCD-ROM that accompaniesthis bookfor
the case in which more than 5% of the population is contained in a sample selectedwithout
replacement.)
Distribution
7.4: Sampling of theMean 265

E X A M P L E7 . 3 COMPUTINGTHE STANDARDERROROF THE MEAN


Returningto the cereal-fillingprocessdescribedin the UsingStatisticsscenarioon page252,if
you randomlyselecta sampleof 25 boxeswithout replacement from the thousandsof boxes
filled during a shift, the samplecontainsfar lessthan5o/oof the population.Giventhat the stan-
darddeviationof the cereal-fillingprocessis l5 grams,computethe standard errorof themean.

SOLUTfON UsingEquation(7.3)with r :25 ando : 15,thestandard


errorof themeanis

o 15 15
vv--- --_-J
n 425 )
"l

The variationin the samplemeansfor samplesof n:25 is muchlessthanthe variation


individualboxesof cereal(thatis, 07:3 whileo: l5).

Samplingfrom NormallyDistributedPopulations
Now that the conceptof a samplingdistributionhasbeenintroducedandthe standarderrorof
themeanhasbeendefine{ whatdistributionwill the samplemean,X. follow?If you aresam-
pling from a populationthat is normallydistributedwith mean,[^t,and standarddeviation,o,
regardless of the samplesize,r, the samplingdistributionof the meanis normallydistribute4
with mean,pt : p, andstandarderrorof themean,ot.
In the simplestcase,if you takesamplesof sizen: 1, eachpossiblesamplemeanis a sin-
gle valuefrom thepopulationbecause

2r' v
V -_- i = I
A _ ni _ -y
n I

Therefore,if the populationis normallydistribute{ with mean,p, and standarddeviation,o,


the samplingdistributionof X for samplesof n : I mustalsofollow the normaldistribution,
withmeantlt:trandstandarderrorofthemeanox : ot.lt:o. Inaddition,asthesample
sizeincreases, the samplingdistributionof the meanstill follows a normaldistribution,with
ILX : p, but the standarderror of the meandecreases, so that a largerproportionof sample
meansare closerto the populationmean.Figure7.4 on page266 illustratesthis reductionin
variabilityin which500 samples of 1,2, 4,8, 16,and32 wererandomlyselected from a nor-
mally distributedpopulation.From the polygonsin Figure7.4, you can seethat,althoughthe
lRememberthat"only" 500 samplingdistributionof the meanis approximatelylnormalfor eachsamplesize,the sample
outof an infinite
samples meansaredistributedmoretightly aroundthe populationmeanasthe samplesizeincreases.
numberofsamples have To further examinethe conceptof the samplingdistributionof the mean,considerthe
beense/ected,sothatthe Using Statisticsscenariodescribedon page252.The packagingequipmentthat is filling
nmplingdistributions
shown 368-gramboxesof cerealis set so that the amountof cerealin a box is normallydistributed
areonlyapproximationsof with a meanof 368 grams.From pastexperience,you know the populationstandarddevia-
thetrue
distributions. tion for this filling processis 15 grams.
If you randomlyselecta sampleof 25 boxesfrom the manythousands that arefilled in a
dayandthemeanweightis computedfor this sample, what type of result
could you expect?For
example,do you think thatthe samplemeancouldbe 368 grams?200 grams? 365 grams?
The sampleactsasa miniaturerepresentation of thepopulation,so if thevaluesin thepop-
ulationare normallydistributed" the valuesin the sampleshouldbe approximatelynormally
distributed.Thus,if the populationmeanis 368 grams,the samplemeanhasa goodchanceof
beingcloseto 368 grams.
266CHAPTERSEVENSamplingandSamplingDistributions

7.4
FIGURE
Samplinq distributions
of the m-eanfrom 500
samplesof sizesn : 1,
2, 4',B, 16, and32
selectedfrom a normal
population

l
Howcanyoudeterminetheprobabilitythatthesampleof25boxeswillhaveameanbelow
365grams?Fromtheno,,natdistribution(Section6.2),youknowthatyoucanfindthearea
Zvahes"
;;ffi valueX by convertingto standardized
""y
X -p
/,=
o

IntheexamplesinSection6.2,youstudiedhowanysinglevalve,xdiffersfromthemean'Now'm likelihood
^"u", xi ^ayou Tant to determinethe
this example,the value involved is a sample x'
(7.4),io find the zvalte,you substitute for 1tx
X
thata samplemeanis uelow365.In Equaiion
for p, and o7 for o'

DISTRIBUTIONOF THE MEAN i


FINDING Z FOR THE SAMPLING.
the sample mean,x,and the population :
The Z valueis equalto the differencebetYeen
ot'
mean,p,divideduytL'tu"OtAerrorofthemean' .
i

L=--
X -vx _ X - * (7.4)
oN o
4n
7.4: SamplingDistributionof theMean 267

To find the area below 365 grams, from Equation (7 .4),

,-x-lLx - 365- 368= _?- = - 1 . 0 0


oN t5 a
J

42s
The areacorrespondingto Z: -1.00 in TableE.2 is 0.1587.Therefore,15.87%of all the pos-
sible samplesof 25 boxeshavea samplemeanbelow 365 grams.
The precedingstatementis not the sameas sayingthat a certainpercentageof individual
boxeswill havelessthan 365 gramsof cereal.You computethat percentageas follows:

x -P - 365-368 j
7 = = = -0.20
o1515

The area correspondingto Z : -0.20 in Table 8.2 is 0.4207.Therefore,42.07% of the


individualboxesareexpectedto containlessthan 365 grams.Comparingtheseresults,you see
that many more individual boxes than sample means are below 365 grams. This result is
explainedby the fact that eachsampleconsistsof 25 different values,somesmall and some
large.The averagingprocessdilutesthe importanceof any individual value,particularlywhen
the samplesizeis large.Thus,the chancethat the samplemeanof 25 boxesis far awayfrom the
populationmeanis lessthanthe chancethat a singlebox is far away.
Examples7.4 and7.5 showhow theseresultsare affectedby using differentsamplesizes.

MPLE7.4 THEEFFECT
OF SAMPLESIZEn ON THECOMPUTATION
OF o;
Howis thestandard
errorofthemeanaffectedby increasing
thesamplesizefrom25to 100boxes?
SOLUTfONIf r: 100boxes, (7.3)onpage264:
thenusingEquation

o
-T 15 15
6N= -----l.J

.{n r/too lo

The fourfold increasein the samplesizefrom 25 to 100reducesthe standarderror of the mean


by half-from 3 gramsto 1.5 grams.This demonstrates that taking a larger sampleresultsin
lessvariability in the samplemeansfrom sampleto sample.

A M P L E7 . 5 THE EFFECTOF SAMPLESIZEn ON THE CLUSTERING


OF MEANS IN THE SAMPLINGDISTRIBUTION
If you selecta sampleof 100 boxes,what is the probability that the samplemean is below
365 grams?
SOLUTfON UsingEquation(7.4) onpage266,

7=x-ItN - 3 6 5 _ 3 6 8= - 3 = _ 2 . 0 0
oy 15 1.5
ffi
FromTableE.2,thearealessthanZ: -2.00 is 0.0228.Therefore,2.28%of the samplesof 100
boxeshavemeansbelow 365 grams,as comparedwith15.87% for samplesof 25 boxes.
268 CHAPTER SEVEN Samplingand SamplingDistributions

Sometimesyou needto find the intervalthat containsa fixed proportionof the sample
means.You needto determinea distancebelow and abovethe populationmeancontaininga
specificareaof the normalcurve.FromEquation(7.4)on page266,

X -1t
Z_
o
T,
Solvingfor X resultsin Equation(7.5).

FINDINGX TON THE SAMPLINGDISTRIBUTION


OF THE MEAN
y = 1 t + z -r
o
(7.s)
"rln

Example7.6 illustratesthe useof Equation(7.5).

E X A M P L E7 . 6 DETERMININGTHE INTERVALTHAT INCLUDES


A FIXEDPROPORTIONOF THE SAMPLEMEANS
In the cereal-fill example,find an intervalsymmetricallydistributedaroundthe population
meanthatwill include95% of the samplemeansbasedon samplesof 25 boxes.
SOLUTION If 95% of the samplemeansarein the interval,then5%oareoutsidethe interval.
Divide the 5ohinto two equalpartsof 2.5o/o.The valueof Z in TableE.2 corresponding to an
areaof 0.0250in the lowertail of the normalcurveis - I .96,andthe valueof Z corresponding
to a cumulativeareaof 0.975(thatis, 0.025in the uppertail of the normalcurve)is + I .96.The
lowervalueof X (called NL\ and the uppervalueof X (calledXul are foundby using
Equation(7.5):

l5
x ,' = 3 6 8+ ( - 1 . 9 6 )- + = 3 6 8- 5 . 8 8= 3 6 2 . 1 2
^l't<
V--

t5
Y r = 3 6 8 + ( l . 9 6 t- : = = 3 6 8 + 5 . 8 8 = 3 7 3 . 8 8
42s
Therefore, 95o/oof all sample means based on samples of 25 boxes are between 362.12 and
3 7 3 . 8 8s r a m s .

Samplingfrom Non-NormallyDistributedPopulations-
The Central Limit Theorem
Thus far in this section,only the samplingdistributionof the meanfor a normallydistributed
populationhasbeenconsidered. However,in manyinstances, eitheryou know thatthe popula-
tion is not normallydistributedor it is unrealisticto assumethatthepopulationis normallydis-
tributed.An imporlanttheoremin statistics, theCentralLimit Theorem,dealswith thissituation.

THE CENTRALLIMITTHEOREM
The Central Limit Theorem statesthat asthe samplesize(that is, the numberof valuesin
eachsample)gelslarge enough,the samplingdistributionof the meanis approximately
normallydistributed.This is true regardlessof the shapeof the distributionof the individual
valuesin the population.
7.4:Sampling
Distribution
of theMean 269

What samplesizeis largeenough?A greatdealof statisticalresearchhasgoneinto this issue.


As a generalrule, statisticianshavefound that for manypopulationdistributions,whenthe sample
size is at least30, the samplingdistribution of the meanis approximatelynormal. However,you
canapply the CentralLimit Theoremfor evensmallersamplesizesif the populationdistributionis
approximatelybell shaped.In the uncommoncasein which the distributionis extremelyskewedor
hasmorethanonemode,you mayneedsamplesizeslargerthan30 to ensurenormality.
Figure7.5 illustratesthe applicationof the CentralLimit Theoremto differentpopulations.
The samplingdistributionsfrom threedifferentcontinuousdistributions(normal,uniform, and
exponential) for varyingsamplesizes(n :2,5,30) aredisplayed.

7.5
distributionof
meanfor different
,ulations for samples
n = 2 , 5 , a n d3 0

Valuesof X

Valuesof X Valuesof X

Valuesof X Valuesof X Valuesof X

Valuesof X Valuesof X Valuesof X


PanelA PanelB PanelC
NormalPopulation UniformPopulation Exponential
Population
270 CHAPTER SEVEN Samplingand SamplingDistributions

In eachof thepanels,because thesamplemeanhasthepropertyof beingunbiased, themean


of anysamplingdistributionis alwaysequaltothe meanof thepopulation.
PanelA of Figure7.5 showsthe samplingdistributionof the meanselectedfrom a normal
population.As mentionedearlierin this section,whenthe populationis normallydistributed,
the samplingdistributionof the meanis normally distributedfor any samplesize.(Youcan
measurethe variabilityby usingthe standarderrorof the mean,Equation7.3,on page264.)
PanelB of Figure7.5 depictsthe samplingdistributionfrom a populationwith a uniform
(or rectangular) distribution(seeSection6.4).Whensamplesof sizen : 2 areselectedthereis
a peaking,or centrallimiting,effectalreadyworking.For n:5, the samplingdistributionis
bell shaped andapproximately normal.Whenn:30, thesamplingdistribution looksverysim-
ilar to a normal distribution.In general,the largerthe samplesize,the more closelythe
samplingdistributionwill follow a normal distribution.As with all cases,the meanof each
samplingdistributionis equalto themeanof thepopulation,andthevariabilitydecreases asthe
samplesizeincreases.
PanelC of Figure7.5 presentsan exponentialdistribution(seeSection6.5).This popula-
tion is extremelyright-skewed. Whenn : 2, the samplingdistributionis still highlyright-
skewedbut lessso thanthe distributionof the population.For n : 5, the samplingdistribution
is slightlyright-skewed. Whenn:30, the samplingdistribution looksapproximately norma,,
Again,the meanof eachsamplingdistributionis equalto the meanof the population,andthe
variabilitydecreases asthe samplesizeincreases.

VISUAL EXPLORATIONSExploringSamplingDistributions

UsetheVisualExplorations Two DiceProbabilityproce- an emptyfrequencydistributiontableandhistogramanda


dureto observethe effectsof simulatedthrowson the fre- floatingcontrolpanel(seebelow).
quencydistributionof the sum of the two dice.Openthe Click theTally buttonto tally a setof throwsin thefre-
Visual Explorations.xlamacroworkbookon the text's quencydistributiontableand histogram.Optionally,use
CD (seeAppendixD) and selectVisualExplorations) the spinnerbuttonsto adjustthe numberof throwsper
Two Dice Probability (Excel97-2003)or Add-Ins ) tally (round).Click the Help buttonfor moreinformation
VisualExplorations) Two Dice Probability (Excel aboutthis simulation.Click Finish whenvou aredone
2001).The procedureproducesa worksheetthat contains with this exnloration.

.lsirjl
loolr q.td Wndod Yirudtxpbrltiffi leh AdohePDF -8r:
Nunba of tfnoN 1';;-- -:l
l*
pertaly: :J *--') t 9,E-)lil U * ) r o o ' / c- u -
I_
lr ' ralv I
/ !l ===:jJ $ %':d3;13 !3:F _-.1r-A.= J
*l
f-*,"

4 Threes
5 Fous
6 Fives
7 Sixes
I Sevens
9 Eights
10 Nines
11 Tens

i
I

i
Fves Sixes Sevens Eiahts Nines Tens ElevensTwelves
i
-*l

izo
i'1'., r\.t*",t\TwoDi(e/ lrl r l, f. i' ' i
Ready
_.t
7.4:Sampling
Distribution
of theMean 271

Using the resultsfrom the normal, uniform, and exponentialstatisticaldistributions,you


canreachthe followingconclusions regardingthe CentralLimit Theorem:
r For most populationdistributions,regardlessof shape,the samplingdistributionof the
meanis approximately normallydistributedif samplesof at least30 areselected.
I If the populationdistributionis fairly symmetric,the samplingdistributionof the meanis
approximately normalfor samplesassmallas 5.
r If the populationis normallydistributed, the samplingdistributionof themeanis normally
distributed,
regardless of the samplesize.
The CentralLimit Theoremis of crucialimportancein usingstatisticalinferenceto draw
conclusionsabouta population.It allowsyou to makeinferencesaboutthe populationmean
withouthavingto know the specificshapeof thepopulationdistribution.

Learningthe Basics samplemeansand also computethe populationmean.


Are theyequal?What is this propertycalled?
7.17 Givena normal distributionwith p : 100 b. Repeat(a) for all possiblesamples
of n:3.
ando : 10,if you selecta sampleof n : 25, what c. Comparethe shapeof the samplingdistributionof the
is the orobabilitvthat X is meanin (a) and (b). Which samplingdistributionhas
a. lessthan95? lessvariability?Why?
b. between95 and97.5? d. Assumingthat you samplewith replacement, repeat(a)
c. above102.2? through (c) and comparethe results.Which sampling
d. Thereis a 65ohchancethat X is abovewhat value? distributionshavethe leastvariability-those in (a) or
7.18 Givena normaldistributionwith p: 50 (b)?why?
E[@
lAsslsTI ando : 5. if you selecta sampleof r : 100,what 7.21 The diameterof Ping-Pongballsmanufac-
@
is the probabilitythat X is lAsslsr I turedat a largefactoryis approximately
normally
a. lessthan47? distributed,with a meanof 1.30inchesand a
b. between47 and49.5? standarddeviationof 0.04 inch. If you selecta random
c. above5l.l? sampleof l6 Ping-Pongballs,
d. Thereis a35ohchancethat X is abovewhatvalue? a. whatis the samplingdistributionof themean?
b. whatis the probabilitythatthe samplemeanis lessthan
Applying the Concepts
1.28inches?
7.19 For eachof the followingthreepopulations,indicate c. what is the probabilitythat the samplemeanis between
whatthe samplingdistributionfor samplesof 25 would l.3l and1.33inches?
consistof: d. The probabilityis 60%othat the samplemeanwill be
a. Travelexpensevouchersfor a universityin an acade- betweenwhat two values, symmetricallydistributed
mic year aroundthepopulationmean?
b. Absenteerecords(days absentper year) in 2006 for
7.22 The U.S. CensusBureau announcedthat the me-
employeesof a largemanufacturingcompany
dian salesprice of new housessold in March 2006 was
c. Yearlysales(in gallons)ofunleadedgasolineat service
$224,200,while the mean sales price was $279,100
stationslocatedin a particularcounty (www.census.gov/newhomesales, Aprll 26, 2006).Assume
7.20 The following data representthe number of days thatthe standard deviationof thepricesis $90,000.
per yearin a populationof six employees
absent of a small a. If you selectsamplesof r = 2, describethe shapeof the
company: samplingdistribution of X.
b. If you selectsamplesof n: 100,describethe shapeof
l3 67 910
thesamplingdistribution of X.
a. Assumingthat you samplewithout replacement, select c. If you selecta randomsampleof n:100, what is the
all possiblesamplesof n : 2 andconstructthe sampling probability that the sample mean will be less than
distributionof the mean.Computethe meanof all the s250.000?
272 CHAPTERSEVEN SamplingandSamplingDistributions

7.23 Time spentusing email per sessionis nor- InternalRevenueService(IRS), www.irs.gov,was0.8sec-


mallydistributed with p:8 minutesando:2 min- ond. Supposethat the downloadtime wasnormallydistrib-
utes.Ifyou selecta randomsampleof25 sessions, ute4 with a standarddeviationof 0.2 second.If you select
a. what is the probabilitythat the samplemeanis between a randomsampleof 30 downloadtimes,
7.8 and8.2minutes? a. what is the probability that the samplemeanis lessthan
b. what is the probability that the samplemeanis between 0.75second?
7.5 and8 minutes? b. what is the probability that the samplemeanis
c. If you selecta randomsampleof 100 sessions, what is 0.70and0.90second?
the probabilitythat the samplemeanis between7.8 and c. the probability is 80% that the samplemeanis between
8.2minutes? what two values,symmetricallydistributedaroundthe
d. Explain the differencein the resultsof (a) and (c). populationmean?
d. the probability is 90o/othal the samplemeanis lessthan
7,24 Theamountof time a banktellerspendswith
ffi eachcustomerhasa populationmean,p, of 3.I 0
what value?
7.26 The article discussedin Problem7.25 alsoreported
ffi minutesand standarddeviation.o. of 0.40minute.
If you selecta randomsampleof 16customers,
a. what is the probabilitythat the meantime spentper cus-
that the meandownloadtime for the H&R Block Website,
www.hrblock.com, was 2.5 seconds.Supposethat the
tomer is at least3 minutes? downloadtime for the H&R Block Web site was normally
b. thereis an 85%chancethatthe samplemeanis lessthan distributed,with a standarddeviationof 0.5 second.If you
how manyminutes? selecta randomsampleof 30 downloadtimes,
c. What assumptionmust you makein orderto solve(a) a. what is the probability that the samplemeanis lessthan
and(b)? 2.75seconds?
d. If you selecta randomsampleof 64 customers, thereis b. what is the probability that the samplemeanis between
an 85ohchancethat the samplemeanis lessthan how 2.70 and2.90seconds?
manyminutes? c. the probability is 80o/othat the samplemeanis between
what two values symmetrically distributed aroundthe
7.25 The New York Times reported(L. J. Flynn, "Tax populationmean?
Surfing,"TheNewYorkTimes, March 25,2002,p. C10) d. the probability is 90o/othal the samplemeanis lessthan
that the mean time to downloadthe home page for the what value?

7.5 SAMPLINGDISTRIBUTION
OF THEPROPORTION
Considera categoricalvariablethat hasonly two categories,suchasthe customerprefersyour
brandor the customerprefersthe competitor'sbrand.Of interestis the proportionof items
belongingto one of the categories-for example,the proportionof customersthat prefersyour
brand.The populationproportion,represented by n, is the proportionof itemsin the entirepop-
ulation with the characteristicof interest.The sampleproportion,represented by p, is the pro-
portion of items in the samplewith the characteristicof interest.The sampleproportion,a sta-
tistic, is usedto estimatethe populationproportion,a parameter.To calculatethe sample
proportion,you assignthe two possibleoutcomesscoresof I or 0 to representthe presenceor
absenceof the characteristic.You then sum all the I and 0 scoresand divide by n, the sample
size.For example,if, in a sampleof five customers,threepreferredyour brandandtwo did not,
you havethree ls andtwo 0s. Summingthe three ls andtwo 0s anddividing by the samplesize
of 5 givesyou a sampleproportionof 0.60.

SAMPLEPROPORTION
X Numberof itemshavingthecharacteristic
of interest
-r \r.v,
n Samplesize

The sampleproportion,p,takeson valuesbetween0 and 1. If all individualspossess the


characteristic,you assigneacha scoreof l, andp is equalto L If half the individualspossess
7.5: Sampling
Distribution
of theProportion 273

the characteristic, you assignhalf a scoreof I and assignthe other half a scoreof 0, and p is
equal to 0.5. Ifnone ofthe individualspossesses the characteristic,
you assigneacha scoreof
0 , a n d p i s e q u a lt o 0 .
While the samplemean, X, is an unbiasedestimatorof the populationmean,p, the statis-
tic p is an unbiasedestimatorof the populationproportion,fi. By analogyto the samplingdis-
tribution of the mean,the standard error of the proportion, o,,, is given in Equation(7.7).

STANDARD
ERROR
OF THEPROPORTION
n(l - n)
(7.7\
n

If you selectall possiblesamplesof a certain size,the distributionof all possiblesample


proportionsis referredto as the sampling distribution of the proportion. The samplingdis-
t r i b u t i o n o f t h e p r o p o r t i o n f o l l o w s t h e b i n o r n i a l d i s t r i b u t i o n ,a s d i s c u s s e di n S e c t i o n5 . 3 .
However,you can use the normal distributionto approximatethe binon-rialdistributionwhen
nn and n(l - n) are each at least5 (seeSection6.6 on the CD-ROM). In most casesin which
inferencesare made about the proportion,the sample size is substantialenoughto meet the
conditionsfor using the normal approximation(seereferencel). Therefore,in rnanyinstances,
you can use the normal distribution to estimatethe sarnplingdistribution of the proportion.
o
for X,n forp, and
Substitutingp i n E q u a t i o n( 7 . 4 )o n p a g e 2 6 6 ,r e s u l t si n
Equation(7.8). "Jn

FINDINGZ FOR THE SAMPL]NGDISTRIBUTION


OF THE PROPORTION

(7.8)

To illustratethe samplingdistributionof the proportion,supposethat the managerof the


local branch of a savingsbank determinesthat 40o/oof all depositorshave multiple accountsat
the bank. If you selecta randomsampleof 200 depositors,the probabilitythat the sarnplepro-
portion of depositorswith multiple accountsis lessthan 0.30 is calculatedas follows: Because
nn: 200(0.40): 80 > 5 and n(l - n): 200(0.60): 120 > 5, the samplesize is largeenoughto
assumethat the samplingdistributionof the proportion is approximatelynormally distributed.
U s i n g E q u a t i o n( 7 . 8 ) ,
r

't

- 0.40 - 0.r 0
0.30 - 0.r 0
(0.40)(q{0) 6n 0.0346
200 ! ioo
= -2.89

Using Table E.2, the area under the normal curve less than -2.89 is 0.0019. Therefore,the
probabilitythat the sampleproportionis lessthan 0.30 is 0.0019-a highly unlikely event.This
rleans that if the true proportion of successes
in the populationis 0.40, less than one-fifth of
l% of the samplesof n - 200 would be expectedto have sampleproportionsof lessthan 0.30.
274 CHAPTERSEVEN Sampling
andSampling
Distributions

Learningthe Basics ity to distinguishbetweenthe two brands.(Hint: If an


vidual has no ability to distinguishbetweenthe two
7.27 In a randomsampleof 64 people,48 are drinks,then eachbrandis equallylikely to be selected.)
classifiedas"successful." a. What is the probability that the sample will
a. Determinethe sampleproportion,p, of "successful" between50ohand60% of the identificationscorrect?
people. b. The probability is 90% that the samplepercentage
b. If the populationproportionis 0.70, determinethe stan- containedwithin what symmetricallimits of the
darderror ofthe proportion. tion percentage?
7.28 A randomsampleof 50 households was c. What is the probabilitythat the samplepercentage
selectedfor a telephonesurvey.The key question correctidentificationsi s sreaterthan650/o?
asked was. "Do vou or anv member of vour d. Which is more likely to occur-more than 60%o
householdown a cellulartelephonewith a built-in cam- identificationsin the sampleof 200 or more than
era?"Ofthe 50 respondents, l5 saidyesand35 saidno. correctidentificationsin a sampleof I,000?Explain.
a. Determinethe sampleproportion,p,of households with 7.32 A study of women in corporateleadership
cellulartelephones with built-in cameras. conductedby Catalyst,a New York research
b. If the populationproportionis 0.40,determinethe stan- zation.The study concludedthat slightly more than I
darderrorofthe proportion. of corporateofficers at Fortune500 companiesare
7.29 Thefollowingdatarepresent theresponses (Ifor yes (C. Hymowitz,"WomenPut Nosesto the Grindstone,
and N for no) from a sampleof 40 collegestudentsto the Miss Opportunities,"TheWallStreetJournal,Februwy
question"Do you currentlyown sharesin any stocks?" 2004,p. Bl). Supposethat you selecta randomsample
200 corporateofficers, and the true proportionheld
NNYNNYNYNYNNYNYYNNNY womenis 0.15.
NYNNNNYNNYYNNNYNNYNN a. What is the probabilitythat in the sample,lessthanI
of the corporateofficers will be women?
a. Determinethe sampleproportion,p, of collegestudents b. What is the probabilitythat in the sample,betweenI
who own sharesof stock. and lToh of the corporateofficers will be women?
b. If the populationproportionis 0.30, determinethe stan- c. What is the probabilitythat in the sample,between
darderrorofthe proportion. and20o/o of the corporateofficerswill be women?
d. If a sampleof 100is taken,how doesthis change
answersto (a) through(c)?
Applying the Concepts
7.33 The NBC hit comedyFriends was TiVo's
7.30 A politicalpollsteris conductingan analy- popular show during the week of April 18-24,
ffi sis of sampleresultsin orderto makepredictions Accordingto the Nielsenratings,29.7% of TiVo

ffi on election night. Assuming a two-candidate


election,if a specificcandidatereceivesat least
55% of the vote in the sample,then that candidatewill be
in the United StateseitherrecordedFriendsor watched
live ("Prime-Time Nielsen Ratings," USA Today,
28,2004,p. 3D).Suppose you selecta randomsample
i

forecastas the winner of the election.If you selecta ran- 50 TiVo owners.
dom sampleof 100 voters,what is the probabilitythat a a. What is the probabilitythat morethanhalf the peoplei
candidatewill be forecastasthe winner when the sample watched or recordedFriends?
a. thetruepercentage of her voteis 50.1%? b. What is the probability that lessthan 25ohof the
b. the true percentageofher vote is 609io? in the samplewatchedor recordedFriends?
c. the true percentageof her vote is 49oh(and she will c. If a randomsampleof size 500 is taken,how does
actuallylosethe election)? changeyour answersto (a) and (b)?
d. If the samplesize is increasedto 400, what are your
7.34 According to Gallup's annual poll on
answersto (a) through(c)? Discuss.
finances,while mostU.S.workersreportedliving
7.31 You plan to conducta marketingexperi- ably now,manyexpecteda downturnin their lifestyle
mentin which studentsareto tasteoneof two dif- they stop working. Approximately half said they
ferentbrandsof soft drink. Their task is to cor- enoughmoneyto live comfortablynow and expectedto
rectly identify the brandtasted.You selecta randomsample so in the future (J. M. Jones,"Only Half of Non-Reti
of 200 studentsand assumethat the studentshaveno abil- Expectto be Comfortablein Retirement,"TheGallup
Summarv 275

May 2,2006).If you selecta randomsam- a. what is the probability that the samplehasbetween25%
of 200U.S.workers, and3}o/owho do not intendto work for pay at all?
whatis theprobabilitythat the samplewill havebetween b. If a current sampleof 400 Americansages50 to 70
45%and55% who saythey haveenoughmoneyto live employedfull time or part time has 35o/owho do not
comfortablynow and expectto do so in the future? intend to work for pay at all, what can you infer about
theprobabilityis 90% that the samplepercentage will thepopulationestimateof 29%?Explain.
becontainedwithin what symmetricallimits of the pop- c. If a current sampleof 100 Americansages50 to 70
ulationpercentage? employedfull time or part time has 35% who do not
theprobabilityis 95% that the samplepercentagewill intend to work for pay at all, what can you infer about
i be containedwithin what symmetricallimits of the pop- the populationestimateof 29oh?Explain.
ulationpercentage? d. Explain the differencein the resultsin (b) and (c).
Accordingto the NationalRestaurantAssociation, 7.37 The IRS discontinuedrandom audits in 1988.
of fine-dinins restaurantshave instituted policies Instead,the IRS conductsaudits on returnsdeemedques-
ictingthe useof cell phones("BusinessBullelinl' The tionableby its DiscriminantFunctionSystem(DFS), a
StreetJournal,June1,2000,p.Al). If you selecta complicatedand highly secretivecomputerizedanalysis
samoleof 100fine-dininerestaurants. system.In an attemptto reducethe proportion of "no-
whatis theprobabilitythat the samplehasbetween15% change"audits(that is, auditsthat uncoverthat no addi-
and25%that haveestablished policiesrestrictingcell tional taxesare due), the IRS only auditsreturnsthe DFS
phoneuse? scoresas highly questionable. The proportion of no-
theprobabilityis 90% that the samplepercentagewill changeaudits has risen over the years and is currently
becontainedwithin what symmetricallimits of the pop- approximately0.25 (T. Herman,"UnhappyReturns:IRS
ulationpercentage? Moves to Bring Back RandomAudits," The Wall Street
theprobabilityis 95% that the samplepercentagewill Journal,June20, 2002,p. A1). Supposethat you selecta
becontainedwithin what symmetricallimits of the pop- randomsampleof 100audits.What is the probabilitythat
ulationpercentage? the samplehas
Suppose that in January2007,you selecteda random a. between24ohand26o/ono-changeaudits?
sampleof 100fine-dining restaurantsand found that 3 I b. between20o/oand307ono-changeaudits?
hadpoliciesrestrictingthe use of cell phones.Do you c. more than30%ono-changeaudits?
thinkthatthe populationpercentagehaschanged?
7.38 Referringto Problemi.37, theIRS announced thatit
.36 An article(P.Kitchen, "RetirementPlan:To Keep plannedto resumetotally randomauditsin 2002. Suppose
ing,"Newsday,September24,2003) discussedthe that you selecta random sampleof 200 totally random
irementplans of Americansages50 to 70 who were auditsand Ihat 90ohof all the returnsfiled would result in
full time or part time. Twenty-ninepercentof the no-changeaudits.What is theprobabilitythatthe samplehas
saidthat they did not intendto work for pay at a. between89o/oand9lo/ono-changeaudits?
If you selecta randomsampleof 400 Americansages b. between85% and 95o/ono-changeaudits?
to70 employedfull time or part time, c. more lhan95o/ono-chanseaudits?

7.6 6 (CD-ROMTopic)SAMPLING FROM FINITEPOPULATIONS


In this section,samplingwithout replacementfrom finite populationsis considered.For further
discussion, seesection7.6.pdfon the StudentCD-ROMthataccompanies this book.

thischapter,you studiedfour commonsamplingmeth- populationproportion.By observingthe meanweightin a


s-simple random,systematic,stratified and cluster. sampleof cerealboxesfilled by Oxford Cereals,you were
alsostudiedthe samplingdistributionof the sample able to reachconclusionsconcerningthe meanweight in
, theCentralLimit Theorem,and the samplingdistri- the populationof cerealboxes.In the next five chapters,
ionof the sampleproportion.You learnedthat the sam- the techniquesof confidenceintervals and tests of
meanis an unbiasedestimatorof the populationmean, hypothesescommonly used for statisticalinferenceare
thesampleproportionis an unbiasedestimatorof the discussed.
27 6 CHAPTERSEVEN Samplingand SamplingDistributions

PopulationMean Finding X for the SamplingDistribution of the Mean


o
Sr. x =p+z (7.s)
L,t G
p = E= (7.1)
SampleProportion
X
Population Standard Deviation p=- (7.6)
n

\;-{|x , - D ' Standard Error ofthe Proportion


(7.2)
n(l - n)
(7.7)
n
StandardError of the Mean
o Finding Z for the Sampling Distribution of the Proportion
w v - ----= (7.3)
"ln p-n
(7.8)
Finding Z for the Sampling Distri bution of the Mean
4r)
n
t
- -p.
7 - 'Y [x - A (7.4)
6F o
r
1n

CentralLimit Theorem 268 nonresponse error 259 simplerandomsample 253


cluster 257 probabilitysample 253 standarderrorof the mean 264
clustersample 257 samplingdistribution262 standarderrorofthe proportion 273
convenience sampling 253 samplingdistributionof the mean 262 strata 256
coverage error 259 samplingdistributionof the stratifiedsample 256
frame 252 proportion 2'73 systematic sample 256
judgmentsample 253 samplingerror 260 tableof randomnumbers 254
measurement error 260 samplingwith replacement253 unbiased 262
nonprobabilitysample 253 samplingwithoutreplacement254
nonresponse bias 259 selectionbias 259

Checking Your Understanding


7.39 Why is the samplemean an unbiasedesti- 7.42 What is the difference between a probability distri-
@
A S S T SI Tmator of the population mean? bution and a sampling distribution?
I

ffire@
I A S S T S
7.40 Why does the standard error of the mean
T
I decreaseas the sample size,n, increases'J
7.43 Under what circumstancesdoes the sampling distri-
bution of the proportion approximately follow the normal
distribution'l
7.41 Why doesthe samplingdistributionof the meanfol-
low a normaldistributionfor a largeenoughsamplesize, 7.44 What is the difference between probability and non-
eventhoughthepopulationmaynotbenormallydistributed? probability sampling?
Chapter
Review
Problems 277

7.45 What are somepotentialproblemswith using "fish- 7.53 In his managementinformationsystemstextbook,


bowl" methodsto selecta simplerandomsample? ProfessorDavid Kroenkeraisesan interestingpoint: "If
7.45 What is the difference between sampling with 98o/oof our market has Internet access,do we have a
replacement
versuswithout replacement? responsibilityto provide non-Internetmaterialsto that
other2o/o?"(D. M. Kroenke, Using MIS, Upper Saddle
7.47 What is the differencebetweena simple random
Riveq NJ: PrenticeHatL,2007,p. 29a.) Supposethat 98Yo
sampleanda systematicsample?
of the customersin your market have Internet accessand
@ 7.48 What is the differencebetweena simple you selecta randomsampleof 500 customers.What is the
lAsslsTI randomsampleanda stratifiedsample? probabilitythat the samplehas
7.49 What is the differencebetweena stratified sample a. greaterthan99ohwith Internetaccess?
anda clustersample? b. between97ohand99o/owithInternetaccess?
c. lessthan9To/o with Internetaccess?
Applyingthe Concepts
7.50 An industrialsewingmachineusesball bearingsthat 7.54 Mutual fundsreportedstrongeamingsin the first quar-
ter of 2006. Especiallystrong growth occurredin mutual
aretargetedto havea diameterof 0.75 inch. The lower and
upperspecificationlimits underwhich the ball bearingcan funds consistingof companiesfocusingon Latin America.
This populationof mutual funds earneda meanreturn of
operateare0.74 inch (lower) and 0.76 inch (upper). Past
hasindicatedthat the actualdiameterof the ball l1.9o/ointhefirst quarter(M. Skala,"Bankingon theWorld,"
experience
bearingsis approximatelynormally distributed,with a ChristianScienceMonitor, www.csmonitor.com,April 10,
meanof 0.753inch anda standarddeviationof 0.004inch. 2006.)Assumethat the returnsfor the Latin America mutual
If you selecta randomsampleof 25 ball bearings,what is funds were distributed as a normal randomvariable.with a
theprobabilitythat the samplemeanis meanof 15.9anda standarddeviationof 20. If you selecteda
randomsampleof l0 fundsfrom this population,what is the
r. betweenthe targetandthe populationmeanof 0,753?
probability that the samplewould havea meanreturn
b. betweenthe lower specificationlimit and the target?
c. greaterthanthe upperspecificationlimit? a. lessthan0-that is, a loss?
b. between0 and 6?
d. lessthanthe lower specificationlimit?
e. Theprobability is 93Yothat the samplemean diameter c. greaterthan10?
will be greaterthan what value? 7.55 Mutual funds reportedstrongearningsin the first
7.51 The fill amountof bottlesof a soft drink is quarterof 2006.The populationof mutual funds focusing
normally distributed"with a mean of 2.0 liters on Europehad a meanreturn of 13.3%during this time.
and a standarddeviation of 0.05 liter. If you Assumethat the returnsfor the Europemutual fundswere
a randomsampleof 25 bottles,what is the probabil- distributedas a normal randomvariable"with a meanof
thatthe samplemeanwill be 13.3and a standarddeviationof 12. Ifyou selectan indi-
between1.99and2.0liters? vidual fund from this population,what is the probability
below1.98liters? that it would havea return
greaterthan2.01 liters? a. lessthan0-that is, a loss?
Theprobability is 99% that the samplemeanwill con- b. between0 and 6?
tainat leasthow much soft drink? c. greaterthanl0?
Theprobability is 99% that the samplemeanwill con- If you selecteda randomsampleof l0 fundsfrom thispop-
tainan amountthat is betweenwhich two values(sym- ulation,what is the probability that the samplewould have
metricallydistributedaroundthe mean)? a meanreturn
d. lessthan 0-that is. a loss?
7.52 An orangejuice producer buys all his
e. between0 and 6?
orangesfrom a large orangegrove that has one
f. greaterthanl0?
varietyot
vanety orange.The
oforange. Ihe amountotJulce
ofjuice squeezed
g. Compareyour results in parts (d) through (f) to (a)
eachof theseoftrngesis approximatelynormally distrib-
through(c).
with a meanof 4.70ouncesanda standarddeviationof
h. Compareyour resultsin parts(d) through(f) to Problem
ounce.Supposethatyou selecta sampleof25 oranges.
7.54@)through(c).
Whatis the probability that the samplemeanwill be at
least4.60ounces? 7.55 Politicalpolling has traditionallyusedtelephone
Theprobabillty is 70% that the samplemeanwill be interviews.Researchers at Harris Black InternationalLtd.
containedbetweenwhat two valuessymmetricallydis- have arguedthat Internetpolling is less expensive,faster,
tibuted aroundthe populationmean? and offers higher responseratesthan telephonesurveys.
Theprobability is 77yothat the samplemeanwill be Criticsareconcernedaboutthe scientificreliabilityof this
greaterthanwhat value? approach(The Wall StreetJournal, April 13, 1999).Even
278 CHAPTER
SEVENSampling Distributions
andSampling

amid this strongcriticism,Internetpolling is becoming 7.60 Accordingto Dr. SarahBeth Estes,sociologypro-


more and more common.What concerns,if any,do you fessorat the University of Cincinnati, and Dr. Jennifer
haveaboutInternetpolling? Glass,sociologyprofessorat the Universityof Iowa,work-
7.57 A survey sponsoredby The American Dietetic ing women who take advantageof family-friendly sched-
Associationandthe agri-business giant ConAgrafound that ulescanfall behindin wages.More specifically, thesociol-
53%of office workerstake30 minutesor lessfor luncheach ogistsreport that in a study of 300 working womenwho
day.Approximately37ohtake30 to 60 minutes,and 100/o take had children and returnedto work and opted for flextime,
morethanan hour.("Snapshots," usatoday.com, April 26, telecommuting, andso on,thesewomenhadpayraisesthat
2006.) averagedbetweenl6oh and260/olessthan other workers
a. What additionalinformationwould you want to know ("Study: 'FaceTime' Can Affect Moms' Raises,"Zfte
beforeyou accepted the resultsofthe survey? CincinnatiEnquirer,August28,2001,p.A1).
b. Discussthe four typesofsurvey errorsin the contextof a. What otherinformationwould you want to know before
this survey. you accepted the resultsofthis study?
c. One of the typesof surveyerrorsdiscussedin part (b) b. If you wereto perform a similar studyin the geographic
shouldhavebeenmeasurement error.Explain how the areawhere you live, define a population,frame,and
root causeof measurement errorin this surveycouldbe samplingmethodyou coulduse.
the halo effect. 7.51 (ClassProject) The tableof randomnumbersis an
7.58 As part of a mediationprocessoverseenby a fed- exampleof a uniform distributionbecauseeachdigit is
eraljudge to end a lawsuitthat accusesCincinnati,Ohio, equallylikely to occur.Startingin therow corresponding
to
of decadesof discriminationagainstAfrican Americans, the day of the month in which you wereborn, usethe table
surveysweredoneon how to improveCincinnatipolice- of randomnumbers(TableE.1) to takeonedigit at a time.
communityrelations.One surveywas sentto the 1,020 Selectfive differentsampleseachof n : 2, n: 5, and
members of the Cincinnati police force. The survey n: 10.Computethe samplemeanof eachsample.Develop
includeda coverin which the chief of policeandpresident a frequencydistributionof the samplemeansfor theresults
of the FraternalOrderof Policeencouraged participation. of the entireclass,basedon samplesof sizesn : 2, n = 5,
Respondents could eitherreturna hardcopy ofthe survey a n dr : 1 0 .
or completethe surveyonline.To the researchers'dismay, Whatcanbe saidaboutthe shapeof the samplingdistri-
only 158 surveyswere completed("Few Cops Fill Out butionfor eachof thesesamplesizes?
Survey,"TheCincinnatiEnquirer,August22,2001,p. B3). 7.62 (ClassProject) Tossa coin 10 timesandrecordthe
a. What type of errorsor biasesshouldthe researchers be numberof heads.If eachstudentperformsthis experiment
especiallyconcerned with? five times, a frequency distribution of the numberof
b. What step(s)shouldthe researchers take to try to over- headscanbe developedfrom the resultsofthe entireclass.
cometheproblemsnotedin (a)? Does this distributionseemto approximatethe normal
c. What could havebeendone differentlvto imorovethe distribution?
survey'sworthiness?
7.63 (ClassProject)Thenumberof carswaitingin lineat
7.59 Connecticutshoppersspendmore on women'scloth-
a carwashis distributedas follows:
ing thando shoppers in anyotherstate,accordingto a survey
conductedby Maplnfo.Themeanspendingper householdin Number of Cars Probabilitv
Connecticut was$975annually("Snapshots," usatoday.com,
0 0.25
April 17,2006).
I 0.40
a. What otherinformationwould you want to know before
2 0.20
you accepted the resultsofthis survey?
3 0.10
b. Suppose that you wishedto conducta similar surveyfor
4 0.04
the geographicregionyou live in. Describethe popula-
5 0.01
tion for your survey.
c. Explainhowyou couldminimizelhechanceof coverage You can use the table of randomnumbers(TableE.1)to
errorin this type ofsurvey. selectsamplesfrom this distributionby assigningnumbers
d. Explainhow you could minimize the chanceof nonre- asfollows:
sponseerrorin this type ofsurvey. l. Startin therow corresponding to thedayof themonthin
e. Explain how you could minimize the chanceof sam- which you wereborn.
pling errorin this type of survey. 2. Selecta two-digitrandomnumber.
f. Explain how you could minimize the chanceof mea- 3. If you selecta randomnumberfrom 00 to 24, recorda
surementerrorin this type of survey. lengthof 0; if from 25 to 64,recorda lengthof 1; if from
WcbCase 279

65 to 84, recorda length of2; iffrom 85 to 94, record a 3. If a random digit between 0 and 6 is selected consider
length of 3; if from 95 to 98, recorda length of 4;if 99, the ball white; if a random digit is a 7, 8, or 9, consider
record a length of 5. the ball red.
S e l e c ts a m p l e so f n : 2 , n : 5 , a n dn : 1 0 .C o m p u t et h e S e l e c ts a m p l e so f r - 1 0 ,n : 2 5 , a n d n - 5 0 d i g i t s .I n
mean for each sample. For example, if a sample of size 2 each sample,count the number of white balls and compute
resultsin the random numbers 18 and 46, thesewould cor- the proportion of white balls in the sample. If each student
respondto lengths of0 and 1, respectively,producing a in the class selectsfive different samples for each sample
sample mean of 0.5. If each student selects five different size, a frequency distribution of the proportion of white
samplesfor each sample size, a frequency distribution of balls (for each sample size) can be developedfrom the
the sample means (for each sample size) can be developed results of the entire class.What conclusionscan you reach
from the resultsof the entire class.What conclusionscan about the sampling distribution of the proportion as the
you reach concerning the sampling distribution of the samplesize is increased?
mean as the sample size is increased?
7 . 6 5 ( C l a s s P r o j e c t ) S u p p o s et h a t s t e p 3 o f P r o b l e m
7.64 (Class Project) Using Table8.1, simulatethe selec- 7.64 usesthe following rule: "If a random digit between0
tion of different-coloredballs frorn a bowl as follows: and 8 is selected,consider the ball to be white; if a ran-
1. Start in the row correspondingto the day of the month in d o m d i g i t o f 9 i s s e l e c t e d c, o n s i d e rt h e b a l l t o b e r e d . "
which you were born. C o m p a r e a n d c o n t r a s tt h e r e s u l t s i n t h i s p r o b l e m a n d
2. Selectone-disitnurnbers. t h o s ei n P r o b l e m7 . 6 4 .

Managingthe SpringvilleHerald
Continuingits quality improvement effort first describedin sured.Assumingthat the distributionhas not
the Chapter6 "Managing the Springville Heruld" case,the changedfrom what it wasin the pastyear,what is
productiondepartmentof the newspaperhas been monitor- the probabilitythat the mean blacknessof the
ing the blacknessof the newspaperprint. As before, black- spotsis
nessis measuredon a standardscale in which the target a. lessthan1.0?
valueis 1.0.Data collectedover the past year indicatethat b. between0.95and 1.0'?
theblacknessis normally distributed,with a mean of 1.005 c. between 1.0and1.05?
anda standarddeviationof0.10. d. lessthan0.95or greaterthan1.05?
e. Supposethat the mean blacknessof today's
EXERCISE sampleof 25 spotsis 0.952.Whatconclusion
SH7.1 Each day,25 spots on the first newspaperprinted canyou makeaboutthe blacknessof the news-
are chosen,and the blacknessof the spotsis mea- paperbasedon this result?Explain.

Web Case

Applyyour knov,leclgeabout sampling distributions in this C D ' s W e b C a s ef o l d e r ) , e x a m i n e t h e i r c l a i m s a n d s u p -


WebCase, which reconsiders the Oxford Cereals Using porting data, and then answerthe following:
:o Stutistics scenario.
l. Are the data collection proceduresthat the CCACC uses
rS T h e a d v o c a c yg r o u p C o n s u m e r sC o n c e r n e dA b o u t
to form its conclusions flawed'JWhat procedurescould
C e r e a lC h e a t e r s( C C A C C ) s u s p e c t st h a t c e r e a l c o m -
the group follow to make their analysismore rigorous?
tn p a n i e s ,i n c l u d i n g O x f o r d C e r e a l s , a r e c h e a t i n g c o n -
s u m e r sb y p a c k a g i n g c e r e a l s a t l e s s t h a n l a b e l e d 2. Assume that the two samples of five cereal boxes (one
w e i g h t sV . i s i t t h e o r g a n i z a t i o n ' sh o m e p a g e a t w w w . sample for each of two cereal varieties) listed on the
a prenhall.com/Springville/ConsumersConcerned.htm CCACC Web site were collected randomly by organiza-
m (orooenthe ConsumersConcerned.htm file in the text tion members.For each sample,do the following:
280 CHAPTER SEVEN Samplingand SamplingDistributions

a. Calculatethe samplemean. 3. What, if any,conclusionscan you form by using your


b. Assumethat the standarddeviationof the processis calculationsaboutthe filling processes
of the two differ-
15 grams and a populationmean of 368 grams. ent cereals?
Calculatethe percentageof all samplesfor each 4. A representativefrom Oxford Cerealshasaskedthat the
processthat would havea samplemeanlessthan the CCACC take down its page discussingshortagesin
valueyou calculatedin (a). Oxford Cerealsboxes.Is that requestreasonable?Why or
c. Again, assumingthat the standarddeviationis 15 why not?
grams,calculatethe percentageof individualboxes J . Can the techniques discussedin this chapterbe usedto
of cerealthat would havea weight lessthan the value provecheatingin the mannerallegedby the CCACC?
you calculatedin (a). Whv or whv not?

l . Cochran,W G., Samplingkchniques,3rded.(NewYork: 5 . Levine, D. M., P. Ramsey,and R. Smidt, Applied


Wiley,1977). Statisticsfor Engineersand ScientistsUsingMicrosoft
2 . Crossen,C., "Deja Vu: Fiascoin 1936SurveyBrought Excel and Minitab (Upper SaddleRiver,NJ: Prentice
Scienceto ElectionPolling," The WallStreetJournal, Hall,200l).
October2,2006,81. 6 . MicrosoftExcel 2007 (Redmond"WA: Microsoft Corp.,
J . Gallup, G. H., The Sophisticated Poll-Watchers Guide 2007).
(Princeton, NJ: PrincetonOpinionPress,1972). 7 . Mosteller, F., et al., The Pre-ElectionPolls of 1948
4. Goleman,D., "PollstersEnlist Psychologistsin Questfor (NewYork:SocialScienceResearch Council,1949).
UnbiasedResults,"TheNew YorkTimes,September7, 8. RandCorporation,A Million RandomDigits with 100,000
1 9 9 3p, p .C l , C l l . NormqlDeviales(NewYork:TheFreePress,1955).
87.2: CreatingSimulatedSamplingDistributions 281

E7,'I CREATINGSIMPLERANDOM DistributionsSimulationprocedure,which doesboth of these


tasks for you and optionally createsa histogram.
SAMPLES(WITHOUT
REPLACEMENT)
You createsimple random samples (without replacement) UsingPH5tat2SamplingDistributions
by using the PHStat2 Random Sample Generation proce- Simulation
dure.(There are no basic Excel commands or features to
Select PHStat ) Sampling + Sampling Distributions
cteatea simple random sample.)
Simulation. In the Sampling Distributions Simulation dia-
Opento the worksheetthat containsthe datato be sam-
log box (shown below), enter values for the Number of
pledand selectPHStat ) Sampling ) Random Sample
Samples and the Sample Size. Click one of the distribu-
Generation.In the Random Sample Generationdialog box
tion options and then enter a title as the Title and click OK.
(shownbelow), enter the Sample Size and click Select
To create a histogram of the sample means, click
valuesfrom range. Enter the cell range of the data to be
Histogram before clicking OK.
sampledas the Values Cell Range, click First cell contains
label,and click OK. A new simple random sample appears
ona new worksheet.

Dats
Nr.snber
d Svr$esr
r**'*-*'*'
Srrpls siear I
Data DbtribuHonO$ims
Sanple $ner tr LFrifsnn
[-
i-' Gersrstelictd rardsn rsl$6ers f' 5tardvdaed Nsrmd
: f Csscrde
tt
s krtvahesfromrarqe '..J
**;
vdre*CEllRa*rpr
*.*prt O$ons
itr f**t ceilconta*ns
lab*l Ttlar i**-**-*
i-"$stqram
OutgrtO$ions
Tftle: I ueb
""1
*ryqr If you want to use the Discrete option, first open to a
worksheet that contains a table of X and P(,X) values and
then select this procedure.Then select Discrete and enter
E7,2 CREATINGSIMULATED that table range as the X and P(X) Values Cell Range.
SAMPLING DISTRIBUTIONS
Youcreatesimulated sampling distributions by first using the
Using ToolPakRandom Number
ToolPakRandom Number Generation procedure to create a
Generation
worksheetof all the random samples.Then you add formulas
tocompute the samplemeansand other appropriatemeasures SelectTools ) Data Analysis. From the list that appearsin
foreachsample.You can also use the PHStat2 Sampling the Data Analysis dialog box, select Random Number
282 EXCELcoMPANIoNto chapter
7

Generation and click OK. In the Random Number To createa histogramfrom the setof samplemeansfor
Generationdialog box (shownbelow), enterthe number your simulation,entera formulathat usesthe AVERAGE
of samplesas the Number of Variables and enterthe function in a row below the cell rangethat containsthe
samplesizeof eachsampleas the Number of Random samplescreatedby the procedure. Thenusethe techniques
Numbers. Select the type of distribution from the for creatingfrequencydistributionsand histogramsdis-
Distribution drop-downlist and make entries in the cussedin the ExcelCompanionto Chapter2 to createyour
Parameters (The contentsof this area
area,as necessary. histosram.
vary accordingto the distributionchosen.)Click New
WorksheetPly andthenclick OK.
EXAMPLE 100 Samplesof SampleSize 30
from a Uniformly Distributed Population
t*rr6cr of lr|lrblas: g-Tffi Basic Excel SelectToolsI Data Analysis.Fromthelist
that appearsin the Data Analysis dialog box, select
l$mbcr of R{dom ilmlels: f c..d I
Random Number Generation and click OK. In the
Q|6ffi.*bnr thf; THdp-l RandomNumber Generationdialog box (shownat left),
Parameteri enter100 as the Number of Variablesandenter30 asthe
B$erGcrt 0 sdt Number of Random Numbers.SelectUniform fromthe
Distribution drop-downlist, click New WorksheetPly,
andthenclick OK.

ggdomSccd: PHStat2 Select PHStat ) Sampling ) Sampling


output options Distritlutions Simulation. ln the proceduredialogbox,
O gr*prnarryar enter100astheNumber of Samplesand30 astheSample
Q Ncrr Wor**rcct gtyr Size.Click Uniform andthenentera title astheTitle and
O ncwUodOoof click OK.

Você também pode gostar