Você está na página 1de 71

Chapter 7

Sampling and Sampling Distributions


Simple Random Sampling
Point Estimation
Introduction to Sampling Distributions
The Law of Large Numbers
The Central Limit Theorem
Sampling Distribution of x
Sampling Distribution of p

1
Statistical Inference

The purpose of statistical inference is to obtain


information about a population from information
contained in a sample.

A population is the set of all the elements of interest.

A sample is a subset of the population.

2
Statistical Inference

The sample results provide only estimates of the


values of the population characteristics.

With proper sampling methods, the sample results


can provide good estimates of the population
characteristics.

A parameter is a numerical characteristic of a


population.

3
Simple Random Sampling:
Finite Population

Finite populations are often defined by lists


such as:
Organization membership roster
Credit card account numbers
Inventory product numbers

A simple random sample of size n from a finite


population of size N is a sample selected such that
each possible sample of size n has the same
probability of being selected.

4
Simple Random Sampling:
Finite Population

Replacing each sampled element before selecting


subsequent elements is called sampling with
replacement.
Sampling without replacement is the procedure
used most often.
In large sampling projects, computer-generated
random numbers are often used to automate the
sample selection process.

5
Simple Random Sampling:
Infinite Population

Infinite populations are often defined by an


ongoing process whereby the elements of the
population consist of items generated as
though the process would operate indefinitely.
A simple random sample from an infinite population
is a sample selected such that the following conditions
are satisfied.
Each element selected comes from the same
population.
Each element is selected independently.

6
Simple Random Sampling:
Infinite Population

In the case of infinite populations, it is impossible to


obtain a list of all elements in the population.
The random number selection procedure cannot be
used for infinite populations.

7
Point Estimation

In point estimation we use the data from the sample


to compute a value of a sample statistic that serves
as an estimate of a population parameter.

We refer to x as the point estimator of the population


mean .

The sample standard deviation s is the


point estimator of the population standard deviation .

The sample proportion p is the point estimator of the


population proportion p. Notation : p p
8
Sampling Error
When the expected value of a point estimator is equal
to the population parameter, the point estimator is said
to be unbiased.
The absolute value of the difference between an
unbiased point estimate and the corresponding
population parameter is called the sampling error.
Sampling error is the result of using a subset of the
population (the sample), and not the entire
population.
Statistical methods can be used to make probability
statements about the size of the sampling error.

9
Sampling Error
The sampling errors are:
|x | for sample mean

|s | for sample standard deviation

| p p| for sample proportion

10
Example: Victoria University Toronto
Victoria University receives
900 applications annually from
prospective students. The
application form contains
a variety of information
including the individuals
scholastic aptitude test (SAT) score and
whether or not
the individual desires on-campus housing.

11
Example: Victoria University
The director of admissions
would like to know the
following information:
the average SAT score for
the 900 applicants, and
the proportion of
applicants that want to live on campus.

12
Example: Victoria University
We will now look at two
alternatives for obtaining the
desired information.
Conducting a census of the
entire 900 applicants
Selecting a sample of 30

applicants, using Excel

13
Conducting a Census
If the relevant data for the entire 900 applicants were
in the universitys database, the population parameters
of interest could be calculated using the formulas
presented in the Descriptive Numbers chapter.
We will assume for the moment that conducting a
census is practical in this example.

14
Conducting a Census
Population Mean SAT Score


xi
990
900
Population Standard Deviation for SAT Score


i
( x ) 2

80
900
Population Proportion Wanting On-Campus Housing
648
p .72
900

15
Simple Random Sampling
Now suppose that the necessary data on the
current years applicants were not yet entered in the
universitys database.
Furthermore, the Director of Admissions must obtain
estimates of the population parameters of interest for
a meeting taking place in a few hours.
She decides a sample of 30 applicants will be used.
The applicants were numbered, from 1 to 900, as
their applications arrived.

16
Simple Random Sampling:
Using a Random Number Table

Taking a Sample of 30 Applicants


Because the finite population has 900 elements, we
will need 3-digit random numbers to randomly
select applicants numbered from 1 to 900.
We will use the last three digits of the 5-digit
random numbers in the third column of the
textbooks random number table, and continue
into the fourth column as needed.

17
Simple Random Sampling:
Using a Random Number Table
Taking a Sample of 30 Applicants
The numbers we draw will be the numbers of the
applicants we will sample unless
the random number is greater than 900 or
the random number has already been used.
We will continue to draw random numbers until
we have selected 30 applicants for our sample.
(We will go through all of column 3 and part of
column 4 of the random number table, encountering
in the process five numbers greater than 900 and
one duplicate, 835.)

18
Simple Random Sampling:
Using a Random Number Table
Use of Random Numbers for Sampling

3-Digit Applicant
Random Number Included in Sample
744 No. 744
436 No. 436
865 No. 865
790 No. 790
835 No. 835
902 Number exceeds 900
190 No. 190
836 No. 836
. . . and so on
19
Simple Random Sampling:
Using a Random Number Table

Sample Data

Random SAT Live On-


No. Number Applicant Score Campus
1 744 Conrad Harris 1025 Yes
2 436 Enrique Romero 950 Yes
3 865 Fabian Avante 1090 No
4 790 Lucila Cruz 1120 Yes
5 835 Chan Chiang 930 No
. . . . .
. . . . .
30 498 Emily Morse 1010 No

20
Simple Random Sampling:
Using a Computer
Taking a Sample of 30 Applicants
Computers can be used to generate random
numbers for selecting random samples.
For example, Excels function
= RANDBETWEEN(1,900)
can be used to generate random numbers between
1 and 900.
Then we choose the 30 applicants corresponding
to 30 generated random numbers as our sample.

21
Point Estimation
x as Point Estimator of

x
x i

29, 910
997
n 30

s as Point Estimator of

s
(x i x )2

163, 996
75.2
n1 29

p as Point Estimator of p
p 20 30 .68

Note: Different random numbers would have


identified a different sample which would have
resulted in different point estimates.
22
Summary of Point Estimates
Obtained from a Simple Random Sample
Population Parameter Point Point
Parameter Value Estimator Estimate
= Population mean 990 x = Sample mean 997
SAT score SAT score

= Population std. 80 s = Sample std. 75.2


deviation for deviation for
SAT score SAT score

p = Population pro- .72 p = Sample pro- .68


portion wanting portion wanting
campus housing campus housing

23
Sampling Distribution of x
Process of Statistical Inference

Population A simple random sample


with mean of n elements is selected
=? from the population.

The value of x is used to The sample data


make inferences about provide a value for
the value of . the sample mean x .

24
Law of Large Numbers
As the number of randomly drawn observations in a
sample increases, the mean of the sample x gets closer
and closer to the population mean .

This is the law of large numbers. It is valid for any


population.

25
Law of Large Numbers (contd)
Note: We often
intuitively expect
predictability over a
few random
observations, but it is
wrong. The law of
large numbers only
applies to really
large numbers. Settlers of Catan 26
What is a Sampling Distribution?
The sampling distribution of a statistic is the
distribution of all possible values taken by the
statistic when all possible samples of a fixed size n
are taken from the population. It is a theoretical
idea we do not actually build it.

The sampling distribution of a statistic is the


probability distribution of that statistic.
27
Sampling Distribution of Sample Mean
We take many random samples of a given size n
from a population with mean and standard
deviation .

Some sample means will be above the population


mean and some will be below, making up the
sampling distribution.

28
Sampling distribution of sample mean

Sampling distribution of x

Histogram
of some
sample
averages

29
For any population with mean and standard deviation :

The mean of the sampling distribution is equal to the


population mean .

The standard deviation of the sampling distribution is


/n, where n is the sample size.
Sampling distribution of x bar

/n

30
Sampling Distribution of x
The sampling distribution of x is the probability
distribution of all possible values of the sample
mean x .
Expected Value of x

E( x ) =

where:
= the population mean

31
Mean of sample mean
Mean of a sampling distribution of x
There is no tendency for a sample mean to fall
systematically above or below , even if the distribution
of the raw data is skewed. Thus, the mean of the
sampling distribution is an unbiased estimate of the
population mean it will be correct on average in
many samples.

32
Sampling Distribution of x
Standard Deviation of x
Finite Population Infinite Population

N n
x ( ) x
n N 1 n
A finite population is treated as being
infinite if n/N < .05.
( N n ) / ( N 1) is the finite correction factor.

x is referred to as the
standard error of the mean.

33
Standard deviation of sample mean
Standard deviation of a sampling distribution of x

The standard deviation of the sampling distribution is


smaller than the standard deviation of the population by
a factor of n. Averages are less variable than
individual observations. Also, the results of large
samples are less variable than the results of small
samples.

34
Form of the Sampling Distribution of x
When the population has a normal distribution, the
sampling distribution of x is normally distributed
for any sample size.

In most applications, the sampling distribution of x


can be approximated by a normal distribution
whenever the sample is size 30 or more.

In cases where the population is highly skewed or


outliers are present, samples of size 50 may be
needed.

35
For Normally Distributed Populations
When a variable in a population is normally
distributed, the sampling distribution of the sample
mean for all possible samples of size n is also
normally distributed.

If the population is N(, ),

then the sample means distribution is N(, /n).

36
For Normally Distributed Populations

Sampling distribution

Population

37
The Central Limit Theorem

Central Limit Theorem:


When randomly sampling from any population with
mean and standard deviation ,
when n is large enough,
the sampling distribution of x-bar is approximately
normal: ~ N(, /n).

38
The Central Limit Theorem

Population Sampling
with strongly distribution of x
skewed for n = 2
distribution observations

Sampling Sampling
distribution of x distribution of x
for n = 10 for n = 25
observations observations

39
IQ scores: Population vs. Sample
In a large population of adults, the mean IQ is 112 with standard
deviation 20. Suppose 200 adults are randomly selected for a
market research campaign.
The distribution of the sample mean IQ is:
A) Exactly normal, mean 112, standard deviation 20
B) Approximately normal, mean 112, standard deviation 20
C) Approximately normal, mean 112 , standard deviation
deviation 1.414
D) Approximately normal, mean 112, standard deviation 0.1

40
Sampling Distribution of x for SAT Scores

Sampling
Distribution
80
of x x 14.6
n 30

x
E( x ) 990

41
Sampling Distribution of x for SAT Scores

What is the probability that a


simple random sample of 30 applicants
will provide an estimate of the
population mean SAT score that is within +/-10 of
the actual population mean ?
In other words, what is the probability that x will be
between 980 and 1000?

42
Sampling Distribution of x for SAT Scores

Step 1: Calculate the z-value at the


upper endpoint of the interval.
z = (1000 - 990)/14.6 = .68
Step 2: Find the area under the curve to the left of the
upper endpoint.
P(z .68) = .7517

Note: Make sure to standardize (z) using the


standard deviation for the sampling distribution!
43
Sampling Distribution of x for SAT Scores

Cumulative Probabilities for


the Standard Normal Distribution
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
. . . . . . . . . . .
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
. . . . . . . . . . .

44
Sampling Distribution of x for SAT Scores

Sampling
Distribution x 14.6
of x

Area = .7517

x
990 1000

45
Sampling Distribution of x for SAT Scores

Step 3: Calculate the z-value at the


lower endpoint of the interval.
z = (980 - 990)/14.6 = - .68
Step 4: Find the area under the curve to the left of the
lower endpoint.
P(z -.68) = .2483

Note: Make sure to standardize (z) using the


standard deviation for the sampling distribution!
46
Sampling Distribution of x for SAT Scores

Sampling
Distribution x 14.6
of x

Area = .2483

x
980 990

47
Sampling Distribution of x for SAT Scores

Step 5: Calculate the area under the curve


between the lower and upper endpoints
of the interval.
P(-.68 z .68) = P(z .68) - P(z -.68)
= .7517 - .2483
= .5034
The probability that the sample mean SAT score will
be between 980 and 1000 is:

P(980 x 1000) = .5034

48
Sampling Distribution of x for SAT Scores

Sampling
Distribution x 14.6
of x

Area = .5034

x
980 990 1000

49
Relationship Between the Sample Size
and the Sampling Distribution of x
Suppose we select a simple random sample of
100 applicants instead of the 30 originally considered.
E(x ) = regardless of the sample size. In our
example, E(x ) remains at 990.
Whenever the sample size is increased, the standard
error of the mean x is decreased. With the increase
in the sample size to n = 100, the standard error of the
mean is decreased to:
80
x 8.0
n 100

Note: Strictly speaking the finite population correction should be used here.
However, this does not affect the key result. 50
Relationship Between the Sample Size
and the Sampling Distribution of x

With n = 100,
x 8

With n = 30,
x 14.6

x
E( x ) 990

51
Relationship Between the Sample Size
and the Sampling Distribution of x
Recall that when n = 30,
P(980 x 1000) = .5034.
We follow the same steps to solve for P(980 x 1000)
when n = 100 as we showed earlier when n = 30.
Now, with n = 100, P(980 x 1000) = .7888.
Because the sampling distribution with n = 100 has a
smaller standard error, the values of x have less
variability and tend to be closer to the population
mean than the values of x with n = 30.

52
Relationship Between the Sample Size
and the Sampling Distribution of x
Sampling
Distribution x 8
of x

Area = .7888

x
980 990 1000
53
Sampling with Categorical Variables

Many studies collect data on categorical


variables, such as race or occupation of a
person, the make of a car, etc.

The parameters of interest in these settings are


population proportions.

The statistic used to estimate a population


proportion is the sample proportion.

54
Sampling Distribution of p
Making Inferences about a Population Proportion

Population A simple random sample


with proportion of n elements is selected
p=? from the population.

The value of p is used The sample data


to make inferences provide a value for the
about the value of p. sample proportion p.

55
Sampling Distribution of p
The sampling distribution of p is the probability
distribution of all possible values of the sample
proportion p .

Expected Value of p

E ( p) p

where:
p = the population proportion

56
Sampling Distribution of p
Standard Deviation of p
Finite Population Infinite Population

p (1 p ) N n p (1 p )
p p
n N 1 n

p is referred to as the
standard error of the proportion.

57
Sampling Distribution of p
The sampling distribution of a sample proportion p = X/n is
approximately normal (normal approximation of a binomial
distribution) when the sample size is large enough.

58
Form of the Sampling Distribution of p
The sampling distribution of p can be approximated
by a normal distribution whenever the sample size
is large.

The sample size is considered large whenever these


conditions are satisfied:

np 5 and n(1 p) 5
Sampling Distribution of p
Example: Victoria University
Recall that 72% of the
prospective students applying
to Victoria University desire
on-campus housing.
What is the probability that
a simple random sample of 30 applicants will provide
an estimate of the population proportion of applicant
desiring on-campus housing that is within plus or
minus .05 of the actual population proportion?
60
Sampling Distribution of p
For our example, with n = 30 and p = .72,
the normal distribution is an acceptable
approximation because:
np = 30(.72) = 21.6 5
and
n(1 - p) = 30(.28) = 8.4 5

61
Sampling Distribution of p

Sampling .72(1 .72)


Distribution p .082
30
of p

p
E( p ) .72

62
Sampling Distribution of p
Step 1: Calculate the z-value at the upper endpoint of
the interval.
z = (.77 - .72)/.082 = .61
Step 2: Find the area under the curve to the left of the
upper endpoint.
P(z .61) = .7291

63
Sampling Distribution of p
Cumulative Probabilities for
the Standard Normal Distribution
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
. . . . . . . . . . .
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
. . . . . . . . . . .

64
Sampling Distribution of p

Sampling p .082
Distribution
of p

Area = .7291

p
.72 .77

65
Sampling Distribution of p
Step 3: Calculate the z-value at the lower endpoint of
the interval.
z = (.67 - .72)/.082 = - .61
Step 4: Find the area under the curve to the left of the
lower endpoint.
P(z -.61) = .2709

66
Sampling Distribution of p

Sampling p .082
Distribution
of p

Area = .2709

p
.67 .72

67
Sampling Distribution of p
Step 5: Calculate the area under the curve between
the lower and upper endpoints of the interval.
P(-.61 z .61) = P(z .61) - P(z -.61)
= .7291 - .2709
= .4582
The probability that the sample proportion of applicants
wanting on-campus housing will be within +/-.05 of the
actual population proportion :

P(.67 p .77) = .4582

68
Sampling Distribution of p

Sampling p .082
Distribution
of p

Area = .4582

p
.67 .72 .77

69
Readings
Textbook:
Chapter 7

70
Random Numbers

71

Você também pode gostar