Sampling Distributions

Chapter 7
Sampling and Sampling Distributions

Simple Random Sampling
Point Estimation
Introduction to Sampling Distributions
The Law of Large Numbers
The Central Limit Theorem
Sampling Distribution of x
Sampling Distribution of p
1
Statistical Inference
The purpose of statistical inference is to obtain

information about a population from information
contained in a sample.
A population is the set of all the elements of interest.
A sample is a subset of the population.
2
Statistical Inference
The sample results provide only estimates of the

values of the population characteristics.
With proper sampling methods, the sample results

can provide good estimates of the population
characteristics.
A parameter is a numerical characteristic of a

population.
3
Simple Random Sampling:
Finite Population
Finite populations are often defined by lists

such as:
Organization membership roster
Credit card account numbers
Inventory product numbers
A simple random sample of size n from a finite

population of size N is a sample selected such that
each possible sample of size n has the same
probability of being selected.
4
Finite Population
Replacing each sampled element before selecting

subsequent elements is called sampling with
replacement.
Sampling without replacement is the procedure
used most often.
In large sampling projects, computer-generated
random numbers are often used to automate the
sample selection process.
5
Infinite Population
Infinite populations are often defined by an

ongoing process whereby the elements of the
population consist of items generated as
though the process would operate indefinitely.
A simple random sample from an infinite population
is a sample selected such that the following conditions
are satisfied.
Each element selected comes from the same
population.
Each element is selected independently.
6
Infinite Population
In the case of infinite populations, it is impossible to

obtain a list of all elements in the population.
The random number selection procedure cannot be
used for infinite populations.
7
Point Estimation
In point estimation we use the data from the sample

to compute a value of a sample statistic that serves
as an estimate of a population parameter.
We refer to x as the point estimator of the population

mean .
The sample standard deviation s is the

point estimator of the population standard deviation .
The sample proportion p is the point estimator of the

population proportion p. Notation : p p
8
Sampling Error
When the expected value of a point estimator is equal
to the population parameter, the point estimator is said
to be unbiased.
The absolute value of the difference between an
unbiased point estimate and the corresponding
population parameter is called the sampling error.
Sampling error is the result of using a subset of the
population (the sample), and not the entire
population.
Statistical methods can be used to make probability
statements about the size of the sampling error.
9
Sampling Error
The sampling errors are:
|x | for sample mean
|s | for sample standard deviation
| p p| for sample proportion
10
Example: Victoria University Toronto
Victoria University receives
900 applications annually from
prospective students. The
application form contains
a variety of information
including the individuals
scholastic aptitude test (SAT) score and
whether or not
the individual desires on-campus housing.
11
Example: Victoria University
The director of admissions
would like to know the
following information:
the average SAT score for
the 900 applicants, and
the proportion of
applicants that want to live on campus.
12
We will now look at two
alternatives for obtaining the
desired information.
Conducting a census of the
entire 900 applicants
Selecting a sample of 30
applicants, using Excel
13
Conducting a Census
If the relevant data for the entire 900 applicants were
in the universitys database, the population parameters
of interest could be calculated using the formulas
presented in the Descriptive Numbers chapter.
We will assume for the moment that conducting a
census is practical in this example.
14
Conducting a Census
Population Mean SAT Score

xi
990
900
Population Standard Deviation for SAT Score

i
( x ) 2
80
900
Population Proportion Wanting On-Campus Housing
648
p .72
900
15
Simple Random Sampling
Now suppose that the necessary data on the
current years applicants were not yet entered in the
universitys database.
Furthermore, the Director of Admissions must obtain
estimates of the population parameters of interest for
a meeting taking place in a few hours.
She decides a sample of 30 applicants will be used.
The applicants were numbered, from 1 to 900, as
their applications arrived.
16
Using a Random Number Table
Taking a Sample of 30 Applicants

Because the finite population has 900 elements, we
will need 3-digit random numbers to randomly
select applicants numbered from 1 to 900.
We will use the last three digits of the 5-digit
random numbers in the third column of the
textbooks random number table, and continue
into the fourth column as needed.
17
The numbers we draw will be the numbers of the
applicants we will sample unless
the random number is greater than 900 or
the random number has already been used.
We will continue to draw random numbers until
we have selected 30 applicants for our sample.
(We will go through all of column 3 and part of
column 4 of the random number table, encountering
in the process five numbers greater than 900 and
one duplicate, 835.)
18
Use of Random Numbers for Sampling
3-Digit Applicant
Random Number Included in Sample
744 No. 744
436 No. 436
865 No. 865
790 No. 790
835 No. 835
902 Number exceeds 900
190 No. 190
836 No. 836
. . . and so on
19
Sample Data
Random SAT Live On-

No. Number Applicant Score Campus
1 744 Conrad Harris 1025 Yes
2 436 Enrique Romero 950 Yes
3 865 Fabian Avante 1090 No
4 790 Lucila Cruz 1120 Yes
5 835 Chan Chiang 930 No
. . . . .
. . . . .
30 498 Emily Morse 1010 No
20
Using a Computer
Computers can be used to generate random
numbers for selecting random samples.
For example, Excels function
= RANDBETWEEN(1,900)
can be used to generate random numbers between
1 and 900.
Then we choose the 30 applicants corresponding
to 30 generated random numbers as our sample.
21
Point Estimation
x as Point Estimator of
x
x i

29, 910
997
n 30
s as Point Estimator of
s
(x i x )2

163, 996
75.2
n1 29
p as Point Estimator of p
p 20 30 .68
Note: Different random numbers would have

identified a different sample which would have
resulted in different point estimates.
22
Summary of Point Estimates
Obtained from a Simple Random Sample
Population Parameter Point Point
Parameter Value Estimator Estimate
= Population mean 990 x = Sample mean 997
SAT score SAT score
= Population std. 80 s = Sample std. 75.2

deviation for deviation for
SAT score SAT score
p = Population pro- .72 p = Sample pro- .68

portion wanting portion wanting
campus housing campus housing
23
Process of Statistical Inference
Population A simple random sample

with mean of n elements is selected
=? from the population.
The value of x is used to The sample data

make inferences about provide a value for
the value of . the sample mean x .
24
Law of Large Numbers
As the number of randomly drawn observations in a
sample increases, the mean of the sample x gets closer
and closer to the population mean .
This is the law of large numbers. It is valid for any

population.
25
Law of Large Numbers (contd)
Note: We often
intuitively expect
predictability over a
few random
observations, but it is
wrong. The law of
large numbers only
applies to really
large numbers. Settlers of Catan 26
What is a Sampling Distribution?
The sampling distribution of a statistic is the
distribution of all possible values taken by the
statistic when all possible samples of a fixed size n
are taken from the population. It is a theoretical
idea we do not actually build it.
The sampling distribution of a statistic is the

probability distribution of that statistic.
27
Sampling Distribution of Sample Mean
We take many random samples of a given size n
from a population with mean and standard
deviation .
Some sample means will be above the population

mean and some will be below, making up the
sampling distribution.
28
Sampling distribution of sample mean
Sampling distribution of x
Histogram
of some
sample
averages
29
For any population with mean and standard deviation :
The mean of the sampling distribution is equal to the

population mean .
The standard deviation of the sampling distribution is

/n, where n is the sample size.
Sampling distribution of x bar
/n
30
The sampling distribution of x is the probability
distribution of all possible values of the sample
mean x .
Expected Value of x
E( x ) =
where:
= the population mean
31
Mean of sample mean
Mean of a sampling distribution of x
There is no tendency for a sample mean to fall
systematically above or below , even if the distribution
of the raw data is skewed. Thus, the mean of the
sampling distribution is an unbiased estimate of the
population mean it will be correct on average in
many samples.
32
Standard Deviation of x
Finite Population Infinite Population
N n
x ( ) x
n N 1 n
A finite population is treated as being
infinite if n/N < .05.
( N n ) / ( N 1) is the finite correction factor.
x is referred to as the
standard error of the mean.
33
Standard deviation of sample mean
Standard deviation of a sampling distribution of x
The standard deviation of the sampling distribution is

smaller than the standard deviation of the population by
a factor of n. Averages are less variable than
individual observations. Also, the results of large
samples are less variable than the results of small
samples.
34
Form of the Sampling Distribution of x
When the population has a normal distribution, the
sampling distribution of x is normally distributed
for any sample size.
In most applications, the sampling distribution of x

can be approximated by a normal distribution
whenever the sample is size 30 or more.
In cases where the population is highly skewed or

outliers are present, samples of size 50 may be
needed.
35
For Normally Distributed Populations
When a variable in a population is normally
distributed, the sampling distribution of the sample
mean for all possible samples of size n is also
normally distributed.
If the population is N(, ),
then the sample means distribution is N(, /n).
36
For Normally Distributed Populations
Sampling distribution
Population
37
Central Limit Theorem:

When randomly sampling from any population with
mean and standard deviation ,
when n is large enough,
the sampling distribution of x-bar is approximately
normal: ~ N(, /n).
38
Population Sampling
with strongly distribution of x
skewed for n = 2
distribution observations

Sampling Sampling
distribution of x distribution of x
for n = 10 for n = 25
observations observations

39
IQ scores: Population vs. Sample
In a large population of adults, the mean IQ is 112 with standard
deviation 20. Suppose 200 adults are randomly selected for a
market research campaign.
The distribution of the sample mean IQ is:
A) Exactly normal, mean 112, standard deviation 20
B) Approximately normal, mean 112, standard deviation 20
C) Approximately normal, mean 112 , standard deviation
deviation 1.414
D) Approximately normal, mean 112, standard deviation 0.1
40
Sampling Distribution of x for SAT Scores
Sampling
Distribution
80
of x x 14.6
n 30
x
E( x ) 990
41
What is the probability that a

simple random sample of 30 applicants
will provide an estimate of the
population mean SAT score that is within +/-10 of
the actual population mean ?
In other words, what is the probability that x will be
between 980 and 1000?
42
Step 1: Calculate the z-value at the

upper endpoint of the interval.
z = (1000 - 990)/14.6 = .68
Step 2: Find the area under the curve to the left of the
upper endpoint.
P(z .68) = .7517
Note: Make sure to standardize (z) using the

standard deviation for the sampling distribution!
43
Cumulative Probabilities for

the Standard Normal Distribution
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
. . . . . . . . . . .
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
. . . . . . . . . . .
44
Sampling
Distribution x 14.6
of x
Area = .7517
x
990 1000
45
Step 3: Calculate the z-value at the

lower endpoint of the interval.
z = (980 - 990)/14.6 = - .68
lower endpoint.
P(z -.68) = .2483
Note: Make sure to standardize (z) using the

standard deviation for the sampling distribution!
46
Sampling
Distribution x 14.6
of x
Area = .2483
x
980 990
47
Step 5: Calculate the area under the curve

between the lower and upper endpoints
of the interval.
P(-.68 z .68) = P(z .68) - P(z -.68)
= .7517 - .2483
= .5034
The probability that the sample mean SAT score will
be between 980 and 1000 is:
P(980 x 1000) = .5034
48
Sampling
Distribution x 14.6
of x
Area = .5034
x
980 990 1000
49
Relationship Between the Sample Size
and the Sampling Distribution of x
Suppose we select a simple random sample of
100 applicants instead of the 30 originally considered.
E(x ) = regardless of the sample size. In our
example, E(x ) remains at 990.
Whenever the sample size is increased, the standard
error of the mean x is decreased. With the increase
in the sample size to n = 100, the standard error of the
mean is decreased to:
80
x 8.0
n 100
Note: Strictly speaking the finite population correction should be used here.
However, this does not affect the key result. 50
With n = 100,
x 8
With n = 30,
x 14.6
x
E( x ) 990
51
Recall that when n = 30,
P(980 x 1000) = .5034.
We follow the same steps to solve for P(980 x 1000)
when n = 100 as we showed earlier when n = 30.
Now, with n = 100, P(980 x 1000) = .7888.
Because the sampling distribution with n = 100 has a
smaller standard error, the values of x have less
variability and tend to be closer to the population
mean than the values of x with n = 30.
52
Sampling
Distribution x 8
of x
Area = .7888
x
980 990 1000
53
Sampling with Categorical Variables
Many studies collect data on categorical

variables, such as race or occupation of a
person, the make of a car, etc.
The parameters of interest in these settings are

population proportions.
The statistic used to estimate a population

proportion is the sample proportion.
54
Making Inferences about a Population Proportion
Population A simple random sample

with proportion of n elements is selected
p=? from the population.
The value of p is used The sample data

to make inferences provide a value for the
about the value of p. sample proportion p.
55
The sampling distribution of p is the probability
distribution of all possible values of the sample
proportion p .
Expected Value of p
E ( p) p
where:
p = the population proportion
56
Standard Deviation of p
Finite Population Infinite Population
p (1 p ) N n p (1 p )
p p
n N 1 n
p is referred to as the
standard error of the proportion.
57
The sampling distribution of a sample proportion p = X/n is
approximately normal (normal approximation of a binomial
distribution) when the sample size is large enough.
58
Form of the Sampling Distribution of p
The sampling distribution of p can be approximated
by a normal distribution whenever the sample size
is large.
The sample size is considered large whenever these

conditions are satisfied:
np 5 and n(1 p) 5
Recall that 72% of the
prospective students applying
to Victoria University desire
on-campus housing.
What is the probability that
a simple random sample of 30 applicants will provide
an estimate of the population proportion of applicant
desiring on-campus housing that is within plus or
minus .05 of the actual population proportion?
60
For our example, with n = 30 and p = .72,
the normal distribution is an acceptable
approximation because:
np = 30(.72) = 21.6 5
and
n(1 - p) = 30(.28) = 8.4 5
61
Sampling .72(1 .72)

Distribution p .082
30
of p
p
E( p ) .72
62
Step 1: Calculate the z-value at the upper endpoint of
the interval.
z = (.77 - .72)/.082 = .61
upper endpoint.
P(z .61) = .7291
63
Cumulative Probabilities for
the Standard Normal Distribution
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
. . . . . . . . . . .
.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
. . . . . . . . . . .
64
Sampling p .082
Distribution
of p
Area = .7291
p
.72 .77
65
Step 3: Calculate the z-value at the lower endpoint of
the interval.
z = (.67 - .72)/.082 = - .61
lower endpoint.
P(z -.61) = .2709
66
Sampling p .082
Distribution
of p
Area = .2709
p
.67 .72
67
Step 5: Calculate the area under the curve between
the lower and upper endpoints of the interval.
P(-.61 z .61) = P(z .61) - P(z -.61)
= .7291 - .2709
= .4582
The probability that the sample proportion of applicants
wanting on-campus housing will be within +/-.05 of the
actual population proportion :
P(.67 p .77) = .4582
68
Sampling p .082
Distribution
of p
Area = .4582
p
.67 .72 .77
69
Readings
Textbook:
Chapter 7
70
Random Numbers
71

Sampling Distributions

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Sampling Distributions

Enviado por

Direitos autorais:

Formatos disponíveis

Chapter 7

Sampling and Sampling Distributions

The purpose of statistical inference is to obtain

A population is the set of all the elements of interest.

A sample is a subset of the population.

The sample results provide only estimates of the

With proper sampling methods, the sample results

A parameter is a numerical characteristic of a

Finite populations are often defined by lists

A simple random sample of size n from a finite

Replacing each sampled element before selecting

Infinite populations are often defined by an

In the case of infinite populations, it is impossible to

In point estimation we use the data from the sample

We refer to x as the point estimator of the population

The sample standard deviation s is the

The sample proportion p is the point estimator of the

|s | for sample standard deviation

| p p| for sample proportion

applicants, using Excel

Taking a Sample of 30 Applicants

Random SAT Live On-

Note: Different random numbers would have

= Population std. 80 s = Sample std. 75.2

p = Population pro- .72 p = Sample pro- .68

Population A simple random sample

The value of x is used to The sample data

This is the law of large numbers. It is valid for any

The sampling distribution of a statistic is the

Some sample means will be above the population

The mean of the sampling distribution is equal to the

The standard deviation of the sampling distribution is

The standard deviation of the sampling distribution is

In most applications, the sampling distribution of x

In cases where the population is highly skewed or

If the population is N(, ),

then the sample means distribution is N(, /n).

Central Limit Theorem:

What is the probability that a

Step 1: Calculate the z-value at the

Note: Make sure to standardize (z) using the

Cumulative Probabilities for

Step 3: Calculate the z-value at the

Note: Make sure to standardize (z) using the

Step 5: Calculate the area under the curve

P(980 x 1000) = .5034

Many studies collect data on categorical

The parameters of interest in these settings are

The statistic used to estimate a population

Population A simple random sample

The value of p is used The sample data

The sample size is considered large whenever these

Sampling .72(1 .72)

P(.67 p .77) = .4582

Você também pode gostar