Você está na página 1de 15

Chapter 7 Sampling Distribution

Recall that the population mean represents the average of all individuals or things under
study. But typically, not all individuals can be measured. Rather, we have only a small
subset of all individuals available to us, and the average response based on this sample,

, is used to estimate the population mean, . An issue of fundamental importance is


how well the sample mean,

, estimates the population mean, . If the sample mean is

= 23 , we estimate that the population mean is 23, but generally this estimate will be
wrong. So what is needed is some method that can be used to assess the precision of this
estimate. A key component when trying to address these problems is the notion of a
sampling distribution.


7.1 Population and Sampling Distribution
- The population distribution is the probability distribution of the population data.

Suppose there are only five students in an advanced statistics class and the midterm
scores are
70 78 80 80 95

Let X denote the score of a student, we can have the frequency distribution of scores as

x f P(X =x)
70 1
78 1
80 2
95 1


- The probability distribution of a sample statistic is called its sampling
distribution.


Sampling distribution of X
- The probability distribution of X is called its sampling distribution. It lists the
various values that X can assume and the probability of each value of X





Example 7.1: For the data in Example 7.1, lists all possible samples of three scores that
can be selected, without replacement. Calculate the sample mean X for each sample and
the sampling distribution of X .

Solution:
Suppose we assign A, B, C, D and E to the scores of five students so that
A = 70, B = 78, C = 80, D = 80, E = 95


All possible samples and their means when the sample size is 3.
Sample scores in the sample
X
ABC 70, 78, 80 76.00
ABD 70, 78, 80 76.00
ABE 70, 78, 95 81.00
ACD 70, 80, 80 76.67
ACE 70, 80, 95 81.67
ADE 70, 80, 95 81.67
BCD 78, 80, 80 79.33
BCE 78, 80, 95 84.33
BDE 78, 80, 95 84.33
CDE 80, 80, 95 85.00



Sampling distribution of X when the sample size is 3
X
f

Relative
Frequency
76.00 2 2/10=0.2
76.67 1 1/10=0.1
79.33 1 1/10=0.1
81.00 1 1/10=0.1
81.67 2 2/10=0.2
84.33 2 2/10=0.2
85.00 1 1/10=0.1











7.2 Sampling and nonsampling errors

- Sampling error is the difference between the value of the sample statistic and the
value of the corresponding population parameter.

- In the case of mean, = x error sampling .
Assuming that the sample is random and no nonsampling error has been made.

- Nonsampling error is the error that occurs in the selection, recording and
tabulation of data.


Example 7.2: Reconsider the data in Example 7.1, now suppose we take a random
sample of three scores from this population. Assume that this sample includes the scores
70, 82 and 95, calculate the sampling error.

Solution:

60 . 80
5
95 80 80 78 70
=
+ + + +
=
67 . 81
3
95 80 70
=
+ +
= x

Sampling error = 07 . 1 60 . 80 67 . 81 = = x

Now suppose, when we select the above mentioned sample, we mistakenly record the
second score as 82 instead of 80, calculate the nonsampling error.

33 . 82
3
95 82 70
=
+ +
= x


Nonsampling error = Incorrect x - Correct x = 82.33-81.67 =0.66
Sampling error = 1.07









7.3 Mean and Standard Deviation of X


-
The mean of the sampling distribution of X is always equal to the mean of the
population. Thus, =
X
.

- For a sample of size n, if the sampling is done from a finite population (of size N),
the standard deviation of X is given by

>

s
=
t replacemen without done is sampling and
N
n
if
N
n N
n
t replacemen with done is sampling or
N
n
if
n
X
05 . 0
1
05 . 0
o
o
o


and if the sampling is done from an infinite population, we have

n
X
o
o =



Remark

1.
1

N
n N
is called the finite population correction factor and 1
1
~

N
n N
when
N is large and 05 . 0 <
N
n
.

2. The value of
X
o decreases as n increases.















Example 7.3: The mean wage per hour for all 5000 employees working at a large
company is RM27.50 and the standard deviation is RM3.70. Let X be the mean wage
per hour for a random sample of certain employees selected from this company. Find the
mean and standard deviation of X for a sample size of
(a) 30 (b) 75 (c) 300

Solution:
5000 = N , 50 . 27 = , 70 . 3 = o






































7.4 Shape of the sampling distribution of X .

- When the population from which sampling distribution of X relates to the
following two cases.
1. The population from which samples are drawn has a normal distribution.
2. The population from which samples are drawn does not have a normal
distribution.


7.4.1 Sampling from a normally distributed population
- If the population from which the samples are drawn is normally distributed with
mean, and standard deviation, , then the sampling distribution of the sample
mean, X , will also be normally distributed with the following mean and
standard deviation, irrespective of the sample size:
=
X
and
n
X
o
o = .
- That means, if X ~ N(,
2
), then X ~ N( =
X
,
n
X
2
2
o
o = ).



Example 7.4: In a recent STAT test, the mean score for all examinees was 1016.
Assume that the distribution of STAT scores of all examinees is normal with a mean of
1016 and a standard deviation of 153. Let X be the mean STAT score of a random
sample of certain examinees. Calculate the mean and standard deviation of X and
describe the shape of its sampling distribution when the sample size is
(a) 16 (b) 50 (c) 1000

Solution:
Let be the mean of SAT scores of all examinees
o be the standard deviation of SAT scores of all examinees

1020 = and 153 = o

a) mean and standard deviation of X are
1020 = =
X
and 250 . 38
16
153
= = =
n
X
o
o





b) mean and standard deviation of X are
1020 = =
X
and 637 . 21
50
153
= = =
n
X
o
o

c) mean and standard deviation of X are
1020 = =
X
and 838 . 4
1000
153
= = =
n
X
o
o






7.4.2 Sampling from a population that is NOT normally distributed

Central Limit Theorem
- For a relatively large sample size, the sampling distribution of X is
approximately normal, regardless of the distribution of the population under
consideration. The mean and standard deviation of the sampling distribution of
X are =
X
and
n
X
o
o = .

- That means, for all distribution of X, if n is large
X ~ N( =
X
,
n
X
2
2
o
o = )

Remark
1. When 30 > n , the shape of the sampling distribution of X is approximately
normal irrespective of the shape of the population distribution.

2. The mean of X , =
X
.
3. The standard deviation of X ,
n
X
o
o = .

Example 7.5: The mean rent paid by all tenants in a large city is RM1550 with a standard
deviation of RM225. However, the population distribution of rents for all tenants in this
city is skewed to the right. Calculate the mean and standard deviation of X and describe
the shape of its sampling distribution when the sample size is
(a) 30 (b) 100

Solution:
Although the population distribution of rents paid by all tenants is not normal, in each
case the sample size is large ) 30 ( > n . Hence, the central limit theorem can be applied to
infer the shape of the sampling distribution of X .

a) Let X be the mean rent paid by a sample of 30 tenants, then mean and standard
deviation of X are
1550 = =
X
and 079 . 41
30
225
= = =
n
X
o
o




b) Let X be the mean rent paid by a sample of 100 tenants, then mean and standard
deviation of X are
1550 = =
X
and 5 . 22
100
225
= = =
n
X
o
o




7.5 Application of the sampling distribution of X

Example 7.6: Assume that the weights of all packages of a certain brand of cookies are
normally distributed with a mean of 32 ounces and a standard deviation of 0.3 ounce.
Find the probability that the mean weight, X , of a random sample of 20 packages of this
brand of cookies will be between 31.8 and 31.9 ounces.

Solution:





































Example 7.7: According to CardWeb, consumers in the United States owned an average
of $7868 on their credits cards in 2004. Suppose the shape of the probability distribution
of the current credit card debts of all consumers in the United States is unknown but its
mean is $ 7868 and the standard deviation is $2160. Let x be the mean credit card debt
of a random sample of 81 US consumers.
a) What is the probability that the mean of the current credit card debts for this
sample of within $440 of the population mean?
b) What is the probability that the mean of the current credit card debts for this
sample is lower than the population mean by $320 or more?

Solution:































7.6 Population and Sample Proportions

- The population and sample proportions, denoted by p and p , respectively, are
calculated as
N
X
p =

and
n
x
p = ,
where
N = total number of elements in the population
n = total number of elements in the sample
X = number of elements in the population that possess a specific characteristic
x = number of elements in the sample that possess a specific characteristic.

Example 7.8: Suppose a total of 789,654 families live in a city and 563,282 of them own
homes. A sample of 240 families is selected from this city, and 158 of them own homes.
Find the proportion of families who own homes in the population and in the sample.

Solution:
N = population size = 789,654
X = families in the population who own homes = 563,282
The proportion of all families in this city who own homes is
71 . 0
789654
563282
= = =
N
X
p

Now, suppose a sample of 240 families is taken from this city and 158 of them are
homeowners. Then,
n = sample size = 240
x = families in the sample who own homes = 158

The sample proportion is
66 . 0
240
158
= = =
n
x
p


As in the case of the mean, the difference between the sample proportion and the
corresponding population proportion gives the sampling error, assuming that the sample
is random and no nonsampling error has been made. That is, in the case of the proportion,
Sampling error = p p .

For Example 7.8,
Sampling error = 0.66 0.71 = -0.05



7.7 Mean, Standard Deviation and Shape of the sampling distribution
of p

- Sampling distribution of the sample proportion, p
- The probability distribution of p is called its sampling distribution. It gives the
various values that p can assume and their probabilities.


Example 7.9: Boe Consultant Associates has five employees. The following table gives
the names of these five employees and information concerning their knowledge of
statistics.

Name Knows Statistics
Ally, A yes
John, B no
Susan, C no
Lee, D yes
Tom, E yes


Solution:
If we define the population proportion, p, as the proportion of employees who know
statistics, then 6 . 0 5 3 = = p

Now, suppose we draw all possible samples of three employees each and for each
sample, we compute the proportion of employees who know statistics

All possible samples of size 3 and the values of p for each sample.
sample p
ABC 1/3
ABD 2/3
ABE 2/3
ACD 2/3
ACE 2/3
sample p
ADE 3/3
BCD 1/3
BCE 1/3
BDE 2/3
CDE 2/3


Sampling distribution of p when the sample size is 3
p ) ( p P
0.33 3/10=0.3
0.67 6/10=0.6
1.00 1/10=0.1


- The mean of the sampling distribution of p is always equal to the population
proportion. Thus p
p
=

.
- The standard deviation of p is given by
n
pq
p
=

o , if 05 . 0 s
N
n

and
1

=
N
n N
n
pq
p
o , if 05 . 0 >
N
n


where p q =1 .


Central Limit Theorem for Sample Proportion:
- According to the central limit theorem, the sampling distribution of p is
approximately normal for a sufficiently large sample size. In the case of
proportion, the sample size is considered to be large if np and nq are both greater
than 5, that is if np > 5 and nq > 5.

- That means, if np > 5 and nq > 5,
p ~ N( p
p
=

,
n
pq
p
=
2

o )


Example 7.10: According to a survey by Conference Board, 50% of Americans are
satisfied with their jobs. Assume that this result is true for the current population of
Americans. Let p be the proportion of Americans in a random sample of 1000 who are
satisfied with their jobs. Find the mean and standard deviation of p and describe the
shape of its sampling distribution.

Solution:
Let p be the proportion of all Americans who are satisfied with their jobs. Then,
5 . 0 = p , 5 . 0 = q

The mean of the sampling distribution p is
5 . 0

= = p
p

The standard deviation of p is
0158 . 0
1000
5 . 0 * 5 . 0

= =
p
o




7.8 Applications of the Sampling Distribution of p

When we conduct a study, we usually take only one sample and make all decisions or
inference on the basis of the results of that one sample. We use the concepts of the mean,
standard deviation, and shape of the sampling distribution of p to determine the
probability that the value of p

computed from one sample falls within a given interval.

Example 7.11: According to an Associated Press poll, circumstances such as income,
education, and marital status affect whether or not Americans feel satisfied with their
lives. In this poll conducted during August 16-18, 2004, 38% of adult Americans said
that they were very satisfied with the way things were going in their lives at that time.
Suppose this result is true for the current population of adult Americans. Let p be the
proportion in a random sample of 1000 adult Americans who will say that they are very
satisfied with the way things are going in their lives at this time. Find the probability that
the value of p is between 0.40 and 0.42

Solution:

Você também pode gostar