Você está na página 1de 4

The Sampling Distribution of the Mean

Note: A prerequisite for this chapter is the chapter on the normal distribution.

Sampling and the Sampling Error

Inferential statistics is all about making deductions about population parameters from
sample statistics. We will concentrate on inferring the population mean from the sample
mean. When we take a random sample from a population and calculate the mean of this
sample it will obviously not be equal to the true population mean. The difference between
the sample mean and the true population mean is called the sampling error. The sampling
error is purely due to chance (due to the randomness involved in picking the sample). We
now wish to address an important question: how variable can the mean of a sample be?

The Sampling Distribution of the Mean

If we are given a normal distribution, we understand how to calculate the probability that
a value picked at random lies between two limits (ref …). The section on the normal
distribution gave several examples of such calculations. This section shows how this is
relevant to questions like the ones asked in Section ___. We will see that given
observations that do not necessarily follow a normal distribution, when it comes to
answering questions about the means of samples we can still use the theory of the normal
distribution.

This is possible because of an important result in probability theory called the Central
Limit Theorem. The theorem is easy enough to understand and what it says is the
following:

Given a population which has an arbitrary distribution (it need not be


normal - it could be skewed arbitrarily), suppose we take all possible
samples of a certain (arbitrary) size, and look at the means of all the
samples. Then, the means are distributed normally.*

One point needs to be made: the size of the sample should not be small. In practice, the
sampling distribution for samples of size 30 or so will provide an excellent
approximation to a true normal distribution.

Definition

The sampling distribution of the mean is the distribution of the means of samples of a
fixed size taken from a population.

*
The actual statement of the Central Limit Theorem is of course more involved, but this is basically what it
does say. See ref…

1
Since we may assume that the sampling distribution of the mean is a normal distribution,
it is natural to ask what the mean and the standard deviation of this distribution are. The
answer is given by the following result:

For a population with mean µ and standard deviation σ , the sampling


distribution of samples of size n has mean equal to µ and standard
σ
deviation
n

Definition

The standard deviation of the sampling distribution is called the standard error. It is
σ
denoted σx and is given by the formula σx =
n

Notice that the standard error decreases as the size of the sample increases and is
proportional to the standard deviation of the original population. This makes sense if you
think about it: what the standard error measures is the variability of the sample means. If
you take samples of a very large size, the variation between the means is obviously less.
And if the original population has a large variability, the variability between the sample
means is also obviously larger.

z values for the sampling distribution are defined as before. The only difference is that
the variable whose distribution we are considering is x and the standard deviation of this
distribution is σx . So the z value for the sampling distribution is given by

x −µ
z=
σx

All this is relevant because in practice we are really interested saying something about the
mean of a sample. The example below should illustrate.

Example 1. Assume that we know that the population of hypertensive people has a mean
diastolic value of 120 and a standard deviation of 40. What is the probability of finding a
random sample of 25 people with a mean diastolic value less than 100?

If we had taken all possible samples of size 25 from this population, we know that the
means of these samples form a normal distribution with mean 120 and standard deviation
8.

How did we get the 8? The standard deviation of our hypothetical sampling distribution
σ 40 40
(the standard error σx ) is = = =8
n 25 5

2
Our problem has now reduced to a question that we know how to answer: Given a normal
distribution with mean 120 and standard deviation of 8, what is the proportion of cases
less than 100? The z value is -1.25.

x −µ 100 −120 − 20
How? The z value is given by z = σ = = = -2.5
x 8 8

What is the area to the left of z = -2.5 for the standard normal distribution? Looking up
the normal table shows that the answer is 0.0062 (Check this!). So the probability of
finding a random sample of 25 people with a mean diastolic value less than 100 is 0.62%.

Exercise. A population of hypertensive people has a mean diastolic value of 120 and a
standard deviation of 40. Find the probability that a random sample of 30 hypertensive
people have a diastolic level less than 110.
(z is -1.67. The probability is 0.0475)

Exercise. A population of asthmatics is known to have a mean PFR of 100 with a


standard deviation of 40. What is the probability that a sample of 10 asthmatics has a
PFR more than 150?
(z is 3.94. The probability is less than 0.0005)

Looking Ahead

The reader will have noticed that in the examples above we knew the values of µ and σ
for the population. In practice this is never true! When the population mean µ is not
known the solution is simple. We know from statistical theory that the sample mean x is
a good estimator for µ (see the next section on Estimation). So when the value of µ is
not known we simply use the sample mean x in its place. From this one might guess
that one could do the same with the standard deviation - if σ is not known, use s in its
place. But while s is a good estimator for σ , the procedure we used to calculate
probabilities as we did above needs a small modification when the sample size is small
(less than 30 or so) - if we use s in place of σ we have to use a modification of the normal
distribution called the t distribution.

But before we do this, we look at the basic idea of estimation.

Review Questions

• What is sampling error?

3
• What is standard error?

Você também pode gostar