Você está na página 1de 28

1 | P a g e

Case Study #1: Postponement of Death Theory



Introduction

If an event of particular significance is upcoming in ones life, is it within an elderly persons
ability to actually postpone death until the event has passed? For instance, each birthday can
hold greater and greater significance for the aged. We often have special celebrations for those
who are fortunate to live 80, 85, 90, even 100 years old. Is it possible that the anticipation of
such celebrations can be incorporated into a will to live for those that are very old? This
postponement of death theory is the context for our first case study. Several data sets will be
explored in this first case, but consider the following scenario for our first inquiry:

Case Question 1A

During the last week of May 2014, there were 18 published obituaries in the
Lufkin/Nacogdoches newspapers for persons who were over 70 years old when they died. If
there is any true significance attached to the postponement of death theory, then one would
expect an increased rate of death shortly after a significant date such as a birthday. Of the 18
obituaries, 6 people died within three months after having a birthday. Is the fraction observed in
this sample suggestive of the postponement theory?


Population and Sample

During the first few class meetings of the semester we discussed that the words population and
sample are key in statistical studies. For Case Question 1A,

- The population of interest is taken to be all people in the Lufkin/Nacogdoches area that
were over 70 years old when they died.

- The sample consists of 18 published obituaries in the two main local newspapers for one
week in late May. All 18 people in the sample were over 70 years old when they died.

Recall, to make sound statistical decisions based on collected data, our sample needs to be as
representative of the population of interest as possible. Discuss and think on the following
questions:

- Is it believable that the collected sample is a random sample?
- Do you think the sample collected is representative of the population of interest listed
above?
- What difficulties might we face in collecting a random sample? Could the population be
sampled in a different way that might be better than what was done in Case Question 1A?

You may very well have doubts about the representative nature of the collected sample. Despite
this, we often are faced with having to work with samples such as this one under the umbrella of
disclaimers. We issue such a disclaimer or assumption now: the following analysis will
2 | P a g e

assume the 18 published obituaries are representative of the targeted population. If this
assumption could be proven to be grossly untrue, then any statistical significance that we may
attach to the results could be in question.

It is important to question the representative nature of samples. It is also important before data
collection to consider the best possible way that we could reasonably sample our population.
Despite such considerations, statisticians often have to work with less than perfect samples. This
is just a realistic feature of data analysis. So, caution is advised when working with samples that
could be argued to be unrepresentative. In the current case, we may have some doubts, but they
may not be so grave as to nullify all that follows.

Random Variable, Parameter and Statistic of Interest

Go back and carefully read how the data is presented in Case Question 1A. What we know
about the 18 people that died is whether or not they died within three months after having a
birthday. Six people passed away during the three months which followed a birthday. The other
12 did not. The information that each person contributes to the sample is essentially either a
yes or a no:

- Yes (Success): The person did die during the three months which followed a birthday.
- No (Failure): The person did NOT die during the three months which followed a
birthday.

Data such as this is said to have come from a Bernoulli Trial. Bernoulli trials are investigations
in which the resulting data has only two possible outcomes. We call the two possible outcomes
success and failure even though there is no positive or negative connotation attached to those
two words. Bernoulli trials result in 1s and 0s. The ones are attached to the outcome
success and the zeroes are attached to the failures. Label the success as the feature you are
interested in studying. In this case, a success will be associated with death during the three
months following a birthday. Data recorded as yes/no, up/down, left/right, on/off, in/out, etc. are
examples of Bernoulli trials.

We have 18 Bernoulli Trials in Case Question 1A. Imagine that each person in the sample was
assigned either a 1 or a 0 based on whether or not they were a successful Bernoulli trial or
not. This type of assignment is an example of a random variable.

- A random variable is an assignment one that obeys the laws of what in mathematics
we call a function in which each experimental result is assigned a meaningful
number.


For instance, one person from the list of obituaries died on May 29, 2014 and they had just had a
birthday on May 13. The person is assigned a 1 a success. Random variables are often
given notation such as X, Y or Z. Sometimes if we the same random variable is observed
multiple times, we use subscripts in our notation, such as
1 2 3
, , , X X X etc. For us, lets denote
our random variable of interest as X, where X is either a 1 or a 0 based on whether or not the
3 | P a g e

person from the list of obituaries was assigned a success or a failure when considering them as a
Bernoulli Trial. We have 18 obituaries, so our random variables will be denoted
1 2 18
, , X X X .
Specifically,
th
th
1 if the i person is a "success"
0 if the i person is a "failure"

i
X ,

1, 2, 18 = i . Notice that our random variable has a finite number of possible outcomes (only
two). Random variables that have a finite or countably infinite number of possible outcomes are
called discrete. Some random variables have an uncountably infinite number of possible
outcomes. These type random variables are called continuous. We will contrast discrete and
continuous random variables more later, but for now, the important thing is that you know the
definition of each type.

Recall from the first week of class that parameters are numerical characteristics of populations
while statistics are numerical characteristics of samples. In our population of all people in the
Lufkin/Nacogdoches area that were over 70 years old when they died, the parameter of interest is
the proportion of people that died during the three months which followed a birthday. This
proportion is of extreme importance. All the statistical analysis contained in Case Study #1 is
associated with a population proportion. Do not forget this. We will denote the population
proportion by p:

Let p = the proportion of all people in the Lufkin/Nacogdoches area that were over 70 years old
and died during the three months which followed a birthday.

Notice how specifically the parameter is defined. This is important. Loosely describing
parameters and statistics can lead to confusion and improper interpretation. You should strive in
your own problem solving to always list the parameter of interest and to very specifically
describe it in writing.

A statistic is a feature of a sample. In our sample, we know the proportion of people that died
during the three months which followed a birthday. This value was given to us in the statement
of the Case Question.

Let p = the proportion of people in our collected sample that were over 70 years old and died
during the three months which followed a birthday.

Do not read beyond this point until you see the difference between p and p . The value of p is
unknown. Despite this, we will try to reach a conclusion about p. We will use the value of p
(which is known) to infer something about p. The statistic p is called the sample proportion.
In other words, we will use the statistic ( p ) to estimate the parameter (p) and ultimately, we
will use other features of the statistic to formulate a final conclusion about p. This process of
using a statistic and its features to draw a conclusion about a corresponding parameter is called
statistical inference. In Case Study #1, we want to make a statistical inference about a
population proportion, p.
4 | P a g e


Hypotheses To Be Tested

So, just what conclusion are we trying to reach about p in Case Question 1A? Well, we are
trying to see if the postponement of death theory is substantiated by our data! Thats our main
inquiry. We should believe this theory is substantiated ONLY if our data indicates it. We
should place the burden of proof on the data and begin our investigation with the hypothesis that
the postponement of death theory is not suggested by the data. If an investigator is trying to
establish the relevance of the theory, he or she should NOT start off assuming the theory is true.
That would be terrible science. Instead, we should test the theory by taking data and then letting
the data tell us if the weight of evidence is so large that the theory is believable.

Now, if the postponement theory isnt suggested by the data, then it would make sense that the
deaths occur randomly throughout the year. Since we are looking at the three months after a
birthday and three months is one quarter (or 25%) of a year, then it makes sense that if the
postponement theory isnt correct for our population, tat p = .25. That is, if the postponement
theory isnt correct for our population, the fraction of deaths in the three months following a
birthday is 25%. This is called the null hypothesis in our statistical investigation. We denote it
like this:

0
: .25 = H p (the population proportion is 25%)

On the other hand, if the postponement theory is substantiated, then this would mean that it
would be more likely for a death to occur right after a birthday. Specifically, if the
postponement theory is relevant for our population, then we would expect more than 25% of
deaths to occur in the three months following a birthday. Proponents of the postponement theory
want us to believe this. The burden of proof is on them to show that the data suggests that their
hypothesis is indeed correct. We will call the hypothesis that is trying to be substantiated the
alternative hypothesis. We denote it like this:

: .25 >
A
H p (the population proportion is greater than 25%)

Test Statistic and Sampling Distribution

Weve previously noted that we want to use the statistic p in order to make a statistical
inference about p. We want to use p in order to test
0
H vs.
A
H . Now, from the information
given in Case Question 1A, we can easily calculate that 6 18 = p . Of course, to the nearest
percentage point, this means that 33% = p . The question is this: Is the truth that 33% = p
sufficient enough evidence to reject
0
H and claim that the postponement theory is substantiated?
Should we reject
0
H and decide to believe
A
H on the basis of the statement that 33% = p ?

Well, it is true that .25 > p . But, 33% may not seem much larger than 25%. Additionally, p
was calculated on the basis of only 18 deaths. So, we have SOME evidence for
A
H on the basis
5 | P a g e

that .25 > p . But, is this evidence convincing enough to reject
0
H and claim that the
postponement theory is substantiated? If so, we would call the evidence presented by the sample
statistically significant. We should immediately state that just because evidence is statistically
significant does not mean that the results are biologically significant or environmentally
significant or psychologically significant, etc. We will discuss this point more later. Suffice it to
say, that the statistical significance or non-significance of a result should always be cross
examined with the significance from other perspectives. But, this is a course in statistics so,
we are learning what it means for results to be statistically significant.

In order to know whether or not 6 18 = p is a statistically significant result that is indicative of
A
H , we must ask ourselves two very important questions:

- What other values of p could we have obtained in other potential samples?
- How likely are the these values of p if the null hypothesis is true?

In particular, we need to know just how likely it is to observe 6 18 = p if the postponement
theory is not true for our population. If we could calculate that 6 18 = p is a very likely
occurrence under the assumption that the postponement theory is not true, then our sample has
indicated evidence to retain
0
H . That is, we wouldnt reject it based on the observed data and
the results gathered from the sample are not statistically significant in regards to the
postponement theory.

However, what would it mean if we were to calculate that 6 18 = p is very unlikely if
0
H is
presumed correct? It would mean that 6 18 = p would be a very rare event to observe, if in fact,
0
H is legitimate. This would make us wonder why, if
0
H is correct, did we see a rare
occurrence? This rare occurrence would cast doubt on the truth of
0
H and would ultimately lead
to
0
H s rejection in favor of
A
H .

The above paragraphs are motivation for what statisticians call a p-value. The concept of p-
value is one of the most important ideas in all of statistical science. We will repeatedly use this
concept and calculate p-values. Your understanding of them will round into form as we progress
through more and more case studies. For now, reflect on the logic of the above paragraphs and
the following definition.

p-value: The chance of observing the value of the statistic from your sample (or one more
extreme) if, in fact, the null hypothesis is true.

Again, this definition and associated calculation will be reinforced all throughout the course.
Learn the definition now as your instructor will surely utilize the concept of p-value repeatedly.

If we answer the questions in the two bullets above, then we will have all the information
necessary to calculate a p-value. Once we have calculated a p-value, we will only be one step
6 | P a g e

away from making a conclusion about Case Question 1A. Also, if we answer the two questions
posed in the bullets we will also be able to construct what is called a sampling distribution.
Sampling distributions can be used for hypothesis tests as well as other statistical procedures that
we will see in subsequent cases. In fact, one really cant do much of any statistical inference
about population parameters without sampling distributions.

sampling distribution a description or list of the possible outcomes of a statistic along with
the likelihoods of these outcomes.

Sampling distributions describe the behavior of a statistic across repeated samples. In general,
we will only collect one sample. But, knowledge of the sampling distribution will allow us to
compare our sample with the other possible samples that we could have seen! This is powerful.
This is what will allow us to know whether weve seen a statistically significant result by
comparing our statistic to other theoretical statistics that we might have seen. In this way, we
will know whether what we observed is rare or common place.

Go back and look at the boxed definition of sampling distribution above. For our current
problem, we could already complete one part of the sampling distribution for p . We could
certainly at this point make a description or list of the possible value of p . They are:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
, , , , , , , , , , , , , , , , , ,
18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
.

The sampling distribution for p will be complete if we can calculate the likelihoods of these
outcomes. Admittedly, calculating these likelihoods is the more challenging part of
constructing any sampling distribution.

The Binomial Distribution and Elementary Probability Ideas

Recall our random variable
th
th
1 if the i person is a "success"
0 if the i person is a "failure"

i
X .

Our statistic, p can be written
1

=
=

n
i
i
p X n , where we know that the total number of Bernoulli
trials is 18 = n . The information provided in Case Question 1A provided us with the observed
value of
18
1 =
i
i
X which is 6. That is, we know the total number of successes is six in our sample
of 18 = n total trials.

A set of Bernoulli trials is called independent if the outcome of any one trial (success or failure)
doesnt alter the likelihood of success or failure on any other trial. It seems appropriate to label
our 18 Bernoulli trials as independent when considering that the date of one persons death
doesnt generally alter the date of someone elses death. Now, there are unfortunate situations
7 | P a g e

where there are simultaneous deaths of several people due to an accident of some sort, but there
was no indication of this among the 18 obituaries in the sample.

A set of Bernoulli Trials is said to have come from a binomial experiment if
- There are n independent Bernoulli trials all of which have the same likelihood of
resulting in success
- The experimenter has interest in the total number of successes among the n trials.

The data collected from the obituaries can reasonably be assumed to have come from a Binomial
experiment:

- We have 18 = n independent Bernoulli trials. The time of each death that determines
whether the trial is classified as a success or failure can reasonably be assumed to be
independent from person to person. If the null hypothesis is true, then the chance of each
trial resulting in a success can be claimed to be .25 = p . This likelihood is reasonable to
apply to each trial.
- We are interested in the total number of successes since our statistic,
1

=
=

n
i
i
p X n ,
utilizes this total.

Let
1 =
=

n
i
i
Y X . Then, from the above argument we state that Y is a binomial random
variable with parameters 18 = n and .25 = p . A binomial random variable counts the total
number of successes in a binomial experiment. The probabilities associated with binomial
random variables can be obtained from the formula for the binomial mass function. All discrete
random variables have mass functions and it is the job a mass function to provide the
probabilities associated with all possible outcomes of the random variable. The formula for the
binomial mass function is

( )
( )
!
1
! !

n y
y
n
p p
y n y
.

This formula gives the chance of exactly y successes among the n trials. So, if Y is a binomial
random variable with parameters n and p, then we denote the probability of exactly y successes
by the notation ( ) = P Y y . Here, the capital P denotes probability, the capital Y is the random
variable of interest and the lowercase y is the particular numerical outcome of the random
variable that is relevant in the calculation. The exclamation point (!) indicates the operation of
factorial, which is simply the successive product of all integers from the value preceding the
symbol down to 1. For instance, ( )( )( )( )( )( ) 6! 6 5 4 3 2 1 720 = = .

Applying the binomial mass function to our case study, suppose Y is a binomial random variable
with parameters 18 = n and .25 = p . Then, the chance of exactly 6 successes among the 18 trials
is
( ) ( ) ( )
6 12 18!
6 .25 .75 .1436
6!12!
= = ~ P Y .
8 | P a g e


Since the binomial mass function is quite prevalent in probability and statistics, it is a good idea
to get used to making calculations by hand using the formula. However, after a bit of practice,
the popular software Excel (as well as other computer programs) can make the calculations more
expeditious. The binomial mass function in Excel can be invoked by clicking in any particular
cell in the spreadsheet and then typing the following command:

( ) BINOM.DIST , , , FALSE = y n p

For instance, clicking in a cell in Excel and typing
( ) BINOM.DIST 6,18,.25, FALSE = and then
hitting the Enter key will produce the value .143564 in the cell. Replacing the word FALSE
with the word TRUE will calculate the cumulative probability
( ) s P Y y , rather than the
individual probability
( ) = P Y y . Try it for our case study and youll see that
( ) 6 .861015 s = P Y .

Looking back at the binomial mass function we can see that it is made up of three parts:

-
( )
!
! !
n
y n y
counts the number of ways in which we could observe y successes among n
trials. Often, we use the symbol
| |
|
\ .
n
y
to denote
( )
!
! !
n
y n y
and this symbol is read n
choose y. If you have n distinguishable objects, and you want to choose y of them to
put in a set, then the number of ways to do this is n choose y.
- The second piece of the binomial mass function is
y
p and it incorporates the fact that we
need exactly y successes each of which has probability p.
- The final piece of the binomial mass function is ( ) 1

n y
p and it incorporates the fact that
if we need exactly y successes, then this means that there are n y failures. Also, since
the chance of a success is p, them the chance of failure is 1 p .

This last point introduces the important complement rule of probability. For our relevant
binomial random variable, notice that the list of possible outcomes for Y is { } 0,1, 2, ,18 . We
will denote this complete list of possibilities as S and write { } 0,1, 2, ,18 = S . The calculation
( ) 6 .861015 s = P Y only involves a subset of S; namely, { } 0,1, 2,3, 4,5, 6 = E . The set E is an
example of what is called an event. The complement rule of probability states that for any event
E, we have that ( ) ( ) not 1 = P E P E .

In our example, not E consists of the following values:

{ } not 7,8,9,10,11,12,13,14,15,16,17,18 = E .
9 | P a g e


The chance of this event, by the complement rule is 1-.861015=.138985. Clearly, another way to
write not E is 7 > Y . So, the complement rule tells us that
( ) 7 .1390 > ~ P Y . Notice, for future
use, that it would seem to make sense that
( ) ( ) ( ) 6 6 7 > = = + > P Y P Y P Y . That is, to get
( ) 6 > P Y , just add in the one additional probability to the calculation already made. Doing this
would give us the result that
( ) 6 .1436 .1390 .2826 > ~ + = P Y . The legitimacy of this
calculation stems from the axioms of probability, of which there are three. The mathematical
subject of probability, like geometry, is based on a set of axioms which are regarded as universal
truths needing no proof. Many scientific theories or branches of science begin with the scientific
community accepting some set of axioms. If you accept the axioms, you will accept the
theorems and results which stem from them. If you dont accept the axioms, then the entire
branch of science may be held in question. Deciding on axioms has historically been
challenging, controversial and time consuming. Scientists didnt generally come to an agreement
on the probability axioms until 1933.

Looking back at our calculations, it makes sense that for any event E, we should claim that
( ) 0 > P E and that for the event S we should claim that ( ) 1 = P S . These are, in fact, the first
two axioms. First, probability is never negative. Second, we are certain to observe one of the
possible outcomes in the set that exhausts all the potential possibilities. The third and final
probability axiom is the one that posed the biggest challenge to scientists in terms of its
acceptance.

A few paragraphs back, we looked at the event that just contained a single value, { } 6 . Notice
that the value 6 does not appear in the event { } not 7,8,9, ,18 = E . When two events (or sets)
dont share any values (elements) in common, we call them mutually exclusive. When we have
a group of events and each and every pair that could be chosen from the group is mutually
exclusive, then we call the entire group of events pairwise mutually exclusive. The third axiom
of probability says that if two events, say A and B, are mutually exclusive, then ( ) or = P A B
( ) ( ) + P A P B . This is the axiom that we used to compute ( ) 6 .1436 .1390 .2826 > ~ + = P Y .

But actually, the third axiom of probability applies to more than two sets. In fact, it applies to
any number of sets that are pairwise mutually exclusive (that was the part that took so long to
agree upon historically). It is of upmost importance that you realize that the third axiom applies
ONLY to events that are pairwise mutually exclusive

Axiom 3: If
1 2 3
, , E E E is a collection of pairwise mutually exclusive sets (all of which are
subsets of larger set S that contains all possible outcomes of the experiment in question), then

( ) ( ) ( ) ( )
1 2 3 1 2 3
or or or = + + + P E E E P E P E P E

Make sure you realize that the ellipsis, means that we could have three events, ten events,
one thousand events, etc. It doesnt matter. As long as all the events in question are pairwise
10 | P a g e

mutually exclusive, Axiom 3 applies. But, it ONLY is good for calculating probabilities that
involves the notion of one set or another or another, etc.. If you need to calculate
( ) and P A B
then we may need other probability rules that will emerge later.

Case Question 1A Concluding Decision

The question in Case Study 1A can be answered by using the development presented to this
point. The work above has outlined the statistical procedure known as a hypothesis test for a
proportion. We have two competing hypotheses and the data will let us know which is most
plausible. As a recap, our two hypotheses are

0
: .25 = H p (the population proportion is 25%)
: .25 >
A
H p (the population proportion is greater than 25%)

The focal question is regarding the level of evidence for the postponement of death theory.
Since this theory is under scrutiny, we should not presume it is true. We should assume it is
false until the data suggests otherwise, if at all. This is why
0
H and
A
H are set up in the manner
that they are. What the experimenter is hoping to establish is generally placed in the alternative
hypothesis.

The parameter being tested is p, the population proportion. It was estimated by the sample
proportion 6 18 = p . In order to know whether or not this result is suggestive of rejecting
0
H ,
we needed to obtain a sampling distribution for our statistic. The sampling distribution describes
the behavior of p in repeated samples. Even though we are unlikely to observe these theoretical
other samples, knowledge of the sampling distribution provides a context for our observed
value of 6 18 = p . When we assume
0
H is true, the distribution of a statistic used in a
hypothesis test is more specifically called the null distribution. The relevant null distribution
for Case Study 1A is the binomial distribution with 18 = n and .25 = p .

Notice that large values of p - and in turn large values of
1 =
=

n
i
i
Y X are indicative of
rejecting
0
H and deciding
A
H is the most relevant hypothesis based on the evidence in the data.
Since large values indicate rejection, the p-value associated with our hypothesis test is the chance
that we observe 6 18 = p or a value larger. We calculate this chance using the null distribution.
If you take a moment to reflect on the definition of p-value, you will recall the phrase or more
extreme appears. Here, more extreme is interpreted as larger than. In general, more
extreme is interpreted in light of the alternative hypothesis (notice the > sign in
A
H )

So, our p-value is ( )
6 18 > P p . Of course, the only way that p can be greater than or equal to
6 is if . We make this calculation using the null distribution, which is binomial with
18 = n and .25 = p . We know from previous work that this probability is ( ) 6 .2826 > ~ P Y .
Now, we are at the point of decision. The p-value is a percentile. Specifically, the p-value we
11 | P a g e

obtained is the upper 28
th
percentile of the null distribution. This means that if
0
H is true, we
can expect to see our sample proportion or one larger 28% of the time. Does this 28% seem
overly rare to you? Probably not. While we all inherently have different personal interpretation
of the word rare, statisticians that are involved in statistical inference problems generally dont
label p-values as rare until they fall in the lower or upper 5%-10% of the null distribution. This
will be discussed more in other case studies.

Since our p-value of 28% isnt particularly rare, we dont particularly have overwhelming
evidence to reject the null hypothesis. You can think of a p-value as a barometer of sorts. The
lower the p-value, the more evidence exists in the data to reject
0
H and instead, conclude that
A
H is most statistically reasonable. If the p-value isnt low, then you have observed data that is
common place if in fact
0
H is true. This is what has happened to us. We didnt observe a low
p-value, so we dont have sufficient evidence to reject
0
H . So, since we dont reject
0
H , we
will retain it as the most reasonable conclusion. Therefore, the we retain the null hypothesis that
.25 = p . That is, the claim that the population proportion is 25% cant be refuted. It appears we
do NOT have sufficient data-driven evidence to believe the postponement of death theory in our
population.

For a visual look at our null distribution and p-value, look at the following chart made in Excel.




Can you tell where the rare values of Y are from this picture? If the null hypothesis is true, it
would be very rare to observe 11 > Y . In fact, Excel calculations can show that ( ) 11 > P Y is
less than 1%. The probabilities for 11,12, ,18 = Y are so small that Excel plots them right on
the horizontal axis. These probabilities arent actually zero, but they are so close that the plotting
symbol is right on the axis. The vertical line represents the value of Y that we observed in our
sample. Look at it in the context of the entire null distribution. Does it appear in the rare
0
0.05
0.1
0.15
0.2
0.25
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
P
r
o
b
a
b
i
l
i
t
y

Possible Values of Y
12 | P a g e

what are called tails - of this distribution? No. So, the null hypothesis wasnt rejected.
However, if 11, 12 or more of the 18 obituaries had been during the three months which
followed a birthday, then our data would have been suggestive of the postponement of death
theory. In these cases, the p-value would have been less than 1% - very rare. If we had observed
that rare of an event, we would have been forced to wonder why we saw something so rare if in
fact
0
H is true. Observing this rare event would have led to a change in what we could presume
reasonable. It would have led to our abandoning
0
H and concluding the data is indicative of
A
H instead. The lower the p-value, the more evidence we have for
A
H .


Concepts and Terms To Review from Case Question 1A:

Population Statistically significant
Sample p-value
Bernoulli Trial sampling distribution
Random Variable independent trials
Discrete random variable Bernoulli Trial
Continuous random variable Binomial random variable
Parameter Binomial mass function
Statistic Cumulative probability
Population proportion Complement rule
Sample proportion Event
Statistical inference Axioms of Probability
Estimate Mutually Exclusive
Test of hypothesis Pairwise Mutually Exclusive
Null hypothesis Null distribution
Alternative hypothesis


Despite the small data set from Lufkin/Nacogdoches not substantiating the postponement of
death theory, the concept has garnered considerable attention throughout the world. The idea
that someone can postpone their death until the passing of a birthday, family reunion or
reconciliation with a family member have all been studied and published reports on each exist in
scientific literature. One of the most discussed papers on this subject is Phillips (1972), Death
and Birthday: An Unexpected Connection. In his essay, Phillips looked at obituaries published
in a newspaper from Salt Lake City, Utah. His data structure was slightly different than that of
Case Question 1A. Phillips looks at the percentage of people that died in the three months prior
to their birthday. Among 747 deaths, only 60 of them fell within this three month window. If
deaths are occurring randomly during the year, we would expect 25% of people to pass away in
the three month prior to their birthday. Notice that 60 of 747 is only 8%.

Case Question 1B

A total of 747 obituaries were collected from a Salt Lake City newspaper. These obituaries were
scattered throughout the period of one year. Among these 747 obituaries, only 60 people died in
13 | P a g e

the three months prior to their birthday. Is this decrease in what would have been expected if the
deaths were randomly occurring throughout the year statistically significant? That is, do the Salt
Lake City obituaries provide statistical evidence for the postponement of death theory?

Population and Sample
On the surface, Case Question 1B is quite similar to Case Question 1A. The data is collected in
a slightly different way, but the issue still boils down to whether or not the disproportionate (8%
vs. 25%) fraction of deaths in a particular time interval is substantial enough to lend credence to
the postponement theory. The population of interest here could reasonably be taken to be the
greater Salt Lake City area. Similar to the Lufkin/Nacogdoches data, all of the people featured in
the obituaries were within one region of the United States. Extrapolation to all of Utah, or all of
the United States would seem to be a stretch since Salt Lake City newspapers dont tend to
contain birth and death announcements from around the country.

The sample is different from the Lufkin/Nacogdoches data in several ways. First, there is no
minimum age used to define the deaths. In the Lufkin/Nacogdoches data, all 18 of the deaths
pertain to people over 70 years old when they died. This isnt the case in the Salt Lake City data.
Secondly, the data is taken over a longer time interval all of the people in Lufkin/Nacogdoches
data set died within one week of each other. Third, and possibly most central to the mathematics
in this section, the Salt Lake City sample contains 747 people, whereas the Lufkin/Nacogdoches
sample was small only 18 people.

Like with Case Question 1A, we should ponder the assumption that the 747 deaths that make up
the sample constitute a random sample of deaths across a year from the greater Salt Lake City
area. Some of the same issues that arose in assuming a representative sample in the
Lufkin/Nacogdoches area may be relevant in the Salt Lake City data as well. Going forward, we
will assume the 747 obituaries represent a random sample of all deaths during one year in the
greater Salt Lake City area.

Random Variable, Parameter and Statistic of Interest

Like Case Question 1A, each person in the sample can be represented by a Bernoulli Trial. For
1, 2, , 747 = i define

th
th
1 if the i person is a "success"
0 if the i person is a "failure"

i
X .

Here, a success represents a person dying within the three months prior to their birthday. A
failure corresponds to the person dying at some other time during the year. In corresponding
fashion, our relevant parameter and statistic are p and p as defined below.

Let p = the proportion of all people in the greater Salt Lake City area that die within the three
months prior to their birthday.

14 | P a g e

Let p = the proportion of people in our collected sample that died within the three months prior
to their birthday.

If the deaths occur in random fashion throughout the year, then 0.25 = p . The data provided in
Case Question 1B tells us that among the 747 Bernoulli Trials, 60 of them were successes.
Namely, we have observed that
747
1
60
=
=
i
i
X .

Hypotheses To Be Tested

At this point, we have defined a population of interest and identified the collected sample.
Additionally, we have a random variable of interest and similar to Case Question 1A, we will use
sample proportion
( )
p to estimate a population proportion
( ) p . Our statistical inference
procedure now begins with assuming that the postponement of death theory is not true and forces
the data to present sufficient evidence to the contrary. Proponents of the postponement theory
will point to the fact that 8% = p in their argument. Is this sufficient evidence? We should not
presume it is instead, we should test to see if this small value of p is statistically significant.
If the postponement of death theory is correct, then based on the way that the data are collected
and the way that p is defined, we would expect .25 < p . Therefore, our null and alternative
hypothesis for Case Question 1B are

0
: .25 = H p (the population proportion is 25%)
: .25 <
A
H p (the population proportion is less than 25%).

Make sure you realize that although we are testing the same postponement theory, the data has
been collected in a different way in Case Question 1B. This necessitates the alternative
hypothesis being lower tailed (notice the less than symbol), whereas
A
H was upper tailed in
Case Question 1A.

Sampling Distribution

In order to know whether or not 60 747 = p is a statistically significant result that is indicative
of
A
H , we must ask ourselves two very important questions:

- What other values of p could we have obtained in other potential samples?
- How likely are the these values of p if the null hypothesis is true?

These are the same two questions posed in Case Question 1A and they are universal questions
that are asked in any statistical inference problem that involves a hypothesis test. Answering
these questions in Case 1A led to considering
18
1 =
i
i
X as a test statistic and the corresponding
null distribution was presented as Binomial. If we again assume that the times at which the
15 | P a g e

people died in Salt Lake City are independent, then our current test statistic,
747
1 =
i
i
X is also
Binomial.

For Case Question 1A, the first bullet above was answered with the list

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
, , , , , , , , , , , , , , , , , ,
18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
.

The list of possible values for p in Case 1B is much, much longer:

0 1 2 3 59 60 61 746 747
, , , , , , , , , ,
747 747 747 747 747 747 747 747 747
.

This long list has the potential to cause problems. First of all, lets establish that we once again
have a binomial experiment:

- We have 747 = n independent Bernoulli trials. The time of each death that determines
whether the trial is classified as a success or failure can reasonably be assumed to be
independent from person to person. If the null hypothesis is true, then the chance of each
trial resulting in a success can be claimed to be .25 = p . This likelihood is reasonable to
apply to each trial.
- We are interested in the total number of successes since our statistic,
1

=
=

n
i
i
p X n ,
utilizes this total.

Let
1 =
=

n
i
i
Y X . Then, from the above argument we state that Y is a binomial random
variable with parameters 747 = n and .25 = p . Applying the binomial mass function to Case
1B, suppose Y is a binomial random variable with parameters 747 = n and .25 = p . Then, the
chance of exactly 60 successes among the 747 trials is

( ) ( ) ( )
60 687 747!
60 .25 .75
60!687!
= = P Y .

Dealing with large exponents and factorial calculations this large can pose problems with
accuracy even on modern hand-held calculators. The value of 60! is larger than an 8 with 80
zeroes behind it. Dealing with values this large should be done with extreme caution. This
computational concern will give us a chance to examine the features of the binomial distribution
when the value of n is large.

When calculating the p-value for the hypothesis test in Case 1A, we were able to create a graph
of the sampling distribution for
18
1 =
i
i
X . Recall, a sampling distribution is a description or list
of the possible outcomes of a statistic along with the likelihoods of these outcomes. The graph
of the sampling distribution included all of the possible values of Y which were 0,1, 2, 3, ,18.
16 | P a g e

The graph of the sampling distribution for
747
1 =
i
i
X must include all of the integers from 0 to
747. Excel is able to generate this plot and it is printed below. Notice that the horizontal axis
only includes the values of Y from 125 to 250. This is because the probabilities for all other
values of Y are so small that they are effectively zero. So, to focus on the shape of the plot, these
values were excluded. Just think of the plot extending to the left and the right of what you
actually see, but hugging the horizontal axis very, very near zero out towards 0 in the left tail
and out towards 747 in the right tail.




The Normal Approximation to the Binomial Distribution

The shape of this sampling distribution is unmistakable. The sampling distribution looks like a
bell-shaped curve. Now, make sure you understand: the plot above consists of just 748 isolated
points. That is, the sampling distribution is graphically represented by the 748 points (or
diamonds that Excel uses). But, because there are so many points represented in the graph, it has
the appearance of being a smooth curve. This fact can help us resolve the question in Case 1B.

The exact sampling distribution of
747
1 =
i
i
X is binomial with a very large value of n ( 747) = n .
From the picture above, it appears as though this exact sampling distribution could be accurately
approximated by a smooth curve. What we will do is replace the exact sampling distribution
with this approximate smooth curve and then, our computational difficulties discussed earlier
will be completely resolved. Finally, we will be in a position to easily calculate the
(approximate) p-value for our hypothesis test and resolve the question from Case 1B.

Recall that random variables that have a finite or countably infinite number of possible outcomes
are called discrete. Random variables that have an uncountably infinite number of possible
outcomes are called continuous. The binomial is a discrete random variable, but when n is
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
125 145 165 185 205 225 245
P
r
o
b
a
b
i
l
i
t
y

Possible Values of Y
17 | P a g e

large, it can be approximated by the continuous random variable known as the normal random
variable.

When calculating probabilities associated with the binomial random variable (or any other
discrete random variable, for that matter), we can just plug appropriate values into a mass
function. Recall that all discrete random variables have mass functions and it is the job of a mass
function to provide the probabilities associated with all possible outcomes of the random
variable. The formula for the binomial mass function is

( )
( )
!
1
! !

n y
y
n
p p
y n y
.

This formula gives the chance of exactly y successes among the n trials.

Probability associated with continuous random variables must come from what is known as a
density function. All continuous random variables have density functions. It is the job of a
density function to provide all the probabilities associated with outcomes of the continuous
random variable. However, density functions achieve this goal in a different manner than mass
functions do for discrete random variables. Probability associated with continuous random
variables is calculated as area under the density curve, rather than by plugging in to the density
function. The chart below compares and contrasts how to find
( ) s s P a X b when using
random variables.


Type of Random Variable Relevant Function How to Find ( ) s s P a X b

Discrete Mass Function Plug in all values between (and
including) a and b into the mass
function and add up the results

Continuous Density Function Find the area under the graph of
the density function over the interval

| |
, a b .


Characteristics of Normal Random Variables

The normal random variable and its associated normal density function are the most popular
continuous random variable and density function in all of probability and statistics. The density
function for a normal random variable has the following features:

- All normal density functions are symmetric around a value known as . We call the
value of the mean of the normal random variable.
18 | P a g e

- All normal density function are unimodal, meaning that the density function has one
peak. This one peak, or mode, is at .
- All normal density functions have their width, or fatness, controlled by a value known
as o . We call the value of o the standard deviation of the normal random variable.

The terms mean and standard deviation will be refined and repeated in future cases. For
now, what is important is that you understand that the mean of a random variable is a measure of
location or center. It is the long-run average outcome that one would see if many, many, many
outcomes of the random variable were observed. The standard deviation is a measure of the
spread inherent in the outcomes of the random variables. The larger the standard deviation, the
more spread out the outcomes of a random variable. If a random variable has a very small
standard deviation, then the outcomes of the random variable will tightly cluster rather than be
grossly disperse. One other important characteristic of normal random variables is that it is quite
rare to observe an outcome of a normal random variable that is more than three standard
deviations above or below the mean.

The following display illustrates the effect of and o on the graph of the normal density
function. In each case, the mean controls where the bell is located and the standard deviation
affects the girth, or width of the curve. Notice that when 0 = , the curve with 1.5 o = is
fatter or wider that the curve with 1 o = . When 2 = , the normal curve has located itself two
units to the right of the other two curves, but because 1 2 o = , this is the thinnest or most
skinny of the three curves plotted. Normal density curves exist for all values of and for all
positive values of o
( ) , 0 o < < > .

Previously, we have introduced the three axioms of probability. The second of these states that
we are certain to observe one of the possible outcomes in the set that exhausts all the potential
possibilities for an experiment. This axiom translates to continuous random variables in such a
way that the total area under all density curves is equal to 1 representing 100% of the possible
outcomes of the random variable. So, even though the three normal curves in the display have
different locations and spreads, they all have the same total area underneath their curve.
19 | P a g e


Despite their being a different normal curve for each value of and all for all positive values of
o , the normal curve with 0 = and 1 o = is really the only one important for calculations and
statistical inference. This is because any probability (area under the curve) calculation that is
required for an arbitrary normal curve can be converted to a problem with the same answer that
uses this standard normal curve. The standard normal random variable is the normal
random variable with 0 = and 1 o = . This conversion process is called standardizing a
normal random variable and is summarized below.

Standardization Theorem for a Single Normal Random Variable: If X is a normal random
variable with mean and standard deviation o , then ( ) o = Z X is a standard normal
random variable.


Calculations With Normal Random Variables

We know that for a continuous random variable X, finding ( ) s s P a X b requires that we find
the area under the graph of the density function over the interval
| |
, a b . Finding areas under
density curves is generally a calculus problem. For normal random variables, these calculus
problems have been solved by others and placed in tables for our use. Alternatively, modern
computer software such as Excel can calculate these areas for us.

It is important to understand the general philosophy that standardizing a random variable
involves subtracting the mean and dividing the result by the standard deviation. To solidify this
philosophy, we will briefly describe how to make calculations with normal random variables
using the standardization theorem. Once this philosophy is understood, then the quickest way to
20 | P a g e

make the calculations is in Excel. Going straight to the Excel code without an understanding of
the standardization process is dangerous and should be avoided. Standardization is a concept
that will emerge again in Case Studies 2A, 2B and 3A.

As an example of how to perform calculations with normal random variable, consider that the
height of a mature pine tree in the East Texas region could be modeled with a normal random
variable having mean 90 = and standard deviation 8 o = . What is the meaning of the phrase
could be modeled with? What we are thinking about here is the entire population of pine trees
(say, of one particular species, like loblolly) in East Texas. We have no hope of measuring the
heights of all the mature pine trees in East Texas. But if we could, we might surmise that the
average height would be 90 feet and that the heights would crowd around the value 90 in such
a way as that values near 90 would be quite popular to see. Then as we moved away from 90
feet, the values would become less and less frequent, yet in a symmetric way. By this we mean
that a tree over 100 feet tall could be imagined to be just as likely to observe as a tree that is 80
feet or less. Finally, since it is rare to see observations more than three standard deviations away
from the mean when dealing with normal random variables, we might imagine that mature trees
above 114 feet or below 66 feet are quite rare to observe around East Texas. All of these facts
could be combined into a model the model being the normal curve with mean 90 and standard
deviation 8.

Next, suppose with this model in place, we were asked for the chance that a mature pine tree in
East Texas grows to exceed 105 feet. How could this calculation be made? The answer is: we
need to calculate an area under the density curve. The area we need to obtain is shaded and
shown below.



Let the continuous random variable X represent the height of a mature pine tree in the East Texas
region. Then, we can write ( ) 90,8 X N to represent the fact that our model for X is normal
with mean 90 = and standard deviation 8 o = . We need to calculate ( ) 105 > P X . This is
done by using the standardization theorem. Once the random variable X is standardized, we
denote the resulting random variable by Z. The capital letter Z will be reserved notation just for
21 | P a g e

standard normal random variables
( ) ( )
0,1 Z N . The constant 105 is also standardized and
then we can use tables or software to finalize the calculations. The process is illustrated below.

( )
( )
90 105 90
105
8 8
1.88 .
| |
> = >
|
\ .
= >
X
P X P
P Z


A standard normal curve with the area to the right of 1.88 = z is shaded below. By using the
standardization theorem, we are assured that the area shaded to the right of 105 when
( ) 90,8 X N is the same as the area shaded to the right of 1.88 when
( ) 0,1 Z N . When using
tables to calculate areas under the standard normal curve, it is important to know which area the
table provides. The most common type of standard normal table provides a cumulative
probability. That is, the table provides
( ) s P Z z for a constant z. Since it is quite rare to
observe an outcome more than three standard deviations from the mean when working with
normal random variables, these tables are typically given for all values of z such that 3 3 s s z .


Using a cumulative probability table for Z, we find that ( ) 1.88 .9699 s = P Z . Thus, by the
complement rule,

( ) ( ) 1.88 1 1.88 1 .9699 .0301 > = s = = P Z P Z .

We conclude that the chance of a mature pine tree in the East Texas region exceeding 105 feet is
approximately 3%.

Using a cumulative probability table for Z, we can:
- Find ( ) s P Z z by looking up the value of z in the margin and the desired probability is in
the body of the table.
- Find ( ) > P Z z by looking up the value of z in the margin, finding the probability in the
body of the table and then applying the complement rule.
22 | P a g e

- Find
( )
1 2
s s P z Z z by looking up the values of
1
z and
2
z in the margin, finding the two
probabilities in the body of the table and then subtracting the smaller probability from the
larger probability.

You should practice each of the three calculations with a variety of values for z until you are
confident.

Excel can make all necessary normal distribution calculations. Placing the cursor in any cell in
an Excel spreadsheet and typing the code

=NORM.DIST(value, mean, std deviation, TRUE)

will return the probability that s X value when X is a normal random variable having the values
of the mean and standard deviation indicated in your code. For instance, typing either

=NORM.DIST(105, 90, 8, TRUE) or
=NORM.DIST(1.88, 0, 1, TRUE)

produces a value of 97%. Remember that these two probabilities (shaded in the two pictures
above) are equivalent because of the standardization theorem. The complement rule can then be
applied to retrieve the 3% solution seen before.

To recap, the act of typing =NORM.DIST(z,0,1,TRUE) in an Excel spreadsheet is equivalent to
looking up a value of z in a cumulative standard normal table. Note, however, that Excel can
make the calculation for any normal random variable provided you type in the appropriate mean
and standard deviation. That is, Excel is aware of the standardization theorem and correctly
performs all necessary calculus. One last comment is in order before returning to the Salt Lake
City data: For continuous random variables, since there are uncountably many outcomes, one
does not have to worry about the difference in using s vs. < or > vs. >. Since we are
focused on area under curves, the inclusion or omission of one single value does not alter the
desired area. Be aware there is a difference between using s vs. < or > vs. > with
discrete random variables (such as the binomial) and one must be particularly careful about the
inclusion or exclusion of single values when working in discrete cases.


Approximation to the Sampling Distribution and P-value Calculation

We now know that the normal distribution can approximate a binomial distribution when the
number of trials ( ) n is large. To determine whether the data from Salt Lake City is suggestive
of the postponement theory, we were faced with a binomial distribution with 747 = n and
0.25 = p . The last step before we can calculate a p-value for our hypothesis test associated with
the Salt Lake City data is to determine which normal curve should be used to approximate our
binomial null distribution. We know that all normal random variables are categorized by their
mean and standard deviation, so how we achieve the proper approximation is to pick the normal
23 | P a g e

curve that has the same mean and standard deviation as the binomial random variable in use.
This is a simple task and is based on the following binomial random variable facts:

Mean and Standard Deviation of a Binomial Random Variable: A binomial random variable
based on n trials and probability of success p has mean = np and standard deviation
( ) 1 o = np p .

Thus, a binomial random variable with 747 = n and 0.25 = p has mean 186.75 = and standard
deviation 11.83 o = . The normal random variable that approximates this binomial random
variable should have the same mean and standard deviation. Therefore, our approximate
sampling distribution for
747
1 =
i
i
X is
( ) 186.75,11.83 N . Next, recall the definition of p-value:

p-value: The chance of observing the value of the statistic from your sample (or one more
extreme) if, in fact, the null hypothesis is true.

Our null and alternative hypothesis for Case Question 1B are

0
: .25 = H p (the population proportion is 25%)
: .25 <
A
H p (the population proportion is less than 25%).

Additionally, the data collected in the Salt Lake City obituaries produced 60 successes. The
most extreme evidence for
A
H would be zero successes. If this had occurred, then no one in the
sample would have died within the three months prior to their birthday. Everyone in the sample
would have postponed death until after their birthday. So, extreme evidence for the
alternative hypothesis in this case are low values of
747
1 =
=
i
i
X X . By this logic, our p-value is
( ) 60 s P X .

We know that the distribution of X is approximately ( ) 186.75,11.83 N and so our approximate p-
value is (using standardization)

( )
( )
186.75 60 186.75
60
11.83 11.83
10.71 .
| |
s = s
|
\ .
= s
X
P X P
P Z



It is very rare to observe a value of a standard normal random variable outside the bounds
3 3 s s z . So, the z-statistic of -10.71 that appears in our p-value calculation is incredibly
uncommon. While the value of 10.71 = z is so rare that it is off the charts in normal tables
listed in textbooks, Excel gives the approximate p-value to be

=NORM.DIST(-10.71, 0, 1, TRUE) =
27
4.57 X 10

.
24 | P a g e



Case Question 1B Concluding Decision

Our calculated p-value is very, very small, indeed! So, if the null hypotheses in Case Study 1B
is true, our sample produced an astronomically rare event. What we observed in the Salt Lake
City data is so rare, that we can no longer reasonably cling to the null hypothesis. If
0
H is true,
then why would we see such an incredibly rare outcome? Remember, the smaller the p-value,
the more evidence for
A
H . We have a very, very small p-value in Case Study 1B. Therefore,
the evidence is strong that we should reject the null hypothesis and claim that the postponement
of death theory is suggested by the Salt Lake City data. That is, assuming that the Salt Lake City
data was representative of the targeted population, we can no longer reasonably claim 0.25 = p .
Instead, under the assumption that our data is representative of the population, the evidence
collected points strongly to 0.25 < p .




Concepts and Terms To Review from Case Question 1B:


Population discrete random variable
Sample continuous random variable
Random variable normal random variable
Parameter density function
Statistic mean of a random variable
Population proportion standard deviation of a random variable
Sample proportion unimodal
Null hypothesis standardizing
Alternative hypothesis standard normal random variable
Lower tailed test cumulative probability
Upper tailed test complement rule
Sampling distribution mean and standard deviation of binomial random variable
Binomial random variable p-value


Statistical Follow-Up to Case 1B: An Introduction to Confidence Intervals

In addressing Case Questions 1A and 1B we developed a statistical inference procedure called a
hypothesis test. When conducting a hypothesis test, we set up a null and alternative hypothesis.
The null hypothesis is the hypothesis that the data do not indicate a change from the status quo
or the manner in which things have occurred in nature to this point. The alternative hypothesis is
a hypothesis of change it is often the hypothesis that the experimenter is hoping to
substantiate.

25 | P a g e

We assume that the null hypothesis is true as we progress through the testing procedure. We
decide on a test statistic, the value of which we can calculate from our observed data set. The
test statistic is some summary of the random variables central to the statistical problem and so
the test statistic has a probability distribution. This distribution describes the behavior of the
outcomes for the test statistic in repeated samples. The distribution of a test statistic when the
null hypothesis is assumed true is called the null distribution. From the null distribution, we
can use the definition of p-value to ascertain how likely or unlikely our sample results are if
0
H
is true. If the p-value is low, we have evidence for rejection of
0
H .

Aside from the hypothesis test procedure, another very popular statistical inference technique is
the creation of a confidence interval for the parameter of interest. The confidence interval
provides the experimenter with a reasonable range of guesses for the parameter based on the
data collected. Each confidence interval begins with the choice of a confidence coefficient. In
this section, we will choose this coefficient to be 95% (0.95). Reasons for this choice will be
discussed in Case Study #2, but recall that statisticians involved in statistical inference problems
generally dont label p-values as rare until they fall in the lower or upper 5%-10% of the null
distribution. When this statement is applied to the concept of confidence intervals, it can be
interpreted as saying that we typically choose confidence coefficients to be around 90%-95%.
Another popular choice is 99%.

Lets create a 95% confidence interval for a population proportion in the context of Case
Study 1B. Once we look specifically at the data from Case 1B, we will be in a position to
generalize the confidence interval procedure and develop a formula for a large sample
confidence interval for a population proportion.

In Case 1B, our test statistic was the total number of successes observed in the sample which was
denoted as X. The approximate null distribution of X was ( ) 186.75,11.83 N . The first step in
constructing a 95% confidence interval is to determine the 95% most frequent outcomes of the
test statistic, by looking for the central 95% portion of the null distribution. This is shaded in the
figure below. Notice that this leaves 2.5% of the area under the density curve in each tail of the
null distribution. Further, notice the null distribution is centered over the mean which is 186.75.



26 | P a g e

If the null hypothesis is true, and were able to repeatedly sample the target population, then the
value of the test statistic X would fall in the shaded region 95% of the time. So, what are the
boundaries of this shaded region? This is a key question one that is critical to discovering a
formula for our confidence interval.

While practicing how to make normal distribution calculations from a table or Excel, one
problem we could have encountered is to find
( ) 1.96 1.96 s s P Z . The answer to this
probability question is 95%. Try it using Excel or a table.

This means that 95% of the time, the outcome of a standard normal random variable falls within
the interval (1.96, -1.96). But, it has an even grander interpretation than this! Because of the
standardization theorem, the fact that
( ) 1.96 1.96 .95 s s = P Z means that 95% of the time, the
outcome of any normal random variable falls within 1.96 standard deviations of the mean. This
is very important to remember.

In developing the hypothesis test procedure for Case 1B, we approximated a binomial random
variable X with a normal random variable having the same mean and standard deviation. Since
the mean of a binomial random variable is np and the standard deviation is
( ) 1 np p , we can
standardize X in the following way:

( ) 1

X np
Z
np p
.

When n is large, this quantity is approximately standard normal. Because of this we can
substitute
( ) ( ) 1 X np np p for Z in the expression ( ) 1.96 1.96 .95 s s = P Z . This
substitution is the second step in arriving at an expression for a confidence interval for the
population proportion, p. When this substitution is made we obtain

( )
1.96 1.96 .95
1
| |

|
s s =
|

\ .
X np
P
np p


The inequalities inside the parenthesis can be algebraically rearranged. This algebraic
rearrangement will not change the truth of the probability statement; instead, it will simply make
the expression look different. Rearranging the above expression can produce

( ) ( ) 1 1
1.96 1.96 .95
| |

| s s + =
|
\ .
p p p p
X X
P p
n n n n
.

For the curious reader, the algebra required to obtain the second rearrangement from the first is:
multiply all terms by ( ) 1 np p , subtract X from each resulting term, divide all of what you
obtain by n and then reverse the inequality signs since you divided by a negative value.
27 | P a g e

Essentially what has been done is that the p in the np term from the numerator has been
algebraically isolated in the center of the inequality. Isolating parameters in inequalities like this
is the mathematical procedure by which confidence interval formulas can be determined. The
isolation of the parameter through the use of algebra is the third and final step in developing
confidence interval formulas.

Despite the fact that the value of p is isolated in the center of our inequality, it unfortunately
also appears in the endpoints. That is, p still appears in the term on the left and on the right in
the string of inequalities. This does not always happen when statisticians derive confidence
interval formulas. We will see in Case 2 that once the parameter is isolated, it does not appear in
the endpoints. However, when the parameter does appear in the endpoints, the typical resolution
is to replace the parameter by its estimating statistic in order to complete the formula.

We know that the population proportion p is estimated by the sample proportion = p X n . So,
replacing p by p we finally reach the expression for a large sample confidence interval for a
population proportion:

Large Sample 95% Confidence Interval for a Population Proportion: Estimating a
population proportion, p, by collecting data on a large number of independent Bernoulli trials can
be done using the expression
( )
1
1.96

p p
p
n
.

Here, the sample proportion is given by = p X n where X is the total number of successes
observed among the n Bernoulli trials.

When the formula above is applied to the Salt Lake City data set, we get

( )( )
60 687
747 747
60
1.96
747 747
.

After some brief calculator work, we conclude that the 95% confidence interval for the
proportion of people in the greater Salt Lake City area that die within the three months prior to
their birthday is ( ) .061,.100 . Based on the sample of 747 Salt Lake City obituaries, we believe
that a reasonable range of guesses for the proportion of people that die within the three months
prior to their birthday is 6.1% to 10.0%. We say that we are 95% confident that the value of p
is between .061and .100.

Remember that p is a parameter. Thus, it is an unknown constant that we will never know
exactly unless we observed EVERY member of the population. For this reason, we do not speak
of a 95% probability of p being between two values. Such a phrase is just as nonsensical as
saying that there is a 95% chance that 5 is between 6 and 7or that there is a 95% chance that 5
is between 1 and 2. Either the true, unknown value of p is or is not between .061 and .100.
Based on our sample, it is reasonable to estimate p to be between .061 and .100. Since p is a
28 | P a g e

constant, we refrain from using the word probability in reference to p. Thus, the necessity for
a new word that word being confident. The confidence that we have actually isnt in the
two numbers that comprise our confidence interval. Interpreting a confidence interval is never
about you or about the results that you personally obtained from your sample. Instead, the
confidence is in the mathematical procedure that has been outlined above.

To get a better handle on the interpretation of a confidence interval, imagine that 100 friends
each had collected a data set of the same size as that of the Salt Lake City data set. Then 95%
confident means that if all 100 friends follow the procedure outlined above while using their
sample data, then we could expect 95% of our friends to generate a confidence interval that
would include the true, unknown value of p. This leaves 5% of our friends to have an interval
that would not include the true, unknown value of p. This is as far as the interpretation can go
because since p is unknown (we are trying to estimate it). We wont know which friends trapped
p in their interval and which friends did not. However, despite not knowing who got it and who
didnt we are confident that 95% did!


Concepts and Terms To Review from the Follow-Up to Case Question 1B:

Hypothesis test confidence interval
Test statistic confidence coefficient
Null distribution large sample confidence interval for population proportion
p-value interpretation of confidence interval

Você também pode gostar