Você está na página 1de 26

Week 7: Lecture 2

Parametric Hypothesis Testing


Hypothesis testing is a methodology that allows an experimenter to assess the
plausibility or credibility of a specific statement. There are two broad methodologies -
parametric and non-parametric testing. Parametric testing is characterised by the construction
of simple estimators of population parameters such as the sample mean and involves either
making strong distributional assumptions or using large samples. In non-parametric testing, no
assumptions or very minimal assumptions are required, and no parameters require estimation.

A. Basic concepts: An Analogy with the Justice System.

The Jury scenario involves a defendant sitting in the dock of a high court accused of
murder. The defendant has pleaded not guilty. Evidence (or data), such as motive, results from
lie detection testing, DNA and other forensic information is gathered and put in front of the
Jury. Based on this evidence the Jury needs to answer one basic question: is the defendant
guilt? This is a binary situation that can be expressed via two competing hypotheses. The
alternative hypothesis (Ha), as the name suggests, is the opposite of the null hypothesis (Ho),
so the answer to the above question can be formalised by deciding which of these hypotheses
are true given the evidence:

H0: The defendant is not guilty.


Ha: The defendant is guilty.

This question can never be answered with 100% certainty acceptance or rejection of
the null hypothesis is not error free. The Jury system is not fool proof and there are well known
documented cases of miscarriages of Justice (the Guildford Four were wrongly convicted in
1974 of planting bombs in various pubs in Guildford and Woolwich. Their convictions were
quashed in 1989. On February 9, 2005, British Prime Minister Tony Blair issued a public
apology to the Guildford Four for the "miscarriages of justice they had suffered).

Such errors can occur because defendants may not be telling the truth when questioned
under oath and/or the evidence is imperfect. A motive may been found, but that does not mean
he or she definitely did commit murder. The defendant may have passed a lie detector test, but
given the imperfections of this test that does not mean he or she is innocent. The defendants
finger prints may have been found at the murder scene, but the defendant may have had a
legitimate reason for being in this location at a time different from the time of death. There
may be DNA evidence against the defendant but even this also has a small error rate associated
with it.

This uncertainty can be formalised in the following way. At the end of the trial one of two
outcomes will have happened:

1. The jury produces a guilty verdict or


2. The jury produces a not guilty verdict.

The following table summarises whether these verdicts are a good or bad outcome.
Scenario 2 will be a good outcome if the defendant really was innocent. Scenario 1 will be a
good outcome if the defendant really did carry out the murder. However, if the defendant was
a liar and did actually kill somebody, then under scenario 2 that person will walk free from
court and so the public remain exposed to the horrors that this person is capable of. Here the
null hypothesis was accepted by the Jury even though it was false. This is called a Type 1I
error. Then again, if the defendant was a not a liar and did not actually kill somebody, then
under scenario 1 he or she will be sent to jail with the public remaining exposed to danger
because the real murderer is still at large (not to mention the injustice of removing liberties
from an innocent person). Here the null hypothesis was rejected by the Jury even though it was
true. This is called a Type 1 error.

H0 is true H0 is false
No error Type II error
H0 is accepted

Type I error No error


H0 is rejected

Given the inability to remove Type I and Type II errors, Judges instruct Jurors to come to
a decision based on the legal concept of the balance of probabilities: the truth can never be
known with certainty. In terms of the above errors, this legal concept can be formalised as the
Judge saying only reject the null hypothesis if it is highly implausible given the evidence or
data presented (i.e. if the evidence or data has very little affinity with the null hypothesis). That
is, only reject the null hypothesis if the chances of making a Type I error are small. The
probability of making a Type I error is called the significance level and is given the symbol .

All that is now needed is a working definition of implausibility how small should be?
A commonly used value for is 0.05. So the null hypothesis is considered implausible if the
probability of it being true is no more than 5%. This is not such a bad definition as it
corresponds to the chances of getting between 4 and 5 heads in a row when tossing a balanced
coin. Other commonly used values for the significance level are 0.1 and 0.01. (Physicists, when
recently trying to prove the presence of the God particle, used an value much less than
0.01).

Related to the significance level is the concept of power. The power of a test is defined to
be the probability of the null hypothesis being rejected when it is false and so

Power = 1 - probability of making a Type II error

Power depends on the sample size and assumptions made. Tests based on the normal
distribution are more powerful than those that are not, provided the data is actually normally
distributed.

A decision criterion is needed to decide whether to accept or reject the null hypothesis.
One criteria often used is based on the p-value. The p-value for a particular null hypothesis H0
based on the observed data (evidence) is defined to be the probability of obtaining that
observed data (or evidence) set or worse when the null hypothesis is true. A worse data set (or
set of evidence) is one that is less consistent with or has less affinity with the null hypothesis.

A p-value smaller than = 0.05 reveals that if the null hypothesis H0 is true, then the
chances of observing the kind of data actually collected (or worse) is less than 5 in 100. If the
null hypothesis is true, it is unlikely or implausible that a Jury sees the kind of evidence actually
presented to them in court. So the Jury, by deduction, could only conclude that the null
hypothesis is implausible. On the other hand, a p-value in excess of = 0.05 reveals that if the
null hypothesis H0 is true, then the chances of observing the king of data actually collected (or
worse) is at least 5 in 100. In other words, if the null hypothesis is true then it is not at all
unlikely that a Jury sees the kind of evidence presented to them in court. Consequently, the null
hypothesis is a plausible statement and should be accepted.

One decision criteria is therefore only reject the null hypothesis if it is implausible, i.e.
only when the p-value is less than the significance level. This illustration reveals that the p-
value can also be defined as the probability of the null hypothesis being true with the
significance level defining what is implausible.

Another decision criteria is based on the test statistic. A test statistic measures the size
of the discrepancy between the observed evidence (or data) seen by the Jury and the null
hypothesis. If this discrepancy is large enough, i.e. if the test statistic is large enough, the null
hypothesis is rejected. A critical value for the test statistic is chosen so that if the test statistic
exceeds the critical value, the null hypothesis is rejected. To ensure that this criteria comes to
the same conclusion as that based on the p-value, the critical value must be defined in a specific
way. The critical value for a test statistic is that value that makes the probability of making a
Type I error small, i.e. that makes the probability of making a Type I error no more than the
significance level ".

B. A Research Engineer.

Surprisingly research or industrial engineers often find themselves faced with the same
scenarios that Jurors face. Consider a scenario in which an industrial engineer is in charge of a
chemical distillation process where the engineer needs to control the oxygen purity at an
average of 90% by deciding on the percentage of hydocarbons to be present in the main
condenser (hydrocarbon content affects the oxygen content). To do this the engineer must
collect evidence or data by setting the hydrocarbon content at a certain value (for example
1.2%) and then measuring the oxygen content of the distilled compound.

However, the engineer will face uncertainty about the true or population average oxygen
content (just like a juror is uncertain about whether the defendant lied when asked if he was
guilty). The engineer cannot know the population mean because that would involve measuring
oxygen content from this refinery operating at 1.2% hydrocarbon over the plants entire life.
But the engineer needs to control the oxygen content now, i.e. needs to know the true average
content now when using 1.2% hydrocarbons. So, the best the engineer can do is to collect a
sample of observations, over a short time interval, from this population. But the average from
this sample may or may not correspond to the true population average. Just like the guilt of a
defendant is unsure to a Juror, so is the population mean oxygen content is not known with
certainty to the engineer.

The engineer therefore needs to hypothesize about the population mean, . Given the need
in this illustration to control the process at an average oxygen content of 90% the null and
alternative hypotheses would take the form: either the population or true average is 90% or it
is not

H0: = 90 versus Ha: 90


This is an example of a two-sided hypothesis because the alternative allows for the
population average to be either side of that specified under the null hypothesis. If in turn the
industrial engineer considers an implausible hypothesis as one that has at most a 5% chance of
occurrence, then = 0.05 (the engineer wishes to limit the chances of making a Type I error to
at most 5%).

In general, it is possible to form three different types of competing hypothesis:

H0: = 0 versus Ha: 0 A two-sided hypothesis.


H0: 0 versus Ha: > 0 A one-sided hypothesis.
H0: 0 versus Ha: < 0 A one-sided hypothesis.

for a specified value of the true population mean, 0 (in the above illustration 0 = 90%). In the
one-sided versions, allowance is made for the population average to be either below or above
(but not either side) of 0.

C. Test Z Test Statistic.

This should only be used when the population variance is known. It should also only
be used when certain other assumptions or conditions are meet. As we saw in week 5, if the
parent population is normally distributed (in this example if oxygen content is normally
distributed), then the means calculated from numerous random samples of size n also have a
normal distribution with a mean equal to the parent population mean () but a standard
deviation equal to the parent population standard deviation divided by the square root of the
sample size. This is illustrated in the top two graphs below. The top graph shows the parent
population that is described by the normal distribution with a population mean of 0 = 90 (the
peak point of the distribution) and a population variance of = 20. Notice the parent
population is drawn as if the null hypothesis is true because it peaks 0 = 90. The variable X
on the horizontal axis is the oxygen content. The second figure shows the distribution for the
sample means x when calculated from samples of size n = 25 (note x is now shown on the
horizontal axis). Notice the distributions for x are drawn as normal distributions (bell shaped
and symmetric around the mean) because if the parent population is normal, any average
calculated from a sample of data taken from this population will also be normally distributed.
This is true no matter how small the sample size is.

The blue bell shaped distribution is scaled to the top horizonal axis and shows the
distribution of the sample means is much tighter than that of the parent population because the
sample mean has a variance /n or 20/25 = 0.8 (which is much less than 20 the standard
deviation of the parent population from which the sample was taken). The black bell shaped
distribution is also the distribution of the sample means but re-scaled along the bottom
horizontal axis (so it is not really wider than the blue distribution).
0.025

0.02

0.015

0.01

0.005

X
0
-10.00 10.00 30.00 50.00 70.00 90.00 110.00 130.00 150.00 170.00 190.00

-10.00 10.00 30.00 50.00 70.00 90.00 110.00 130.00 150.00 170.00 190.00
0.12 0.12

X
Sample mean 0.1 0.1

0.08 0.08

0.06 0.06

0.04 0.04

0.02 0.02

ample means
0 0

70.00 74.00 78.00 82.00 86.00 90.00 94.00 98.00 102.00 106.00 110.00

In summary the Z test can only be used if:

1. The population variance is known and


2. The sample mean has a normal distribution. This will be so if either the population from
which the sample was taken is normally distributed (even when the sample size is very small)
or, if the population is not normally distributed, the sample mean is computed from a large
sample taken from the population (as then the central limit theorem guarantees that x will be
normally distributed). For practical purposes a large sample corresponds to n>30.

i. A one-sided problem.

The sheet Oxygen Purity in Excel file Data Examples contains an actual sample of
data collected by the engineer from the distillation column (i.e. collected from the population)
obtained when the hydrocarbon content was at 1.2%. A total on n = 25 observations were taken.
The sample mean is x = 88.92% and the sample standard deviation is 4.28%. Suppose it is also
known that the population variance is 2= 20. Consider the one-sided hypothesis
H0: 0 versus Ha: < 0

where 0 = 90% and the engineer views an implausible outcome as one with at most a 5%
chance of happening (= 0.05). The discrepancy between the sample data set and the null
hypothesis is then measured through the Z test statistic

n(x 0 )
Z =

Clearly the discrepancy is smallest when x = 0 as this gives a test statistic of Z* = 0.


Such a value indicates that the sample mean coincides exactly with the hypothesised value of
the population mean 0 . The discrepancy between the sample data set and the null hypothesis
then increases as the value of this test statistic decreases. This is illustrated in the line graph
below where there is more discrepancy between the sample mean calculated from data set II
and the population mean value specified by H0 than between the sample mean calculated from
data set I and the population mean value specified by H0. Given the null hypothesis is of the
form worse data sets than data set II have values of the test statistic in the bold region
and so

p-value = P(Z Z*)

Data set I
-Z* 0

Data set II
-Z* 0

-Z* 0

"Worse" sample averages calculated from worse data sets than


data set II have values of the test statistic in this bold region

Measuring discrepancy in the specific way given by Z* is no coincidence as illustrated


in the figure below. It helps quantify the p-value and the critical value. Recall that standardising
the sample mean x, is achieved by subtracting the mean value for x and dividing through by its
standard deviation. Thus Z*, as defined above, is the standardised value of the sample mean
provided the null hypothesis is true (i,e, provided the population mean is 0). If x is normally
distributed then Z* has a standard normal distribution. The third graph below shows this
standardised normal distribution, where Z is shown on the bottom horizontal axis and the
unstandardized sample means on the top horizontal axis. The standardised value of x = 88.92%
is
30(88.9290)
Z = = -1.21
20
This Z test statistic is shown by the red square on the horizontal axis of the bottom
graph (corresponding to the sample mean shown by the red square on the horizontal axis of the
middle graph). Because Z follows a standard normal distribution, P(Z Z*) or the p-value can
be obtained from the Z tables introduced in earlier lectures or values from that table can be
looked up in Excel using

=(1- NORM.S.DIST(ABS(Z*),TRUE))

where ABS(Z*) = |Z*| is the absolute value of Z*. Inserting the Z test statistic value into this
formula in Excel gives

=(1- NORM.S.DIST(ABS(-1.21),TRUE)) = 0.113

This probability equals the dotted shaded area under the distribution to the left of the
test statistic in the third graph below. Thus, the p-value = 0.113 is the probability that the
engineer could collect a sample of n = 25 observations that produces a sample mean of 88.92
or less (i.e. or worse) when the population mean really is that described by H0 (i.e. when =
90 or more). That is, there is a 11.35% chance of observing sample averages that are that far
below the hypothesised true mean of = 90 or more. This is above the definition of
implausibility (a probability of at most = 0.05) meaning its plausible to see samples means
this far from the hypothesised true mean of 90% or more. So the engineer actually getting a
sample average x = 88.92 would not be surprising to him/her and so does not provide sufficient
evidence to reject the view that the population average is 90% or more. The engineer has
observed x = 88.92 and this is plausible (could realistically have occurred) if the population
mean is at least 90. The only way the engineer could then logically explain getting x = 88.92
is to accept the null hypothesis and reject the alternative that the true mean is less than 90
(which would make getting the sample average as low as 88.92 less likely).

Put differently, the p-value = 0.1135 is the probability of the null hypothesis being true,
i.e. the probability that the true population mean is 90 or more. As this probability is above =
0.05, the engineer considers this a plausible hypothesis and so the null hypothesis is accepted.

Another criteria is based on the critical value. The is worked out from the significance
level. Simply identify in the main body of the Z table and read of the corresponding Z value.
Denote this value by Z. Z is called the critical value. There is then a (100) percent change
of the Z test statistic being less that (or equal to) ZSo then null hypothesis is only rejected if
the test statistic is less than this critical value, i.e. if Z* < ZZ can also be found in Excel
using

=NORM.S.INV()

Using = 0.05 gives

-1.64 = NORM.S.INV()

This is shown in the third graph below by the vertical red line. Thus ZWith
Z = -1.21,Zand so the engineer accepts the null hypothesis. This is because observing
*

Z* = -1.21 has more than an = 0.05 chance of occurring (it is not an implausible value) and
because Z* was constructed using a population mean of 90, it follows logically that = 90 or
is also a plausible set of value.

0.113
ii. Another one-sided problem.

Consider the one-sided hypothesis

H0: 0 versus Ha: > 0

where 0 = 87% and the engineer views an implausible outcome as one with at most a 5%
chance of happening (= 0.05). The discrepancy between the sample data set and the null
hypothesis is again measured through the Z test statistic

n(x 0 )
Z =

Clearly the discrepancy is smallest when x = 0 as this gives a test statistic of Z* = 0.


Such a value indicates that the sample mean coincides exactly with the hypothesised value of
the population mean 0 . The discrepancy between the sample data set and the null hypothesis
then increases as the value of this test statistic increases. This is illustrated in the line graph
below where there is more discrepancy between the sample mean calculated from data set II
and the population mean value specified by H0 than between the sample mean calculated from
data set I and the population mean value specified by H0. Given the null hypothesis is of the
form worse data sets than data set II have values of the test statistic in the bold region
and so

p-value = P(Z Z*)

Data set I
0 Z*

Data set II
0 Z*

0 Z*

"Worse" sample averages calculated from worse data sets than


data set II have values of the test statistic in this bold region

The standardised value of x = 88.92% is

30(88.9287)
Z = = 2.15
20
This Z test statistic is shown by the red square on the horizontal axis of the bottom
graph (corresponding to the sample mean shown by the red square on the horizontal axis of the
middle graph). Because Z follows a standard normal distribution P(Z Z*) can be obtained
from the Z tables introduced in earlier lectures or values from that table can be looked up in
Excel using

=(1- NORM.S.DIST(ABS(Z*),TRUE))

where ABS(Z*) = |Z*| is the absolute value of Z*. Inserting the Z test statistic value into this
formula in Excel gives

=(1- NORM.S.DIST(ABS(2.15),TRUE)) = 0.016.

This probability equals the dotted shaded area under the distribution to the right of the
test statistic in the third graph below. Thus the p-value = 0.016 is the probability that the
engineer could collect a sample of n = 25 observations that produces a sample mean of 88.92
or more (i.e. or worse) when the population mean really is that described by H0 (i.e when =
87 or less). That is, there is only a 1.6% chance of observing sample averages that are that far
above the hypothesised true mean of = 87 or less. This is below the definition of
implausibility (a probability of at most = 0.05) meaning its implausible to see samples means
this far above the hypothesised true mean of 87% or less. So the engineer actually getting a
sample average x = 88.92 would be surprising to him/her and so does provide sufficient
evidence to reject the view that the population average is 87% or less. The engineer has
observed x = 88.92 and this is implausible (should not have occurred) if the population mean
is no more than 87. The only way the engineer could then logically explain getting x = 88.92
is to reject the null hypothesis and accept the alternative that the true mean is above 87 (which
would make getting the sample average as high as 88.92 more likely).

Put differently, the p-value = 0.016 is the probability of the null hypothesis being true,
i.e. the probability that the true population mean is 87 or less. As this probability is below =
0.05, the engineer considers this a implausible hypothesis and so the null hypothesis is rejected.

For this one sided test simply identify 1- in the main body of the Z table and read of
the corresponding Z value. Denote this value by Z1-. Z1- is called the critical value. There is
then a (100) percent change of the Z test statistic being more than Z1-So then null hypothesis
is only rejected if the test statistic is more than this critical value, i.e. if Z* > Z1-Z1- can also
be found in Excel using

=NORM.S.INV(1-)

Using = 0.05 gives

1.64 = NORM.S.INV(1-)

This is shown in the third graph below by the vertical red line. Thus Z1-With
Z* = 2.15,Z1-and so the engineer rejects the null hypothesis. This is because observing
Z* = 2.15 has less than an = 0.05 chance of occurring (it is therefore an implausible value)
and because Z* was constructed using a population mean of 87, it follows logically that = 87
or less is also not a plausible set of values.
0.025

0.02

0.015

0.01

0.005

0
-13.00 7.00 27.00 47.00 67.00 87.00 107.00 127.00 147.00 167.00 187.00

-13.00 7.00 27.00 47.00 67.00 87.00 107.00 127.00 147.00 167.00 187.00
0.12 0.12

Sample mean 0.1 0.1

0.08 0.08

0.06 0.06

0.04 0.04

0.02 0.02

ample means
0 0

67.00 71.00 75.00 79.00 83.00 87.00 91.00 95.00 99.00 103.00 107.00

67.00 71.00 75.00 79.00 83.00 87.00 91.00 95.00 99.00 103.00 107.00
0.12 0.12

Test statistic
ample means
Critical value(s)
0.1 0.1

0.08 0.08

0.06 0.06

0.04 0.04

0.016
0.02 0.02

0 0
-5.0

-4.0

-3.0

-2.0
-1.5
-1.0
-0.5
-4.5

-3.5

-2.5

0.5

1.5

2.5

3.5

4.5
0.0

1.0

2.0

3.0

4.0

5.0

Z or student t value
iii. Two-sided problems

Consider the two sided hypothesis testing problem

H0: = 0 versus Ha: 0

where 0 = 90% and the engineer views an implausible outcome as one with at most a 5%
chance of happening (= 0.05). The discrepancy between the sample data set and the null
hypothesis can be measured through the Z test statistic

n(x 0 )
Z =

Clearly the discrepancy is smallest when x = 0 as this gives a test statistic of Z* = 0.
Such a value indicates that the sample mean coincides exactly with the hypothesised value 0
of the population mean. The discrepancy between the sample data set and the null hypothesis
then increases as the absolute value of this test statistic increases (|Z*| is used to denote the
absolute value for Z*). This is illustrated in the line graph below where there is more
discrepancy between the sample mean calculated from data set II and the population mean
value specified by H0 than between the sample mean calculated from data set I and the
population mean value specified by H0. Give the null hypothesis is of the form = worse
data sets than data set II have values of the test statistic in the bolder regions and so

p-value = P(Z |Z*|) + P(Z -|Z*|)

Data set I
0 |Z*|

Data set II
0 |Z*|

-|Z*| 0 |Z*|

"Worse" sample averages calculated from worse data sets than


data set II have values of the test statistic in these regions

The standardised value of x = 88.92% is

30(88.9290)
Z= = -1.21
20

This Z test statistic is shown by the red square on the horizontal axis of the bottom
graph (corresponding to the sample mean shown by the red square on the horizontal axis of the
middle graph). Because the standard normal distribution is symmetric around zero, it follows
that

p-value = P(Z |Z*|) + P(Z -|Z*|) = 2P(Z |Z*|)

and because Z follows a standard normal distribution this probability can be obtained from the
Z tables introduced in earlier lectures or values from that table can be looked up in Excel using

=2*(1- NORM.S.DIST(ABS(Z*),TRUE))

Inserting the Z test statistic value into this formula (without the multiplication by 2) in
Excel gives

=2*(1- NORM.S.DIST(ABS(-1.21),TRUE)) = 0.227

Half this probability equals the dotted shaded area under the distribution to the left of
the test statistic in the third graph below. As such 0.113 gives the chances that the engineer
could collect and observe a sample of n = 25 observations that produce a sample mean of x =
88.92 (that corresponds to Z = -1.21) or less when the population mean is that described by the
H0 (i.e. when = 90). Less here means is further below the hypothesised mean and so worse.
Because the normal distribution is symmetric it also follows that 0.113 is also the chances of
Z being more than 1.23 that corresponds to x = 91.08 or more. As such 0.113 gives the chances
that the engineer could collect and observe a sample of n = 25 observations that produce a
sample mean of x = 91.08 (that corresponds to Z = 1.23) or more when the population mean
is that described by the H0 (i.e. when = 90). More here means is further above the
hypothesised mean and so worse.

Thus p-value = 2*0.113 = 0.227 is the probability that you could collect and observe a
sample of n = 25 observations that produces a sample mean outside the range 88.92 to 91.08
when the population mean is that described by the H0 (i.e when = 90). This is the sum of the
two shaded areas in the third graph below. That is, there is a 22.7% chance of observing sample
averages that are this far above or below the hypothesised true mean of 90. This is well above
the definition of implausibility (probability of at most = 0.05) meaning its plausible to see
samples means this far or more above and below the hypothesised true mean of 90%. So the
engineer actually getting a sample average x = 88.92 would not be surprising to him/her and
so does not provide sufficient evidence to reject the view that the population average is 90%.
The engineer has observed x = 88.92 and this is plausible (could have occurred) if the
population mean is 90. The only way the engineer could then logically explain getting x =
88.93 is to accept the null hypothesis and reject the alternative that the true mean is different
from 90 (which would make getting the sample average different from 88.92 less likely).

Put differently, the p-value = 22.7% is the probability of the null hypothesis being true,
i.e. the probability that the true population mean is 90. As this probability is above = 0.05,
the engineer considers this a plausible hypothesis and so the null hypothesis is accepted.

The same conclusion can be obtained using the critical values. For this two-sided test
first identify in the main body of the Z table and read of the corresponding Z value. Denote
this value by Z. Zis called the lower critical value. Next identify 1- in the main body
of the Z table and read of the corresponding Z value. Denote this value by Z1-. Z1-is called
the upper critical value There is then a (100) percent change of the Z test statistic being
more than Z1-or less than ZSo then null hypothesis is only rejected if the test statistic is
less than the lower critical value or more than the upper critical value, i.e. if ZZ* > Z1-
Z can also be found in Excel using

=NORM.S.INV()

Using = 0.05 gives

-1.96 = NORM.S.INV()

Because the standard normal distribution is symmetric around zero, it follows that Z1-
These critical values are shown in the third graph below by the vertical red lines.
With Z = -1.23,Zand so the engineer accepts the null hypothesis. This is because
*

observing Z* = -1.21 or less has more than an = 0.05/2 chance of occurring (it is therefore a
plausible value) and indeed observing Z* = 1.21 or more has more than an = 0.05/2 chance
of occurring. Because Z* was constructed using a population mean of 90, it follows logically
that = 90 is also a plausible value. Notice the engineer looks either side of the distribution
because the alternative hypothesis states the population mean is different from 90 (could be
more or less than 90).
0.025

0.02

0.015

0.01

0.005

X
0
-10.00 10.00 30.00 50.00 70.00 90.00 110.00 130.00 150.00 170.00 190.00

-10.00 10.00 30.00 50.00 70.00 90.00 110.00 130.00 150.00 170.00 190.00
0.12 0.12

Sample mean 0.1 0.1

0.08 0.08

0.06 0.06

0.04 0.04

0.02 0.02

ample means
0 0

70.00 74.00 78.00 82.00 86.00 90.00 94.00 98.00 102.00 106.00 110.00

70.00 74.00 78.00 82.00 86.00 90.00 94.00 98.00 102.00 106.00 110.00
0.12 0.12

Test statistic
ample means
Critical value(s)
0.1 0.1

0.08 0.08

0.06 0.06

0.04
0.113
0.04

0.113

0.02 0.02

0 0
-5.0

-4.0

-3.0

-2.0
-1.5
-1.0
-0.5
-4.5

-3.5

-2.5

0.5

1.5

2.5

3.5

4.5
0.0

1.0

2.0

3.0

4.0

5.0

Z or student t value
D. The t Test Statistic.

This test statistic should be used when the population variance is not known. The
procedure is then to use the sample standard deviation in place of the population standard
deviation in the calculation of the test statistic. This leads to a new test statistic called the t test
statistic

n(x 0 )
t =
s

When the sample size is small (n 30), and as shown in the last lecture, this test statistic
has a student t distribution with v = n 1 degrees of freedom if and only if the sample from
which x and s are calculated come from a normal distribution. But when the sample size is
large this test statistic will also have a student t distribution with n - 1 degrees of freedom even
if the sample was taken from a non-normal population. When n is larger than 30 the t
distribution is very close to the standard normal distribution.

i. A one-sided problem.

You are an engineer responsible for safety at a power generating plant and need to
replace some pipework. To prevent failure, the steel pipework must have a tensile strength of
at least 2.99 MPa. You have sourced a manufacturer who claims their 1.25Cr pipes meet this
criteria. You test n = 12 specimens supplied by the manufacturing - See sheet Alloy Hardness
in Excel file Data Examples for the results of these tests. The sample mean is x = 2.8758
MPa and the sample standard deviation is 0.2569 MPa. To provide and answer to the above
problem the following null hypotheses:

H0: 0 versus Ha: < 0

are appropriate, where 0 = 2.99. Suppose the engineer views an implausible outcome as one
with at most a 10% chance of happening (= 0.1). The discrepancy between the sample data
set and the null hypothesis is then measured through the t test statistic

n(x 0 )
t =
s

Given the null hypothesis is of the form the p-value is given by

p-value = P(t |t*|)

If x is normally distributed then t* has a student t distribution with v = n 1 = 12 1 =


11 degrees of freedom. Given the small sample size, x will only be normally distributed if
hardness itself is normally distributed. Under this assumption, the third graph below shows this
t distribution, where t is shown on the bottom horizontal axis and the unstandardized sample
means on the top horizontal axis. The standardised value of x = 2.8758 is

12(2.87582.99)
t = = -1.539
0.2569
This t test statistic is shown by the red square on the horizontal axis of the bottom graph
(corresponding to the sample mean shown by the red square on the horizontal axis of the middle
graph). Because t* follows a student t distribution under the normality assumption, this p-value
can be (approximately) obtained from the t table introduced in the last lecture,

Reading along the 11 degrees of freedom row, the two numbers closest to the t statistic
test value are highlighted in blue. Reading the probabilities shown along the top row of these
two columns shows that the probability of observing these t values or less is between 0.05 and
0.10. The p-value is the probability of getting these t statistic test values or less (or worse) and
so the p-value is also somewhere between 0.1 and 0.05. Its exact value can be found in Excel
using

=(1- T.DIST(ABS(t*),v,TRUE))

where ABS(t*) = |t*| is the absolute value of t* and v is the degrees of freedom. Inserting the t
test statistic and v values into this formula in Excel gives

=(1- T.DIST(ABS(-1.539),12-1,TRUE)) = 0.076

This probability equals the dotted shaded area under the distribution to the left of the
test statistic in the third graph below. Thus the p-value = 0.076 is the probability that the
engineer could receive a sample of n = 12 observations that produces a sample mean of 2.8758
or less (i.e. or worse) when the population mean really is that described by H0 (i.e when =
2.99 or more). That is, there is a 7.6% chance of observing sample averages that are that far
below the hypothesised true mean of = 2.99. This is below the definition of implausibility
(a probability of at most = 0.1) meaning its implausible to see samples means this far below
the hypothesised true mean of 2.99 MPa or more. So the engineer actually getting a sample
average x = 2.8758 would be surprising to him/her and so does provide sufficient evidence to
reject the view that the population average is 2.99 MPa. The engineer has observed x = 2.8758
and this is implausible (should not have occurred) if the population mean is 2.99 or more. The
only way the engineer could then logically explain getting x = 2.8758 is to reject the null
hypothesis and accept the alternative that the true mean is below 2.99 (which would make
getting the sample average as low as 2.8758 much more likely).

Put differently, the p-value = 0.076 is the probability of the null hypothesis being true,
i.e. the probability that the true population mean is 2.99 or more. As this probability is below
= 0.1, the engineer considers this an implausible hypothesis and so the null hypothesis is
rejected.

Another criteria is based on the critical value. This is worked out from the significance
level. Simply identify along the top row of the student t table and read of the corresponding
t value where this column intersects the row with 11 degrees of freedom. Denote this value by
t. t is called the critical value and in this case t. There is then a (100) percent
change of the t test statistic being less than tSo the null hypothesis is only rejected if the test
statistic is less than this critical value, i.e. if t* < tt can also be found in Excel using

= T.INV(,v)

Using = 0.1 gives

-1.363= T.INV()

This is shown in the third graph below by the vertical red line. Thus tWith
t* = -1.539,ttand so the engineer rejects the null hypothesis. This is because observing
t* = -1.539 has less than an = 0.1 chance of occurring (it is an implausible value) and because
t* was constructed using a population mean of 2.99, it follows logically that = 2.99 is also a
implausible value.
1.8

1.6

1.4

1.2

0.8

0.6

0.4

0.2

0
1.71 1.96 2.22 2.48 2.73 2.99 3.25 3.50 3.76 4.02 4.27

1.71 1.96 2.22 2.48 2.73 2.99 3.25 3.50 3.76 4.02 4.27
6 6

X
Sample mean 5 5

4 4

3 3

2 2

1 1

ample means
0 0

2.62 2.69 2.77 2.84 2.92 2.99 3.06 3.14 3.21 3.29 3.36

2.62 2.69 2.77 2.84 2.92 2.99 3.06 3.14 3.21 3.29 3.36
6 6

Test statistic
ample means
Critical value(s)
5 5

4 4

3 3

2 2

0.076

1 1

0 0
-5.0

-4.0

-3.0

-2.0
-1.5
-1.0
-0.5
-4.5

-3.5

-2.5

0.5

1.5

2.5

3.5

4.5
0.0

1.0

2.0

3.0

4.0

5.0

Z or student t value
ii. Another one-sided problem.

You are an engineer responsible for safety at a power generating plant and need to
replace some pipework. The steel pipework develops cracking issues when bending into shape
if the tensile strength reaches 2.75 MPa or more. You have sourced a manufacturer who claims
their 1.25Cr pipes meet this criteria. You test n = 12 specimens supplied by the manufacturing
- See sheet Alloy Hardness in Excel file Data Examples for the results of these tests. The
sample mean is x = 2.8758 MPa and the sample standard deviation is 0.2569 MPa. To provide
and answer to the above problem the following null hypotheses:

H0: 0 versus Ha: > 0

are appropriate, where 0 = 2.75. Suppose and the engineer views an implausible outcome as
one with at most a 1% chance of happening (= 0.01). The discrepancy between the sample
data set and the null hypothesis is then measured through the t test statistic

n(x 0 )
t =
s

Given the null hypothesis is of the form < the p-value is given by

p-value = P(t |t*|)

If x is normally distributed then t* has a student t distribution with v = n 1 = 12 1 =


11 degrees of freedom. Given the small sample size, x will only be normally distributed if
hardness itself is normally distributed. Under this assumption, the third graph below shows this
t distribution, where t is shown on the bottom horizontal axis and the unstandardized sample
means on the top horizontal axis. The standardised value of x = 2.8758 MPa is

12(2.87582.75)
t = = 1.697
0.2569

This t test statistic is shown by the red square on the horizontal axis of the bottom graph
(corresponding to the sample mean shown by the red square on the horizontal axis of the middle
graph). Because t* follows student t distribution under the normality assumption, this p-value
can be (approximately) obtained from the t table introduced in the last lecture,

Reading along the 11 degrees of freedom row, the two numbers closest to the t statistic
test value are highlighted in blue. Reading the probabilities shown along the top row of these
two columns shows that the probability of observing these t values or more (worse) is between
1-0.9 = 0.1 and 1-0.95 = 0.05. This is the p-value for this type of null hypothesis and so the p-
value is somewhere between these limits. Its exact value can be found in Excel using

=(1-T.DIST(ABS(t*),v,TRUE))

where ABS(t*) = |t*| is the absolute value of t* and v is the degrees of freedom. Inserting the t
test statistic and v values into this formula in Excel gives

=(1-T.DIST(ABS(1.697),12-1,TRUE)) = 0.059

This probability equals the area under the distribution to the right of the test statistic in
the third graph below. Thus the p-value = 0.059 is the probability that the engineer could
receive a sample of n = 12 observations that produces a sample mean of 2.8758 or more (i.e.
or worse) when the population mean really is that described by H0 (i.e when = 2.75 or less).
That is, there is a 5.9% chance of observing sample averages that are that far above the
hypothesised true mean of = 2.75 or less. This is above the definition of implausibility (a
probability of at most = 0.01) meaning its plausible to see samples means this far from the
hypothesised true mean of 2.75 MPa. So the engineer actually getting a sample average x =
2.8758 would not be surprising to him/her and so does not provide sufficient evidence to reject
the view that the population average is 2.75 MPa or less. The engineer has observed x =
2.8758 and this is plausible (could realistically have occurred) if the population mean is 2.75
or less. The only way the engineer could then logically explain getting x = 2.8758 is to accept
the null hypothesis and reject the alternative that the true mean is above 2.75.

Put differently, the p-value = 0.059 is the probability of the null hypothesis being true,
i.e. the probability that the true population mean is 2.75 or less. As this probability is above
= 0.01, the engineer considers this as a plausible hypothesis and so the null hypothesis is
accepted.

Another criteria is based on the critical value. This is worked out from the significance
level. Simply identify 1- along the top row of the student t table and read of the corresponding
t value where this column intersects the row with 11 degrees of freedom. Denote this value by
t1-. t1- is called the critical value and in this case t1-. There is then a (100) percent
change of the t test statistic being more than t1-. So then null hypothesis is only rejected if the
test statistic is more than this critical value, i.e. if t* > t1-. t1-. can also be found in Excel
using

= T.INV(1-,v)

Using = 0.01 gives

2.72 = T.INV(1-)

This is shown in the third graph below by the vertical red line. Thus t With
t = 1.697,ttand so the engineer accepts the null hypothesis. This is because observing
*

t* = 1.697 or more has more than an = 0.01 chance of occurring (it is an plausible value) and
because t* was constructed using a population mean of 2.75, it follows logically that = 2.75
is also a plausible value.
1.8

1.6

1.4

1.2

0.8

0.6

0.4

0.2

0
1.47 1.72 1.98 2.24 2.49 2.75 3.01 3.26 3.52 3.78 4.03

1.47 1.72 1.98 2.24 2.49 2.75 3.01 3.26 3.52 3.78 4.03
6 6

Sample mean 5 5

4 4

3 3

2 2

1 1

ample means
0 0

2.38 2.45 2.53 2.60 2.68 2.75 2.82 2.90 2.97 3.05 3.12

2.38 2.45 2.53 2.60 2.68 2.75 2.82 2.90 2.97 3.05 3.12
6 6

Test statistic
ample means
Critical value(s)
5 5

4 4

3 3

2 2

0.059
1 1

0 0
-5.0

-4.0

-3.0

-2.0
-1.5
-1.0
-0.5
-4.5

-3.5

-2.5

0.5

1.5

2.5

3.5

4.5
0.0

1.0

2.0

3.0

4.0

5.0

Z or student t value
iii. A two-sided problem.

You are a engineer responsible for safety at a power generating plant and need to
replace some pipework. A manufacturer claims their 1.25Cr pipes have a hardness of 2.99 MPa.
You test n = 12 specimens supplied by the manufacturing to try and verify their claim - See
sheet Alloy Hardness in Excel file Data Examples for the results of these tests. The sample
mean is x = 2.8758 MPa and the sample standard deviation is 0.2569 MPa. To provide and
answer to the above problem the following null hypotheses:

H0: = 0 versus Ha: 0

are appropriate, where 0 = 2.99. Suppose and the engineer views an implausible outcome as
one with at most a 5% chance of happening (= 0.05). The discrepancy between the sample
data set and the null hypothesis is then measured through the t test statistic

n(x 0 )
t =
s

If x is normally distributed then t* has a student t distribution with v = n 1 = 12 1 =


11 degrees of freedom. Given the small sample size, x will only be normally distributed if
hardness itself is normally distributed. Given the null hypothesis is of the form = and because
the t distribution is symmetric around zero, the p-value is given by

p-value = P(t |t*|) + P(t -|t*|) = 2P(t |t*|)

Under this normality assumption, the third graph below shows this t distribution, where
t is shown on the bottom horizontal axis and the unstandardized sample means on the top
horizontal axis. The standardised value of x = 2.8758 is

12(2.87582.99)
t = = -1.539
0.2569

This t test statistic is shown by the red square on the horizontal axis of the bottom graph
(corresponding to the sample mean shown by the red square on the horizontal axis of the middle
graph). Because t* follows student t distribution under the normality assumption, this
probability can be (approximately) obtained from the t table introduced in the last lecture,
Reading along the 11 degrees of freedom row, the two numbers closest to the t statistic
test value are highlighted in blue. Reading the probabilities shown along the top row of these
two columns shows that the probability of observing these t values or less is between 0.1 and
0.05. With a two sided hypothesis (allowing the population mean to be either below or above
2.99) we also need to allow for the probability of t* exceeding 1.539 which given the symmetric
nature of the t distribution is also between 0.1 and 0.05. Thus the p-value is between 2*0.1 and
2*0.05, i.e. between 0.1, and 0.2 Its exact value can be found in Excel using

=2*(1- T.DIST(ABS(t*),v,TRUE))

where ABS(t*) = |t*| is the absolute value of t* and v is the degrees of freedom. Inserting the t
test statistic and v values into this formula in Excel gives

=2*(1- T.DIST(ABS(-1.62),12-1,TRUE)) = 0.152

Thus the p-value = 0.152 is the probability that you could collect and observe a sample
of n = 12 observations that produces a sample mean outside the range 2.8758 to 3.104 when
the population mean is that described by the H0 (i.e when = 2.99). This is the sum of the two
shaded areas in the third graph below. That is, there is a 15.2% chance of observing sample
averages that are this far above or below the hypothesised true mean of 2.99. This is well above
the definition of implausibility (probability of at most = 0.05) meaning its plausible to see
samples means this far or more above and below the hypothesised true mean of 2.99 MPa. So
the engineer actually getting a sample average x = 2.8758 would not be surprising to him/her
and so does not provide sufficient evidence to reject the view that the population average is
2.99 MPa. The engineer has observed x = 2.8758 and this is plausible (could have occurred)
if the population mean is 2.99. The only way the engineer could then logically explain getting
x = 2.8758 is to accept the null hypothesis and reject the alternative that the true mean is
different from 2.99 (which would make getting the sample average different from 2.8758 less
likely).
Put differently, the p-value = 15.2% is the probability of the null hypothesis being true,
i.e. the probability that the true population mean is 2.99. As this probability is above = 0.05,
the engineer considers this a plausible hypothesis and so the null hypothesis is accepted.

The same conclusion can be obtained using the critical values. For this two-sided test
first identify along the top row of the t table and then locate the t value that intersects this
column and the row with 11 degrees of freedom. Denote this value by t. tis called the
lower critical value. Next identify 1-along the top row of the t table and then locate the t
value that intersects this column and the row with 11 degrees of freedom. Denote this value by
t1-. t1-is called the upper critical value There is then a (100) percent change of the t test
statistic being more than t1-or less than tSo then null hypothesis is only rejected if the
test statistic is less than the lower critical value or more than the upper critical value, i.e. if t*
< tort* > t1-t can also be found in Excel using

=T.INV(,v)

Using = 0.05 gives

-2.20 = T.INV()

Because the student t distribution is symmetric around zero, it follows that t1-
These critical values are shown in the third graph below by the vertical red lines.
With t* = -1.539,ttand so the engineer accepts the null hypothesis. This is because
observing t* = -1.539 or less has more than an = 0.05/2 chance of occurring (it is therefore
a plausible value) and indeed observing t* = 1.539 or more has more than an = 0.05/2 chance
of occurring (an is also a plausible value). Because t* was constructed using a population mean
of 2.99, it follows logically that = 2.99 is also a plausible value. Notice the engineer looks
either side of the distribution because the alternative hypothesis states the population mean is
different from 2.99 (could be more or less than 2.99).
1.8

1.6

1.4

1.2

0.8

0.6

0.4

0.2

0
1.71 1.96 2.22 2.48 2.73 2.99 3.25 3.50 3.76 4.02 4.27

1.71 1.96 2.22 2.48 2.73 2.99 3.25 3.50 3.76 4.02 4.27
6 6

Sample mean 5 5

4 4

3 3

2 2

1 1

ample means
0 0

2.62 2.69 2.77 2.84 2.92 2.99 3.06 3.14 3.21 3.29 3.36

2.62 2.69 2.77 2.84 2.92 2.99 3.06 3.14 3.21 3.29 3.36
6 6

Test statistic
ample means
Critical value(s)
5 5

4 4

3 3

0.076 2 2

0.076

1 1

0 0
-5.0

-4.0

-3.0

-2.0
-1.5
-1.0
-0.5
-4.5

-3.5

-2.5

0.5

1.5

2.5

3.5

4.5
0.0

1.0

2.0

3.0

4.0

5.0

Z or student t value

Você também pode gostar