Probability, Statistics and Random Processes: Hypothesis Testing-1

Probability, Statistics and
Random Processes
IC 210
Hypothesis Testing-1
Reference: Introductory statistics
By Prem S. Mann available on Moodle Chapter 9
Inferential Statistics
Researchers use inferential statistics to address two

broad goals:
Estimate the value of population parameters
Hypothesis testing
Statistics:
1. Model
2. Estimation
3. Hypothesis test
X i ~ N ( , 2 ), i 1, 2, , n iid.
x ,
2 s 2
0 , 2 02
Hypothesis testing
The purpose of hypothesis testing is to determine whether there is
enough statistical evidence in favor of a certain belief about a parameter.
For Example:
A software company may claim that, on average, it cans contain 12
ounces of soda. A government agency may want to test whether or not
such cans do contain, on average, 12 ounces of soda. Here we are to
test a hypothesis about the population mean .
According to some survey 75% of the total charitable contributions in
2008 were given by individuals. An economist want to check if this
percentage is still true for this year. Here we are to test a hypothesis
about population proportion p.
Hypothesis testing
Hypothesis testing is designed to detect significant

differences: differences that did not occur by random
chance.
In the one sample case: we compare a random sample

(from a large group) to a population.
We compare a sample statistic to a population

parameter to see if there is a significant difference.
Nonstatistical Hypothesis Testing
A criminal trial is an example of hypothesis testing without the

statistics.
Based in the available evidence, the judge or jury will make one
of the two possible decisions.
1. The defendant is innocent or not guilty

2. The defendant is guilty
At the outset of the trial, the person is presumed not guilty. The
prosecutors efforts are to prove that the person has committed
the crime and, hence is guilty.

In statistics, the person is not guilty is called the Null Hypothesis.
And the person is guilty is called the alternate hypothesis.
The null hypothesis is denoted by H0:
H0: The person is not guilty
The alternative hypothesis is denoted by H1:
H1: The person is guilty
In the beginning of the trial it is assumed that the person is not
guilty. null hypothesis is usually the hypothesis that is assumed to
be true to be begin with.
6
In statistics, the null hypothesis states that a given claim

(or statement) about a population parameter is true.
Therefore, convicting the defendant is called rejecting the null
hypothesis in favor of the alternative hypothesis. That is, the
jury is saying that there is enough evidence to conclude that
the defendant is guilty (i.e., there is enough evidence to
support the alternative hypothesis).
Example soft drink

Soft drink company claim that, on average, its can contain
12 ounces of soda. In reality, this claim may not be true.

However we will initially assume that the companys claim
is true ( thats the company is not guilty of cheating and
lying).
To test that the claim of the soft-drink company, the null
hypothesis is that the companys claim is true.
H0: =12 ounces

The null hypothesis can also be written as 12 ounces, boz
companys claim will still be true.
H1: <12 ounces
8
How do we judge the plausibility of

the null hypothesis?
The sample mean should be plausible under the

sampling distribution of the mean.
p( X )
Implausible
X X X
Fairly plausible
Highly plausible
The further the observed value is from the mean of

the expected distribution, the more significant the
difference
Plausibility of the null hypothesis
The plausibility of the null hypothesis is judged by computing the

probability p of observing a sample mean that is at least as
deviant from the population mean as the value we have observed.
p( X )
p
10
Plausibility of the null hypothesis
This computation is simplified by converting to z-scores.

Under the assumption of normality, we can determine
this probability from a standard normal table.
p( z )
X
z
X
p
11
Two Types of Error (in nonstatistical example)

The person has not committed the crime but is declared
guilty. In this case, court has made an error by punishing

an innocent person. In statistics, this kind of error is
called a type I or an (alpha) error.
12
The person has committed the crime, but because of

lack of evidence, is declared not guilty. In this case, court
has committed an error by setting a guilty person free.
Two Types of Error (statistical

example)
A type I error will occur when H 0 is actually true (that is, the cans
do contain on average 12 ounces of soda. But it just happen that

we draw a sample with a mean which is much less than 12 ounces
and we wrongfully reject the null hypothesis H 0.
The value of , called the significance level of the test, represents

the probability of making a type I error . In other words, is the
probability of rejecting the null hypothesis, when in fact it is true.
= P(Ho is rejected Ho is true)
Note : the size of the rejection region depends on the value assigned
to
Two Types of Error (statistical

example)
A type II error will occur when the null hypothesis is actually false
(that is, the soda contained in all cans, on average, is less than
12 ounces), but it happens by chance that we draw a sample with
a mean that is close to or greater than 12 ounces and we
wrongfully accepted it.
The value of represents the probability of making a type II error.
It represents the probability that Ho is not rejected when Ho is
false.
= P(Ho is not rejected Ho is false)
The value of 1- is called the power of the test. It represents

14the probability of not making a type II error.
Two Types of Error

H0: Innocent
Hypothesis Test
Jury Trial
Actual Situation
Verdict
Innocent
Guilty
Actual Situation
Decision
H 0 True
Accept
Innocent
Guilty
Correct
Error
Error
Correct
Reject
H
1-
Type I
Error
False
Positive
( )
H 0 False
Type II
Error (
Power
(1 - )
False
Negative
Type I and Type II Errors

Type I error (false rejection error) the probability (equal to
) associated with rejecting a true null hypothesis.
Type II error (false acceptance error) the probability
associated with failing to reject a false null hypothesis.
Actual Situation
Researchers Decision
Null Hypothesis is True
Null Hypothesis is False
Accept the Null

Hypothesis
p (accept H 0 | H 0 true)
p (accept H0 | H0 false)
Reject the Null

Hypothesis
p (reject H 0 | H 0 true)
p (reject H0 | H0 false)
1 (power)
The two probabilities are inversely

related. Decreasing one increases the
16
other, for a fixed sample size.
Note
By rejecting H0, we are saying that the difference between
the value of stated in H0 and the value of obtained from
the sample is too large to have occurred because of the
sampling error alone. Consequently, this difference is real.
By not rejecting H0, we are saying that the difference
between the value of stated in H0 and the value of
obtained from the sample is small and it may have
occurred because of the sampling error alone.
17
Tailed Tests
Two-tailed hypothesis test A hypothesis test in which the region of

rejection falls equally within both tails of the sampling distribution .
One-tailed hypothesis test A hypothesis test in which the alternative

is stated in such a way that the probability of making a Type I error is
entirely in one tail of a sampling distribution.
Right-tailed test A one-tailed test in which the sample outcome is

hypothesized to be at the right tail of the sampling distribution.
Note Whether a test is two-tailed or one-tailed is determined by the

sign in the alternative hypothesis.
Two -Tailed Tests

Example: According to a survey conducted in 2008, a sample
of six graders in schools weighed an average of 18.4
pounds. Some magzine wants to check whether or not this
mean changed since that survey
Ho: the mean weight has not changed =18.4
H1: the mean weight has changed 18.4
Right-tailed test
Example: The average price of homes in New Jersey was
$461,216 in 2007. Suppose a real estate researcher wants to
check whether the current mean price of homes in this Town is
higher than $461,216 .
Ho: =$ 461.216
H1: >$ 461.216
20
Left-tailed test
Example: The company claims that their soft-drink cans, on
average, contain 12 ounces of soda. However, if these cans
contain less than the claimed amount of soda, then the company
can be accused of cheating. Suppose a consumer agency wants
to test whether the mean amount of soda per can is less than 12
ounces.
H0: = 12 ounces = mean is equal to 12 ounces
H1: < 12 ounces =The mean is less
than 12 ounces
21
One-tail vs. Two-tail Test
Hypothesis tests
Type I and type II errors
Type I error: H0 rejected, when H0 is true.
Type II error: H0 not rejected, when H0 is false.
Significance level: a is the probability of committing a
Type I error.
One-sided test
23
Two-sided test
/2
Example : Metal Cylinder

Production
The machine that produces metal cylinders is set to

make cylinders with a diameter of 50 mm.
The two-sided hypotheses of interest are

H0 : = 50 versus
HA : 50
where the null hypothesis states that the machine is
calibrated correctly.
Example : Car Fuel Efficiency
A manufacturer claim : its cars achieve an average of

at least 35 miles per gallon in highway driving.
The one-sided (left-tailed test) hypotheses of interest

are
H0 : 35 versus
H1 : < 35
The null hypothesis states that the manufacturers

claim regarding the fuel efficiency of its cars is correct.
Approaches to Hypothesis Testing
There are two approaches to test whether

the sample mean supports the alternative
hypothesis (H1)
The
rejection region method

The p-value method
26
The Rejection Region Method
The rejection region is a range of values such that if

the test statistic falls within that range, the null
hypothesis is rejected in favour of the alternative
hypothesis.
27
Steps in rejection region method
Construct appropriate hypotheses

Determine a test statistics to be used
Determine the critical value
Compare the test statistic with the critical value. Reject
the null hypothesis if the former is greater than the
latter.
Make an appropriate conclusion.
28
X 265
Calculating Test Statistics
For one sample tests, use Z test

statistic if population is Normal, is
known, or if sample size is large
For one sample tests, use T static if
population distribution is not known or
if sample size is small (less than 30)
x
N
sX
sx
N
X
zc
x
zc 1.80
Procedure
First we find the critical value(s) of z from the normal
distribution table for the given significance level.
Then we find the value of the test statistic z for the observed
value of the sample statistic.
Finally we compare these two values and make a decision.
Remember, if the test is one-tailed, there is only one critical
value of z, and it is obtained by using the value of which gives
the area in the left or right tail of the normal distribution curve
depending on whether the test is left-tailed or right-tailed,
respectively. However, if the test is two-tailed, there are two
critical values of z and they are obtained by using area in each
30
tail of the normal distribution curve.
Hypothesis Setups for Testing a

Mean ()
Hypothesis Setups for Testing a

Proportion (p)
Problem : A used car dealer says that the mean price of a 1995
Ford F-150 Super Cab is at least $16,500. You suspect this claim is
incorrect and find that a random sample of 14 similar vehicles has a
mean price of $15,700 and a standard deviation of $1250. Is there
enough evidence to reject the dealers claim at = 0.05?
Solution:
The claim is the mean price is at least $16,500.
Ho: $16,500 (Claim) and H1 : < $16,500
Because the test is a left-tailed test, the level of significance is 0.05.

There are d.f. = 14 1 = 13 degrees of freedom and the critical value
is t (from table )= -1.771.
The rejection region is t < -1.771. Using the t-test, the standardized
test statistic is:
x 15,700 16,500
to
2.39
s
1250
n
14
Since t0 < t, we reject

The graph shows the location of the rejection region and the standardized
test statistic, t. Because t0 is in the rejection region, you should decide to
reject the null hypothesis. There is enough evidence at the 5% level of
significance to reject the claim that the mean price of a 1995 Ford F-150
Super Cab is at least $16,500.
Example : An industrial company claims that the mean pH

level of the water in a nearby river is 6.8. You randomly
select 19 water samples and measure the pH of each. The
sample mean and standard deviation are 6.7 and 0.24
respectively.
Is there enough evidence to reject the
companys claim at = 0.05? Assume the population is
normally distributed.
The claim is the mean pH level is 6.8. So, the null and alternative
hypotheses are:
Ho: = 6.8 (Claim) and Ha : 6.8
Because the test is a two-tailed test, the level of significance is = 0.05.
There are d.f. = 19 1 = 18 degrees of freedom and the critical value is
-t = -2.101 and t = 2.101 The rejection regions are t < -2.101 and t >
2.101. Using the t-test, the standardized test statistic is:
x 6.7 6.8
to
1.82
s
0.24
n
19
The graph shows the location of the rejection region and the standardized
test statistic, t. Because t0 is not in the rejection region, you should decide
not to reject the null hypothesis. There is not enough evidence at the 5%
level of significance to reject the claim that the mean pH is 6.8.
t distribution table
Probability Values
Z statistic (obtained) The test statistic
computed by converting a sample statistic
(such as the mean) to a Z score. The
formula for obtaining Z varies from test to
test.
P value The probability associated with the
obtained value of Z.
The p-Value Approach

In this procedure, we find a probability value such that a
given null hypothesis is rejected for any (significance level)
greater than this value and it is not rejected for any less
than this
value.
In this approach, we calculate the p-value for the test,
which is defined as the smallest level of significance at
which the given null hypothesis is rejected.
Using this p-value, we state the decision. If we have a
predetermined value of , then we compare the value of p
39with and make a decision.
Probability Values
Probability Values
Alpha ( ) The level of probability at which

the null hypothesis is rejected. It is
customary to set alpha at the .05, .01, or .001
level.
Example: Normal Body Temperature

What is normal body temperature? Is it actually
37.6oC (on average)?
State the null and alternative hypotheses
H0: = 37.6oC
Ha: 37.6oC
Example Normal Body Temp

(cont)
Data: random sample of n = 18 normal body temps
37.2
36.4
36.8
36.6
38.0
37.4
37.6
37.0
37.2
38.2
36.8
37.6
37.4
36.1
38.7
36.2
37.2
37.5
Summarize data with a test statistic

Variable
n
Temperature 18
Mean
37.22
SD
0.68
SE
0.161
to P
2.38 0.029
sample mean null value x 0

to
s
standard error
n
STUDENTS t DISTRIBUTION TABLE

Degrees of
freedom
Probability (p value)
0.10
0.025
0.01
1
5
10
17
20
24
25
6.314
2.015
1.813
1.740
1.725
1.711
1.708
1.645
12.706
2.571
2.228
2.110
2.086
2.064
2.060
1.960
63.657
4.032
3.169
2.898
2.845
2.797
2.787
2.576
Example Normal Body Temp (cont)

Find the p-value
Df = n 1 = 18 1 = 17
Rejection
region
p-value = 0.029
From t Table: t17,.025= 2.11
-2.11
calculated t0 =2.38
Since t0 > t
Reject the null hypothesis
+2.11
t
t0
Example Normal Body Temp (cont)

Decide whether or not the result is statistically
significant based on the p-value
Using = 0.05 as the level of significance criterion,
the results are statistically significant because
0.029 is less than 0.05. In other words, we can reject
the null hypothesis.
Report the Conclusion

We can conclude, based on these data, that the
mean temperature in the human population
does not equal 37.6.
Exampleusing p value
1.
2.
3.
We want to see whether our data confirm a specific

hypothesis
Example: NYC Blackout Baby Boom
Data is births per day from two weeks in August 1966
Test against usual birth rate in NYC (430 births/day)
Formulate your hypotheses:
Need a Null Hypothesis and an Alternative Hypothesis
Calculate the test statistic:
Test statistic summarizes the difference between data
and your null hypothesis
Find the p-value for the test statistic:
How probable is your data if the null hypothesis is true?
Null and Alternative Hypotheses
Null Hypothesis (H0):

no effect or no change in the population
Alternative hypothesis (Ha):
real difference or real change in the population
If there is a large discrepancy between data and null
hypothesis, then we will reject the null hypothesis
NYC dataset: = mean birth rate in Aug. 1966
Null hypothesis is that blackout has no effect on birth
rate, so August 1966 should be the same as any
other month
H0: = 430 (usual birth rate for NYC)
Ha: 430
Test Statistic
The test statistic measures the difference between

the observed data and the null hypothesis
How many standard deviations is our observed
sample value from the hypothesized value?
For our birth rate dataset, the observed sample mean

is 433.6 and our hypothesized mean is 430
Assume population variance = sample variance s
p-value
p-value is the probability that we observed such an

extreme sample value if our null hypothesis is true
If null hypothesis is true, then test statistic T follows
a standard normal distribution
prob = 0.367
prob = 0.367
T = -0.342
T = 0.342
If our alternative hypothesis was one-sided

(Ha: >430), then our p-value would be 0.367
Since are alternative hypothesis was two-sided our pvalue is the sum of both tail probabilities (0.734)
Statistical Significance
Is test statistic T=0.342 statistically significant?

If the p-value is smaller than , we say the difference is
statistically significant at level
The -level is also used as a threshold for rejecting the
null hypothesis (most common = 0.05)
If the p-value < , we reject the null hypothesis that
there is no change or difference
The p-value = 0.734 for the NYC data, so we can not
reject the null hypothesis at -level of 0.05
Difference between null hypothesis and our data is not
statistically significant
Data do not support the idea that there was a
different birth rate than usual for the first two weeks
of August, 1966
Tests and Intervals
There is a close connection between confidence

intervals and two-sided hypothesis tests
100C % confidence interval is contains likely values
for a population parameter, like the pop. mean
Interval is centered around sample mean
Width of interval is a multiple of
A -level hypothesis test rejects the null hypothesis
that = 0 if the test statistic T has a p-value less
than
Tests and Intervals
If our confidence level C is equal to 1 - where is

the level of the hypothesis test, then we have the
following connection between tests and intervals:
A two-sided hypothesis test rejects the null
hypothesis ( = 0) if our hypothesized value 0
falls outside the confidence interval for
So, if we have already calculated a confidence interval

for , then we can test any hypothesized value 0 just
by whether or not 0 is in the interval!
Example: NYC blackout baby boom
Births per day from two weeks in August 1966
Difference between our sample mean and the

population mean 0 = 430 had a p-value of 0.734, so
we did not reject the null hypothesis at -level of 0.05
We could have also calculated a 100(1-) % = 95 %
confidence interval:
Since our hypothesized 0 = 430 is within our interval

of likely values, we do not reject the null hypothesis.
If hypothesis was 0 = 410, then we would reject it!
Example Hypothesis Test for Calcium
Let be the mean calcium intake for people below the

poverty line
Null hypothesis is that calcium intake for people below
poverty line is not different from RDA: 0 = 850 mg/day
Two-sided alternative hypothesis: 0 850 mg/day
To calculate test statistic, we need to know the

population standard deviation of daily calcium intake.
From previous study, we know = 188 mg
Need p-value: if 0 = 850, what is the probability we get a

sample mean as extreme (or more) than 747 ?
p-value for Calcium
We have two-sided alternative, so p-value includes standard

normal probabilities on both sides:
prob = 0.010
prob = 0.010
T = -2.32
T = 2.32
Looking up probability in table, we see that the two-sided pvalue is 0.010+0.010 = 0.02
Since the p-value is less than 0.05, we can reject the null
hypothesis
Conclusion: people below the poverty line have significantly (at a =0.05
level) lower calcium intake than the RDA
Confidence Interval for Calcium
Alternatively, we calculate a confidence interval for

the calcium intake of people below poverty line
Use confidence level 100C = 100(1-) = 95%
95% confidence level means critical value Z*=1.96
Since our hypothesized value 0 = 850 mg is not in

the 95% confidence interval, we can reject that
hypothesis right away!
Cautions about Hypothesis Tests
Statistical significance does not necessarily mean

real significance
Lack of significance does not necessarily mean that

the null hypothesis is true
If sample size is large, even very small differences can have

a low p-value
If sample size is small, there could be a real difference, but

we are not able to detect it
Many assumptions went into our hypothesis tests
Presence of outliers, low sample sizes, etc. make our

assumptions less realistic
We will try to address some of these problems next class

Probability, Statistics and Random Processes: Hypothesis Testing-1

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Probability, Statistics and Random Processes: Hypothesis Testing-1

Enviado por

Direitos autorais:

Formatos disponíveis

Probability, Statistics and

Researchers use inferential statistics to address two

Hypothesis testing is designed to detect significant

In the one sample case: we compare a random sample

We compare a sample statistic to a population

Nonstatistical Hypothesis Testing

A criminal trial is an example of hypothesis testing without the

1. The defendant is innocent or not guilty

Nonstatistical Hypothesis Testing

Nonstatistical Hypothesis Testing

In statistics, the null hypothesis states that a given claim

Example soft drink

12 ounces of soda. In reality, this claim may not be true.

H0: =12 ounces

How do we judge the plausibility of

The sample mean should be plausible under the

The further the observed value is from the mean of

Plausibility of the null hypothesis

The plausibility of the null hypothesis is judged by computing the

Plausibility of the null hypothesis

This computation is simplified by converting to z-scores.

Two Types of Error (in nonstatistical example)

guilty. In this case, court has made an error by punishing

The person has committed the crime, but because of

Two Types of Error (statistical

do contain on average 12 ounces of soda. But it just happen that

The value of , called the significance level of the test, represents

Two Types of Error (statistical

The value of 1- is called the power of the test. It represents

Two Types of Error

Type I and Type II Errors

Null Hypothesis is True

Null Hypothesis is False

Accept the Null

Reject the Null

The two probabilities are inversely

Two-tailed hypothesis test A hypothesis test in which the region of

One-tailed hypothesis test A hypothesis test in which the alternative

Right-tailed test A one-tailed test in which the sample outcome is

Note Whether a test is two-tailed or one-tailed is determined by the

Two -Tailed Tests

One-tail vs. Two-tail Test

Example : Metal Cylinder

The machine that produces metal cylinders is set to

The two-sided hypotheses of interest are

Example : Car Fuel Efficiency

A manufacturer claim : its cars achieve an average of

The one-sided (left-tailed test) hypotheses of interest

The null hypothesis states that the manufacturers

Approaches to Hypothesis Testing

There are two approaches to test whether

rejection region method

The Rejection Region Method

The rejection region is a range of values such that if

Steps in rejection region method

Construct appropriate hypotheses

For one sample tests, use Z test

Hypothesis Setups for Testing a

Hypothesis Setups for Testing a

Because the test is a left-tailed test, the level of significance is 0.05.

Since t0 < t, we reject

Example : An industrial company claims that the mean pH

The p-Value Approach