Você está na página 1de 58

Probability, Statistics and

Random Processes
IC 210

Hypothesis Testing-1
Reference: Introductory statistics
By Prem S. Mann available on Moodle Chapter 9

Inferential Statistics

Researchers use inferential statistics to address two


broad goals:
Estimate the value of population parameters
Hypothesis testing

Statistics:
1. Model
2. Estimation
3. Hypothesis test

X i ~ N ( , 2 ), i 1, 2, , n iid.

x ,

2 s 2

0 , 2 02

Hypothesis testing
The purpose of hypothesis testing is to determine whether there is
enough statistical evidence in favor of a certain belief about a parameter.
For Example:
A software company may claim that, on average, it cans contain 12
ounces of soda. A government agency may want to test whether or not
such cans do contain, on average, 12 ounces of soda. Here we are to
test a hypothesis about the population mean .
According to some survey 75% of the total charitable contributions in
2008 were given by individuals. An economist want to check if this
percentage is still true for this year. Here we are to test a hypothesis
about population proportion p.

Hypothesis testing

Hypothesis testing is designed to detect significant


differences: differences that did not occur by random
chance.

In the one sample case: we compare a random sample


(from a large group) to a population.

We compare a sample statistic to a population


parameter to see if there is a significant difference.

Nonstatistical Hypothesis Testing

A criminal trial is an example of hypothesis testing without the


statistics.
Based in the available evidence, the judge or jury will make one
of the two possible decisions.

1. The defendant is innocent or not guilty


2. The defendant is guilty
At the outset of the trial, the person is presumed not guilty. The
prosecutors efforts are to prove that the person has committed
the crime and, hence is guilty.

Nonstatistical Hypothesis Testing


In statistics, the person is not guilty is called the Null Hypothesis.
And the person is guilty is called the alternate hypothesis.
The null hypothesis is denoted by H0:
H0: The person is not guilty
The alternative hypothesis is denoted by H1:
H1: The person is guilty
In the beginning of the trial it is assumed that the person is not
guilty. null hypothesis is usually the hypothesis that is assumed to
be true to be begin with.
6

Nonstatistical Hypothesis Testing

In statistics, the null hypothesis states that a given claim


(or statement) about a population parameter is true.
Therefore, convicting the defendant is called rejecting the null
hypothesis in favor of the alternative hypothesis. That is, the
jury is saying that there is enough evidence to conclude that
the defendant is guilty (i.e., there is enough evidence to
support the alternative hypothesis).

Example soft drink


Soft drink company claim that, on average, its can contain

12 ounces of soda. In reality, this claim may not be true.


However we will initially assume that the companys claim
is true ( thats the company is not guilty of cheating and
lying).
To test that the claim of the soft-drink company, the null
hypothesis is that the companys claim is true.

H0: =12 ounces


The null hypothesis can also be written as 12 ounces, boz
companys claim will still be true.
H1: <12 ounces
8

How do we judge the plausibility of


the null hypothesis?

The sample mean should be plausible under the


sampling distribution of the mean.

p( X )

Implausible

X X X

Fairly plausible
Highly plausible

The further the observed value is from the mean of


the expected distribution, the more significant the
difference

Plausibility of the null hypothesis

The plausibility of the null hypothesis is judged by computing the


probability p of observing a sample mean that is at least as
deviant from the population mean as the value we have observed.

p( X )

p
10

Plausibility of the null hypothesis

This computation is simplified by converting to z-scores.


Under the assumption of normality, we can determine
this probability from a standard normal table.

p( z )

X
z
X

p
11

Two Types of Error (in nonstatistical example)


The person has not committed the crime but is declared

guilty. In this case, court has made an error by punishing


an innocent person. In statistics, this kind of error is
called a type I or an (alpha) error.

12

The person has committed the crime, but because of


lack of evidence, is declared not guilty. In this case, court
has committed an error by setting a guilty person free.

Two Types of Error (statistical


example)
A type I error will occur when H 0 is actually true (that is, the cans

do contain on average 12 ounces of soda. But it just happen that


we draw a sample with a mean which is much less than 12 ounces
and we wrongfully reject the null hypothesis H 0.

The value of , called the significance level of the test, represents


the probability of making a type I error . In other words, is the
probability of rejecting the null hypothesis, when in fact it is true.
= P(Ho is rejected Ho is true)

Note : the size of the rejection region depends on the value assigned
to

Two Types of Error (statistical


example)
A type II error will occur when the null hypothesis is actually false

(that is, the soda contained in all cans, on average, is less than
12 ounces), but it happens by chance that we draw a sample with
a mean that is close to or greater than 12 ounces and we
wrongfully accepted it.
The value of represents the probability of making a type II error.
It represents the probability that Ho is not rejected when Ho is
false.
= P(Ho is not rejected Ho is false)

The value of 1- is called the power of the test. It represents


14the probability of not making a type II error.

Two Types of Error


H0: Innocent
Hypothesis Test

Jury Trial
Actual Situation
Verdict

Innocent

Guilty

Actual Situation
Decision

H 0 True

Accept
Innocent

Guilty

Correct

Error

Error

Correct

Reject
H

1-
Type I
Error

False
Positive

( )

H 0 False
Type II
Error (

Power
(1 - )

False
Negative

Type I and Type II Errors


Type I error (false rejection error) the probability (equal to
) associated with rejecting a true null hypothesis.
Type II error (false acceptance error) the probability
associated with failing to reject a false null hypothesis.
Actual Situation
Researchers Decision

Null Hypothesis is True

Null Hypothesis is False

Accept the Null


Hypothesis

p (accept H 0 | H 0 true)

p (accept H0 | H0 false)

Reject the Null


Hypothesis

p (reject H 0 | H 0 true)

p (reject H0 | H0 false)

1 (power)

The two probabilities are inversely


related. Decreasing one increases the
16
other, for a fixed sample size.

Note
By rejecting H0, we are saying that the difference between
the value of stated in H0 and the value of obtained from
the sample is too large to have occurred because of the
sampling error alone. Consequently, this difference is real.
By not rejecting H0, we are saying that the difference
between the value of stated in H0 and the value of
obtained from the sample is small and it may have
occurred because of the sampling error alone.

17

Tailed Tests

Two-tailed hypothesis test A hypothesis test in which the region of


rejection falls equally within both tails of the sampling distribution .

One-tailed hypothesis test A hypothesis test in which the alternative


is stated in such a way that the probability of making a Type I error is
entirely in one tail of a sampling distribution.

Right-tailed test A one-tailed test in which the sample outcome is


hypothesized to be at the right tail of the sampling distribution.

Note Whether a test is two-tailed or one-tailed is determined by the


sign in the alternative hypothesis.

Two -Tailed Tests


Example: According to a survey conducted in 2008, a sample
of six graders in schools weighed an average of 18.4
pounds. Some magzine wants to check whether or not this
mean changed since that survey
Ho: the mean weight has not changed =18.4
H1: the mean weight has changed 18.4

Right-tailed test
Example: The average price of homes in New Jersey was
$461,216 in 2007. Suppose a real estate researcher wants to
check whether the current mean price of homes in this Town is
higher than $461,216 .
Ho: =$ 461.216
H1: >$ 461.216

20

Left-tailed test
Example: The company claims that their soft-drink cans, on
average, contain 12 ounces of soda. However, if these cans
contain less than the claimed amount of soda, then the company
can be accused of cheating. Suppose a consumer agency wants
to test whether the mean amount of soda per can is less than 12
ounces.
H0: = 12 ounces = mean is equal to 12 ounces
H1: < 12 ounces =The mean is less
than 12 ounces
21

One-tail vs. Two-tail Test

Hypothesis tests
Type I and type II errors
Type I error: H0 rejected, when H0 is true.
Type II error: H0 not rejected, when H0 is false.
Significance level: a is the probability of committing a
Type I error.
One-sided test

23

Two-sided test

/2

Example : Metal Cylinder


Production

The machine that produces metal cylinders is set to


make cylinders with a diameter of 50 mm.

The two-sided hypotheses of interest are


H0 : = 50 versus
HA : 50
where the null hypothesis states that the machine is
calibrated correctly.

Example : Car Fuel Efficiency

A manufacturer claim : its cars achieve an average of


at least 35 miles per gallon in highway driving.

The one-sided (left-tailed test) hypotheses of interest


are
H0 : 35 versus
H1 : < 35

The null hypothesis states that the manufacturers


claim regarding the fuel efficiency of its cars is correct.

Approaches to Hypothesis Testing

There are two approaches to test whether


the sample mean supports the alternative
hypothesis (H1)
The

rejection region method


The p-value method

26

The Rejection Region Method

The rejection region is a range of values such that if


the test statistic falls within that range, the null
hypothesis is rejected in favour of the alternative
hypothesis.

27

Steps in rejection region method

Construct appropriate hypotheses


Determine a test statistics to be used
Determine the critical value
Compare the test statistic with the critical value. Reject
the null hypothesis if the former is greater than the
latter.
Make an appropriate conclusion.

28

X 265
Calculating Test Statistics

For one sample tests, use Z test


statistic if population is Normal, is
known, or if sample size is large
For one sample tests, use T static if
population distribution is not known or
if sample size is small (less than 30)

x
N

sX
sx
N

X
zc
x
zc 1.80

Procedure
First we find the critical value(s) of z from the normal
distribution table for the given significance level.
Then we find the value of the test statistic z for the observed
value of the sample statistic.
Finally we compare these two values and make a decision.
Remember, if the test is one-tailed, there is only one critical
value of z, and it is obtained by using the value of which gives
the area in the left or right tail of the normal distribution curve
depending on whether the test is left-tailed or right-tailed,
respectively. However, if the test is two-tailed, there are two
critical values of z and they are obtained by using area in each
30
tail of the normal distribution curve.

Hypothesis Setups for Testing a


Mean ()

Hypothesis Setups for Testing a


Proportion (p)

Problem : A used car dealer says that the mean price of a 1995
Ford F-150 Super Cab is at least $16,500. You suspect this claim is
incorrect and find that a random sample of 14 similar vehicles has a
mean price of $15,700 and a standard deviation of $1250. Is there
enough evidence to reject the dealers claim at = 0.05?

Solution:
The claim is the mean price is at least $16,500.
Ho: $16,500 (Claim) and H1 : < $16,500

Because the test is a left-tailed test, the level of significance is 0.05.


There are d.f. = 14 1 = 13 degrees of freedom and the critical value
is t (from table )= -1.771.
The rejection region is t < -1.771. Using the t-test, the standardized
test statistic is:
x 15,700 16,500
to

2.39
s
1250
n
14

Since t0 < t, we reject


The graph shows the location of the rejection region and the standardized
test statistic, t. Because t0 is in the rejection region, you should decide to
reject the null hypothesis. There is enough evidence at the 5% level of
significance to reject the claim that the mean price of a 1995 Ford F-150
Super Cab is at least $16,500.

Example : An industrial company claims that the mean pH


level of the water in a nearby river is 6.8. You randomly
select 19 water samples and measure the pH of each. The
sample mean and standard deviation are 6.7 and 0.24
respectively.
Is there enough evidence to reject the
companys claim at = 0.05? Assume the population is
normally distributed.

The claim is the mean pH level is 6.8. So, the null and alternative
hypotheses are:
Ho: = 6.8 (Claim) and Ha : 6.8
Because the test is a two-tailed test, the level of significance is = 0.05.
There are d.f. = 19 1 = 18 degrees of freedom and the critical value is
-t = -2.101 and t = 2.101 The rejection regions are t < -2.101 and t >
2.101. Using the t-test, the standardized test statistic is:

x 6.7 6.8
to

1.82
s
0.24
n
19
The graph shows the location of the rejection region and the standardized
test statistic, t. Because t0 is not in the rejection region, you should decide
not to reject the null hypothesis. There is not enough evidence at the 5%
level of significance to reject the claim that the mean pH is 6.8.

t distribution table

Probability Values
Z statistic (obtained) The test statistic
computed by converting a sample statistic
(such as the mean) to a Z score. The
formula for obtaining Z varies from test to
test.
P value The probability associated with the
obtained value of Z.

The p-Value Approach


In this procedure, we find a probability value such that a
given null hypothesis is rejected for any (significance level)
greater than this value and it is not rejected for any less
than this
value.
In this approach, we calculate the p-value for the test,
which is defined as the smallest level of significance at
which the given null hypothesis is rejected.
Using this p-value, we state the decision. If we have a
predetermined value of , then we compare the value of p
39with and make a decision.

Probability Values

Probability Values

Alpha ( ) The level of probability at which


the null hypothesis is rejected. It is
customary to set alpha at the .05, .01, or .001
level.

Example: Normal Body Temperature


What is normal body temperature? Is it actually
37.6oC (on average)?
State the null and alternative hypotheses
H0: = 37.6oC
Ha: 37.6oC

Example Normal Body Temp


(cont)
Data: random sample of n = 18 normal body temps
37.2
36.4

36.8
36.6

38.0
37.4

37.6
37.0

37.2
38.2

36.8
37.6

37.4
36.1

38.7
36.2

37.2
37.5

Summarize data with a test statistic


Variable
n
Temperature 18

Mean
37.22

SD
0.68

SE
0.161

to P
2.38 0.029

sample mean null value x 0


to

s
standard error
n

STUDENTS t DISTRIBUTION TABLE


Degrees of
freedom

Probability (p value)
0.10
0.025
0.01

1
5
10
17
20
24
25

6.314
2.015
1.813
1.740
1.725
1.711
1.708
1.645

12.706
2.571
2.228
2.110
2.086
2.064
2.060
1.960

63.657
4.032
3.169
2.898
2.845
2.797
2.787
2.576

Example Normal Body Temp (cont)


Find the p-value
Df = n 1 = 18 1 = 17
Rejection
region

p-value = 0.029
From t Table: t17,.025= 2.11
-2.11

calculated t0 =2.38

Since t0 > t
Reject the null hypothesis

+2.11
t

t0

Example Normal Body Temp (cont)


Decide whether or not the result is statistically
significant based on the p-value
Using = 0.05 as the level of significance criterion,
the results are statistically significant because
0.029 is less than 0.05. In other words, we can reject
the null hypothesis.

Report the Conclusion


We can conclude, based on these data, that the
mean temperature in the human population
does not equal 37.6.

Exampleusing p value

1.

2.

3.

We want to see whether our data confirm a specific


hypothesis
Example: NYC Blackout Baby Boom
Data is births per day from two weeks in August 1966
Test against usual birth rate in NYC (430 births/day)
Formulate your hypotheses:
Need a Null Hypothesis and an Alternative Hypothesis
Calculate the test statistic:
Test statistic summarizes the difference between data
and your null hypothesis
Find the p-value for the test statistic:
How probable is your data if the null hypothesis is true?

Null and Alternative Hypotheses

Null Hypothesis (H0):


no effect or no change in the population
Alternative hypothesis (Ha):
real difference or real change in the population
If there is a large discrepancy between data and null
hypothesis, then we will reject the null hypothesis
NYC dataset: = mean birth rate in Aug. 1966
Null hypothesis is that blackout has no effect on birth
rate, so August 1966 should be the same as any
other month
H0: = 430 (usual birth rate for NYC)
Ha: 430

Test Statistic

The test statistic measures the difference between


the observed data and the null hypothesis
How many standard deviations is our observed
sample value from the hypothesized value?

For our birth rate dataset, the observed sample mean


is 433.6 and our hypothesized mean is 430

Assume population variance = sample variance s

p-value

p-value is the probability that we observed such an


extreme sample value if our null hypothesis is true
If null hypothesis is true, then test statistic T follows
a standard normal distribution

prob = 0.367

prob = 0.367
T = -0.342

T = 0.342

If our alternative hypothesis was one-sided


(Ha: >430), then our p-value would be 0.367

Since are alternative hypothesis was two-sided our pvalue is the sum of both tail probabilities (0.734)

Statistical Significance

Is test statistic T=0.342 statistically significant?


If the p-value is smaller than , we say the difference is
statistically significant at level
The -level is also used as a threshold for rejecting the
null hypothesis (most common = 0.05)
If the p-value < , we reject the null hypothesis that
there is no change or difference
The p-value = 0.734 for the NYC data, so we can not
reject the null hypothesis at -level of 0.05
Difference between null hypothesis and our data is not
statistically significant
Data do not support the idea that there was a
different birth rate than usual for the first two weeks
of August, 1966

Tests and Intervals

There is a close connection between confidence


intervals and two-sided hypothesis tests
100C % confidence interval is contains likely values
for a population parameter, like the pop. mean
Interval is centered around sample mean
Width of interval is a multiple of
A -level hypothesis test rejects the null hypothesis
that = 0 if the test statistic T has a p-value less
than

Tests and Intervals

If our confidence level C is equal to 1 - where is


the level of the hypothesis test, then we have the
following connection between tests and intervals:
A two-sided hypothesis test rejects the null
hypothesis ( = 0) if our hypothesized value 0
falls outside the confidence interval for

So, if we have already calculated a confidence interval


for , then we can test any hypothesized value 0 just
by whether or not 0 is in the interval!

Example: NYC blackout baby boom

Births per day from two weeks in August 1966

Difference between our sample mean and the


population mean 0 = 430 had a p-value of 0.734, so
we did not reject the null hypothesis at -level of 0.05
We could have also calculated a 100(1-) % = 95 %
confidence interval:

Since our hypothesized 0 = 430 is within our interval


of likely values, we do not reject the null hypothesis.
If hypothesis was 0 = 410, then we would reject it!

Example Hypothesis Test for Calcium

Let be the mean calcium intake for people below the


poverty line
Null hypothesis is that calcium intake for people below
poverty line is not different from RDA: 0 = 850 mg/day

Two-sided alternative hypothesis: 0 850 mg/day

To calculate test statistic, we need to know the


population standard deviation of daily calcium intake.
From previous study, we know = 188 mg

Need p-value: if 0 = 850, what is the probability we get a


sample mean as extreme (or more) than 747 ?

p-value for Calcium

We have two-sided alternative, so p-value includes standard


normal probabilities on both sides:

prob = 0.010

prob = 0.010
T = -2.32

T = 2.32

Looking up probability in table, we see that the two-sided pvalue is 0.010+0.010 = 0.02
Since the p-value is less than 0.05, we can reject the null
hypothesis

Conclusion: people below the poverty line have significantly (at a =0.05
level) lower calcium intake than the RDA

Confidence Interval for Calcium

Alternatively, we calculate a confidence interval for


the calcium intake of people below poverty line
Use confidence level 100C = 100(1-) = 95%
95% confidence level means critical value Z*=1.96

Since our hypothesized value 0 = 850 mg is not in


the 95% confidence interval, we can reject that
hypothesis right away!

Cautions about Hypothesis Tests

Statistical significance does not necessarily mean


real significance

Lack of significance does not necessarily mean that


the null hypothesis is true

If sample size is large, even very small differences can have


a low p-value

If sample size is small, there could be a real difference, but


we are not able to detect it

Many assumptions went into our hypothesis tests

Presence of outliers, low sample sizes, etc. make our


assumptions less realistic
We will try to address some of these problems next class

Você também pode gostar