Você está na página 1de 40

Chapter 11

Experimental Design and


Analysis of Variance

McGraw-Hill/Irwin

Copyright 2011 by The McGraw-Hill Companies, Inc. All rights reserved.

Experimental Design and


Analysis of Variance
11.1 Basic Concepts of Experimental Design
11.2 One-Way Analysis of Variance

11-2

11.1 Basic Concepts of


Experimental Design

We have considered only one way of


collecting and comparing data:

Using independent random samples

Often data is collected as the result of an


experiment

To systematically study how one or more factors


(variables) influence the variable that is being
studied

11-3

Experimental Design

In an experiment, there is strict control over the factors


(independent variables) contributing to the experiment
The values or levels of the factors are called treatments
The objective is to compare and estimate the effects of
different treatments on the response variable

The different treatments are assigned to objects (the


test subjects) called experimental units

When a treatment is applied to more than one


experimental unit, the treatment is being replicated

Independent Variable - Qulaitatitve


Response Variable - Quantitative
Fixed for Anova

- Sometimes called as Factor


Treatments - possible values of Quali (Independent Variable
Experimental unit - source of response

11-4

Experimental Design

A designed experiment is an experiment where


the analyst controls which treatments are used
and how they are applied to the experimental
units
Example: An oil company wishes to study how
three different gasoline types (A, B, and C) affect
the mileage of a midsized car.
Qualitative/Independent Variable

Gasoline
Response Variable: Mileage
Treatments: Gasoline Type (A, B, and C)
Experimental Units: Midsized Cars

11-5

Experimental Design

In a completely randomized experimental design,


independent random samples are assigned to each
of the treatments

For example, suppose three experimental units are to be


assigned to five treatments
For completely randomized experimental design, randomly
pick three experimental units for one treatment, randomly
pick three different experimental units from those
remaining for the next treatment, and so on

11-6

Experimental Design

Once the experimental units are assigned


and the experiment is performed, a value of
the response variable is observed for each
experimental unit

Obtain a sample of values for the response


variable for each treatment

11-7

Example: Battery Testing

Suppose you wish to determine which of


three brands of AA battery (Energizer,
Eveready, and Tiger) lasts the longest when
used in a remote controlled car. You have 30
cars, so you assign 10 to each battery brand.
Determine the following:
Independent Variable: Battery brands

Response Variable How long will the battery last. (Battery life)
Treatment Energizer, Eveready, Tiger
remote controlled card
Experimental Unit

Gasoline Mileage Case

North American Oil Company is attempting to


develop a reasonably priced gasoline that will
deliver improved gasoline mileages. As part
of its development process, the company
would like to compare the effects of three
types of gasoline (A, B and C) on gasoline
mileage. To test the three types of gasoline,
the company assigned 5 cars for each type of
gasoline and measured the mileages.

11.2 One-Way Analysis of


Variance

Single - Factor Anova (Correlation of response variable to only one


independent variable)

Objective is to estimate and compare the effects of the


different treatments on the response variable.
Given p treatments on a response variable, we try to
estimate the differences between the means i of each
treatment.

11-10

ANOVA

Want to study the effects of all p treatments on a


response variable

For each treatment, find the mean and standard deviation


of all possible values of the response variable when using
that treatment
For treatment i, find treatment mean i

One-way analysis of variance estimates and


compares the effects of the different treatments on
the response variable

By estimating and comparing the treatment means 1, 2,


, p
One-way analysis of variance, or one-way ANOVA

11-11

ANOVA Notation

p is the total number of treatments


i is the representation of a treatment (ex: A, B, C)
ni denotes the size of the sample randomly selected
for treatment i
xij is the jth value of the response variable using
treatment i
i is the average of the sample of ni values for
treatment i

i is the point estimate of the treatment mean i

si is the standard deviation of the sample of ni


values for treatment i

si is the point estimate for the treatment (population)


standard deviation i
11-12

Gasoline Mileage Case

p=3
nA = nB = nC = 5
Type A
xA1=34.0
xA2=35.0
xA3=34.3
xA4=35.5
xA5=35.8

i = A, B, C

Type B
xB1=35.3
xB2=36.5
xB3=36.4
xB4=37.0
xB5=37.6

Type C
xC1=33.3
xC2=34.0
xC3=34.7
xC4=33.0
xC5=34.9

Gasoline Mileage Case

The mean of a sample is the point


estimate for the corresponding
treatment mean
A = 34.92 mpg estimates A
B = 36.56 mpg estimates B
C = 33.98 mpg estimates C

Gasoline Mileage Case

Gasoline Mileage Case

The standard deviation of a sample is the


point estimate for the corresponding
treatment standard estimates
sA = 0.7662 mpg estimates A
sB = 0.8503 mpg estimates B
sC = 0.8349 mpg estimates C

One-Way ANOVA
Assumptions
1.

Completely randomized experimental design

2.

Assume that a sample has been selected


randomly for each of the p treatments on the
response variable using a completely randomized
experimental design

Constant variance

The p populations of values of the response


variable (associated with the p treatments) all
have the same variance

11-17

One-Way ANOVA
Assumptions
3.

Normality

4.

The p populations of values of the response


variable all have normal distributions

Independence

The samples of experimental units are randomly


selected, independent samples

Ho: M1 = M2 = .... = Mp
Ha: At least two of M1, M2 ... , Mp differ
If Ha is proven then there is a correlation.
Correlation: if one changes, there's a causation of changes in the other variable.

11-18

One-Way ANOVA
Assumptions

To make sure that unequal variances will not


be a problem:

Take the same sample size per treatment


Check the different sample standard deviations
General Rule: The one-way ANOVA results will
be approximately correct if the largest sample
standard deviation is no more than twice the
smallest sample standard deviation.

Gasoline Mileage Case

The standard deviation of a sample is the


point estimate for the corresponding
treatment standard estimates
sA = 0.7662 mpg estimates A
sB = 0.8503 mpg estimates B
sC = 0.8349 mpg estimates C

Testing for Significant Differences


Between Treatment Means

Are there any statistically significant differences


between the sample (treatment) means?
The null hypothesis is that the mean of all p
treatments are the same

H0: 1 = 2 = = p

The alternative is that some (or all, but at least two)


of the p treatments have different effects on the
mean response

Ha: at least two of 1, 2 , , p differ

11-21

Testing for Significant Differences


Between Treatment Means

Compare the between-treatment variability


to the within-treatment variability

Between-treatment variability is the variability of


the sample means from sample to sample

Ex: Variability between A, B, C

Within-treatment variability is the variability of the


treatments (that is, the values) within each sample

Ex: Variability between A and xA1, xA2,, xA5

11-22

Comparing Between-Treatment
Variability and Within-Treatment
Variability

11-23

Partitioning the Total Variability


in the Response
Total
Variability
Total Sum of
Squares

= Between
+ Within
Treatment
Treatment
Variability
Variability
= Treatment Sum of + Error Sum of
Squares
Squares

SSTO

= SST

x
p

ni

i 1 j 1

ij

x
2

+ SSE

n x x

i 1

x
p

ni

i 1 j 1

ij

xi

11-24

Mean Squares

The treatment mean-squares is

SST
MST
p 1

The error mean-squares is

SSE
MSE
n p
11-25

Gasoline Mileage Case


p

SST ni xi x n A x A x nB xB x nC xC x
2

i 1

534.92 35.153 536.56 35.153 533.98 35.153


2

17.0493
SSE xij xi x Aj x A xBj xB xCj xC
p

ni

i 1 j 1

8.028

nA

j 1

nB

j 1

nC

j 1

F Test for Difference Between


Treatment Means

Suppose that we want to compare p


treatment means
The null hypothesis is that all treatment
means are the same:

H0: 1 = 2 = = p

The alternative hypothesis is that they are not


all the same:

Ha: at least two of 1, 2 , , p differ

11-27

F Test for Difference Between


Treatment Means

Define the F statistic:


SST

MST

p 1
F=

SSE
MSE
n p

The p-value is the area under the F curve to


the right of F, where the F curve has p 1
numerator and n p denominator degrees of
freedom
The critical values are based on the f-test

11-28

F Test for Difference Between


Treatment Means
Reject H0 in favor of Ha at the a level of
significance if
F > Fa , or if

p-value < a
Fa is based on p 1
numerator and n p
denominator degrees
of freedom

11-29

Gasoline Mileage Case

Computing for the F statistic


SST

17.0493

MST

p 1

3 1
F=

12.74
SSE
8.028
MSE
n p
15 3

To test H0 at a = 0.05, we use F0.05 with

Numerator: p 1 = 3 1 = 2
Denominator: n p = 15 3 = 12
F0.05 = 3.89

Since F = 12.74 > F0.05 = 3.89, we reject H0

Excel Output: ANOVA Test


Anova: Single
Factor
Compute for the standard deviation and check the general rule

SUMMARY

Groups

Count

Sum

Average Variance

Type A

174.6

34.92

0.587

Type B

182.8

36.56

0.723

Type C

169.9

33.98

0.697

Excel Output: ANOVA Test


ANOVA
Source of
Variation

SS

df

MS

P-value F crit
Alpha at 0.05

Between Groups

Within Groups

17.0493

2 8.5246 12.7424 0.0011 3.8853

8.028 12

0.669

Interpretaion: We have a very strong evidence that there is a correlation between the gasoline type
and the mileage.

Total

25.07733 14

F Test for Difference Between


Treatment Means

From the F test, we can conclude that at


least two of the treatment means differ. But
how do we know which ones differ?
We compare two means at a time. (Pairwise
Comparison)

Pairwise Comparisons,
Individual Intervals
Tukey simultaneous 100(1 - a)%
confidence interval for i h:

xi xh q

MSE
m

qa is the upper a percentage point of


the studentized range for p and (n p)
from Table A.9
m denotes common sample size

Pairwise Comparisons,
Individual Intervals

If the sample sizes of the two treatment


means are unequal:

1 1
q
xi xh
MSE
2
ni nh

Confidence Intervals for


Treatment Means

A point estimate of the treatment mean is the


sample mean of a treatment
We can also make a confidence interval for
each treatment with a confidence level of (1a)

MSE
xi ta / 2

n
i

Hypothesis Testing Between


Treatment Means

Ho: i - h = 0
Ha: i -h 0
This test tells us whether the two treatment
means are equal or different.

xi xh
1 1
MSE
ni nh

Hypothesis Testing Between


Treatment Means

qa
2

r=p,v=np

Critical Value =

Rejection Rule: If the test statistic is greater


than the critical value, reject Ho.
If we reject Ho, this means that the two
treatment means are not equal.

Hypothesis Testing Between


Treatment Means
Tukey simultaneous comparison t-values (d.f. = 12)
Type C

Type A

Type B

33.98

34.92

36.56

Type C

33.98

Type A

34.92

1.82

Type B

36.56

4.99

3.17

critical values for experimentwise error rate:


0.05
2.67
0.01
3.56

Hypothesis Testing Between


Treatment Means
p-values for pairwise t-tests
Type C

Type A

Type B

33.98

34.92

36.56

Type C 33.98

Type A 34.92

.0942

Type B 36.56

.0003

.0081

Você também pode gostar