ANOVA1

Statistics Micro Mini
Multi-factor ANOVA
January 5-9, 2008 Beth Ayers
January 7, 2009 morning session
Thursday Sessions
ANOVA
One-way ANOVA Two-way ANOVA ANCOVA With-in subject Between subject Repeated measures MANOVA etc.
What is ANOVA?
ANalysis Of VAriance
Partitions the observed variance based on explanatory variables Compare partitions to test significance of explanatory variables
Some Terminology
Between subjects design each subject participates in one and only one group Within subjects design the same group of subjects serves in more than one treatment
Subject is now a factor
Mixed design a study which has both between and within subject factors Repeated measures general term for any study in which multiple measurements are measured on the same subject
Can be either multiple treatments or several measurements over time
4
ANOVA
Use variances and variance like quantities to study the equality or nonequality of population means So, although it is analysis of variance we are actually analyzing means, not variances There are other methods which analyze the variances between groups
ANOVA
Typical exploratory analysis includes
Tabulation of the number of subjects in each experimental group Side-by-side box plots Statistics about each group
At least mean and standard deviation, can include 5-number summary and information on skewness
Table of means for each experimental group
Notation
If we have k groups, denote the means of the groups as:
1, 2, . . ., k
Student i in group j has observation

yij = j + ij Where ij are independent, distributed N(0,2) Can combine this and say subjects from group j have distribution N(j,2)
With random assignment, the sample mean for any treatment group is representative of the population mean for that group
7
Assumptions
1. The errors ij are normally distributed 2. Across the conditions, the errors have equal spread. Often referred to as equal variances.
Rule of thumb: the assumption is met if the largest variance is less than twice the smallest variance If unequal variances need to make a correction!! This is usually /2.
3. The errors are independent from each other

8
Checking the assumptions

Use the residuals, which are the estimates of ij
1. Look at normal probability plot 2. Look at residual versus fitted plot 3. Hard to check, often assumed from study design
For mild violations of the assumptions, there are options for correction
When the assumptions are not met the p-value is simply wrong!!
9
One-way ANOVA
One-way ANOVA is used when
Only testing the affect of one explanatory variable Each subject has only one treatment or condition
Thus a between-subjects design
Used to test for differences among two or more independent groups

Gives the same results as two-sample Ttest if explanatory variable has 2 levels
10
Hypothesis Testing
H0: 1 = 2 = . . . = k H1: the s are not all equal The alternative hypothesis H1: 1 k is wrong! The null hypothesis is called the overall null and is the hypothesis tested by ANOVA If the overall null is rejected, must do more specific hypothesis testing to determine which means are different, often referred to as contrasts
11
Terminology
The sample variance is the sum of the squared deviations from the mean divided by the degrees of freedom 2
(x x)
i
N 1
A mean square (MS) is a variance like quantity calculated as a SS/df
SS MS df
12
One-way ANOVA
In one-way ANOVA we work with two mean square quantities
MSwithin the mean square within-groups MSbetween the mean square between-groups
MSwithin
SS within df within
MSbetween
SS between df between
13
Within vs. Between
14
One-way ANOVA
For each individual group we have
SS i df i
2 ( x x ) j 1 ij i ni
ni 1
So the estimate of MSwithin is
MSwithin
SSi SSwithin i 1 k df within (ni 1) N k i 1

k
And the estimate of MSbetween is
SS MSbetween between df between

2 n ( x x ) i i 1 i
k 1
15
Mean Squares
What do these values mean? MSwithin is considered a true estimate of 2 that is unaffected by whether the null or alternative hypothesis is true MSbetween is considered a good estimate of 2 only when the null hypothesis is true
If the alternative is true, values of MSbetween tend to be inflated
Thus, we can look at the ratio of the two mean square values to evaluate the null hypothesis
16
Testing the Hypothesis

The F-test looks at the variation among the group means relative to the variation within the sample
MS between dfbetween (k 1) F SS within SS within MS within df within (N k)

The F-statistic tends to be larger if the alternative hypothesis is true than if the null hypothesis is true The test statistic F has an F(k-1, N-k) distribution
SSbetween
SSbetween
17
What does the F ratio tell us?

F = MSbetween / MSwithin The denominator is always an estimate of 2 (under both the null and alternative hypotheses) The numerator is either another estimate of 2 (under the null) or is inflated (under the alternative) If the null is true, values of F are close to 1 If the alternative is true, values of F are larger
Large values of F depend on the degrees of freedom

18
The ANOVA table

When running an ANOVA, statistical packages will return an ANOVA table summarizing the SS, MS, df, F-statistic, and p-value SS Group (Treatment, between) Residual (Error, within) Total SSbetween Df dfbetween MS MSbetween F
_________________
Sig P-value
MSbetween MSwithin
SSwithin
dfwithin
MSwithin
SSbetween dfbetween + SSwithin + dfwithin
19
Example
Suppose we want to know if typing speed varies across majors
Use 4 majors Biology, Business, English, and Mathematics

H0: typing speed is the same for students of all majors
H0: Bio = Business = Eng = Math
H1: typing speed varies across the majors

H1: at least one of the means is different
20
Box plots
21
Summary
The largest variance is less than twice the smallest variance (38.8 < 2 20.1 = 40.2). Use = 0.05.
Major Biology Business English Mathematics ni 25 25 25 25 Mean 45.3 47.6 55.6 45.1 Variance 24.7 25.4 38.8 20.1
22
Degrees of Freedom
How many groups do we have? What is the sample size? Using these values:
What is dfwithin?
What is dfbetween?
23
Degrees of Freedom
How many groups do we have?
There are k = 4 groups Biology, English, Business, and Mathematics
What is the sample size?

There are N = 100 students
Using these values,

What is dfbetween?
k1=41=3
What is dfwithin?
N k = 100 4 = 96
24
Sample Output
SS Group (Treatment, between) Residual (Error, within) Total 1807.49 Df 3 MS 602.50 F 22.091 Sig 0.000
2618.20
96
27.17
4425.69
99
Our estimate of 2 is 27.17

The numerator MS = 602.5 and appears to be highly inflated January 7, 2009 morning session
25
Results
F-statistic = 22.1 P-value: <0.0005 Conclusion the average words per minute differs for at least one of the majors To make stronger statements need to do further testing
26
27
Further Analysis
If H0 is rejected, we conclude that not all the s are equal
Would like to make statements about where there are differences

Can use planned or unplanned comparisons (or contrasts)
Planned comparisons are interesting comparisons decided on before analysis Unplanned comparisons occur after seeing the results
Be careful not to go fishing for results
28
Contrasts
A simple contrast hypothesis compares two population means
HO: 1 = 5
A complex contrast hypothesis has multiple population means on either side

H0: (1 + 2) / 2 = 3 H0: (1 + 2) / 2 = (3 + 4 + 5) / 3
29
Planned Comparisons
Most statistical packages allow you to enter custom planned contrast hypotheses The p-values are only valid under strict conditions
The conditions maintain Type-1 error rate across the whole experiment
Computer packages assume that you have checked the assumptions of the ANOVA test
30
Conditions for Planned Comparisons

Contrasts are selected before looking at the residuals, they are planned not post-hoc Must be ignored if the overall null is not rejected!
Each contrast is based on independent information from other contrasts

The number of planned comparisons must not be more than the corresponding degrees of freedom (k-1 in one-way ANOVA)
31
Unplanned Comparisons
What if we notice a possible interesting difference when looking at the results? Can do comparisons but need to adjust the -level to control for Type-1 error One common method is to use Tukeys simultaneous confidence intervals to calculate any and all pairs of group population means
This procedure takes multiple comparisons into consideration to preserve the level
32
Other Options
Bonferroni correction for the number of comparisons done Dunnetts tests Scheffe procedure
33
Tukeys Multiple Comparisons for previous example
34
Conclusions
In the table on the previous page,
1 = Biology, 2 = Business, 3 = English, 4 = Mathematics
Biology, Business, and Mathematics are all are significantly different from English There are no other significant differences
35
Additional sample output

Below is the same output from a different software package
36
Comparison to Regression
Sample regression output
Which major is our baseline?
37
F-statistic = 22.1, p-value < 0.0005
This is the same F-statistic and p-value as the ANOVA on slide 25
At least one of the explanatory variables is important in this corresponds to the rejection of the null, at least one of the means differs
38
Note that Biology is the baseline and 45.3 is the mean WPM for Biology students Note that Business and Mathematics are not significant Agrees with post-hoc comparisons that neither Business or Mathematics is significantly different from Biology, but English is not To make further conclusions will need to look at multiple comparisons, such as the previous Tukey intervals
39
Regression
The conclusions about the overall null hypothesis will be the same
In regression can make statements comparing groups to baseline

To make more conclusive statements will need to do more analysis ANOVA and either planned or post-hoc comparisons will do the same thing and is often easier
40
One-way ANOVA Power

Two different SAT prep courses charge $1200 for a two month course. An (unethical) experiment would be to randomize students into one of the two courses or take no course What information is needed to calculate power for this one-way ANOVA?
Sample size Within group variance (2 ) Estimated or minimally interesting outcome means for each group
41
Estimate of 2
Based on previous years, we know that 95% of the student scores on SATs fall between 900 and 1500 = (1500-900)/4 = 150 2 = 150^2
42
Minimally interesting outcome

What is the minimally average benefit, in points gained, that would justify the program?
The minimally interesting outcome is based on previous knowledge
For this example well try several different values
43
sd[treatment]
Different applets will define things slightly different. Find an applet you understand.
For the applet I will show you, they require sd[treatment]. From their definition this is calculated as
sd[treatme nt]
2 ( ) i1 i k
k -1
Where i is the ith group mean k = the number of groups
Ready to go to power applet

44
Calculating the power

Let = 150, n = 50, effect = 50 points
Power = 0.3811

Power = 0.6772

Power = 0.9367

Power = 0.1245
45
Calculating the power

Power = 0.7276

Power = 0.9622

Power = 0.997

Power = 0.2294
46
Moving past One-way ANOVA

What if we have two categorical explanatory variables? What if we have categorical and quantitative explanatory variables?
What if subjects have more than one treatment?

What if there is more than one response variable? And many other combinations
47
Two-way ANOVA
Suppose we now have two categorical explanatory variables Is there a significant X1 effect? Is there a significant X2 effect? Are there significant interaction effects? If X1 has k levels and X2 has m levels, then the analysis is often referred to as a k by m ANOVA or k x m ANOVA
48
Terminology
If the interaction is significant, the model is called an interaction model If the interaction is not significant, the model is called an additive model
Explanatory variables are often referred to as factors
49
Assumptions
The assumptions are the same as in One-way ANOVA
1. The errors ij are normally distributed
2. Across the conditions, the errors have equal spread. Often referred to as equal variances. 3. The errors are independent from each other
50
Two-way ANOVA
Two-way (or multi-way) ANOVA is an appropriate analysis method for a study with a quantitative outcome and two (or more) categorical explanatory variables. The assumptions are Normality, equal variance, and independent errors.
51
Results
Results are again displayed in a ANOVA table Will have one line for each term in the model. For a model with two factors, we will have one line for each factor and one line for the interaction. We will also have a line for the error and the total. See next page.
52
The ANOVA table

SS Factor 1 Factor 2 Interaction Error Total df k-1 m-1 (k-1)(m-1) N-k*m N-1 * MS F Sig
The MS(error), denoted by * in the above table, is the true estimate of 2 The MS in each row is that rows SS/df The F-statistic is the MS/MS(error)
53
Exploratory Analysis
Table of means Interaction or profile plots
An interaction plot is a way to look at outcome means for two factors simultaneously A plot with parallel lines suggests an additive model A plot with non-parallel lines suggests an interaction model Note that an interaction plot should NOT be the deciding factor in whether or not to run an interaction model
54
Example
Continuing with the previous example, suppose wed like to add gender as an explanatory variable X1: Major 4 levels X2: Gender 2 levels Response: words per minute typed We will fit an 4 by 2 ANOVA
55
Table of Means and Counts

Male Biology Business 45.5 48.6 Female 45.2 46.9 Overall 45.4 47.6
English
Mathematics Overall
55.3
45.6 48.9
55.9
44.6 47.9
55.6
45.1 48.4 Male
Note, this table should also include the standard error of each of the means.
Female 11 15 11 13
56
Biology Business English

14 10 14 12
Mathematics
Interaction plots
57
Interaction plots
There are two ways to do an interaction plot. Both are legitimate. Ease of interpretation is the final criteria of which to do. If one explanatory variable has more levels than the other, interpretation is often easier if the explanatory variable with more levels defines the x-axis If one explanatory variable is quantitative but has been categorized and the other is categorical, interpretation is often easier if the categorized quantitative variable defines the xaxis.
Example: age, 20-29, 30-39, 40-49, etc.
58
Results
Typical output:
The last column contains the pvalues

Always check interaction first! If the interaction is not significant, rerun without it
59
Results
Updated results
Now we can interpret the main effects. We can see that major is significant but that gender is not.
60
61
Notes
If the interaction is significant, do not check the main effects. The main effects should always be kept if the interaction is significant. Note that due to the groups of students, you will see vertical lines in the residual versus predicted plot. This is due to the fact that all students with a particular combination of the factors will have the same predicted value.
62
Example 2
Using the same variables, lets look at a different outcome
63
Table of Means Example 2

English
Mathematics Overall
45.3
41.8 41.3
60.0
50.0 49.8
51.8
46.1 51.2
64
Typical SPSS Exploratory Analysis
65
Interaction plots Example 2
66
Results Example 2
Results
Note that the interaction is significant

In this case both main effects are also significant, however since the interaction is significant we would keep them even if they were not
67
Example 2
68
Example 2
69
Example 3
Again, using the same variables, lets look at a different outcome
70
Table of Means Example 3

English
Mathematics Overall
54.8
52.0 51.3
62.1
48.4 51.1
58.1
50.1 58.0
71
Interaction Plots Example 3
72
Results Example 3
Results
In this case, the interaction and major are significant, but gender is not. Since the interaction is significant, leave gender in the model.
73
Example 3
74
Example Ginkgo for Memory

A study was performed to test the memory effects of the herbal medicine Ginkgo biloba in healthy people. Subjects received a daily dosage (placebo, 120mg, 250mg) for two months. Subjects also received one of two types of mnemonic training. All subjects were given a memory test before the study and again at the end. The response variable is the difference (after before) in memory test scores. There were 18 subjects randomly assigned to each combination of levels.
75
76
77
SPSS ANOVA output

Conclusions?
78
ANOVA output
Conclusions?
79
Estimated Profile Plot
80
Post-hoc Comparisons
Since there are only two levels of training and there is a significant training effect, we dont need multiple comparisons for training
81
Residual plot
No problems
82
Further Analysis
If there had been an interaction, we could create a table indicating which differences were significant
83
ANCOVA
Analysis of Covariance
At least one quantitative and one categorical explanatory variable In general, the main interest is the effects of the categorical variable and the quantitative variable is considered to be a control variable It is a blending of regression and ANOVA
84
Example
Suppose that we have two different math tutors and would like to compare performance on the final math test We also have time on tutor and would like to use that as another explanatory variable
85
86
Compare Regression and ANCOVA

Regression
ANCOVA
87
Compare Regression and ANOVA

Note that the p-value for the interaction is the same in both models The interaction is not significant, drop and rerun
88
Compare Regression and ANOVA

Regression
ANCOVA
89

ANOVA1

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

ANOVA1

Enviado por

Direitos autorais:

Formatos disponíveis

Statistics Micro Mini

January 5-9, 2008 Beth Ayers

January 7, 2009 morning session

January 7, 2009 morning session

January 7, 2009 morning session

January 7, 2009 morning session

Table of means for each experimental group

January 7, 2009 morning session

Student i in group j has observation

3. The errors are independent from each other

Checking the assumptions

Used to test for differences among two or more independent groups

January 7, 2009 morning session

A mean square (MS) is a variance like quantity calculated as a SS/df

January 7, 2009 morning session

January 7, 2009 morning session

Within vs. Between

January 7, 2009 morning session

So the estimate of MSwithin is

SSi SSwithin i 1 k df within (ni 1) N k i 1

And the estimate of MSbetween is

SS MSbetween between df between

Testing the Hypothesis

MS between dfbetween (k 1) F SS within SS within MS within df within (N k)

January 7, 2009 morning session

What does the F ratio tell us?

Large values of F depend on the degrees of freedom

The ANOVA table

SSbetween dfbetween + SSwithin + dfwithin

January 7, 2009 morning session

Use 4 majors Biology, Business, English, and Mathematics

H1: typing speed varies across the majors

January 7, 2009 morning session

January 7, 2009 morning session

January 7, 2009 morning session

What is the sample size?

Using these values,

Our estimate of 2 is 27.17

January 7, 2009 morning session

Checking the assumptions

January 7, 2009 morning session

Would like to make statements about where there are differences

A complex contrast hypothesis has multiple population means on either side

January 7, 2009 morning session

Conditions for Planned Comparisons

Each contrast is based on independent information from other contrasts

January 7, 2009 morning session

January 7, 2009 morning session

Tukeys Multiple Comparisons for previous example

January 7, 2009 morning session

January 7, 2009 morning session

Additional sample output

January 7, 2009 morning session

January 7, 2009 morning session

January 7, 2009 morning session

January 7, 2009 morning session

In regression can make statements comparing groups to baseline

One-way ANOVA Power

January 7, 2009 morning session

Minimally interesting outcome

For this example well try several different values

January 7, 2009 morning session

Where i is the ith group mean k = the number of groups