Você está na página 1de 89

Statistics Micro Mini

Multi-factor ANOVA

January 5-9, 2008 Beth Ayers

January 7, 2009 morning session

Thursday Sessions
ANOVA
One-way ANOVA Two-way ANOVA ANCOVA With-in subject Between subject Repeated measures MANOVA etc.

January 7, 2009 morning session

What is ANOVA?
ANalysis Of VAriance
Partitions the observed variance based on explanatory variables Compare partitions to test significance of explanatory variables

January 7, 2009 morning session

Some Terminology
Between subjects design each subject participates in one and only one group Within subjects design the same group of subjects serves in more than one treatment
Subject is now a factor

Mixed design a study which has both between and within subject factors Repeated measures general term for any study in which multiple measurements are measured on the same subject
Can be either multiple treatments or several measurements over time
January 7, 2009 morning session
4

ANOVA
Use variances and variance like quantities to study the equality or nonequality of population means So, although it is analysis of variance we are actually analyzing means, not variances There are other methods which analyze the variances between groups

January 7, 2009 morning session

ANOVA
Typical exploratory analysis includes
Tabulation of the number of subjects in each experimental group Side-by-side box plots Statistics about each group
At least mean and standard deviation, can include 5-number summary and information on skewness

Table of means for each experimental group

January 7, 2009 morning session

Notation
If we have k groups, denote the means of the groups as:
1, 2, . . ., k

Student i in group j has observation


yij = j + ij Where ij are independent, distributed N(0,2) Can combine this and say subjects from group j have distribution N(j,2)

With random assignment, the sample mean for any treatment group is representative of the population mean for that group
January 7, 2009 morning session
7

Assumptions
1. The errors ij are normally distributed 2. Across the conditions, the errors have equal spread. Often referred to as equal variances.
Rule of thumb: the assumption is met if the largest variance is less than twice the smallest variance If unequal variances need to make a correction!! This is usually /2.

3. The errors are independent from each other


January 7, 2009 morning session
8

Checking the assumptions


Use the residuals, which are the estimates of ij

1. Look at normal probability plot 2. Look at residual versus fitted plot 3. Hard to check, often assumed from study design

For mild violations of the assumptions, there are options for correction

When the assumptions are not met the p-value is simply wrong!!
January 7, 2009 morning session
9

One-way ANOVA
One-way ANOVA is used when
Only testing the affect of one explanatory variable Each subject has only one treatment or condition
Thus a between-subjects design

Used to test for differences among two or more independent groups


Gives the same results as two-sample Ttest if explanatory variable has 2 levels
January 7, 2009 morning session
10

Hypothesis Testing
H0: 1 = 2 = . . . = k H1: the s are not all equal The alternative hypothesis H1: 1 k is wrong! The null hypothesis is called the overall null and is the hypothesis tested by ANOVA If the overall null is rejected, must do more specific hypothesis testing to determine which means are different, often referred to as contrasts

January 7, 2009 morning session

11

Terminology
The sample variance is the sum of the squared deviations from the mean divided by the degrees of freedom 2

(x x)
i

N 1

A mean square (MS) is a variance like quantity calculated as a SS/df

SS MS df

January 7, 2009 morning session

12

One-way ANOVA
In one-way ANOVA we work with two mean square quantities
MSwithin the mean square within-groups MSbetween the mean square between-groups

MSwithin

SS within df within

MSbetween

SS between df between

January 7, 2009 morning session

13

Within vs. Between

January 7, 2009 morning session

14

One-way ANOVA
For each individual group we have

SS i df i

2 ( x x ) j 1 ij i ni

ni 1

So the estimate of MSwithin is

MSwithin

SSi SSwithin i 1 k df within (ni 1) N k i 1


k

And the estimate of MSbetween is

SS MSbetween between df between


January 7, 2009 morning session

2 n ( x x ) i i 1 i

k 1
15

Mean Squares
What do these values mean? MSwithin is considered a true estimate of 2 that is unaffected by whether the null or alternative hypothesis is true MSbetween is considered a good estimate of 2 only when the null hypothesis is true
If the alternative is true, values of MSbetween tend to be inflated

Thus, we can look at the ratio of the two mean square values to evaluate the null hypothesis
January 7, 2009 morning session
16

Testing the Hypothesis


The F-test looks at the variation among the group means relative to the variation within the sample

MS between dfbetween (k 1) F SS within SS within MS within df within (N k)


The F-statistic tends to be larger if the alternative hypothesis is true than if the null hypothesis is true The test statistic F has an F(k-1, N-k) distribution

SSbetween

SSbetween

January 7, 2009 morning session

17

What does the F ratio tell us?


F = MSbetween / MSwithin The denominator is always an estimate of 2 (under both the null and alternative hypotheses) The numerator is either another estimate of 2 (under the null) or is inflated (under the alternative) If the null is true, values of F are close to 1 If the alternative is true, values of F are larger

Large values of F depend on the degrees of freedom


January 7, 2009 morning session
18

The ANOVA table


When running an ANOVA, statistical packages will return an ANOVA table summarizing the SS, MS, df, F-statistic, and p-value SS Group (Treatment, between) Residual (Error, within) Total SSbetween Df dfbetween MS MSbetween F
_________________

Sig P-value

MSbetween MSwithin

SSwithin

dfwithin

MSwithin

SSbetween dfbetween + SSwithin + dfwithin

January 7, 2009 morning session

19

Example
Suppose we want to know if typing speed varies across majors

Use 4 majors Biology, Business, English, and Mathematics


H0: typing speed is the same for students of all majors
H0: Bio = Business = Eng = Math

H1: typing speed varies across the majors


H1: at least one of the means is different
January 7, 2009 morning session
20

Box plots

January 7, 2009 morning session

21

Summary
The largest variance is less than twice the smallest variance (38.8 < 2 20.1 = 40.2). Use = 0.05.
Major Biology Business English Mathematics ni 25 25 25 25 Mean 45.3 47.6 55.6 45.1 Variance 24.7 25.4 38.8 20.1

January 7, 2009 morning session

22

Degrees of Freedom
How many groups do we have? What is the sample size? Using these values:
What is dfwithin?
What is dfbetween?

January 7, 2009 morning session

23

Degrees of Freedom
How many groups do we have?
There are k = 4 groups Biology, English, Business, and Mathematics

What is the sample size?


There are N = 100 students

Using these values,


What is dfbetween?
k1=41=3

What is dfwithin?
N k = 100 4 = 96
January 7, 2009 morning session
24

Sample Output
SS Group (Treatment, between) Residual (Error, within) Total 1807.49 Df 3 MS 602.50 F 22.091 Sig 0.000

2618.20

96

27.17

4425.69

99

Our estimate of 2 is 27.17


The numerator MS = 602.5 and appears to be highly inflated January 7, 2009 morning session

25

Results
F-statistic = 22.1 P-value: <0.0005 Conclusion the average words per minute differs for at least one of the majors To make stronger statements need to do further testing

January 7, 2009 morning session

26

Checking the assumptions

January 7, 2009 morning session

27

Further Analysis
If H0 is rejected, we conclude that not all the s are equal

Would like to make statements about where there are differences


Can use planned or unplanned comparisons (or contrasts)
Planned comparisons are interesting comparisons decided on before analysis Unplanned comparisons occur after seeing the results
Be careful not to go fishing for results
January 7, 2009 morning session
28

Contrasts
A simple contrast hypothesis compares two population means
HO: 1 = 5

A complex contrast hypothesis has multiple population means on either side


H0: (1 + 2) / 2 = 3 H0: (1 + 2) / 2 = (3 + 4 + 5) / 3

January 7, 2009 morning session

29

Planned Comparisons
Most statistical packages allow you to enter custom planned contrast hypotheses The p-values are only valid under strict conditions
The conditions maintain Type-1 error rate across the whole experiment

Computer packages assume that you have checked the assumptions of the ANOVA test
January 7, 2009 morning session
30

Conditions for Planned Comparisons


Contrasts are selected before looking at the residuals, they are planned not post-hoc Must be ignored if the overall null is not rejected!

Each contrast is based on independent information from other contrasts


The number of planned comparisons must not be more than the corresponding degrees of freedom (k-1 in one-way ANOVA)

January 7, 2009 morning session

31

Unplanned Comparisons
What if we notice a possible interesting difference when looking at the results? Can do comparisons but need to adjust the -level to control for Type-1 error One common method is to use Tukeys simultaneous confidence intervals to calculate any and all pairs of group population means
This procedure takes multiple comparisons into consideration to preserve the level
January 7, 2009 morning session
32

Other Options
Bonferroni correction for the number of comparisons done Dunnetts tests Scheffe procedure

January 7, 2009 morning session

33

Tukeys Multiple Comparisons for previous example

January 7, 2009 morning session

34

Conclusions
In the table on the previous page,
1 = Biology, 2 = Business, 3 = English, 4 = Mathematics

Biology, Business, and Mathematics are all are significantly different from English There are no other significant differences

January 7, 2009 morning session

35

Additional sample output


Below is the same output from a different software package

January 7, 2009 morning session

36

Comparison to Regression
Sample regression output
Which major is our baseline?

January 7, 2009 morning session

37

Comparison to Regression
F-statistic = 22.1, p-value < 0.0005
This is the same F-statistic and p-value as the ANOVA on slide 25

At least one of the explanatory variables is important in this corresponds to the rejection of the null, at least one of the means differs

January 7, 2009 morning session

38

Comparison to Regression
Note that Biology is the baseline and 45.3 is the mean WPM for Biology students Note that Business and Mathematics are not significant Agrees with post-hoc comparisons that neither Business or Mathematics is significantly different from Biology, but English is not To make further conclusions will need to look at multiple comparisons, such as the previous Tukey intervals

January 7, 2009 morning session

39

Regression
The conclusions about the overall null hypothesis will be the same

In regression can make statements comparing groups to baseline


To make more conclusive statements will need to do more analysis ANOVA and either planned or post-hoc comparisons will do the same thing and is often easier
January 7, 2009 morning session
40

One-way ANOVA Power


Two different SAT prep courses charge $1200 for a two month course. An (unethical) experiment would be to randomize students into one of the two courses or take no course What information is needed to calculate power for this one-way ANOVA?
Sample size Within group variance (2 ) Estimated or minimally interesting outcome means for each group
January 7, 2009 morning session
41

Estimate of 2
Based on previous years, we know that 95% of the student scores on SATs fall between 900 and 1500 = (1500-900)/4 = 150 2 = 150^2

January 7, 2009 morning session

42

Minimally interesting outcome


What is the minimally average benefit, in points gained, that would justify the program?
The minimally interesting outcome is based on previous knowledge

For this example well try several different values

January 7, 2009 morning session

43

sd[treatment]
Different applets will define things slightly different. Find an applet you understand.

For the applet I will show you, they require sd[treatment]. From their definition this is calculated as

sd[treatme nt]

2 ( ) i1 i k

k -1

Where i is the ith group mean k = the number of groups

Ready to go to power applet


January 7, 2009 morning session
44

Calculating the power


Let = 150, n = 50, effect = 50 points
Power = 0.3811

Let = 150, n = 100, effect = 50 points


Power = 0.6772

Let = 150, n = 50, effect = 100 points


Power = 0.9367

Let = 150, n = 50, effect = 25 points


Power = 0.1245
January 7, 2009 morning session
45

Calculating the power


Let = 100, n = 50, effect = 50 points
Power = 0.7276

Let = 100, n = 100, effect = 50 points


Power = 0.9622

Let = 100, n = 50, effect = 100 points


Power = 0.997

Let = 100, n = 50, effect = 25 points


Power = 0.2294
January 7, 2009 morning session
46

Moving past One-way ANOVA


What if we have two categorical explanatory variables? What if we have categorical and quantitative explanatory variables?

What if subjects have more than one treatment?


What if there is more than one response variable? And many other combinations

January 7, 2009 morning session

47

Two-way ANOVA
Suppose we now have two categorical explanatory variables Is there a significant X1 effect? Is there a significant X2 effect? Are there significant interaction effects? If X1 has k levels and X2 has m levels, then the analysis is often referred to as a k by m ANOVA or k x m ANOVA

January 7, 2009 morning session

48

Terminology
If the interaction is significant, the model is called an interaction model If the interaction is not significant, the model is called an additive model

Explanatory variables are often referred to as factors

January 7, 2009 morning session

49

Assumptions
The assumptions are the same as in One-way ANOVA
1. The errors ij are normally distributed
2. Across the conditions, the errors have equal spread. Often referred to as equal variances. 3. The errors are independent from each other

January 7, 2009 morning session

50

Two-way ANOVA
Two-way (or multi-way) ANOVA is an appropriate analysis method for a study with a quantitative outcome and two (or more) categorical explanatory variables. The assumptions are Normality, equal variance, and independent errors.

January 7, 2009 morning session

51

Results
Results are again displayed in a ANOVA table Will have one line for each term in the model. For a model with two factors, we will have one line for each factor and one line for the interaction. We will also have a line for the error and the total. See next page.

January 7, 2009 morning session

52

The ANOVA table


SS Factor 1 Factor 2 Interaction Error Total df k-1 m-1 (k-1)(m-1) N-k*m N-1 * MS F Sig

The MS(error), denoted by * in the above table, is the true estimate of 2 The MS in each row is that rows SS/df The F-statistic is the MS/MS(error)
January 7, 2009 morning session
53

Exploratory Analysis
Table of means Interaction or profile plots
An interaction plot is a way to look at outcome means for two factors simultaneously A plot with parallel lines suggests an additive model A plot with non-parallel lines suggests an interaction model Note that an interaction plot should NOT be the deciding factor in whether or not to run an interaction model
January 7, 2009 morning session
54

Example
Continuing with the previous example, suppose wed like to add gender as an explanatory variable X1: Major 4 levels X2: Gender 2 levels Response: words per minute typed We will fit an 4 by 2 ANOVA
January 7, 2009 morning session
55

Table of Means and Counts


Male Biology Business 45.5 48.6 Female 45.2 46.9 Overall 45.4 47.6

English
Mathematics Overall

55.3
45.6 48.9

55.9
44.6 47.9

55.6
45.1 48.4 Male

Note, this table should also include the standard error of each of the means.

Female 11 15 11 13
56

Biology Business English


January 7, 2009 morning session

14 10 14 12

Mathematics

Interaction plots

January 7, 2009 morning session

57

Interaction plots
There are two ways to do an interaction plot. Both are legitimate. Ease of interpretation is the final criteria of which to do. If one explanatory variable has more levels than the other, interpretation is often easier if the explanatory variable with more levels defines the x-axis If one explanatory variable is quantitative but has been categorized and the other is categorical, interpretation is often easier if the categorized quantitative variable defines the xaxis.
Example: age, 20-29, 30-39, 40-49, etc.
January 7, 2009 morning session
58

Results
Typical output:

The last column contains the pvalues


Always check interaction first! If the interaction is not significant, rerun without it
January 7, 2009 morning session
59

Results
Updated results

Now we can interpret the main effects. We can see that major is significant but that gender is not.

January 7, 2009 morning session

60

Checking the assumptions

January 7, 2009 morning session

61

Notes
If the interaction is significant, do not check the main effects. The main effects should always be kept if the interaction is significant. Note that due to the groups of students, you will see vertical lines in the residual versus predicted plot. This is due to the fact that all students with a particular combination of the factors will have the same predicted value.
January 7, 2009 morning session
62

Example 2
Using the same variables, lets look at a different outcome

January 7, 2009 morning session

63

Table of Means Example 2


Male Biology Business 37.9 39.9 Female 45.8 45.0 Overall 41.2 43.0

English
Mathematics Overall

45.3
41.8 41.3

60.0
50.0 49.8

51.8
46.1 51.2

January 7, 2009 morning session

64

Typical SPSS Exploratory Analysis

January 7, 2009 morning session

65

Interaction plots Example 2

January 7, 2009 morning session

66

Results Example 2
Results

Note that the interaction is significant


In this case both main effects are also significant, however since the interaction is significant we would keep them even if they were not
January 7, 2009 morning session
67

Example 2

January 7, 2009 morning session

68

Example 2

January 7, 2009 morning session

69

Example 3
Again, using the same variables, lets look at a different outcome

January 7, 2009 morning session

70

Table of Means Example 3


Male Biology Business 47.9 50.2 Female 47.2 48.1 Overall 47.6 49.0

English
Mathematics Overall

54.8
52.0 51.3

62.1
48.4 51.1

58.1
50.1 58.0

January 7, 2009 morning session

71

Interaction Plots Example 3

January 7, 2009 morning session

72

Results Example 3
Results

In this case, the interaction and major are significant, but gender is not. Since the interaction is significant, leave gender in the model.
January 7, 2009 morning session
73

Example 3

January 7, 2009 morning session

74

Example Ginkgo for Memory


A study was performed to test the memory effects of the herbal medicine Ginkgo biloba in healthy people. Subjects received a daily dosage (placebo, 120mg, 250mg) for two months. Subjects also received one of two types of mnemonic training. All subjects were given a memory test before the study and again at the end. The response variable is the difference (after before) in memory test scores. There were 18 subjects randomly assigned to each combination of levels.
January 7, 2009 morning session
75

Exploratory Analysis

January 7, 2009 morning session

76

Exploratory Analysis

January 7, 2009 morning session

77

SPSS ANOVA output


Conclusions?

January 7, 2009 morning session

78

ANOVA output
Conclusions?

January 7, 2009 morning session

79

Estimated Profile Plot

January 7, 2009 morning session

80

Post-hoc Comparisons
Since there are only two levels of training and there is a significant training effect, we dont need multiple comparisons for training

January 7, 2009 morning session

81

Residual plot
No problems

January 7, 2009 morning session

82

Further Analysis
If there had been an interaction, we could create a table indicating which differences were significant

January 7, 2009 morning session

83

ANCOVA
Analysis of Covariance
At least one quantitative and one categorical explanatory variable In general, the main interest is the effects of the categorical variable and the quantitative variable is considered to be a control variable It is a blending of regression and ANOVA

January 7, 2009 morning session

84

Example
Suppose that we have two different math tutors and would like to compare performance on the final math test We also have time on tutor and would like to use that as another explanatory variable

January 7, 2009 morning session

85

Exploratory Analysis

January 7, 2009 morning session

86

Compare Regression and ANCOVA


Regression

ANCOVA

January 7, 2009 morning session

87

Compare Regression and ANOVA


Note that the p-value for the interaction is the same in both models The interaction is not significant, drop and rerun

January 7, 2009 morning session

88

Compare Regression and ANOVA


Regression

ANCOVA

89

Você também pode gostar