Escolar Documentos
Profissional Documentos
Cultura Documentos
Jeremy Sumner
Maths and Physics, University of Tasmania
Useful resources
http://rtutorialseries.blogspot.com.au/
http://www.r-tutor.com/elementary-statistics/analysis-variance
http://www.statmethods.net/stats/anova.html
http://www.r-bloggers.com/one-way-analysis-of-variance-anova/
http://www.stat.columbia.edu/martin/W2024/R3.pdf
DATA:
ANOVA in a nutshell
ANOVA basics
Time to coagulation by Diet
NULL: A = B = C = D
Groups normal about their mean
ALT: i 6= j
Boxes: within group variation
Averages: between group variation
Model: yij = i + random
Summary statistics
DATA assumptions
yij = i + ij = + i + ij
yij is j th sample from i th group
is the grand mean
i = +i are group means
ij N(0, 2 )
Group
1
2
3
..
.
Data
y11 , y12 , . . . , y1N1
y21 , y22 , . . . , y2N2
y31 , y32 , . . . , y3N3
..
.
Dist
N(1 , 2 )
N(2 , 2 )
N(3 , 2 )
..
.
N(k , 2 )
Sample means: yi =
1
Ni
PNi
j=1 yij
1
Ni 1
PNi
j=1
(yij yi )
Size
N1
N2
N3
..
.
Mean
y1
y2
y3
..
.
Variance
s12
s22
s32
..
.
Nk
yk
sk2
MSGroups
MSE
= 1?
SSE
k1
y
total variation around group means
ij
i
i=1
j=1
=
=
Nk
# data points # of means computed
2
= sp =
SS Groups
k1
k
X
(yi y )2
i=1
i,j (yij
k1
y )2
Generic table
Source
df
SS
MS
Between
Within
Total
k 1
N k
n1
P
SS Groups = i Ni (yi y )2
P
SSE = i (Ni 1) si2
P
2
SST = i,j (yij y )
SS Groups
k1
SSE
Nk
F =
***
MSGroups
MSE
ANOVA as regression
Consider regression model on 4 diets:
time = 0 (diet A)+1 (diet B)+2 (diet C)+3 (diet D)
(diet A) is the indicator function: = 0 or 1
In R: 0 = A , 1 = (B A ), 2 = (C A ), 3 = (D A )
Regression style outputs
Coefficients:
Estimate Std. Error t value
(Intercept)
6.100e+01 1.183e+00 51.554
DietB
5.000e+00 1.528e+00
3.273
DietC
7.000e+00 1.528e+00
4.583
DietD
-3.333e-15 1.449e+00
0.000
Pr(> |t|)
< 2e-16
0.003803
0.000181
1.000000
. 0.1
***
**
***
Comparison
Temperature matters
Humidity matters
Same matters
yij
=
(yij 1)/ 6= 0
log(yij )
=0
Analysis of Medians:
Kruskal-Wallis : rank based, average rank for each group,
variation in these rank-averages in analysed
Moods Median test: Contingency table, greater than grand
median? less than? 2 test
Both assume groups have same shaped distribution, and
Kruskal-Wallis is more powerful than Mood
pairwise Wilcoxon: assumes roughly symmetric distribution,
rank based, Holm adjusted p-values
Weak assumptions
Strong assumptions
(or model)
# parameters
p-values
power
fit
Type I rate
Type II rate
Sample size
Bias
Variance
(eg. non-parametric)
many (!?)
large
low
good
low
high
large
low
high
Factorial ANOVA
You should always double check that your data satisfies the
assumptions of the method you are applying.
The more you can assume the better, as you can use a more
powerful test and hence reduce Type II error
For ANOVA there is a sequence of assumptions across groups:
normal with identical variance . . . normal without identical
variance . . . not normal but same shape . . . completely nuts
The equal means null hypothesis is a good start, but if its
false we always want to know more this is where contrasts
come in.
Multiple tests lead to increased chance of Type I error
Contrasts are great, but p-values must be corrected for
multiple testing, AND dont use the data to suggest contrasts.