Escolar Documentos
Profissional Documentos
Cultura Documentos
Goals
Overview of key elements of hypothesis testing
Review of common one and two sample tests
Introduction to ANOVA
Hypothesis Testing
The intent of hypothesis testing is formally examine two
opposing conjectures (hypotheses), H0 and HA
These two hypotheses are mutually exclusive and
exhaustive so that one is true to the exclusion of the
other
We accumulate evidence - collect and analyze sample
information - for the purpose of determining which of
the two hypotheses is true and which of the two
hypotheses is false
P-values
Calculate a test statistic in the sample data that is
relevant to the hypothesis being tested
After calculating a test statistic we convert this to a Pvalue by comparing its value to distribution of test
statistics under the null hypothesis
Measure of how likely the test statistic value is under
the null hypothesis
P-value Reject H0 at level
P-value > Do not reject H0 at level
When To Reject H0
Level of significance, : Specified before an experiment to define
rejection region
Rejection region: set of all test statistic values for which H0 will be
rejected
One Sided
Two Sided
= 0.05
= 0.05
Some Notation
In general, critical values for an level test denoted as:
One sided test : X "
Two sided test : X "/2
H0 True
H0 False
H0 True
Do Not
Reject H0
Correct Decision
1-
Rejct H0
Incorrect Decision
H0 False
Incorrect Decision
Correct Decision
1-
H0 False
H0 True
Do Not
Reject H0
Correct Decision
1-
Incorrect Decision
Type II Error
Rejct H0
Incorrect Decision
Type I Error
Correct Decision
1-
Gaussian
Non-Gaussian
Binomial
Compare one
group to a
hypothetical
value
One sample
t-test
Wilcoxon Test
Binomial Test
Compare two
paired groups
Paired t-test
Wilcoxon Test
McNemars Test
Compare two
unpaired
groups
Two sample
t-test
Wilcoxon-MannWhitney Test
Chi-Square or
Fishers Exact Test
Statistic
x "
T=
s n
t" ,m +n#2
t" ,n#1
df
Two Sample
Non-Parametric Alternatives
Wilcoxon Test: non-parametric analog of one sample ttest
z=
p " p0
p0 (1" p0 ) /n
Confidence Intervals
Confidence interval: an interval of plausible values for
the parameter being estimated, where degree of plausibility
specifided by a confidence level
General form:
x critical value" se
x - y t " ,m +n#2 sp
1 1
+
m n
Interpreting a 95% CI
We calculate a 95% CI for a hypothetical sample mean to be
between 20.6 and 35.4. Does this mean there is a 95%
probability the true population mean is between 20.6 and 35.4?
NO! Correct interpretation relies on the long-rang frequency
interpretation of probability
One-Way ANOVA
Simplest case is for One-Way (Single Factor) ANOVA
The outcome variable is the variable youre comparing
The factor variable is the categorical variable being used to
define the groups
- We will assume k samples (groups)
The one-way is because each value is classified in exactly one
way
ANOVA easily generalizes to more factors
Assumptions of ANOVA
Independence
Normality
Homogeneity of variances (aka,
Homoscedasticity)
H 0 : : =1
= =2 = ...
= =k
=L
0
Motivating ANOVA
A random sample of some quantitative trait
was measured in individuals randomly sampled
from population
Genotyping of a single SNP
AA:
AG:
GG:
82, 83, 97
83, 78, 68
38, 59, 55
Rational of ANOVA
Basic idea is to partition total variation of the
data into two sources
1. Variation within levels (groups)
2. Variation between levels (groups)
If H0 is true the standardized variances are equal
to one another
The Details
Our Data:
AA:
AG:
GG:
82, 83, 97
83, 78, 68
38, 59,! 55
Let Xij denote the data from the ith level and jth observation
!
x ij
x.. = " "
i=1 j=1 N
x.. =
82 + 83 + 97 + 83 + 78 + 68 + 38 + 59 + 55
= 71.4
9
SST
K
# # (x
ij
" x.. )
Sum of squared
deviations about the
!
grand mean across all
N observations
SSTE
K
i=1 j=1
SSTG
# n (x
i
i.
" x.. )
i=1
Sum of squared
deviations for each
!
group mean about
the grand mean
2
(x
"
x
)
# # ij i.
i=1 j=1
Sum of squared
deviations for all
observations within
each group from that
group mean, summed
across all groups
In Our Example
SST
K
ij
" x.. )
# n (x
i=1 j=1
SSTE
K
# # (x
SSTG
i.
" x.. )
i=1
2
(x
"
x
)
# # ij i.
i=1 j=1
2630.2
2124.2
506
In Our Example
SST
K
ij
" x.. )
# n (x
i.
" x.. )
x1.
2
(x
"
x
)
# # ij i.
i=1 j=1
i=1
i=1 j=1
SSTE
K
# # (x
SSTG
!
x 2.
x..
!
!
x 3.
MSTG 1062.2
F=
=
= 12.59
MSTE
84.3
If H0 is true MSTG and MSTE are equal
ANOVA Table
Source of
Variation
Group
Error
df
k-1
N-k
Sum of
Squares
MS
SSTG
SSTG
k "1
SSTE
SSTE
N " k!
Total
N-1
SST
!
F
SSTG
k "1
SSTE
N"k
Non-Parametric Alternative
Kruskal-Wallis Rank Sum Test: non-parametric analog to ANOVA
In R, kruskal.test()