Nonparametric Lecture

Nonparametric Statistics
Timothy C. Bates
tim.bates@ed.ac.uk
Parametric Statistics 1
Assume data are drawn from samples
with a certain distribution (usually normal)
Compute the likelihood that groups are
related/unrelated or same/different given
that underlying model
t-test, Pearsons correlation, ANOVA
Parametric Statistics 2
Assumptions of Parametric statistics

1.
2.
3.
Observations are independent

Your data are normally distributed
Variances are equal across groups
Can be modified to cope with unequal 2
Non-parametric Statistics?
Non-parametric statistics do not assume
any underlying distribution
They estimate the distribution AND
compute the probability that your groups
are the related/the same or
unrelated/different
Nonparametric
No parameters
Model structure is not specified a priori

but is instead determined from data.
The data are parameterised by the
analysis
AKA: distribution free
Non-parametric Statistics
Assumptions of non-parametric statistics

1.
Non-parametric Statistics?
Non-parametric statistics do not assume
any underlying distribution
Estimating or modeling this distribution
reduces their power to detect effects
So never use them unless you have to
Why use a Non-parametric

Statistic?
Very small samples (<20 replicates)

High probability of violating the assumption of
normality
Leads to spurious Type-1 (false alarm) errors
Why use a Non-parametric

Statistic?
Outliers more often lead to spurious Type1 (false alarm) errors in parametric
statistics.
Nonparametric statistics reduce data to
an ordinal rank, which reduces the impact
or leverage of outliers.
Error
Type-I error: False Alarm for a bogus effect
Type-II error: Miss a real effect
reject the null hypothesis when it is really true
fail to reject our null hypothesis when it is really false
Type-III error: :-)
lazy, incompetent, or willful ignorance of the truth
Power
1-alpha
Non-parametric Choices
Data type?
continuous
discrete
Question?
association
Spearmans
Rank
2
Different central
value
Number of
groups?
two-groups
Mann-Whitney U
Wilcoxons Rank Sums
Difference in 2
BrownForsythe
more than 2
Kruskal-Wallis
test
Non-parametric Choices
Data type?
continuous
discrete
Question?
Like a
association
Pearsons R
Spearmans
Rank
Like
Students t
No alternative
Different central
value
Difference in 2
Number of
groups?
BrownForsythe
two-groups
Mann-Whitney U
Wilcoxons Rank Sums
more than 2
Kruskal-Wallis
test
Like F-test
Like ANOVA
Chi-Squared (2)
2 tests the null hypothesis that observed

events occur with an expected frequency
e.g. Ho: This six-sided dice is fair
in large samples frequencies are distributed as 2

Expect all 6 outcomes to occur equally often
Assumptions

Outcomes mutually exclusive
Sample is not small
Small samples require exact test:, i.e., binomial test
Chi-Squared 2 formula
2 = the sum of each squared difference

between the observed and expected
frequencies divided its expected
frequency
2 and contingency tables

2 essentially tests if each cell in a
contingency table has its expected value
In a 2-way table, this expectation will be
the value of an adjacent cell
Example: coin toss

Random sample of 100 coin tosses, of a
coin believed to be fair
We observed number of 45 heads, and
and 55 tails
Is the coin fair?
Coin toss
If ho is true, our test statistic is drawn from a 2

distribution with df = 1
(45-50)2 + (55-50)2 = 0.5 + 0.5 = 1

50
50
2(1) = 1, p > 0.3
Coin toss 2 in R
chisq.test(c(45,55), p=c(.5,.5))
Chi-squared test for given probabilities

2 = 1, df = 1, p = 0.3173
Spearman Rank test ( (rho))
Named after Charles Spearman,
Non-parametric measure of correlation
Assesses how well an arbitrary monotonic

function describes the relationship between
two variables,
Does not require the relationship be linear

Does not require interval measurement
Spearman Rank test ( (rho))
Mathematically, it is simply a Pearsons r

computed on ranked data
d = difference in rank of a given pair

n = number of pairs
Alternative test = Kendall's Tau (Kendall's )
Mann-Whitney U
AKA: Wilcoxon rank-sum test
Mann & Whitney, 1947; Wilcoxon, 1945
Non-parametric test for difference in the

medians of two independent samples
Assumptions:
Samples are independent
Observations can be ranked (ordinal or better)
Mann-Whitney U
U tests the difference in the medians of
two independent samples
n1 = number of obs in sample 1
n2 = number of obs in sample 2

R = sum of ranks of the lower-ranked
sample
Mann-Whitney U or t-test?
Should you use it over the t-test?
Yes if you have a very small sample (<20)

(central limit assumptions not met)
It is less prone to type-I error
Possibly if your data are inherently ordinal

Otherwise, probably not.
(spurious significance) due to outliers.
But does not in fact handle comparisons of

samples whose variances differ very well
(Use unequal variance t-test with rank data)
Aesop: Mann-Whitney U
Example
Suppose that Aesop is dissatisfied with

his classic experiment in which one
tortoise was found to beat one hare in a
race.
He decides to carry out a significance test
to discover whether the results could be
extended to tortoises and hares in
general
Aesop 2: Mann-Whitney U
He collects a sample of 6 tortoises and 6 hares,

and makes them all run his race. The order in
which they reach the finishing post (their rank
order) is as follows:
tort = c(1, 7, 8, 9, 10,11)
hare = c(2, 3, 4, 5, 6, 12)
Original tortoise still goes at warp speed, original

hare is still lazy, but the others run truer to
stereotype.
wilcox.test(tort, hare)
Wilcoxon = W = 25, p-value = 0.31
Tortoises are not faster (but neither are

hares)
tort = c(1, 7, 8, 9, 10,11) (n 2 = 6)
hare = c(2, 3, 4, 5, 6, 12) (n1 = 6, R1 =32)
Wilcoxon = W = 25, p-value = 0.31
Tortoises are not faster (but neither are hares).
Welch Two Sample t-test
t = 1.1355, df = 10, p-value = 0.28

Alternative hypothesis: true difference in means is
not equal to 0
95 percent confidence interval:
-2.25 ~ 6.91
sample estimates:
mean of x = 7.6 mean of y = 5.3
Power comparison with continuous

normal data
tort = 1 74 79 81 100 121
hare = 4 9 16 17 18 144
Wilcoxon
W = 25, p = 0.31
t.test
t.test(tort, hare, var.equal = TRUE)

t(10) = 1.5, p = 0.16
Wilcoxon signed-rank test

(related samples)
Same idea as MW U, generalized to

matched samples
Equivalent to non-independent sample ttest
Kruskall-Wallis
Non-parametric one-way analysis of variance by

ranks (named after William Kruskal and W.
Allen Wallis)
tests equality of medians across groups.
It is an extension of the Mann-Whitney U test to
3 or more groups.
Does not assume a normal population,
Assumes population variances among groups
are equal.

Nonparametric Lecture

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Nonparametric Lecture

Enviado por

Direitos autorais:

Formatos disponíveis

Nonparametric Statistics

Assumptions of Parametric statistics

Observations are independent

Can be modified to cope with unequal 2

Model structure is not specified a priori

AKA: distribution free

Assumptions of non-parametric statistics

Observations are independent

So never use them unless you have to

Why use a Non-parametric

Very small samples (<20 replicates)

Why use a Non-parametric

Type-I error: False Alarm for a bogus effect

Type-II error: Miss a real effect

reject the null hypothesis when it is really true

fail to reject our null hypothesis when it is really false

Type-III error: :-)

lazy, incompetent, or willful ignorance of the truth

2 tests the null hypothesis that observed

e.g. Ho: This six-sided dice is fair

in large samples frequencies are distributed as 2

Observations are independent

Small samples require exact test:, i.e., binomial test

2 = the sum of each squared difference

2 and contingency tables

Example: coin toss

If ho is true, our test statistic is drawn from a 2

(45-50)2 + (55-50)2 = 0.5 + 0.5 = 1

Chi-squared test for given probabilities

Spearman Rank test ( (rho))

Named after Charles Spearman,

Non-parametric measure of correlation

Assesses how well an arbitrary monotonic

Does not require the relationship be linear

Spearman Rank test ( (rho))

Mathematically, it is simply a Pearsons r

d = difference in rank of a given pair

Alternative test = Kendall's Tau (Kendall's )

AKA: Wilcoxon rank-sum test

Mann & Whitney, 1947; Wilcoxon, 1945

Non-parametric test for difference in the

n2 = number of obs in sample 2

Should you use it over the t-test?

Yes if you have a very small sample (<20)

It is less prone to type-I error

Possibly if your data are inherently ordinal

But does not in fact handle comparisons of

(Use unequal variance t-test with rank data)

Suppose that Aesop is dissatisfied with

He collects a sample of 6 tortoises and 6 hares,

Original tortoise still goes at warp speed, original

Tortoises are not faster (but neither are

hare = c(2, 3, 4, 5, 6, 12) (n1 = 6, R1 =32)

Wilcoxon = W = 25, p-value = 0.31

Tortoises are not faster (but neither are hares).

Welch Two Sample t-test

t = 1.1355, df = 10, p-value = 0.28

mean of x = 7.6 mean of y = 5.3

Power comparison with continuous

t.test(tort, hare, var.equal = TRUE)

Wilcoxon signed-rank test

Same idea as MW U, generalized to