Escolar Documentos
Profissional Documentos
Cultura Documentos
Definitions
Parametric Tests
Statistical tests that involve assumptions about or estimations of population parameters. (what weve been learning)
Nonparametric Tests
Also known as distribution-free tests Statistical tests that do not rely on assumptions of distributions or parameter estimates (what were going to be learning)
More Definitions
The Chi-Square (X2) test is a nonparametric test that is used to test hypotheses about distributions of frequencies across categories of data. Different from what weve been learning
Then Averages Scales Now Frequencies Categories
Pp
Pp
PP
Pp
Pp
pp
Flowers
I grow 120 of these plants from seed. The resulting colors of flowers are as follows:
Pink 75
White 20
Red 25
Observed
75
So, if I planted 120 seeds, Id expect this set of colored flowers. Expected 60 30 30
If my hypothesis is true (50%, 25%, 25%), how likely is it that I could get this difference between my actual distribution and my expected distribution of colored flowers?
Used to determine if the probability < , in which case the hypothesis is rejected or if the probability > , in which case the hypothesis is not rejected.
White 20
120(.25) = 30 120(.25) = 30
(O E )2 2 o = E
(O E )2 = E
As differences between Os and Es get bigger, X2 gets bigger. Since we are only interested in rejecting H0 if the differences between the obtained frequencies and the expected frequencies is greater than expected by chance, the rejection region is in the upper tail.
Finding X2c
Table E.1 has the tabled values. df?
df = k - 1 Why?
If you have 3 categories, only the counts in 2 of them are free to vary.
Finding X2c
df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0.050 3.841 5.991 7.815 9.488 11.070 12.592 14.067 15.507 16.919 18.307 19.675 21.026 22.362 23.685 24.996 26.296 27.587 28.869 30.144 31.410 0.025 5.024 7.378 9.348 11.143 12.832 14.449 16.013 17.535 19.023 20.483 21.920 23.337 24.736 26.119 27.488 28.845 30.191 31.526 32.852 34.170 0.010 6.635 9.210 11.345 13.277 15.086 16.812 18.475 20.090 21.666 23.209 24.725 26.217 27.688 29.141 30.578 32.000 33.409 34.805 36.191 37.566 0.005 7.879 10.597 12.838 14.860 16.750 18.548 20.278 21.955 23.589 25.188 26.757 28.300 29.819 31.319 32.801 34.267 35.718 37.156 38.582 39.997
Calculating X2o
Pink Observed 75 60 White 20 30
Finding X2o
Red 25 30
White 20 30
Red
Expected
(O-E) = 0, always
Finding X2o
Pink Observed - Expected 15 White -10 Red -5
Pink (Observed - Expected)2 Expected 225 60
Finding X2o
White 100 30 Red 25 30
Components of X2
White 100
Red 25
(Observed - Expected)2 / Expected
Pink 3.75
White 3.33
Red .83
Interpretation
Since we reject H0, the geneticists hypothesis does not fit the data. The population distribution across the three categories is probably different than .50 pink, .25 white, .25 red.
Breakfast Answers
2) Use = .05 3) df = 1, X2 distribution with 1 df 4) X2c for = .05, df = 1, is 3.84; Decision rule: reject H0 if X2o > 3.84 5) Calculations E(Posts) = E(Kelloggs) = 100 (.5) = 50
Cereal Post's Kellogg's O E O-E 57 50 7 43 50 -7 100 100 0 (O-E)^2 (O-2)^2/E 49 0.98 49 0.98 1.96 = X2o
Since X2o < X2c (1.96 < 3.84), we retain H0. The manufacturers cannot claim that more people prefer Posts.
Same test.
Contingency Tables
Left Early Late 80 55 Right 45 50
Stating H0 & H1
When two or more groups are being compared, H0 states that the population distributions across all categories are the same. H1 states that the population distributions differ.
Stating H0 & H1
H0 : Early and late contact mothers do not differ in how they hold their neonates. H1 : Early and late contact mothers hold their neonates differently. OR H0 : Group membership and distribution across categories are unrelated. H1 : Group membership and distribution across categories are related.
Stating H0 & H1
H0 : Group membership and distribution across categories are unrelated. H1 : Group membership and distribution across categories are related. OR H0 : Time of first contact and how neonates are held are not related H1 : Time of first contact and how neonates are held are related.
Stating H0 & H1
H0 : Time of first contact and how neonates are held are not related H1 : Time of first contact and how neonates are held are related. OR H0 : Time of first contact and how neonates are held are independent. H1 : Time of first contact and how neonates are held are dependent / correlated / related.
(O E )2 2 o = E
Two differences:
Calculation of expected frequencies Calculation of df
E=
E (early , right ) =
E (late, right ) =
Expected Frequencies Left Right Early 73.4 51.6 Late 61.6 43.4 Column Sums 135 95
Expected frequencies = N P
Row Sums 125 105 N = 230
X2
Analyses.
df in X2 Independence Tests
df = (# rows - 1) (# columns - 1) Why?
Remember marginals are fixed in X2 independence tests.
100 100 50 60 90
Expected Frequencies Left Right Early 73.4 51.6 Late 61.6 43.4
(Observed - Expected)^2 Left Right Early 43.56 43.56 Late 43.56 43.56
(O E ) 2 o = E
2
Overview of X2
X2 - a nonparametric test applied to categorical, frequency data. Relevant probability distribution is the X2 distribution.
A family of distributions varying in df Positively skewed with minimum = 0 Skew decreases as df increases. Center of distribution and critical values increase as df increases.
X2o = 3.14
Overview of X2
Rejection region in the upper tail. Decision rule: reject H0 if X2o > X2c Two forms:
Goodness-of-fit
used to determine whether an obtained distribution fits a hypothetical one.
Independence
used to test whether two categorical variables are related used to test whether two different samples are related