Você está na página 1de 15

Chi-square Basics

The Chi-square distribution


Positively skewed but becomes
symmetrical with increasing degrees of
freedom
Mean = k where k = degrees of freedom
Variance = 2k
Assuming a normally distributed dataset
and sampling a single z2 value at a time
2(1) = z2
n
2
z
i
If more than one 2(N) =
i 1

Why used?
Chi-square analysis is primarily used to
deal with categorical (frequency) data
We measure the goodness of fit between
our observed outcome and the expected
outcome for some variable
With two variables, we test in particular
whether they are independent of one
another using the same basic approach.

One-dimensional
Suppose we want to know how people in a
particular area will vote in general and go
around asking them.
Republican

Democrat

Other

20

30

10

How will we go about seeing whats really


going on?

Hypothesis: Dems should win district


Solution: chi-square analysis to determine
if our outcome is different from what would
be expected if there was no preference
2
(
O

E
)
2
E

Observed
Expected

Republican

Democrat

Other

20
20

30
20

10
20

Plug in to formula
(20 20) 2 (30 20) 2 (10 20) 2

20
20
20

2 (2) 10
2
.05
5.99

Reject H0
The district will probably vote democratic
However

Conclusion
Note that all we really can conclude is that our
data is different from the expected outcome
given a situation
Although it would appear that the district will vote
democratic, really we can only conclude they were not
responding by chance
Regardless of the position of the frequencies wed
have come up with the same result
In other words, it is a non-directional test regardless
of the prediction

More complex
What do stats kids do with their free time?

Males
Females

TV

Nap

Worry

Stare at
Ceiling

30
20

40
30

20
40

10
10

Is there a relationship between gender


and what the stats kids do with their free
time?
Males
Females

TV

Nap

Worry

Stare at
Ceiling

Total

30
20

40
30

20
40

10
10

100
100

50

70

60

20

200

Expected = (Ri*Cj)/N
Example for males TV: (100*50)/200 = 25

TV

Nap

Worry

Stare at
Ceiling

Total

Males (E)

30 (25) 40 (35)

20 (30)

10 (10)

100

Females
(E)

20 (25) 30 (35)

40 (30)

10 (10)

100

50

60

20

200

70

df = (R-1)(C-1)
R = number of rows
C = number of columns

Interpretation
2 (3) 10.10

2
.05

7.82

Reject H0, there is some relationship


between gender and how stats students
spend their free time

Other
Important point about the non-directional
nature of the test, the chi-square test by
itself cannot speak to specific hypotheses
about the way the results would come out
Not useful for ordinal data because of this

Assumptions
Normality
Rule of thumb is that we need at least 5 for our expected
frequencies value

Inclusion of non-occurences
Must include all responses, not just those positive ones

Independence
Not that the variables are independent or related (thats what the
test can be used for), but rather as with our t-tests, the
observations (data points) dont have any bearing on one
another.

To help with the last two, make sure that your N equals
the total number of people who responded

Measures of Association

Contingency coefficient
Phi
Cramers Phi
Odds Ratios
Kappa

These were discussed in 5700

Você também pode gostar