Escolar Documentos
Profissional Documentos
Cultura Documentos
Why used?
Chi-square analysis is primarily used to
deal with categorical (frequency) data
We measure the goodness of fit between
our observed outcome and the expected
outcome for some variable
With two variables, we test in particular
whether they are independent of one
another using the same basic approach.
One-dimensional
Suppose we want to know how people in a
particular area will vote in general and go
around asking them.
Republican
Democrat
Other
20
30
10
E
)
2
E
Observed
Expected
Republican
Democrat
Other
20
20
30
20
10
20
Plug in to formula
(20 20) 2 (30 20) 2 (10 20) 2
20
20
20
2 (2) 10
2
.05
5.99
Reject H0
The district will probably vote democratic
However
Conclusion
Note that all we really can conclude is that our
data is different from the expected outcome
given a situation
Although it would appear that the district will vote
democratic, really we can only conclude they were not
responding by chance
Regardless of the position of the frequencies wed
have come up with the same result
In other words, it is a non-directional test regardless
of the prediction
More complex
What do stats kids do with their free time?
Males
Females
TV
Nap
Worry
Stare at
Ceiling
30
20
40
30
20
40
10
10
TV
Nap
Worry
Stare at
Ceiling
Total
30
20
40
30
20
40
10
10
100
100
50
70
60
20
200
Expected = (Ri*Cj)/N
Example for males TV: (100*50)/200 = 25
TV
Nap
Worry
Stare at
Ceiling
Total
Males (E)
30 (25) 40 (35)
20 (30)
10 (10)
100
Females
(E)
20 (25) 30 (35)
40 (30)
10 (10)
100
50
60
20
200
70
df = (R-1)(C-1)
R = number of rows
C = number of columns
Interpretation
2 (3) 10.10
2
.05
7.82
Other
Important point about the non-directional
nature of the test, the chi-square test by
itself cannot speak to specific hypotheses
about the way the results would come out
Not useful for ordinal data because of this
Assumptions
Normality
Rule of thumb is that we need at least 5 for our expected
frequencies value
Inclusion of non-occurences
Must include all responses, not just those positive ones
Independence
Not that the variables are independent or related (thats what the
test can be used for), but rather as with our t-tests, the
observations (data points) dont have any bearing on one
another.
To help with the last two, make sure that your N equals
the total number of people who responded
Measures of Association
Contingency coefficient
Phi
Cramers Phi
Odds Ratios
Kappa