Pearsonian Chi-Square Test

Chi-Square Goodness-of-Fit Test
(Pearsonian Chi-square Test)

Large Sample Test
Example
Two companies, A and B, have recently
conducted aggressive advertising campaigns to
maintain and possibly increase their respective
shares of the market for fabric softener. These
two companies enjoy a dominant position in the
market. Before the advertising campaigns began,
the market share of company A was 45%,
whereas company B had 40% of the market.
Other competitors accounted for the remaining
15%.
Example
To determine whether these market shares changed
after the advertising campaigns, a marketing analyst
solicited the preferences of a random sample of 200
customers of fabric softener. Of the 200 customers,
102 indicated a preference for company A's product,
82 preferred company B's fabric softener, and the
remaining 16 preferred the products of one of the
competitors. Can the analyst infer at the 5%
significance level that customer preferences have
changed from their levels before the advertising
campaigns were launched?
Example
We compare market share before and after an advertising
campaign to see if there is a difference (i.e. if the
advertising was effective in improving market share). We
hypothesize values for the parameters equal to the before-
market share. That is,
H0: p1 = .45, p2 = .40, p3 = .15
The alternative hypothesis is a denial of the null. That is,
H1: At least one pi is not equal to its specified value

Example
Test Statistic
If the null hypothesis is true, we would expect the number of customers
selecting brand A, brand B, and other to be 200 times the proportions
specified under the null hypothesis. That is,
e1 = 200(.45) = 90
e2 = 200(.40) = 80
e3 = 200(.15) = 30
In general, the expected frequency for each cell is given by
ei = npi
This expression is derived from the formula for the expected value of a
binomial random variable.
Example
If the expected frequencies and the observed frequencies are
quite different, we would conclude that the null hypothesis is
false, and we would reject it.
However, if the expected and observed frequencies are similar,

we would not reject the null hypothesis.
The test statistic measures the similarity of the expected and

observed frequencies.
Is Brand Loyalty Related to Buyers
Age?
The retail analyst for a marketing firm wants to know if

different customer groups prefer one brand over another.
She looks at data from 600 sales.
In particular, she feels that the brand Under Armour
might appeal more to younger customers.
The more established brands (Nike and Adidas) might be
capturing the older-customer market.
Is Brand Loyalty Related to Buyers
Age?
1. Determine whether the two classifications (age
and brand name) are dependent at the 5%
significance level
2. Discuss how the findings from the test for

independence can be used.
12-8
Goodness of fit test for
multinomial population
Goodness of fit
1) The multinomial distribution can be thought of as a generalization
of the binomial distribution to more than two (say, k) categories.
2) Hypothesis to be tested:
H0: p1 = p10, p2 = p20, ... , pk = pk0
H1: at least one inequality i.e. pi pi0
3) k=no. of groups or classes ; n = total frequency (large)

= level of significance
Goodness of fit
4) Test statistic for testing H0
2

k
Oi Ei
2
~ k21
obs
i 1 Ei
Oi : Observed frequency
Ei npi : Expected frequency
Caution: Expected frequencies are 5 or more for all categories
5) Reject H0 if 2obs > 2;k-1

Otherwise, do not reject H0.
Restaurant Food Quality
Last year the management at a restaurant surveyed
its patrons to rate the quality of its food. The results
were as follows:
Based on this and other survey results, management

made changes to the menu.
BUSINESS STATISTICS | Jaggia, Kelly

This Years Results
This year, the management surveyed 250 patrons,
asking the same questions about food quality. Here
are the results:
We want to know if the results agree with those from

last year, or if there has been a significant change.
BUSINESS STATISTICS | Jaggia, Kelly

Methodology
Compute an expected frequency for each
category and compare it to what we actually
observe.
Compute the difference between what was

observed and expected for each category.
If the results this year are consistent with last

year, these differences will be relatively small.
LO 12.1
The ei (Expected Frequencies)
We first compute the expected counts based
on the survey of 250 restaurant patrons.
If the survey is consistent with last years

results, we expect e1 = p1(250) = 0.15(250) =
37.5 responses to be in the Excellent
category.
There actually were o1 = 46, a bit more than

expected.
LO 12.1
Computing the Deviations
In the first category e1 = 37.5 and o1 = 46, so we get
(o1 e1) = ___.
In the third category, which are Fair responses, e3 =

p3(250) = .45(250) = 112.5.
There are 105 of these responses in the survey, so

we compute (o3 e3) = 105 112.5 = ___.
LO 12.1
Standardizing the Deviations
LO 12.1
The Chi-Square Test
df = k-1, where k is the number of categories

oi = observed frequency for category i
ei = expected frequency for category i
LO 12.1
The Critical Value (at = .05)
LO 12.1
The Restaurant Example
test statistic is greater than the critical value.
LO 12.1
The Restaurant Example
Observed Expected ( oi - ei )2
Response Percentage This Year Out of 250 ________
Category Last year ( oi ) ( ei ) ( oi - ei ) ei
Excellent 15% 46 37.5 8.5 1.927
Good 30% 83 75.0 8.0 0.853
Fair 45% 105 112.5 -7.5 0.500
Poor 10% 16 25.0 -9.0 3.240
TOTAL 100% 250 250 0.0 6.520
Since the computed test statistic of 6.520 is less than the

critical value of 7.815, we do not reject H0.
The changes did not produce a statistically significant
response at the 5% level.
LO 12.1
Chi-Square Test of Independence
The Chi-Square (2) Test of Independence is used
to determine whether two factors or traits
(qualitative characteristics) are related to one
another
Introductory Case
Does the brand of compression garment
purchased depend on the customers age?
Brand Name
Age Group Under Armor Nike Adidas
Under 35 years 174 132 90
35 years and older 54 72 78
LO 12.2
Introductory Case: Notation
We use the notation oij to denote the observed
frequency in row i of column j.
Similarly, eij is the expected frequency in row i of

column j.
Under the independence assumption, the

expected frequency per cell is:
eij = (Row i total)(Column j total)/Sample Size
LO 12.2
Introductory Case:
The Chi-Square Statistic
We apply the chi-square test statistic in a similar
manner as in the goodness-of-fit test. The
formula is as follows:
(oij eij ) 2

2
df ,
i j eij
where df = (rows - 1)(columns -1).
LO 12.2
Introductory Case:
Computing Expected Frequencies
Brand Name Row
Age Group Under Armor Nike Adidas Totals
Under 35 years 174 132 90 396
35 years and up 54 72 78 204
Column Totals 228 204 168 600
For row 1 and column 1, the expected frequency, e11, is

(396)(228)/600 = 150.48.
For row 1 and column 2, the expected frequency, e12, is

(396)(204)/600 = _____.
For e13, we calculate (396)(___)/600 = _____.

LO 12.2
Introductory Case:
Expected Frequencies and Deviations
Brand Name Row
Age Group Under Armor Nike Adidas Totals
Under 35 years 150.48 134.64 110.88 396.00
35 years and up 077.52 069.36 057.12 204.00
Column Totals 228.00 204.00 168.00 600.00
The deviations ( oij eij ) are:

Brand Name
Under 35 years 23.52 -2.64 -20.88
35 years and up -23.52 2.64 20.88
LO 12.2
Introductory Case: Squared Deviations
We square each deviation and divide by the
respective expected frequency. These values
are shown in the following table.
Brand Name
Under 35 years 3.68 0.05 3.93
35 years and up 7.14 0.10 7.63
The standardized, squared deviations sum to

22.53, the value of the test statistic.
LO 12.2
Introductory Case:
Summarizing the Example
Competing Hypotheses:
H0: brand choice does not depend on customers age
HA: brand choice depends on customers age
The test statistic is calculated using:

(oij eij ) 2

2
df ,
i j eij
where df = (r 1)(c 1) = (2 - 1)(3 - 1) = 2.
The critical value is 5.991 at the 5% significance level.
LO 12.2
Introductory Case:
Summarizing the Example
We reject H0 because the value of the test
statistic is larger than the critical value:
22.53 > 5.991. Therefore, age and brand name
are not independent of one another.
LO 12.2
Theoretical Background
Two characters, R and C, are independent if: P(RC) =
P(R)P(C)
R has r categories R1, R2, ..., Rr

C has c categories C1, C2, ..., Cc?
Then P(RiCj) = P(Ri)P(Cj)
for i = 1, 2, ..., r and j = 1, 2, ..., c
Let the probability that an observation is classified into

the ith row be pi.
Let the probability that an observation is classified into
the jth column be p.j
Let pij be the probability that an observation is classified
into the ith row and jth column.
Hypotheses :
H0: pij = pi.p.j for all i,j combinations
H1: pij pi.p.j for at least one i,j combination

The Test Statistic & its Basis
Let fi. be the number of sample observations

in the ith row
Let f.j be the number of sample observations in
the jth column
Let n be the total number of sample
observations (total frequency)
Let Oij be the number of observations belong
to both of the ith row & jth column.
The Test Statistic & its Basis
pi. is estimated by p i 0 = fi0/n
p.j is estimated by p 0 j = f0j/n
pij is estimated by p i 0 p 0 j
The Test Statistic
The chi-square test statistic for independence is:
obs
2
(Oij Eij ) 2 / Eij
i j
(Oij npi 0 p0 j ) / np i 0 p 0 j
2
i j
fi0 f0 j
where Eij np i 0 p 0 j
n
Critical Region
Reject H0 if and only if
2obs > 2; (r-1)(c-1)
r: no. of rows; c: no. of columns

An Example: Chi-Square Test for
Independence
In one large factory, 100 employees were judged to be highly
successful and another 100 marginally successful. All workers
were asked, Which do you find more important to you
personally, the money you are able to take home or the
satisfaction you feel from doing the job? In the first group,
49% found the money more important, but in the second group
53% responded that way. Test the null hypothesis that job
performance and job motivation are independent using the .01
level of significance.
An Example: Chi-Square Test for Independence
State the research hypothesis.

Are job performance and job motivation
independent?
State the statistical hypotheses.
H 0 : Job performance and job motivation are independent.

H1 : Job performance and job motivation are not independent.
An Example:
Set the decision rule.

.01
df (c 1)(r 1)
( 2 1)(2 1)
(1)(1) 1
crit
2
6.64
Calculate the test statistic.
High Success Marginal SuccessTotal

Money 49 (51) 53 (51) 102
Satisfaction 51 (49) 47 (49) 98
Total 100 100 200
(row total)(column total) (102)(100)

fe 51
overall total 200
(row total)(column total) (98)(100)
fe 49
overall total 200
Calculate the test statistic.

Money 49 (51) 53 (51) 102
Satisfaction 51 (49) 47 (49) 98
Total 100 100 200
( fo fe )2

2

fe
(49 51) 2 (51 49) 2 (53 51) 2 (47 49) 2

2

51 49 51 49
.08 .08 .08 .08
0.32
Decide if your result is significant.

Retain H0, 0.32<6.64
Interpret your results.
Job performance and job motivation are
independent.

Money 49 (51) 53 (51) 102
Satisfaction 51 (49) 47 (49) 98
Total 100 100 200

Pearsonian Chi-Square Test

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Pearsonian Chi-Square Test

Enviado por

Direitos autorais:

Formatos disponíveis

Chi-Square Goodness-of-Fit Test

(Pearsonian Chi-square Test)

The alternative hypothesis is a denial of the null. That is,

H1: At least one pi is not equal to its specified value

However, if the expected and observed frequencies are similar,

The test statistic measures the similarity of the expected and

The retail analyst for a marketing firm wants to know if

2. Discuss how the findings from the test for

3) k=no. of groups or classes ; n = total frequency (large)

Caution: Expected frequencies are 5 or more for all categories

5) Reject H0 if 2obs > 2;k-1

Based on this and other survey results, management

BUSINESS STATISTICS | Jaggia, Kelly

We want to know if the results agree with those from

BUSINESS STATISTICS | Jaggia, Kelly

Compute the difference between what was

If the results this year are consistent with last

If the survey is consistent with last years

There actually were o1 = 46, a bit more than

In the third category, which are Fair responses, e3 =

There are 105 of these responses in the survey, so

df = k-1, where k is the number of categories

test statistic is greater than the critical value.

Since the computed test statistic of 6.520 is less than the

Similarly, eij is the expected frequency in row i of

Under the independence assumption, the

where df = (rows - 1)(columns -1).

For row 1 and column 1, the expected frequency, e11, is

For row 1 and column 2, the expected frequency, e12, is

For e13, we calculate (396)(___)/600 = _____.

The deviations ( oij eij ) are:

The standardized, squared deviations sum to

The test statistic is calculated using:

The critical value is 5.991 at the 5% significance level.

R has r categories R1, R2, ..., Rr

Let the probability that an observation is classified into

H1: pij pi.p.j for at least one i,j combination

Let fi. be the number of sample observations

pi. is estimated by p i 0 = fi0/n

p.j is estimated by p 0 j = f0j/n

The chi-square test statistic for independence is:

2obs > 2; (r-1)(c-1)

r: no. of rows; c: no. of columns

State the research hypothesis.

H 0 : Job performance and job motivation are independent.

Set the decision rule.

Calculate the test statistic.

High Success Marginal SuccessTotal

(row total)(column total) (102)(100)

Calculate the test statistic.

Decide if your result is significant.

High Success Marginal SuccessTotal

Você também pode gostar

For e13, we calculate (396)(_)/600 = ___.