Você está na página 1de 44

Chi-Square Goodness-of-Fit Test

(Pearsonian Chi-square Test)


Large Sample Test
Example
Two companies, A and B, have recently
conducted aggressive advertising campaigns to
maintain and possibly increase their respective
shares of the market for fabric softener. These
two companies enjoy a dominant position in the
market. Before the advertising campaigns began,
the market share of company A was 45%,
whereas company B had 40% of the market.
Other competitors accounted for the remaining
15%.
Example
To determine whether these market shares changed
after the advertising campaigns, a marketing analyst
solicited the preferences of a random sample of 200
customers of fabric softener. Of the 200 customers,
102 indicated a preference for company A's product,
82 preferred company B's fabric softener, and the
remaining 16 preferred the products of one of the
competitors. Can the analyst infer at the 5%
significance level that customer preferences have
changed from their levels before the advertising
campaigns were launched?
Example
We compare market share before and after an advertising
campaign to see if there is a difference (i.e. if the
advertising was effective in improving market share). We
hypothesize values for the parameters equal to the before-
market share. That is,
H0: p1 = .45, p2 = .40, p3 = .15

The alternative hypothesis is a denial of the null. That is,

H1: At least one pi is not equal to its specified value


Example
Test Statistic
If the null hypothesis is true, we would expect the number of customers
selecting brand A, brand B, and other to be 200 times the proportions
specified under the null hypothesis. That is,
e1 = 200(.45) = 90
e2 = 200(.40) = 80
e3 = 200(.15) = 30
In general, the expected frequency for each cell is given by
ei = npi

This expression is derived from the formula for the expected value of a
binomial random variable.
Example
If the expected frequencies and the observed frequencies are
quite different, we would conclude that the null hypothesis is
false, and we would reject it.

However, if the expected and observed frequencies are similar,


we would not reject the null hypothesis.

The test statistic measures the similarity of the expected and


observed frequencies.
Is Brand Loyalty Related to Buyers
Age?

The retail analyst for a marketing firm wants to know if


different customer groups prefer one brand over another.
She looks at data from 600 sales.
In particular, she feels that the brand Under Armour
might appeal more to younger customers.
The more established brands (Nike and Adidas) might be
capturing the older-customer market.
Is Brand Loyalty Related to Buyers
Age?
1. Determine whether the two classifications (age
and brand name) are dependent at the 5%
significance level

2. Discuss how the findings from the test for


independence can be used.

12-8
Goodness of fit test for
multinomial population
Goodness of fit
1) The multinomial distribution can be thought of as a generalization
of the binomial distribution to more than two (say, k) categories.

2) Hypothesis to be tested:
H0: p1 = p10, p2 = p20, ... , pk = pk0
H1: at least one inequality i.e. pi pi0

3) k=no. of groups or classes ; n = total frequency (large)


= level of significance
Goodness of fit
4) Test statistic for testing H0

2

k
Oi Ei
2
~ k21
obs
i 1 Ei
Oi : Observed frequency
Ei npi : Expected frequency

Caution: Expected frequencies are 5 or more for all categories

5) Reject H0 if 2obs > 2;k-1


Otherwise, do not reject H0.
Restaurant Food Quality
Last year the management at a restaurant surveyed
its patrons to rate the quality of its food. The results
were as follows:

Based on this and other survey results, management


made changes to the menu.

BUSINESS STATISTICS | Jaggia, Kelly


This Years Results
This year, the management surveyed 250 patrons,
asking the same questions about food quality. Here
are the results:

We want to know if the results agree with those from


last year, or if there has been a significant change.

BUSINESS STATISTICS | Jaggia, Kelly


Methodology
Compute an expected frequency for each
category and compare it to what we actually
observe.

Compute the difference between what was


observed and expected for each category.

If the results this year are consistent with last


year, these differences will be relatively small.

LO 12.1
The ei (Expected Frequencies)
We first compute the expected counts based
on the survey of 250 restaurant patrons.

If the survey is consistent with last years


results, we expect e1 = p1(250) = 0.15(250) =
37.5 responses to be in the Excellent
category.

There actually were o1 = 46, a bit more than


expected.

LO 12.1
Computing the Deviations
In the first category e1 = 37.5 and o1 = 46, so we get
(o1 e1) = ___.

In the third category, which are Fair responses, e3 =


p3(250) = .45(250) = 112.5.

There are 105 of these responses in the survey, so


we compute (o3 e3) = 105 112.5 = ___.

LO 12.1
Standardizing the Deviations

LO 12.1
The Chi-Square Test

df = k-1, where k is the number of categories


oi = observed frequency for category i
ei = expected frequency for category i

LO 12.1
The Critical Value (at = .05)

LO 12.1
The Restaurant Example

test statistic is greater than the critical value.

LO 12.1
The Restaurant Example
Observed Expected ( oi - ei )2
Response Percentage This Year Out of 250 ________
Category Last year ( oi ) ( ei ) ( oi - ei ) ei
Excellent 15% 46 37.5 8.5 1.927
Good 30% 83 75.0 8.0 0.853
Fair 45% 105 112.5 -7.5 0.500
Poor 10% 16 25.0 -9.0 3.240
TOTAL 100% 250 250 0.0 6.520

Since the computed test statistic of 6.520 is less than the


critical value of 7.815, we do not reject H0.
The changes did not produce a statistically significant
response at the 5% level.

LO 12.1
Chi-Square Test of Independence
The Chi-Square (2) Test of Independence is used
to determine whether two factors or traits
(qualitative characteristics) are related to one
another
Introductory Case
Does the brand of compression garment
purchased depend on the customers age?
Brand Name
Age Group Under Armor Nike Adidas
Under 35 years 174 132 90
35 years and older 54 72 78

LO 12.2
Introductory Case: Notation
We use the notation oij to denote the observed
frequency in row i of column j.

Similarly, eij is the expected frequency in row i of


column j.

Under the independence assumption, the


expected frequency per cell is:
eij = (Row i total)(Column j total)/Sample Size

LO 12.2
Introductory Case:
The Chi-Square Statistic
We apply the chi-square test statistic in a similar
manner as in the goodness-of-fit test. The
formula is as follows:
(oij eij ) 2


2
df ,
i j eij

where df = (rows - 1)(columns -1).

LO 12.2
Introductory Case:
Computing Expected Frequencies
Brand Name Row
Age Group Under Armor Nike Adidas Totals
Under 35 years 174 132 90 396
35 years and up 54 72 78 204
Column Totals 228 204 168 600

For row 1 and column 1, the expected frequency, e11, is


(396)(228)/600 = 150.48.

For row 1 and column 2, the expected frequency, e12, is


(396)(204)/600 = _____.

For e13, we calculate (396)(___)/600 = _____.


LO 12.2
Introductory Case:
Expected Frequencies and Deviations
Brand Name Row
Age Group Under Armor Nike Adidas Totals
Under 35 years 150.48 134.64 110.88 396.00
35 years and up 077.52 069.36 057.12 204.00
Column Totals 228.00 204.00 168.00 600.00

The deviations ( oij eij ) are:


Brand Name
Age Group Under Armor Nike Adidas
Under 35 years 23.52 -2.64 -20.88
35 years and up -23.52 2.64 20.88

LO 12.2
Introductory Case: Squared Deviations
We square each deviation and divide by the
respective expected frequency. These values
are shown in the following table.

Brand Name
Age Group Under Armor Nike Adidas
Under 35 years 3.68 0.05 3.93
35 years and up 7.14 0.10 7.63

The standardized, squared deviations sum to


22.53, the value of the test statistic.

LO 12.2
Introductory Case:
Summarizing the Example
Competing Hypotheses:
H0: brand choice does not depend on customers age
HA: brand choice depends on customers age

The test statistic is calculated using:


(oij eij ) 2


2
df ,
i j eij
where df = (r 1)(c 1) = (2 - 1)(3 - 1) = 2.

The critical value is 5.991 at the 5% significance level.

LO 12.2
Introductory Case:
Summarizing the Example
We reject H0 because the value of the test
statistic is larger than the critical value:
22.53 > 5.991. Therefore, age and brand name
are not independent of one another.

LO 12.2
Theoretical Background
Two characters, R and C, are independent if: P(RC) =
P(R)P(C)

R has r categories R1, R2, ..., Rr


C has c categories C1, C2, ..., Cc?
Then P(RiCj) = P(Ri)P(Cj)
for i = 1, 2, ..., r and j = 1, 2, ..., c
Theoretical Background

Let the probability that an observation is classified into


the ith row be pi.
Let the probability that an observation is classified into
the jth column be p.j
Let pij be the probability that an observation is classified
into the ith row and jth column.
Theoretical Background

Hypotheses :
H0: pij = pi.p.j for all i,j combinations

H1: pij pi.p.j for at least one i,j combination


The Test Statistic & its Basis

Let fi. be the number of sample observations


in the ith row
Let f.j be the number of sample observations in
the jth column
Let n be the total number of sample
observations (total frequency)
Let Oij be the number of observations belong
to both of the ith row & jth column.
The Test Statistic & its Basis

pi. is estimated by p i 0 = fi0/n

p.j is estimated by p 0 j = f0j/n

pij is estimated by p i 0 p 0 j
The Test Statistic

The chi-square test statistic for independence is:

obs
2
(Oij Eij ) 2 / Eij
i j

(Oij npi 0 p0 j ) / np i 0 p 0 j
2

i j

fi0 f0 j
where Eij np i 0 p 0 j
n
Critical Region
Reject H0 if and only if

2obs > 2; (r-1)(c-1)

r: no. of rows; c: no. of columns


An Example: Chi-Square Test for
Independence
In one large factory, 100 employees were judged to be highly
successful and another 100 marginally successful. All workers
were asked, Which do you find more important to you
personally, the money you are able to take home or the
satisfaction you feel from doing the job? In the first group,
49% found the money more important, but in the second group
53% responded that way. Test the null hypothesis that job
performance and job motivation are independent using the .01
level of significance.
An Example: Chi-Square Test for Independence

State the research hypothesis.


Are job performance and job motivation
independent?
State the statistical hypotheses.

H 0 : Job performance and job motivation are independent.


H1 : Job performance and job motivation are not independent.
An Example:

Set the decision rule.


.01
df (c 1)(r 1)
( 2 1)(2 1)
(1)(1) 1
crit
2
6.64
An Example: Chi-Square Test for Independence

Calculate the test statistic.

High Success Marginal SuccessTotal


Money 49 (51) 53 (51) 102
Satisfaction 51 (49) 47 (49) 98
Total 100 100 200

(row total)(column total) (102)(100)


fe 51
overall total 200
(row total)(column total) (98)(100)
fe 49
overall total 200
An Example: Chi-Square Test for Independence

Calculate the test statistic.


High Success Marginal SuccessTotal
Money 49 (51) 53 (51) 102
Satisfaction 51 (49) 47 (49) 98
Total 100 100 200
( fo fe )2

2


fe
(49 51) 2 (51 49) 2 (53 51) 2 (47 49) 2

2

51 49 51 49
.08 .08 .08 .08
0.32
An Example: Chi-Square Test for Independence

Decide if your result is significant.


Retain H0, 0.32<6.64
Interpret your results.
Job performance and job motivation are
independent.

High Success Marginal SuccessTotal


Money 49 (51) 53 (51) 102
Satisfaction 51 (49) 47 (49) 98
Total 100 100 200

Você também pode gostar