Você está na página 1de 47

2.

13 Chi-squared Test
and Proportion Test

Six Sigma Black Belt and Green Belt


Week 2
2008-06-10
Objectives

• Review the concepts of attribute data


• Introduce the basic concepts of
Chi-squared - Test for Independence
• Link Chi-squared to the DMAIC Roadmap
• Demonstrate application of Chi-squared

2008-06-10 © SKF Group Slide 1 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Using Statistical Tools to analyse Data

• First determine the type of data


– Continuous (variables) or discrete (attribute)

• Refer to the 'Hypothesis Test Matrix' to determine


appropriate tool
• Use hypothesis testing to determine statistical
significance

2008-06-10 © SKF Group Slide 2 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
The Data Type is important in choosing a
Hypothesis Test

Y = f(X1, X2, X3, ..., Xn)


Attribute or Attribute or Variable?
Variable?

By knowing and controlling the X’s, we reduce the


variability in Y.
We validate x’s and Y’s with hypothesis testing.

2008-06-10 © SKF Group Slide 3 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Analysis Matrix (Single X – Single Y)

X Data
Attribute Variable
Attribute

Proportion Defective,
Logistic Regression
Chi-squared
Y Data
Variable

Means/Medians Test,
Regression
t-Test, ANOVA

2008-06-10 © SKF Group Slide 4 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Hypothesis Tests Attribute Data

Attribute
Data

1 pop. 2 or more pop.


Proportion Independence

2 pop.

Z Test Z Test χ2 Test χ2 Test

2008-06-10 © SKF Group Slide 5 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Chi-squared Test

2008-06-10 © SKF Group Slide 6 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Attribute Data

Anything that can be classified


• Pass / Fail
• Category / Type
• 2 or more categories

Discrete
• Rating Scales
• Scale Groupings

Insight into
Y = f(X1, X2, X3, ..., Xn)
2008-06-10 © SKF Group Slide 7 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
When to use Chi-squared

• A population is characterised by attribute data


Y = f(X)
• Responses and factors are attribute
• Each member of the population can be uniquely classified
according to a pair or set of attributes
• Each category of an attribute is mutually exclusive and
exhaustive
Chi-squared is used to compare two or more group proportions.
Are they the same or are they different?

2008-06-10 © SKF Group Slide 8 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Example

The Personnel Department wants to see if there is a link


between age (old and young) and whether that person
gets hired.

What’s the Y ? _____________ Type of Data ? ______________

What’s the X ? _____________ Type of Data ? ______________

What type of tool would you use ? ________________________

2008-06-10 © SKF Group Slide 9 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Example

The Personnel Department wants to see if there is a link


between age (old and young) and whether that person
gets hired.

Got Hired
What’s the Y ? _____________ Attribute
Type of Data ? ______________

Age
What’s the X ? _____________ Attribute
Type of Data ? ______________

Chi-squared
What type of tool would you use ? ________________________

2008-06-10 © SKF Group Slide 10 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
The Hypothesis

With the Chi-squared Test for Independence, statisticians


assume most variables in life are independent, therefore:
H0: Data is Independent (Not Related)
(where Age & Hiring practices are independent)

HA: Data is Dependent (Related)


(where Age & Hiring practices are dependent)

The p-value is the probability that we are wrong in


rejecting the null hypothesis.

2008-06-10 © SKF Group Slide 11 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Hypothesis for Example

Let’s walk through our example ...

Assume we wanted to determine if age and hiring


practices are dependent or independent.
Therefore our hypotheses are stated as follows ...
H0: Age and Hiring Practices are independent
HA: Age and Hiring Practices are dependent

2008-06-10 © SKF Group Slide 12 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Contingency Table

Hired Not Hired Total

Old 30 150 180

Young 45 230 275

Total 75 380 455

Do Hiring Decisions depend on Age?

2008-06-10 © SKF Group Slide 13 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Step #1

We must develop an Observed Frequency Table by


breaking our attribute variables into different levels:
Age: Old & Young
Hiring Practices: Hired & Not Hired
We then collect data to perform the analysis.

Example: Hired Not Hired

Old 30 150

Young 45 230

2008-06-10 © SKF Group Slide 14 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Step #2

Calculate Column & Row Totals

Example: Hired Not Hired Total

Old 30 150 180

Young 45 230 275

Total 75 380 455

2008-06-10 © SKF Group Slide 15 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Step #3

• What would it look like if these factors were really


independent?
• Develop an expected frequency table.

Example: Hired Not Hired

Old

Young

How do we do that?

2008-06-10 © SKF Group Slide 16 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Step #3 Continued

• What would it look like if these factors were really


independent?
• Develop an expected frequency table.

Example: Hired Not Hired Total


75
Old x 180 180
455

Young 275

Total 75 380 455


Cell’s expected frequency is:

(Column Total) × (Row Total)


Grand Total
2008-06-10 © SKF Group Slide 17 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Step #3 Continued

We would expect the quantity of Old and Hired to be 29.6 if


the two factors were really independent.

Example: Hired Not Hired Total

Old 29.6 150.3


____ 180

Young 45.3
____ 229.7
____ 275

Total 75 380 455

You finish the table!


2008-06-10 © SKF Group Slide 18 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Step #4

Subtract the expected value from the observed (O-E)

Example: Hired Not Hired Total

Old 30-29.6=0.4 -____


0.3 180

Young -____
0.3 0.3
____ 275

Total 75 380 455

You finish the table!


2008-06-10 © SKF Group Slide 19 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Step #5

Square the Differences (O-E)^2

Example: Hired Not Hired Total

Old (.4)×(.4)=.16 .09


____ 180

Young .09
____ .09
____ 275

Total 75 380 455

You finish the table!


2008-06-10 © SKF Group Slide 20 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Step #6

Compute the Relative Squared Differences (O-E)^2 / E

Example: Hired Not Hired Total

Old .16 / 29.6 = .005 .0006


____ 180

Young .002
____ .0004
____ 275

Total 75 380 455

You finish the table!


2008-06-10 © SKF Group Slide 21 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Step #7

The Chi-squared Statistic (χ2)


• Measures the difference between the observed and
expected counts in this way:

(Observed − Expected)2
χ2 =∑
Expected

2008-06-10 © SKF Group Slide 22 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Step #8

The sum of the relative squared differences is Chi-squared.


Find Chi-squared on the distribution to determine
significance.
0.30

0.25

0.20

0.15 p-value = Area under Curve


0.10 to Right of Chi-squared Statistic
0.05

0.00
0 1 2 3 4 5 6 7 8 9 10

Example: Chi-squared (χ2) = .005+.002+.0006+.0004 = .008


The p-value is far more than 0.05
so, we can’t reject H0 Hypothesis.
Conclusion: Hiring Practices are Independent of Age.

2008-06-10 © SKF Group Slide 23 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Another Example ...

Example: Hired Not Hired

Old 45 135

Young 45 230

What Decision Would You Make?


Determine which group proportions are different.
Determine why the group proportions are different.

2008-06-10 © SKF Group Slide 24 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Multiple Categories for each Attribute

Attribute Y
B1 B2 B3 B4 B5 . . .
A1
Attribute X
A2
A3
A4
A5
.
.
.

More complex relationships can be studied

2008-06-10 © SKF Group Slide 25 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Chi-squared Assumptions

1. The sample is representative of the population or


process.

2. The underlying population is binomial for discrete data.

3. The expected count is greater than or equal to 5 for


each cell. If less than 5, additional data will be needed.

2008-06-10 © SKF Group Slide 26 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Analysing the Data in MINITAB

Y variable

X variable

2008-06-10 © SKF Group Slide 27 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Analysing the Data in MINITAB

Chi-Square Test: Hired, Not Hired


Note:
Expected counts are printed below observed
counts The observed and expected
Chi-Square contributions are printed below
expected counts
counts are the same values
you calculated a moment ago.
Hired Not Hired Total
1 30 150 180
29.67 150.33
0.004 0.001

2 45 230 275
45.33 229.67

What Decision Would You Make?


0.002 0.000

Total 75 380 455

Chi-Sq = 0.007, DF = 1, P-Value = 0.932

p-value

2008-06-10 © SKF Group Slide 28 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Exercise - 1

The government performed a study to determine if there is


a relationship between geographic location and political
party preferences. The results are below:
East West
Republican 590 434
Democrat 232 1199
No Pref 45 83
Is there are any relationship between geographic location
and political party preference?
How could you use this information if you were planning a
political campaign?

2008-06-10 © SKF Group Slide 29 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Exercise - 2

Is there a relationship between parental influence and


having a felon record?
Data is below:

Felon Record No Felon Record


52 135
Single Mother
Single Father 30 80

Joint Parents 19 98

What are your conclusions?

2008-06-10 © SKF Group Slide 30 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Exercise - 3

Is there a relationship between on time delivery and the


shipment provider?
Data is below:

On-Time Late

70 11
Fed-Ex
UPS 46 9

Post Office 57 14

What are your conclusions?

2008-06-10 © SKF Group Slide 31 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Exercise - 1: Answer

Chi-Square Test: East, West


Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts

East West Total


MINITAB®

1 590 434 1024


343.71 680.29
176.479 89.165

2 232 1199 1431


480.32 950.68
128.382 64.864

3 45 83 128
42.96 85.04
0.096 0.049

Total 867 1716 2583

Chi-Sq = 459.035, DF = 2, P-Value = 0.000

There is a relationship between geographic location and political party preferences.

2008-06-10 © SKF Group Slide 32 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Exercise - 2: Answer

Chi-Square Test: Felon Record, No Felon Record


Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts
No
Felon Felon
MINITAB®

Record Record Total


1 52 135 187
45.62 141.38
0.892 0.288

2 30 80 110
26.84 83.16
0.373 0.120

3 19 98 117
28.54 88.46
3.191 1.030

Total 101 313 414

Chi-Sq = 5.894, DF = 2, P-Value = 0.053


There is not a relationship between parental influence and having a felon record.

2008-06-10 © SKF Group Slide 33 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Exercise - 3: Answer

Chi-Square Test: On Time, Later


Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts

On Time Later Total


MINITAB®

1 70 11 81
67.70 13.30
0.078 0.399

2 46 9 55
45.97 9.03
0.000 0.000

3 57 14 71
59.34 11.66
0.092 0.469

Total 173 34 207

Chi-Sq = 1.039, DF = 2, P-Value = 0.595

There is not a relationship between on time delivery and the shipment provider.

2008-06-10 © SKF Group Slide 34 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Marketing Application

Determine whether or not beer preferences change with gender and


age so that we can establish target markets accordingly. Below are
survey results of a sample of the population.
Age
Gender

What
What are
are your
your recommendations?
recommendations?

2008-06-10 © SKF Group Slide 35 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Chi-squared Summary

• Discrete data are commonly used to analyse process


performance in the service applications
• Non-significant differences between two or more groups
keep you from chasing ghosts
• Significant differences between group proportions can be
detected
• A low p-value indicates that root causes need to be
identified for the significant differences between groups
• Examine Chi-squared values for each cell to determine
which groups are different
• Consider whether the size of the "statistically significant"
difference is actually important to your business

2008-06-10 © SKF Group Slide 36 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Proportion Test

2008-06-10 © SKF Group Slide 37 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Hypothesis Tests Attribute Data

Attribute
Data

1 pop. 2 or more pop.


Proportion Independence

2 pop.

Z Test Z Test χ2 Test χ2 Test

2008-06-10 © SKF Group Slide 38 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Confidence Interval Estimate for Proportion
Example

• A distribution centre QA manager wants to know with 95%


confidence an interval estimate of the proportion of products that
have a history of picking the wrong quantity.
• A sample of 2,000 SKU’s (Stock Keeping Units) contained 250 SKU’s
that had at least one error in quantity picked.
1. What proportion of all SKU’s probably have had quantity errors?
2. Is this proportion higher than Company average of 11% of error in
SKU’s quantity picked?
3. After improvement actions has been implemented, a new sample of
450 SKU’s is taken, which contained 45 SKU’s that had at least one
error in quantity picked. Is this significantly better than in sample
above ?
2008-06-10 © SKF Group Slide 39 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Confidence Interval - Solution with MINITAB

Rule of Thumb:
Use the normal approximation
when np ≥ 10 and n(1 – p) ≥ 10.
In our case (p = 250/2000 =
0.125 and n= 2000), it’s OK.
The QA Manager can estimate with 95% confidence
that the proportion of SKU’s having errors is
between 11.05 % and 13.95%

2008-06-10 © SKF Group Slide 40 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
The Same Solution – But manually

ps= 250/2000 = 0.125


ps (1 − ps )
n
= 0.007395
Z = 1.96

p(1 − p) Formula for Confidence Interval


p±z (approximating Normal Distribution)
n
0.125+1.96(0.007395) = 13.95%

0.125-1.96(0.007395) = 11.05%

2008-06-10 © SKF Group Slide 41 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Is this Proportion Higher than
Company Average of 11%?

• We will use 1 Proportion Test to check this:

Percentage of errors in this sample is significantly higher


than Company average of 11% (p is small 0.016)

2008-06-10 © SKF Group Slide 42 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Are Proportions from Two Samples Different?

Is proportion from the sample after improvement different


from the first one ?
• We will use 2 Proportion Test to find out:

p-value of 0.117 would suggest that we can’t reject


hypothesis that 2 sample proportions are equal.
But, we can increase the power of test by choosing one sided instead of two sided test

2008-06-10 © SKF Group Slide 43 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
How to use One-sided Test?

H0 = two proportions are the same or smaller


HA = first proportion (250/2000 = 0.125) is greater than
second one (45/450 = 0.10)

Change "Alternative" from the last slide to "greater than"


After clicking OK on each window,
we get session window with new
(smaller) p-value of 0.059

p-value of 0.059 is very close


to significant level of 0.05.
What should we do now?

2008-06-10 © SKF Group Slide 44 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Improve Power of Test with more Samples

Let’s assume that we increase second sample from 450 to


600, and find total 60 defectives SKU’s (note that we kept
the same proportion of 0.10).
Repeat the test:
The p-value
is below the
significance
level of 0.05

Finally, we can say that


proportion of defective SKU’s
is better after improvements.
2008-06-10 © SKF Group Slide 45 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test
Summary

• Hypothesis Testing and Confidence Interval Testing


validates the x’s as stated in Null and Alternative
Hypotheses
• Follow the steps to ensure a valid conclusion
• Remember - sample size is critical; LARGER sample sizes
result in tighter, smaller confidence intervals.
• Reproduce test when sample size is small, if possible.

2008-06-10 © SKF Group Slide 46 SKF (Group Six Sigma) 2.13 Chi-squared - Proportion Test

Você também pode gostar