Você está na página 1de 54

2001 ConceptFlow 0

Review of Analyze
2001 ConceptFlow 1
Analyze Phase Deliverables
A prioritized list of potential sources of variation
Variation Component Studies
Measurement Analysis on the xs
Data collected to validate sources
Graphical and statistical analysis of data
P-value establishing level of significance and probability
Correlation and regression analysis to determine variable relationships
Reduced list of potential key input variables that affect the output(s)
Updated control charts, process map & FMEA
Results to data (compared to baseline)
Define Improve Control Measure Analyze
Statistically links key input variables with key output variable
2001 ConceptFlow 2
Analyze Week Topics
Review of Measure Week
Central Limit Theorem
Confidence Intervals
Introduction to Hypothesis
Testing
Hypothesis Testing
Means
Variance
Proportion
Chi Square
Analysis of Variance (ANOVA)
Variation Components
Correlation and Simple
Regression
Multiple Regression
Wrap-up and Deliverables
2001 ConceptFlow 3
n sample size
x
individuals
= = s
s s

m
x
=
m
Central Limit Theorem Defined
If variable x has an unknown distribution with mean = m and standard
deviation = s, then
Sampling distribution of x (mean) having sample size of n will
(1) have a mean,
(2) have a standard deviation,
(3) tend to be normal as the sample size becomes large (n>30 for
unknown distributions)
2001 ConceptFlow
Standard Error of the Mean
mean for the Size Sample = n
Scores Individual for the Deviation Standard
Mean the of Error Standard
=
=
s
s
x
Distribution
of Sample
Averages
Population of
Individuals
SE Mean =
x
s
s
n
x
= s
s

2001 ConceptFlow
Central Limit Theorem Objectives
By the end of this module the participant should be able to:
Discuss the Central Limit Theorem (CLT) and demonstrate its results
using a practical example
Discuss the implications of Central Limit Theorem in statistical analysis
Describe how to apply the Central Limit Theorem to reduce
measurement variation
2001 ConceptFlow 6
A Graphical View
A 95% confidence interval suggests that
approximately 95 out of 100 confidence
intervals will contain the population
parameter
Confidence Interval
Population Mean
Sample Mean
2001 ConceptFlow 7
SAMPLE WITHIN
(subset)
ENTIRE POPULATION

Population Versus Sample
Sample mean = X
Population Parameters
Sample Statistics
m = Population mean
s = Sample standard deviation
Population
s = Population standard deviation
If we only pull samples, do we ever know the true population
parameters?
Sample
2001 ConceptFlow 8


CI = Sample Statistic Margin of Error
Margin of Error = K * Measure of Variability
Statistic = Mean, Variance, Proportion, etc. from sample
Confidence Factor, K = Constant based on a statistical distribution
Estimating Confidence Intervals (CIs)
Parametric confidence intervals in most cases take the general form:








Confidence intervals reflect the sample to sample variation of our point
estimates
2001 ConceptFlow 9
Confidence Interval and Central Limit
Theorem
100 90 80 70 60 50 40 30
500
400
300
200
100
0
Population
F
r
e
q
u
e
n
c
y

100 90 80 70 60 50 40 30
80
70
60
50
40
30
20
10
0
Sample
F
r
e
q
u
e
n
c
y

4 3 2 1 0 - 1 - 2 - 3 - 4
99.73%
95.44%
68.26%
95% of all sample means are within two standard errors of
the population mean
2001 ConceptFlow 10
Confidence Interval Objectives
By the end of this module participants should be able to:
Discuss the role of confidence intervals in statistical analysis
Discuss the meaning of confidence intervals in
practical terms
Calculate confidence intervals for the mean, standard deviation,
proportion and other derived parameters such as Cp and Pp
2001 ConceptFlow 11
What is Hypothesis Testing?
In hypothesis testing, relatively small samples are used to answer
questions about population parameters (inferential statistics)
There is always a chance that the selected sample is not
representative of the population; therefore, there is always a chance
that the conclusion obtained is wrong (Alpha & Beta Risks)
With some assumptions, inferential statistics allows the estimation of
the probability of getting an odd sample and quantifies the probability
(p-value) of a wrong conclusion
2001 ConceptFlow 12
Process Flow of a Hypothesis Test
DECIDE:
What does the evidence suggest?
Reject H
o
? or Fail to reject H
o
?
Calculate test statistic and/or p-value
Collect sample data
Establish significance level ()
State the Alternate Hypothesis (H
a
)
State a Null Hypothesis (H
o
)
Define the problem and state objectives
2001 ConceptFlow 13
Forming a Hypothesis
Null Hypothesis (H
o
)
No difference/ no change
Factor not statistically significant
Population follows a normal
distribution
Alternative Hypothesis (H
a
)
Difference/change occurred
Factor statistically significant
Population does not follow a
normal distribution
Assume H
0
to be true until proven otherwise. Burden of proof
rests with H
a

2001 ConceptFlow 14
(Alpha) - Simplified Perspective
Null Hypothesis (H
o
) assumed true
e.g., defendant assumed innocent
Prosecuting attorney must provide evidence beyond reasonable doubt
that assumption is not true
Reasonable doubt = (significance level)
2001 ConceptFlow 15
Alpha () & Beta () Risk
-risk
Risk of finding a difference when there really isnt one
Type I error or Producers risk
-risk
Risk of not finding a difference when there really is one
Type II error or Consumers risk
2001 ConceptFlow 16
Sensitivity
/s where = size of difference and s=SD
Relative magnitude or size of the difference being tested expressed in
standard deviations
Called test sensitivity
m
1


/s
m
2

2001 ConceptFlow 17
The - Relationship in Hypothesis Testing
Decision
Fail to reject H
o
Truth
H
o
true
H
a
true
Type I Error
-Risk or false
positive)
Type II Error
-Risk or false
negative)
Correct Decision
CI = 1-
Correct Decision
Power = 1-
Reject H
o
Producers Risk
Consumers Risk
2001 ConceptFlow 18
Test Statistic and p-value Graphical View
m
0

Observed value of
Test Statistic
Critical value
-risk p- value
2001 ConceptFlow 19
Hypothesis Testing Introduction Objectives
By the end of this module participants should be able to:
Discuss the hypothesis testing process
Recognize and risks and how they affect hypothesis testing
Discuss how the p-value is used for decision making
Relate the hypothesis testing process to real world examples
2001 ConceptFlow 20
m
m
Comparison of Means: 4 Scenarios
1. Single Mean Comparison
One sample vs. target
s is known





2. Single Mean Comparison
One sample vs. target
s is NOT known


target
value
vs.

target
value
vs.
2001 ConceptFlow 21
Comparison of Means: 4 Scenarios
3. Two Sample Comparison
Two independent samples
compared to each other





4. Paired Comparison
The difference () between two
paired samples

vs.
m
1
-
=

m
1
m
2
m
2








m
d
m
d
vs. target
2001 ConceptFlow 22
Hypothesis Testing of Means-Roadmap
3 or
more
factors
Comparing Means
1 Factor
1-sample
Z-test
Two way
ANOVA
ANOVA
GLM
One way
ANOVA
1-sample
t-test
2-sample
t-test
Paired
t-test
1 Sample 2 Samples 2 or
more
samples
2 Factors
s not known s known independent paired
2001 ConceptFlow 23
Means Hypothesis Testing Objectives
By the end of this module participant should be able to:
Choose the appropriate test for a given problem regarding population
mean
Perform hypothesis tests of mean
Design and apply hypothesis tests of mean on projects
2001 ConceptFlow 24
vs.

target
value
s
Comparison of Variance: 3 Scenarios
1. Single Variance Comparison
One population standard
deviation compared to a target
value




2. Two Sample Comparison
Variances of two independent
populations compared to each
other

vs.
s
2
1
s
2
2
2001 ConceptFlow 25
Comparison of Variance: 3 Scenarios
3. More than Two Sample Comparison
Variances of more than two
independent populations
compared to each other
vs.
s
2
1
s
2
3
s
2
2
vs.
2001 ConceptFlow 26
1 Variance
Test
1 Sample
Comparing Variances
Hypothesis Testing of Variation - Roadmap
2 Variance
Test
2 Sample
Test for Equal
Variance
More Than 2 Samples
Levenes Test Bartletts Test Levenes Test F- Test
Descriptive
Statistics
2001 ConceptFlow 27
Variation Hypothesis Testing Objectives
By the end of this module participants should be able to:
Choose the appropriate test of variance for a given problem
Perform hypothesis tests of variance
Design and apply hypothesis tests of variance on projects

2001 ConceptFlow 28
P
Comparison of Proportion:
2 Scenarios
1. Single Proportion Comparison
One population proportion
compared to a target value





2. Two Sample Comparison
Proportions of two independent
populations compared to each
other
vs.
P
1
P
2
2001 ConceptFlow 29
1 Proportion
Test
Comparing Proportions
Hypothesis Testing of Proportion -
Roadmap
2 Proportion
Test
2 Sample
Chi-Square
Test
More than 2 samples 1 Sample
2001 ConceptFlow 30
Proportion Hypothesis Testing Objectives
By the end of this module participants should be able to:
Choose the appropriate test of proportion for a given problem
Perform hypothesis tests of proportion
Determine sample size for 1 proportion and 2 proportion hypothesis
testing
Design and apply hypothesis tests of proportion on projects
2001 ConceptFlow 31
Both of these tools use the Chi-Square
distribution, where f
o
and f
e
are the observed
and expected frequencies, respectively.
What Are Chi-Square Tools?
Chi-Square Goodness-of-Fit Test
To test if a particular distribution (model) is a good fit for a
population
Chi-Square Test for Association
To test if a relationship between two attribute variables exists

2
=
f
o
- f
e
2
f
e

j = 1
g
Chi-Square Statistic
2001 ConceptFlow 32
The Chi-Square Distribution
Measure of difference between
observed counts and expected
counts
Observations must be
independent
Works best with 5 or more
observations in each cell
Cells may be combined to pool
observations
0
.
1

1
.
2

2
.
3

3
.
4

4
.
5

5
.
6

6
.
7

7
.
8

8
.
9

1
0

1
1
.
1

1
2
.
2

1
3
.
3

1
4
.
4

1
5
.
5

1
6
.
6

1
7
.
7

1
8
.
8

1
9
.
9

n = 2
n = 10
n = 4

2
V
a
l
u
e

o
f

t
h
e

(

2
)

d
i
s
t
r
i
b
u
t
i
o
n

n = 6
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Chi-square distribution
for various degrees of
freedom (n)
2001 ConceptFlow 33
Chi Square Hypothesis Testing Objectives
By the end of this module the participants should be able to
Formulate appropriate hypotheses for Chi-Square tests
Apply Chi-Square Goodness-of-Fit Test to practical problems
Apply Chi-Square Test for Association to practical problems
2001 ConceptFlow 34
What is ANOVA?

Hypothesis Test for MEANS
Uses two components of variance
within variance (no change)
between variance (after a change)
Uses the F-distribution to test the variance components
Comprehensive test for significance
Backbone test statistic for subsequent complex analysis
2001 ConceptFlow 35
When to Use ANOVA
Variables Road Map
1 Sample
t-test
1 Sample
2 Sample
t -test
Paired
Comparisons
Tukey's
Quick Test
2 Samples
ANOVA
2 or more samples
Variables Data
1 Mean 2 Means 2+ Means
ANOVA is used to test two or more means
2001 ConceptFlow 36
Working With the ANOVA Data

ANOVA data analysis will determine
Total process variance
Within factor variance
Variation due to noise
Technology focus
Between factor variance
Variation due to factor change
Process focus

2001 ConceptFlow 37
ANOVA Objectives
By the end of this module, the participant should be able to:
Explain how ANOVA works
Interpret an ANOVA table
Determine significant effects
Perform a residual analysis
Determine if data is normal
Test groups of data for equal variances
Run main effects plots
2001 ConceptFlow 38
What is a Variation Component Study?
A variation component study combines techniques from familiar areas:
Shewart control chart model
Rational sub-grouping
Measurement systems analysis
Graphical, Multi-Variate charts
Analysis of variance (ANOVA) methods
Type of study partitions potential sources of variation within a process
so the researcher will know where to work first
2001 ConceptFlow 39
Crossed Versus Nested Studies

Subject 1 Subject 2 Subject 3
Group 1
Subject 1 Subject 2 Subject 3
Group 2 ...
Subject 1 Subject 2 Subject 3
Group k
Subject 1 Subject 2 Subject 3
Group 1
Subject 4 Subject 5 Subject 6
Group 2 ...
Subject 16 Subject 17 Subject 18
Group k
Crossed Study: Subjects are not unique to one group
Nested Study: Subjects are unique to one group
2001 ConceptFlow 40
Variation Component Studies Objectives
By the end of this module participant should be able to:
Design appropriate sampling plans for variation component studies
Recognize whether data is crossed, nested or both and model the
scenarios using ANOVA
Analyze studies
Graphically
With control charts
Using ANOVA methods
Provide estimates of variation components (quantify)
Provide guidance/direction for process improvement


2001 ConceptFlow 41
Correlation Coefficient
30 20 10
100
90
80
70
60
50
40
X
Y

r = -1.0
30 20 10
90
80
70
60
50
40
30
20
X
Y

r = +1.0
30 20 10
76
75
74
73
72
71
X
Y

r = 0.0
No correlation
2001 ConceptFlow 42
Correlation and Regression
Correlation tells how much linear association exists between two
variables
Regression provides an equation describing the nature of relationship
Correlations: Shelf Space, Sales
Pearson correlation of Shelf Space and Sales = 0.978
p-value = 0.000
Regression Analysis: Sales versus Shelf Space
The regression equation is Sales = - 4711 + 10.1 Shelf Space
2001 ConceptFlow 43
Types of Regression
Simple Linear Regression
Single regressor (x) variable such as x
1
and model linear with
respect to coefficients
Multiple Linear Regression
Multiple regressor (x) variables such as x
1
, x
2
, x
3
and model linear
with respect to coefficients
Simple Non-Linear Regression
Single regressor (x) variable such as x and model non-linear with
respect to coefficients
Multiple Non-Linear Regression
Multiple regressor (x) variables such as x
1
, x
2
, x
3
and model non-
linear with respect to coefficients
2001 ConceptFlow 44
Method of Least Squares
Objective:
Find a line that will
minimize sum of squares
of residuals
650 600 550
2000
1500
1000
Shelf Space
S
a
l
e
s

Regression Plot

Regression Line
Residual = Y -
^
Residuals
are the error
of prediction
Y
2001 ConceptFlow 45
Correlation and Simple Regression
Objectives
By the end of this module the participant should be able to:
Measure the strength of correlation between two variables
Determine if a correlation coefficient is statistically significant
Perform simple linear regression including polynomial regression
Perform model diagnostics and validate assumptions
Use a regression model to predict the value of a response variable for
a given value of predictor


2001 ConceptFlow 46
What is Multiple Regression?
Procedure of establishing relationship between a continuous type
response variable and two or more independent variables
Multiple regression equation can be used to predict a response based
on values of predictor variables
Multiple regression equation takes the form
Y = f (x
1
, x
2
, x
3
, .)

2001 ConceptFlow 47
Types of Multiple Regression
Multiple Linear Regression
Multiple regressor (x) variables such as x
1
, x
2
, x
3
and model linear
with respect to coefficients
Multiple Non-Linear Regression
Multiple regressor (x) variables such as x
1
, x
2
, x
3
and model non-
linear with respect to coefficients
This module focuses on multiple linear regression applying
general least squares method
2001 ConceptFlow 48

Predictor Variable Selection
What combination of predictor variables is best for the regression
model?
Three options in MINITAB

:
Stepwise: procedure to add and remove variables to the regression
model to produce a useful subset of predictors
Best Subsets: procedure to give best fitting regression model that
can be constructed with one variable, two variable, three variable,
etc. models
Regression: once the best model is selected, use Regression to
get more detailed diagnostics
2001 ConceptFlow 49
Multiple Regression Objectives
By the end of this module participant should be able to:
Determine, for a given response variable, the key process input
variables from a set of multiple input variables
Perform multiple linear regression for a given set of response variables
using several input variables
Perform model diagnostics and validate assumptions
Use a regression model to predict the value of a response variable for
given values of predictor variables


2001 ConceptFlow 50
Analyze Phase Deliverables
Week 1 Deliverables summarized
and updated
Revised problem statement reflecting
an increased understanding of the
problem
Detailed Process Map revised
Additional sources of variation
quantified and prioritized
Use and display data to identify and
verify the vital few factors
Sampling plan
Graphical analysis and interpretation
of data
Correlation and Regression Analysis

Confidence interval for Y metric(s)
Hypothesis statement(s), null
hypothesis and alternative hypothesis
MINITAB hypothesis test output, p
value and interpretation
Project management report (Gantt
chart, timelines, milestones, critical
path)
Any red flags with project or project
scope and recommendations to
resolve
Next steps
Signed approval of report out by
Project Champion

Prepare and deliver a 10 minute presentation that discusses the following
project status items:
2001 ConceptFlow 51
Appendix
2001 ConceptFlow 52
3 or more
Levels
Non-Parametric Tests
Binominal
(Dichotomous)
Mann-Whitney
U
(T-test analog)
Friedman Two
way
ANOVA
(Repeated
measure
ANOVA)
Dependent
Kruskal-
Wallis H (One
way
ANOVA
analog)
Wilcoxon Sign
(Paired
t-test analog)
Independent Dependent Independent
Non-Parametric Hypothesis Testing
Roadmap
Trademarks and Service Marks
Six Sigma is a federally registered trademark of Motorola, Inc.
Breakthrough Strategy is a federally registered trademark of Six Sigma Academy.
VISION. FOR A MORE PERFECT WORLD is a federally registered trademark of Six Sigma Academy.
ESSENTEQ is a trademark of Six Sigma Academy.
FASTART is a trademark of Six Sigma Academy.
Breakthrough Design is a trademark of Six Sigma Academy.
Breakthrough Lean is a trademark of Six Sigma Academy.
Design with the Power of Six Sigma is a trademark of Six Sigma Academy.
Legal Lean is a trademark of Six Sigma Academy.
SSA Navigator is a trademark of Six Sigma Academy.
SigmaCALC is a trademark of Six Sigma Academy.
iGrafx

is a trademark of Micrografx, Inc.
SigmaTRAC is a trademark of DuPont.
MINITAB is a trademark of Minitab, Inc.

Você também pode gostar