W12INSE6220

1
INSE 6220 -- Week 12

Advanced Statistical Approaches to Quality
Acceptance Sampling
Final Exam Review
Dr. A. Ben Hamza Concordia University

2
Acceptance Sampling
Why Acceptance Sampling and not 100% Inspection?
Testing is destructive
Cost of 100% inspection is high
100% inspection is not feasible (require too much time)
If vendor has excellent quality history
Advantages and Disadvantages of Sampling
Advantages
Less expensive
Reduced damage
Reduces the amount of inspection error
Disadvantages
Risk of accepting bad lots, rejecting good lots.
Less information generated
Requires planning and documentation
3
Acceptance Sampling
Problem:
A lot (shipment) is received.
A sample is taken from the lot.
Some quality characteristic of the units in the sample is inspected.
On the basis of this inspection information, the lot is sentenced accept or
reject
Type of sampling plans

classification is by data type, variables and attributes
Based on the number of samples required for a decision:
Single-sampling plans
Double-sampling plans
Multiple-sampling plans
Sequential-sampling plans
4
Lot formation
Lots should be homogeneous.
Larger lots are preferred over smaller ones.
Lots should be conformable to materials-handling systems
used in both supplier and consumer facilities.
Random Sampling
Single Sampling plan

A lot of size N submitted for inspection.
Single sampling plan defines:
Sample size, n
Acceptance number, c
Operating Characteristic (OC) Curve
To measure the performance of a sampling plan.
The OC curve plots the probability of accepting the lot versus the lot fraction
defective.
Show the probability of a lot submitted, with certain fraction defective, will be either
accepted or rejected.
5
OC Curve
Lot size, N large
# defective, d, in a random sample of size n will follow a binominal
distribution with parameters, n and p.
n!
p k 1 p , k 0, 1, 2, ..., n
nk
P(d k )
k !( n k )!
The probability of acceptance is P(d c)

c
n!
P (Accept lot) Pa ( p ) P (d c ) p k 1 p
nk
k 0 k !( n k )!
OC Curve plots the probability of accepting the lot Pa versus the lot fraction defective p (true
proportion nonconforming) .
Example: An apple producer has 500 baskets of apples, containing 20 each (10000 apples in the
lot). A buyer wants to inspect 10 of the apples before accepting the lot (if 2 or less are bruised). That
is n=10, N=10000, c=2. Suppose 20% of the apples are bruised, what is the probability of
accepting such a lot?
Solution: Let d be the number of bruised apples in the sample. Calculate: Pa (0.2) P ( d 2)
6
OC Curve: the plot OC curve for n=10 and c=2

1
c
n!
Pa P ( d c ) p k 1 p
nk
0.8
k 0 k !( n k )!
Probability of Acceptance
0.6
>> p=0:.001:1;
>> c=2; n=10; 0.4
>> plot(p,cdf(binomial,c,n,p));
>> xlabel('p'); ylabel('Probability of Acceptance'); 0.2
>> title('OC curve for n=10 and c=2'); 0

0 0.2 0.4 0.6 0.8 1
p
7
Operating Characteristic (OC) Curve

N=10,000, n=89, c=2
p = fraction Pa = P[Accepting
defective in lot Lot]
0.005 0.9897
0.010 0.9397
0.015 0.8502
0.020 0.7366
0.025 0.6153
0.030 0.4985
0.035 0.3936
8
Effect of n and c on OC curve

The ideal OC curve places high probability on accepting good lots (p close to 0) and
low probability on accepting bad lots
n: the higher, the better
c: the lower, the better
Changing acceptance number

c does not significantly
change the shape
9
Main Points for Final Exam

What to Study
Some topics are more important than others.
Spend your time on the right stuff.
Dont waste time on topics we havent emphasized in class.
How to Prepare for the Final Exam
Focus on the main topics (Lectures 8-to-12)
Make a list of your problem areas.
Eliminate any problems not mentioned on the lecture slides
Keep the class notes as guideline.
Redo the examples in the lecture notes.
Work on relevant exercises at the end of the chapters.
Bring a Calculator on the day of the exam
Final Exam Coverage: Lectures 8-to-12, and Assignment 2
10
Principal Component Analysis (PCA)
Steps in PCA: #1 Calculate Adjusted Data Set
Subtract the mean from each dimension

DataAdjustij Dataij {Data kj | k 1,..., n}
where i is observation, j is dimension and n is total number of observations
Adjusted Data Set: X Data Set Mean values: M
Mi is calculated by
n
= - taking the mean
rows of the values in
dimension i
(column-wise mean)
p columns
11
Steps in PCA: #2 Calculate covariance matrix, S, from Adjusted Data Set, X
Covariance Matrix: S
p
Note: Since the means of the dimensions in
the adjusted data set, X, are 0, the covariance
matrix can simply be written as:
p
Sij = cov(Xi,Xj) S = (XTX) / (n-1)

12

Steps in PCA: #3 Calculate eigenvectors and eigenvalues of S
Matrix A Matrix A
Eigenvalues Eigenvalues
x x
Eigenvectors Eigenvectors
If some eigenvalues are 0 or very small, we can essentially discard those

eigenvalues and the corresponding eigenvectors, hence reducing the
dimensionality of the new basis.
Eigenvalues j are used for calculation of [% of total variance] (vj) for each
component j
13

Steps in PCA: #4 Transforming data set to the new basis
Note that the dimensions of the
Z = XA new dataset, Z, are less than the
data set X
where: To recover X from Z:
Z is the transformed data set ZA-1 = XAA-1 = X
A is the matrix containing the
p
xi Azi zik ak
eigenvectors
X ZA T
X is the adjusted data set k 1
* A is orthogonal, therefore A-1 = AT

Example 14
Consider the eigenvector matrix
Scatter plot of PC2 vs. PC1 coefficients Biplot

15
Introduction to Regression Analysis

Regression analysis is used to:
Predict the value of a dependent variable based on the value of at least one
independent variable
Explain the impact of changes in an independent variable on the dependent
variable
Dependent variable: the variable we wish to predict or explain
Independent variable: the variable used to predict or explain the dependent variable
Population Population Slope Independent

Y intercept Variable Random
Coefficient Error term
Dependent Variable
Linear component Random Error component

16
Example
Car Plant Electricity Usage
The manager of a car plant wishes to investigate how the plants electricity usage
depends upon the plants production. The linear model y 0 1 x will allow
a months electrical usage to be estimated as a function of the months production.
17
Linear Regression Model: Least Squares Estimation

n n
L ( yi 0 1 xi ) 2
i
2
i 1 i 1
The least squares estimators of 0 and 1 , say, 0 and 0 must satisfy

L n
2 ( yi 0 1 xi ) 0
0 0 , 1 i 1
L n
2 ( yi 0 1 xi ) xi 0
1 0 , 1 i 1
Simplifying these two equations yields

0 y 1 x 1 x2

n n
n 0 1 xi yi s.e.
2
n 0
i 1 i 1

xi yi n( x )( y )
S xy

n S xx
n n n
i 1

0 xi 1 xi2 yi xi 1 2
i 1 i 1 i 1

n
xi2 n( x ) 2
S xx s.e.
1
S xx
i 1
These equations are called the least squares normal equations. The solution to the
normal equations results in the least squares estimates.
18
Linear Regression and Analysis of Variance

The total sum of squares of the observed y values is a measure of the total variability
in the response:
n n n
( y y ) ( y y ) ( y y )
i 1
i
2
i 1
i
2
i 1
i i
2
SST SS R SS E
And the sample correlation coefficient is given by:
S xy S xx
r 1
S xx S yy SST
Analysis of Variance for Testing Significance of Regression

Source of Variation Sum of Squares Degrees of Freedom Mean Square F0
Regression SS R 1 MS R MS R / MS E
Error or residual SS E n-2 MS E
Total SST n 1
S xy2 n
SS R 1S xy , SST S yy ( yi y ) 2 , SS E SST SS R
S xx i 1
Example: Car Plant Electricity Usage
19
The estimates of the slope parameter and the intercept parameter :

Sxy
1 0.49883
Sxx
34.15 58.62
0 y 1 x (0.49883 ) 0.4090
12 12
The fitted regression line :

y
0
x 0.409 0.499x
1
The estimate of the error variance is

2 SS E
0.0299
n2
0.0299 0.1729
20
Coefficient of Determination R2
The total variability in the dependent variable, the total sum of squares
SST in1 ( yi y ) 2
can be partitioned into the variability explained by the regression line,
S xy2
theregression sum of squaresSSR in1 (
yi y ) 2 1S xy
S xx
and the variability about the regression line, the error sum of squares
SS n ( y
E i 1 y )2 .
i i
The proportion of the total variability accounted for by the regression line is
the coefficient of determination
SSR SS
R 2 1 E
SST SST
which takes a value between zero and one.
SSR 1.2124
Car Plant Electricity Usage: R 2 0.802
SST 1.5115
21
Confidence Interval: Inferences on the Slope Parameter 1

22
Example 23
Example 24
ONE-WAY ANOVA
25
ANOVA model:
Let yij be a random number indicating the j-th observation in the sample of the i-th treatment, then an
ANOVA is based on the following linear model:
y ij i ij
i ij
ANOVA assumptions:
ij ~ N (0, 2 ) i
In words: We assume the that the errors (residuals) are from a constant normal distribution (variance
unaffected by treatments) with a mean of zero.
ANOVA hypothesis (Fixed effects model):
An ANOVA tests the following hypothesis:
H0 : 1 2 ... a 0
H1 : At least one i 0
which is the same as the following:

H0 : 1 2 ... a
H1 : At least one i
26
Let yi . represent the total of the observations under the i-th treatment and yi . represent the
average of the observations under the i-th treatment. Similarly, let y.. represent the grand
total of all observations and y.. represent the grand mean of all observations. Expressed
mathematically:
n
yi . yij yi . yi . / n i 1, 2,..., a
j 1
a n
y.. yij y.. y.. / N
i 1 j 1
where N an is the total number of observations. Thus, the dot subscript notation implies
summation over the subscript that it replaces.
We are interested in testing the equality of a treatment means 1 , 2 ,...., a
This is equivalent to testing the hypotheses
H 0 : 1 1 a 0
H1 : At least one i 0
Thus, if the null hypothesis is true, each observation consists of the overall mean plus a
realization of the random error component ij . This is equivalent to saying that all N
observations are taken from a normal distribution with mean and variance 2. Therefore, if
the null hypothesis is true, changing the levels of the factor has no effect on the mean
response.
ONE-WAY ANOVA
27
In a one-way ANOVA, the overall variance is decomposed into two components:

a) Variation BETWEEN samples y..
b) Variation WITHIN samples
yi . i
We factorise as follows: yij is an observatio n of Yij
y ij y .. y ij y i. y i. y ..
yij yi . ij
OVERALL WITHIN BETWEEN
yi . y.. i
Then it is fairly easy to show that (see literature)
y
2
y y i .
a n a n
ij y ..
2
ij y i. y ..
i 1 j 1 i 1 j 1
y y i.
a n a n
y y .. 2
2
ij i.
i1 j 1 i 1 j 1
TOTAL WITHIN BETWEEN

The shorthand for the above is: SST SS E SSTreatments
The total number of observations is: N = an. It can also be shown that the degrees of freedom can
be decomposed as: TOTAL WITHIN BETWEEN
an - 1 a(n-1) a-1
28
ONE-WAY ANOVA
The previous sums of squares are totals within their respective categories. Using the
degrees of freedom, the Within and between sum of squares can be converted to mean
sum of squares as follows: SS
MSTreatments Tr
a 1
SS E
MS E
a ( n 1)
MSE is ALWAYS an UNBIASED estimate of 2, because E (MSE ) 2

Note in order to estimate SSE, n must be greater than 1 (otherwise the df would be zero,
resulting in division by zero when we try to calculate SSE). So to estimate the residual sum
of squares, we must replicate the experiment!
MSTreatments is an unbiased estimate of 2 ONLY IF the null a
n i2
hypothesis is true. If not it is BIASED, because E ( MSTreatments ) 2 i 1
a 1
If H0 is true, both of the above would be unbiased estimators of the same population
variance and therefore their ratio would have a F-distribution. In symbols:
MS Treatments
~ F with a 1 and a ( n 1) degrees of freedom
MS E
29
The expected value of the treatment sum of square is

a
E ( SS Treatments ) ( a 1) n i2
2
i 1
Now if the null hypothesis is true, each i is equal to zero and
SS
E Treatments
2
a -1
If the alternative hypothesis is true, then
a
n i2
SS

i 1
E Treatments 2
a -1 a 1
The ratio MSTreatments SS Treatments /( a 1) is called the mean square of treatments.

Thus, if H0 is true true, MS Treatments is an unbiased estimator of 2 , whereas if H1 is true,
MSTreatments estimates 2 plus a positive term that incorporates variation due to the
systematic difference in treatment means (refer to the textbook for more details).
30
We can also show that the expected value of the error sum of squares is
E ( SS E ) a ( n 1) 2. Therefore, the error mean square MS E SS E /[ a ( n 1)] is an
unbiased estimator of 2 regardless of whether or not H 0 is true.
The error mean square SS E
MS E
a (n 1)
is an unbiased estimator of 2.
31
Efficient computations formulas for the sums of squares may be obtained by

expanding and simplifying the definitions of SSTreatments and SS T . This yields to:
Definition
The sums of squares computing formulas for the analysis of variance with equal
sample sizes in each treatment are
a n
y..2
SS T y 2
ij
i 1 j 1 N
and a
yi2. y..2
SSTreatments
i 1 n N
The error sum of squares is obtained by subtraction as
SS E SST SSTreatments
The computations for this test procedure are usually summarized in tabular form as shown
below. This is called an analysis of variance (or ANOVA) table.
Source of Variation Sum of Squares Degrees of Freedom Mean Square F0
MSTreatments
Treatments SSTreatments a 1 MS Treatments
MS E
Error SS E a ( n 1) MS E
Total SST an 1
Checking Assumptions: Residual Analysis
32
Therefore, the residual is eij yij yi. ; that is, the difference between an observation
and the corresponding factor-level mean. The residuals for the hardwood percentage
experiment are shown in Table 3-8.
33
Design of Experiments: Statistical Analysis

The analysis of variance (ANOVA) can be extended to handle the two-factor
factorial experiment. Let the two factors be denoted A and B, with a levels of
factor A and b levels of factor B. The experiment is replicated n times.
i 1, 2,..., a

yijk i j ( )ij ijk j 1, 2,..., b
k 1, 2,..., n

34
Statistical Analysis
The corresponding degree of freedom decomposition is
abn 1 ( a 1) (b 1) ( a 1)(b 1) ab( n 1)
This decomposition is usually summarized in an analysis of variance table such

as the one shown in the Table 12-3
35
Statistical Analysis
The ANOVA is usually performed with computer software, although simple computing
formulas for the sums of squares may be obtained easily. The computing formulas for
these sums of squares follow.
a b n
y...2
SST y 2
ijk
i 1 j 1 k 1 abn
Main effects
yi2.. y...2
a
SS A
i 1 bn abn
b y.2j .
y...2
SS B
j 1 an abn
Interaction a b yij2.
y...2
SS AB SS A SS B
i 1 j 1 n abn
Error SS E SST SS A SS B SS AB
The 2k Factorial Design
36
The 22 Design
2
The geometry of the 2 design is shown in Fig. 12-17a. Note that the design can be
represented geometrically as a square with the 2 4 runs forming the corners of the
2
square. Fig. 12-17b shows the 4 runs in a tabular format often called the test matrix.
- and + denote the low and

high levels of a factor,
respectively
Low and high are arbitrary
terms
Geometrically, the four runs
form the corners of a square
Factors can be quantitative or
qualitative, although their
treatment in the final model
The letters (1), a, b, and ab represent the totals of all will be different
n observations taken at these design points.
37
A y A y A
a ab b (1)

2n 2n
1
a ab b (1)
2n
B yB yB

b ab a (1)

2n 2n
1
b ab a (1)
2n
ab (1) a b
AB
2n 2n
1
ab (1) a b
2n
The quantities in brackets in the above equations are called contrasts.

For example, the A contrast is
Contrast A a ab b (1)
38
In these equations, the contrast coefficients are always either +1 or -1. A table of
plus and minus signs, such as Table 12-7, can be used to determine the sign of each
run for a particular contrast.
Let k be is the number of factors. Then the effects and the sums of squares
for A, B, and AB are obtained as follows:
Contrast
Effect
n 2k 1
(Contrast) 2 2 k 2
SS k
n (Effect) 2
n2
39
Example 12-6
40
Analysis of Variance
We can compute the factor effect estimates as follows:
1
A [ a ab b (1)]
2n Contrast
1 133.1 Effect

2(4)
[96.1 161.1 59.7 64.4]
8
16.64 n 2k 1
1
B [b ab a (1)]
2n (Contrast) 2 2 k 2
1 60.3 SS k
n (Effect) 2
[59.7 161.1 96.1 64.4] 7.54 n2
2(4) 8
n
2 2
y ...2

1
AB
2n
[ ab (1) a b ] SS T otal y ijk
2

i 1 j 1 k 1 4n
1 69.7
[161.1 64.4 96.1 59.7] 8.71
2(4) 8
Source of Sum of Degrees of P-value

Variation Squares Freedom Mean Square F0
Bit Size (A) 1107.226 1 1107.226 185.25 1.17 x 10-8
Speed (B) 227.256 1 227.256 38.03 4.82 x 10-5
AB 303.631 1 303.631 50.80 1.20 x 10-5
Error 71.723 12 5.977
Total 1709.836 15
How to decide on significant or non-significant factors? 0.05 f ,1,2 k
( n 1)
f 0.05,1,12 4.7472
41
Regression Model and Residual Analysis

A B AB
At x1 = 1 and x2 = 1: y y x1 x2 x1 x 2
2 2 2
16.64 7.54 8.71
y 23.83 ( 1) ( 1) ( 1)( 1) 16.1
2 2 2
e1 18.2 16.1 2.1 e3 12.9 16.1 3.2

Residuals:
e2 18.9 16.1 2.8 e4 14.4 16.1 1.7
Normal Probability Plot Residual Plot Interaction Plot

Analysis Procedure for Factorial Experiments
42
Analysis Procedure for Factorial Designs

1. Estimate the factor effects 4. Analyze residuals
2. Form preliminary model 5. Refine model, if necessary
3. Test for significance of factor effects 6. Interpret results
The 2k Design for k>=3 factors
Cube plots are

often useful visual
displays of
experimental results
43
Example: The main effect may be estimated using the corresponding equations.
The effect of A, for example is
1
A a ab ac abc b c bc (1)
4n
1
22 27 23 30 20 21 18 16
4(2)
1
27 3.375
8
(Contrast A ) 2
and the sum of squares for A is given by SS A
n 2k
(27) 2
45.5625
2(8)
44
Formulas for 2-level factorial experiments with k factors

Effect 2 1 2
Coefficient , s.e.(Effect) k 2
, s.e.(Coefficient)
2 n2 2 n 2k 2
Effect Coefficient
t -ratio
s.e.(Effect) s.e.(Coefficient)
2 mean square error
2 k ( n 1) residual degrees of freedom p -value 2(1 cdf (' t ', abs(t -ratio),2 k ( n 1)))
Data analysis using t-ratios

Effect Coefficient s.e.(Effect) t-ratio p-value
A
B
C
AB
AC
BC
ABC

W12INSE6220

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

W12INSE6220

Enviado por

Direitos autorais:

Formatos disponíveis

1

INSE 6220 -- Week 12

Dr. A. Ben Hamza Concordia University

Type of sampling plans

Single Sampling plan

The probability of acceptance is P(d c)

OC Curve: the plot OC curve for n=10 and c=2

>> title('OC curve for n=10 and c=2'); 0

Operating Characteristic (OC) Curve

Effect of n and c on OC curve

n: the higher, the better

c: the lower, the better

Changing acceptance number

Main Points for Final Exam

Principal Component Analysis (PCA)

Steps in PCA: #1 Calculate Adjusted Data Set

Subtract the mean from each dimension

where i is observation, j is dimension and n is total number of observations

Adjusted Data Set: X Data Set Mean values: M

Principal Component Analysis (PCA)

Steps in PCA: #2 Calculate covariance matrix, S, from Adjusted Data Set, X

Sij = cov(Xi,Xj) S = (XTX) / (n-1)

Principal Component Analysis (PCA)

If some eigenvalues are 0 or very small, we can essentially discard those

Principal Component Analysis (PCA)

* A is orthogonal, therefore A-1 = AT

Consider the eigenvector matrix

Scatter plot of PC2 vs. PC1 coefficients Biplot

Introduction to Regression Analysis

Population Population Slope Independent

Linear component Random Error component

Linear Regression Model: Least Squares Estimation

The least squares estimators of 0 and 1 , say, 0 and 0 must satisfy

Simplifying these two equations yields

Linear Regression and Analysis of Variance

Analysis of Variance for Testing Significance of Regression

The estimates of the slope parameter and the intercept parameter :

The fitted regression line :

The estimate of the error variance is

Confidence Interval: Inferences on the Slope Parameter 1

which is the same as the following:

In a one-way ANOVA, the overall variance is decomposed into two components:

TOTAL WITHIN BETWEEN

MSE is ALWAYS an UNBIASED estimate of 2, because E (MSE ) 2

The expected value of the treatment sum of square is

Now if the null hypothesis is true, each i is equal to zero and

The ratio MSTreatments SS Treatments /( a 1) is called the mean square of treatments.

Efficient computations formulas for the sums of squares may be obtained by

Design of Experiments: Statistical Analysis

abn 1 ( a 1) (b 1) ( a 1)(b 1) ab( n 1)

This decomposition is usually summarized in an analysis of variance table such

- and + denote the low and

The quantities in brackets in the above equations are called contrasts.

Source of Sum of Degrees of P-value

Regression Model and Residual Analysis

e1 18.2 16.1 2.1 e3 12.9 16.1 3.2

Normal Probability Plot Residual Plot Interaction Plot

Analysis Procedure for Factorial Designs

The 2k Design for k>=3 factors

Cube plots are

Formulas for 2-level factorial experiments with k factors

Data analysis using t-ratios

Você também pode gostar