Escolar Documentos
Profissional Documentos
Cultura Documentos
Statistical Method
Statistical distributions
Hypothesis testing
3
4
Categorical modeling
Purpose
Basic material for statistical tests. Used to characterize a population
based upon a sample.
Decide whether data under investigation indicates that elements
of concern are the same or different.
Determine significance of factors and models; decompose
observed variation into constituent elements.
Understand relationships, determine process margin, and
optimize process.
Use when result or response is discrete (such as very rough,
rough, or smooth). Understand relationships, determine
process margin, and optimize process.
Determine if system is operating as expected.
Statistical Method
Control chart for x (for wafer or lot mean) and s (for within-wafer uniformity) based on the distribution of each parameter. Control limits and control rules are based on the distribution and time patterns in the parameter.
ANOVA on x and/or s over sets of equipment, fabs, or vendors.
Regression of end-of-line measurement as response
against in-line measurements as factors.
ANOVA, categorical modeling, correlation to find most influential factors to
explain the difference between good versus bad material.
One-sample comparisons on limited data; single lot factorial experiment
design; response surface models over a wider range of settings looking for
opportunities for optimization, or to establish process window over a narrower range of settings.
Two-sample comparison run over time to ensure a change is robust and
works over multiple tools, and no subtle defects are incurred by change.
Semiconductor
technology
development
and
manufacturing are often concerned with both continuous
parameters (e.g., thin-film thicknesses, electrical
performance parameters of transistors) and discrete
parameters (e.g., defect counts and yield). In this section, we
begin with a brief review of the fundamental probability
distributions typically encountered in semiconductor
manufacturing, as well as sampling distributions which arise
when one calculates statistics based on multiple
measurements (3). An understanding of these distributions is
crucial to understanding hypothesis testing, analysis of
variance, and other inferencing and statistical analysis
methods discussed in later sections.
Descriptive Statistics
xi
(1)
i=1
( xi x )
(2)
1 x 2
------ -----------
2
1
f ( x ) = --------------e
2
x
normalization z = ------------ , so that z N ( 1, 0 ) :
1
---
1 2 z
f ( z ) = ----------e
2
(3)
(4)
i=1
2
sx
(5)
Pr ( 105 x 120 ) =
105
1
--------------e
2
1 x 2
--- ------------
2
dx
105 x 120
= Pr ------------------ ------------ ------------------
zu
= Pr ( z l z z u ) =
1
---
1 2 z
dz
----------e
2
(6)
zl
1 2 zl
1 2
zu
----1 2 z
1 2 z
dz ----------e
dz
= ----------e
2
2
= ( zu ) ( zl )
= ( 6.325 ) ( 1.581 ) = 0.0569
nx
n x
f ( x, p, n ) = p ( 1 p )
x
(7)
where n choose x is
n!
n = ---------------------- x
x! ( n x )!
(8)
0.4
0.35
0.3
observed is 1
f ( x, 3 ) = 0.0038 .
x=0
0.25
10
Wafers
15
20
25
surviving
x = E{ x } =
e
f ( x, ) = -------------x!
(9)
(10)
xi pr ( xi )
i=1
x = Var { x } =
( x E{ x })
f ( x ) dx
(11)
xf ( x ) dx
( xi E { xi } )
pr ( xi )
i=1
xy = Cov { x, y } = E { ( x E { x } ) ( y E { y } ) }
(12)
= E { xy } E { x }E { y }
xy
Cov { x, y }
xy = Corr { x, y } = ------------------------------------------ = ----------xy
Var { x }Var { y }
(13)
2
sx
105 100
T
120 100
= Pr ------------------------------- ------------------- ------------------------
( 10 ) ( 5 ) ( n )
2
= Pr ( z l z z u )
( xi x ) ( yi y )
(14)
i=1
s xy
= --------sxsy
(15)
Sampling Distributions
and
compute
the
mean
oxide
(17)
thickness
1
T = --- ( T 1 + T 2 + + T 5 ) . We now have two key questions:
5
that a T b ?
T N ( , n )
(16)
where T = T = by the definition of the mean. Thus,
by
we have a chi-
y u
y2 v
2
2
s = ( xi x ) ( n 1 )
2
1
- F u, v
similarly y 2 v , then the random variable Y = -----------
( n 1 )s
2
---------------------- n 1
2
(18)
2
s ----------------
(n 1)
n1
x i N ( , )
x
-----------------
( n )
N ( 0, 1 )
x
------------------ = ------------------------- ----------------------------- t n 1
s
1 2
s ( n)
------------ n 1
n1
(19)
x i N ( x, x ) and w i N ( w, w ) . Then
2
sx x
--------------- F n 1, m 1
2
2
sw w
(20)
(21)
x z
2 ------- x + z 2 -------
n
n
= x z 2 ------n
(22)
f ( x ) N ( , n )
( n 1 )s
( n 1 )s
2
----------------------- -----------------------------2
2
2, n 1
1 2, n 1
(25)
T
T
2 = Pr z ------------------- = Pr 1.95996 -------------------
- ( n ) (23)
2 ( n )
T = 1.95996 ( n ) = 2.7718
2, n 1 ------- x + t 2, n 1 -------
n
n
s
= x t 2, n 1 ------n
(24)
(26)
f 0 ( x ) dx
= Pr ( reject H 0 H 0 is true ) =
x*
xi < x .
(27)
= Pr (Type II error)
x*
= Pr ( accept H 0 H 1 is true ) =
f 1 ( x ) dx
H 0 : f 0 ( x ) N ( 0, )
x
0
xi
= z 2 --- z 2 ---
(28)
(29)
CL = z 2 -------
1
0.9
0.8
(30)
n=2
0.7
n=3
0.6
n=5
0.5
n = 10
n = 20
0.4
n = 30
0.3
n = 40
0.2
0.1
0
0
0.5
1.5
of
Then
n A = 10 ,
n B = 10 ,
2 1
1
Var { y B y A } = ------ + ------ = ------ + ------
n A n B
nA nB
(31)
1
1
s B A = ------ + -----nA nB
(32)
(33)
distribution, Pr ( z > z 0 ) .
The disadvantage of the above method is that it depends
on knowing the population standard deviation . If such
information is indeed available, using it will certainly
improve the ability to detect a difference. In the second
approach, we assume that our 10 wafer samples are again
drawn by random sampling on an underlying normal
population, but in this case we do not assume that we know a
priori what the population variance is. In this case, we must
also build an internal estimate of the variance. First, we
estimate from the individual variances:
2
sA
1
= --------------nA 1
nA
( yA yA)
(34)
i=1
( n A 1 )s A + ( n B 1 )s B
2
s = --------------------------------------------------------nA + nB 2
(35)
(36)
measurements, N = n 1 + n 2 + + n k , or simply N = nk if
all k treatments consist of the same number of samples, n .
In the second step, we want an estimate of betweengroup variation. We will ultimately be testing the hypothesis
1 = 2 = = k . The estimate of the between-group
variance is
k
nt ( yt y )
sT =
Population
distributions
yB
t=1
SS
= --------TT
(40)
yC
yA
SS t =
( yt
yt )
(37)
2
sT
k
nt t
2
-
estimates + --------------(
k
1 )
t=1
distribution, since s T s R F k 1, N k .
We can also express the total variation (total deviation
sum of squares from the grand mean SS D ) observed in the
data as
j=1
SS D =
(38)
SS R
SS
1 s1 + 2 s2 + + k sk
2
- = --------Rs R = ---------------------------------------------------------= -----------1 + 2 + + k
Nk
R
(39)
(41)
nt
( yti y
(42)
t=1 i=1
2
sD
SS D
SS D
= --------- = -----------D
N1
Table 3: Structure of the Analysis of Variance Table, for Single Factor (Treatment) Case
Source of Variation
Sum of Squares
Degrees of Freedom
Between treatments
SS T
T = k 1
sT
Within treatments
SS R
R = N k
sR
SS D = SS T + SS R
D = T + R = N 1
sD
Mean
Square
F ratio
sT s R
Pr(F)
variation
due
to
the
average,
SS A = N y ,
or
SS = SS A + SS D = SS A + SS T + SS R . In Table 3, k is the
Pr( F T, R >
value
2
sT
2
sR
shown
in
such
table
is
1 = = k ).
ti = y ti yti N ( 0, )
(44)
SS = SS A + SS B + SS T + SS R
(47)
where b is the number of blocking groups, k is the
b
i=1
i
2
2
s B estimates + k ---------------
( b 1 )
i=1
(48)
2
sT
k
nt t
2
-
estimates + --------------( k 1 )
t=1
13
Sum of Squares
SS A = nk y
Between blocks
( yi y )
Mean
Square
F ratio
Pr(F)
SS B = k
Degrees of Freedom
B = b 1
sB
sB sR
T = k 1
sT
sT s R
pB
pT
i=1
Between treatments
SS T = b ( y t y )
t=1
Residuals
SS R
R = (b 1)(k 1)
sR
Total
SS
= N = bk
sD
(50)
= y + ( y t y ) + ( y i y ) + ( y ti y t y i + y )
(51)
Sum of Squares
Degrees of Freedom
Between levels of
factor 1
SS T = b ( y t y )
Mean
Square
F ratio
T = k 1
sT
sT s E
B = b 1
sB
sB sE
sI sE
Pr(F)
pT
pB
pI
t=1
Between levels of
factor 2
SS B = k
( yi y )
i=1
Interaction
SS I
I = (k 1)(b 1)
sI
SS E
E = bk ( m 1 )
sE
SS D
= bkm 1
sD
SS R R
2
Adj. R = 1 -------------------SS D D
(52)
sR
Mean Sq. of Residual
= 1 ----- = 1 ---------------------------------------------------2
Mean Sq. of Total
sD
1
analogous fashion; Interaction AB = --- ( y A ( B+ ) y A ( B- ) ) where
2
estimates of variance s i
1 s1 + 2 s2 + + g sg
2
s = --------------------------------------------------------1 + 2 + + g
(53)
s
s
s
Var{Effect A } = Var { y A+ } + Var { y A- } = ---- + ---- = ---- (54)
8 8
4
15
Factor A
Factor B
Factor C
1
2
3
4
5
6
7
8
+
+
+
+
+
+
+
+
+
+
+
+
Measured
Result
Yield
Yield
(-)
(+)
Factor A
(-)
(+)
Factor B
Case 1
ase 2
B-
BB+
(-)
(+)
Factor A
Yield
Experiment
Condition
Number
Yield
B+
(-)
(+)
Factor A
M
2
2
W = W + ------nM
(57)
1040
1030
M
2
2
W = W ------nM
1020
1010
1
Wafer number
W i N ( 0, W )
2
M j ( i ) N ( 0, M )
i = 1n W
for
(55)
where Y i
nW
i=1
1
2
s i = ------nW
nW
random variables drawn from the distributions of wafer-towafer variations and measurements taken within the ith
wafer, respectively. In this case, the total variation in oxide
2
wafer-to-wafer variance W :
2
T = W + M
M ( Y ij Y i )
-
------------------------------nM 1
i=1 j=1
(59)
within-wafer average):
Yi
1
= ------nM
nM
Y ij
(60)
j=1
1
= ---------------nW 1
nW
(Yi
Y )
(61)
i=1
nM
Y ij
(62)
i = 1j = 1
2
1
2
s M = ------nW
j = 1n M
nW
for
(58)
(56)
sM
2
2
s W = s W -----nM
(63)
where each x
( j)
( j)
(1)
( j)
+ 2 x
(2)
(65)
( j)
[ x min, x max ]
that the F test degenerates into the t test: F 1, n = t n , and a ttest can be used to evaluate the significance of the model
coefficient.
In the analysis above, we have assumed that the values
for x i have been selected at random and are thus unlikely to
be replicated. In many cases, it may be possible to repeat the
experiment at particular values, and doing so gives us the
opportunity to decompose the residual error into two
contributions. The residual sum of squares SS R can be
broken into a component SS L due to lack of fit and a
component SS E due to pure error or replication error:
( yi y i )
SS min = SS R =
(67)
i=1
(68)
An important issue is the estimate of the experimental
error. If we assume that the model structure is adequate, we
2
s = SS R ( n 1 )
(69)
(70)
(71)
(73)
SS R = SS L + SS E
y = b0 + b1 x +
or
Sum of Squares
Degrees of Freedom
SS M
M = 1 (number of
Mean
Square
F ratio
sM sR
sM
Pr(F)
p
model coefficients)
Residual
SS R
R = n M
sR
Total
SS
= n
sT
* The degrees of freedom in the model are shown for the case when only one model coefficient is used (strictly linear response).
Table 8: Structure of the analysis of variance table for a single factor response surface regression, in the case of
replication of experimental design points.
Source of Variation
Model
Sum of Squares
Degrees of Freedom
Mean Square
F ratio
SS M
M = 2
s M = SS M M
sM sE
b0
SS 0
0 = 1
s 0 = SS 0
b1
SS 1
1 = 1
s 1 = SS 1
R = n M
sR
Residual
SS R
p0
p1
Pr(lack of fit)
s1 s E
sL
pure error
SS E
E = m
sE
= n
pM
L = R E
s0 s E
SS L
SS
lack of fit
Total
Pr(F)
sL sE
(c) Box-Bhenken
Fa
c
tor
Factor 2
Factor 1
1.
CATEGORICAL MODELING
2.
3.
4.
5.
C. J. Spanos, Statistical process control in semiconductor manufacturing, Proc. IEEE, 80 (6): 819-830, 1992.
J. B. Keats and D. C. Montgomery (eds.), Statistical
Applications in Process Control, New York: Marcel
Dekker, 1996.
A. Madansky, Prescriptions for Working Statisticians,
Berlin: Springer-Verlag, 1988.
D. M. H. Walker, Yield Simulation for Integrated Circuits, Norwell, MA: Kluwer, 1987.
D. C. Montgomery, Introduction to Statistical Quality
Control, New York: Wiley, 1985.
21
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22