ITK 226 2 Statistics

Dicky Dermawan
www.dickydermawan.net78.net
dickydermawan@gmail.com
In statistics we are concerned with method for designing and
evaluating experiments to obtain information about practical
problems.
In most cases the inspection of each item of population would
be too expensive, time-consuming, or even impossible. Hence
a few of sample are drawn at randomand from this inspection
conclusion about the population are inferred.
POPULATION SAMPLE
n
y
y
n
1 i
i
=
= =
)
1 n
y y
S
2
n
1 i
i
2 2
= =
=
o
=
=
N
1 j
j j
) x ( f x
Mean
Average
Variance
Variance ) ) x ( f x
j
j
2
j
2
= o
Size large number N
Size small number n
Probability function/density f(x) Relative frequency function
Distribution function F(x)
Cumulative frequency function
Sample of 100 Values of the Splitting Tensile Strength (lb/in
2
)
320 380 340 410 380 340 360 350 320 370
350 340 350 360 370 350 380 370 300 420
370 390 390 440 330 390 330 360 400 370
320 350 360 340 340 350 350 390 380 340
400 360 350 390 400 350 360 340 370 420
420 400 350 370 330 320 390 380 400 370
390 330 360 380 350 330 360 300 360 360
360 390 350 370 370 350 390 370 370 340
370 400 360 350 380 380 360 340 330 370
340 360 390 400 370 410 360 400 340 360
2
)
0
2
4
6
8
10
12
14
16
18
300 310 320 330 340 350 360 370 380 390 400 410 420 430 440
A
b
s
o
l
u
t
e

F
r
e
q
u
e
n
c
y
Tensile Strength
2
)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
300 310 320 330 340 350 360 370 380 390 400 410 420 430 440
R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
Tensile Strength
2
)
0
20
40
60
80
100
120
300 310 320 330 340 350 360 370 380 390 400 410 420 430 440
C
u
m
u
l
a
t
i
v
e

A
b
s
o
l
u
t
e

F
r
e
q
u
e
n
c
y
Tensile Strength
2
)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
300 310 320 330 340 350 360 370 380 390400410 420 430440
C
u
m
u
l
a
t
i
v
e

R
e
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
Tensile Strength
Min
Lower Quartile
Middle Quartile = Median
Upper Quartile
Interquartile range
Max
DOX 6E Montgomery 10
Experimental error
Hypothesis testing: null hypothesis, alternative hypothesis
Type I error : rejecting a true hypothesis
Type II error : accepting a false hypohesis
One-tail test vs Two-tail test
Confidence level = Significance Level
P-value
Confidence interval
If Y1, .Yn are independent normal random variables each of
which has mean and variance
2
, then the normal random
variable:
Is normal with the mean and variance
2
/n and the random
variable
Is normal with the mean 0 and variance 1
The confidence interval for is
) Y .... Y Y Y (
n
1
Y
n 3 2 1
=
n /
Y
Z
o

=
So far we have regarded the value y1, y2, .of a sample as n observed
value of a single random variable Y. We may equally well regard these n
values as single observations of n random variables Y1, Y2,.that have the
same distribution and are independent

e e
n
c
y
n
c
y CONF
o
o
A vendor submits lots of fabric to a textile manufacturer.
The manufacturer wants to know if the lot average breaking
strength exceeds 200 psi. If so, she wants to accept the lot.
Past experience indicates that a reasonable value for the
variance of breaking strength is 100 (psi)
2
.
Four speciments are randomly selected, and the average
breaking strength observed is
psi. 214 y =
The hypothesis to be tested are:
This is a one-sided alternative hypothesis
The value of the test statistic is:
If the confidence level of 95% is chosen, i.e. type I error = 0.05, we
find Z
= 1.645
Thus the difference is significant: H0 is rejected and we conclude that
the lot average breaking strength exceeds 200 psi.
Thus, we accept the lot.
The confidence interval for at 95% confidence level is 205.8
222.2. Clearly, 200 is outside the interval.
The P-value is 0.0026.
200 : H
200 : H
1
0
"
=
80 . 2
n /
y
Z
0
0
=
=
o
n / S
Y
t
0
0

=

e e
n
S t
y
n
S t
y CONF
The test statistic is
The confidence interval is
At (n-1) degree of freedom
The same as previous, but we use..
t distribution instead of normal distribution
Sample standard deviation S instead of t Normal
S
o
If Variance Known
2
2
2
1
2
1
2 1
0
n n
y y
Z
o o
=
The confidence interval is
) ) )
e e
1
2
1
1
2
1
2 / 2 1 2 1
1
2
1
1
2
1
2 / 2 1
n n
Z y y y y
n n
Z y y CONF
o o o o

Normal
n n
n
y
2
2
2
1
2
1
2
o o o
If Variance Unknown, but

1
2
=
2
2
1 2
0
1 2

1 1
p
y y
t
S
n n
) 1 n ( ) 1 n (
S ) 1 n ( S ) 1 n (
S
2 1
2
2 2
2
1 1
2
P

=
Choose confidence level, usually 95%, then find critical t value at associated degree of
freedom, i.e. t
/2,v
If |t
0
|> t
/2,v
, we have enough reason to reject null hypothesis and conclude that the
two method differ significantly
Alternatively, calculate P value, i.e. the risk of wrongly rejecting the null hypothesis
Or set confidence interval and reject null hypothesis if 0 is not included in the interval
2 n n
2 1
= v
2 n n , t Normal
n
1
n
1
S
n
y
2 1
2 1
P
2
=

v
o
If Variance Unknown,
1
2

2
2
)
1 n 1 n
2
n
S
n
S
2
2
2
1
2
1
2
2
2
2
n
2
2
S
1
2
1
n
2
1
S
2
2
2
1
2
1
t Normal
n
S
n
S
n
y
'
+
'
'
+
'
v
o
2
2
2
1
2
1
2 1
0
n
S
n
S
y y
t
=
Tension bond strength of portland cement
mortar is an important characteristics of the
product. An engineer is interested in
comparing the strength of a modified
formulation in which polymer latex
emulsions have been added during mixing to
the strength of the unmodified mortar. He
collected 10 observations (Table 2.1)
Plot the dot diagram.
Plot the Box &Whisker plot
Are the two formulations really different?
Or perhaps the observed difference is the
results of sampling fluctuation and the two
formulations are really identical?
Bloking is a design technique used to improve the precision with which the
comparisons among the factors of interest are made. Often blocking is used to reduce
or eliminate the variability transmitted from nuisance factors, i.e.
factors that may influence the experimental response but in which we are not
interested.
The term block refers to a relatively homogeneous experimental unit, and the block
represents a restriction on complete randomization because the treatment
combinations are only randomized within the block. Blocking is carried out by making
comparisons within matched pairs of experimental material.
The confidence interval based on paired analysis usually much narrower than that
from the independent analysis. This illustrates the noise reduction property of
blocking.
Statistical model 4 complete randomization:
with (2n
i
-1) degree of freedom
Statistical model with blocking:
with only (n
i
pair -1) degree of freedom
The test statistic:
The confidence interval for 2-sided test:
n / S
d
t
d
0
=
=
=
=
i
ij i ij
n 1,2,..., j
1,2 i
y s
=
=
=
i
ij j i ij
n 1,2,..., j
1,2 i
y s
j 2 j 1 j
y y d =
n / S t d
d n , 2 /

Consider a hardness testing machine that presses a rod with a pointed tip into a
metal specimen with a known force. Two different tips are available for this machine,
and it is suspected that one tip produces different hardness readings than the other.
The test could be performed as follows: a number of metal specimens could
randomly be selected. Half are tested by tip 1 and the other half by tip 2.
The metal specimens might be cut from different bar stock that were not exactly
different in their hardness. To protect against this possibility, an alternative
experimental design should be considered: divide each specimen into two part and
randomly assign each tip to of each specimen
- Use the paired data to determine a 95% confidence interval for the difference
- What if we use pooled or independent analysis?
Speciment Tip 1 Tip 2
1 7 6
2 3 3
3 3 5
4 4 3
5 8 8
6 3 2
7 2 4
8 9 9
9 5 4
10 4 5
In some experiments it is the comparison of variability in the data that is important.
For example, in chemical laboratories, we may wish to compare the variability of two
analytical methods.
Unlike the tests on means, the procedures for tests on variances are rather sensitive
to the normality assumption.
Suppose we wish to test the hypothesis weather or not the variance of a normal
population equals a constant, viz.
0
2
. The test statistic is:
The appropriate distribution for G
0
2
is chi-square distribution with (n-1) degree of
freedom. The confidence interval for
0
2
is
2
0
2
2
0
2
0
S ) 1 n ( SS
o o
G

= =
2
1 n , 1
2
2
2
1 n ,
2
2 2
S ) 1 n ( S ) 1 n (

e e

G
o
G
Suppose we wish to test equality of the variances of two normal populations.
If independent random samples of size n1 and n2 are taken from populations 1 & 2,
respectively, the test statistic for:
Is the ratio of the sample variances:
The appropriate distribution for F
0
is the F distribution with (n
1
-1) numerator degree
of freedom and (n
2
-1) denominator degree of freedom. The null hypothesis would be
rejected if F
0
> F
/2,n
1
-1,n
2
-1
The confidence interval for
1
2
/
2
2
is
2
2
2
1
0
S
S
F =
1 n , 1 n , 2
2
2
1
2
2
2
1
1 n , 1 n , 1 2
2
2
1
1 2
2
1 2
2
F
S
S
F
S
S

e e

o
o
2
0
2
1
2
0
2
0
: H
: H
o o
o o
=
=
1 n , 1 n ,
1 n , 1 n , 1
1 2
2
2 1
2
F
1
F : Note

=
Probability plotting is a graphical technique for determining whether sample data

conform to a hypothesized distribution based on a subjective visual examination of
the data.
To construct a probability plot, the observation in the sample are first rank from
smallest to largest. That is, the sample y
1
,y
2
,,y
n
is arranged as y
(1) ,
y
(2)
,.,y
(n)
where
y
(1)
is the smallest observation, with y
(n)
the largest.
The ordered observations y
(j)
are then plotted against their observed cumulative
frequency (j-0.5)/n.
The cumulative frequency scale has been arranged so that if the hypothesized
distribution adequately describes the data, the plotted points will fall approximately
along a straight line. Usually, this is subjective.
An experiment is a test or a series of tests
Experiments are used widely in the engineering
world
Process characterization & optimization
Evaluation of material properties
Product design & development
Component & system tolerance determination
All experiments are designed experiments,
some are poorly designed, some are well-
designed
Randomization
Running the trials in an experiment in random order
Notion of balancing out effects of lurking variables
Replication
Sample size (improving precision of effect estimation,
estimation of error or background noise)
Replication versus repeat measurements? (see page 13)
Blocking
Dealing with nuisance factors
Best-guess experiments
Used a lot
More successful than you might suspect, but there are
disadvantages
One-factor-at-a-time (OFAT) experiments
Sometimes associated with the scientific or
engineering method
Devastated by interaction, also very inefficient
Statistically designed experiments
Based on Fishers factorial concept

ITK 226 2 Statistics

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

ITK 226 2 Statistics

Enviado por

Direitos autorais:

Formatos disponíveis

Dicky Dermawan

If Variance Unknown, but

Probability plotting is a graphical technique for determining whether sample data

Você também pode gostar