Lecture 11

Stat 110, Lecture 11
Sampling Distributions, Estimation,

and Hypothesis Testing (I)
bheavlin@stat.stanford.edu
Statistics
Way Too
No Data Some Data Much Data
Inferential Descriptive
Probability Statistics Statistics
sampling hypothesis estimation

distributions testing
Stat 110 bheavlin@stat.stanford.edu

Some topics
• that n–1 issue

• normal approximation to binomial
• p-values are uniform!
• classic sampling distributions based on the
normal

That n–1 issue in the denominator
Population: {1,2,3}
(with replacement
sampling)
Population mean is 2
Population variance=2/3
= (1–2)2+(2–2)2+(3–2)2
3
= [1+0+1]/3 = 2/3
= N in the denominator is
population size
Samples of sample ave var(n) var(n–1)
size n =2 1,1 1.0 0.00 0.00
1,2 1.5 0.25 0.50
a=b: 3× vn=0 1,3 2.0 1.00 2.00
2,1 1.5 0.25 0.50
|a–b|=2 2× vn=1 2,2 2.0 0.00 0.00
2,3 2.5 0.25 0.50
|a–b|=1 4× vn=1/4 3,1 2.0 1.00 2.00
3,2 2.5 0.25 0.50
3,3 3.0 0.00 0.00
ave 2.0 3/9=1/3 6/9=2/3
stdev 1/3

Random sampling
If n elements are selected from a population in

such a way that every set of n elements in the
population has an equal probability of being
selected, the n elements are said to be a random
sample.

residuals residuals when n=2
1.5
Residual(i) 1.0
= X(i) – X 0.5
x2 - ave
0.0
Σi Residual(i) = 0 -0.5
-1.0
so residuals live in -1.5
n–1 dimensions. -1.5 -1.0 -0.5 .0 .5 1.0 1.5

x1 - ave

the population of averages
S&L population S&L aves of n=2

3 00 30
2 2 55 The sampling distribution
2 00 2 000 of averages of two.
1 1 55
1 00 10
variance=2/3 variance=1/3 std error=√(1/3)=0.58
the precision effect:

The standard deviation of a sampling
distribution is the corresponding statistic’s
standard error.
yasd
2.0 00 1.5
mean=2/3
1.5 1.0 44
1.0 0.5 7777
0.5 5555 0.0 000
0.0 000
For with-replacement ... for the sampling
sampling from {1,2,3}, distribution of the
the sampling distribution standard deviation
of the variance when n=2. when n=2.

Why n–1?
Lemma: { X(i)} are independent with mean μ and
variance σ2. Var(Σi X(i)) = n σ2.
Numerator of sample variance:

Σi [ X(i)–X ]2 = Σi [(X(i)–μ)–(X–μ)]2 ≡ Σi (z(i)–z)2
= Σi [(1–1/n)z(i) – Σi≠j z(j)/n]2
= Σi (1–1/n)2z(i)2 + Σi [Σi≠j z(j)/n]2
–2 ΣiΣi≠j (1–1/n) z(i) z(j)/n
Why n–1 continued…
Expectation of numerator of sample variance:
E( Σi [ X(i)–X ]2 )
= Σi (1–1/n)2 E(z(i))2 + Σi E([Σi≠j z(j)/n]2)
–2 ΣiΣi≠j (1–1/n) E(z(i)z(j)/n)
= Σi (1–1/n)2 σ2 + Σi E([Σi≠j z(j)]2)/n2 E(z(i))=0
Lemma: var of sums
= n(1–1/n) σ + Σi (n–1)σ /n
2 2 2 2
= n(1–1/n)2 σ2 + n(n–1)σ2/n2
= ((n–1)2/n)σ2 + ((n–1)/n)σ2 = σ2×((n–1)/n)((n–1)+1)
= σ2×(n–1)
Comments on why n–1
E{ Σi [ X(i)–X ]2/(n–1) } = σ2, so the sample

variance is an unbiased estimator of σ2.
This unbiasedness does not depend on assuming
a Gaussian or other distribution, just
independence, E(X)=μ, Var(X)=σ2.
Σi [X(i)–X ] = 0, so the n residuals are in n–1 linear
dimensions. This dimensionality issue shows
up in much theory, e.g. combining variance
estimates.

Four sides of the same coin
1. If X, Y are independent,
Var(X+Y) = Var(X) + Var(Y).
2. { X(i)} independent, each with variance σ2,
Var(Σi X(i)) = n σ2.
3. { X(i)} independent, each with variance σ2,
Var(X) = σ2/n.
4. If { X(i)} are independent with mean μ and
variance σ2, the sample variance (with n–1 in
the denominator) is unbiased.

Side 5!
How do you get the weights
of an apple and orange?
C. Weigh the apple=Wa. Var(Wa ) = σ2,

Weigh the orange=Wo. Var(Wo ) = σ2,
(sniff)
D. Weigh the
(W+ + W–)/2 = apple
apple+orange= W+.
Weigh the apple–orange (W+ – W–)/2 = orange
=W–. Var((W+.±W–)/2) =
(?) (1/2)2 × (σ2+σ2) = σ2/2
The sampling distribution of p-values
Assume an 1
underlying
0.8
continuous
w=CDFY(y)
distribution: 0.6
P(W≤w)=FY(y(w)) 0.4
=FY(FY-1(w))=w 0.2
0 .2 .4 .6 .8 1 1.2
y

“Fisher’s p-value
An application: combination method”
Two inconclusive experiments, giving p-values of 0.06

and 0.08. Can these p-values be combined into an
overall p-value?
If pv is uniform(0,1), 0.4
then –ln(pv) is exponential 0.3
gamma(alpha=2)
with mean=1. The sum of two
0.2
exponentials(λ=1) has
distribution gamma(α=2,λ=1). 0.1 3%
–ln(0.06)–ln(0.08)
0.0
=2.81+2.53
0 1 2 3 4 5 6 7 8
=5.34
P(gamma(α=2,1)≥5.34)=0.03 = the combined p-value
Normal approximation to binomial
Let {z(i)} be independent Bernouilli( p ).

Then S(n) = Σ i≤n z(i) has distribution binomial(n, p)
Central limit theorem applies:
• μ = E(S(n)) = np, σ2 = Var(S(n)) = np(1–p)
CLT is a better approximation when:

• 0 ≤ np ± 2×√np(1–p) ≤ n or
• np ≥ 4 and n(1–p) ≥ 4

Improving binomial-normal approx:
Let S denote a binomial(n, p); Z a standard normal.
lower tail:
P(S ≤ a) ≈ P( Z ≤ (a+0.5–np)/√np(1–p) )
upper tail:
P(S ≥ b) ≈ P( Z ≥ (b–0.5–np)/√np(1–p) )
interval:
P(a≤S≤b) ≈ P((a–0.5–np) ≤ Z ≤ (b+0.5–np))
√np(1–p) √np(1–p)

continuity correction for n=13, p=0.3
0.3 bin bin cont’ty naïve binCDF
binomial vs k pdf CDF corrected normal – pdf/2
uncorrected
0.2
normal 0 0.010 0.010 0.020 0.009 0.005
approx 1 0.054 0.064 0.073 0.040 0.037
0.1
2 0.139 0.202 0.198 0.125 0.133
0
3 0.218 0.421 0.404 0.293 0.312
0.3 0 5 10 15 4 0.234 0.654 0.642 0.524 0.537
binomial vs 5 0.180 0.835 0.834 0.747 0.744
0.2
continuity
corrected 6 0.103 0.938 0.942 0.898 0.886
Y
normal 7 0.044 0.982 0.985 0.970 0.960

0.1
approx 8 0.014 0.996 0.997 0.993 0.989
0.0
9 0.003 0.999 1.000 0.999 0.998
0 5 10 15

Combining (pooling) standard deviations
Suppose we want to combine a few standard deviations
into a single estimate, e.g. S-chart centerline, etc.
2. Because the sample variances are unbiased estimates,
it makes sense to average them.
3. Suppose the different standard deviations have
different associated sample sizes. It makes sense to
give those with larger sample sizes (n) more weight.
4. In fact, the correct weights are the respective (n–1)
value of each variance:
(n1–1)s12 + (n2–1)s22 weighted
2 =
s1+2 =
(n1–1) + (n2–1) RMS(s1,s2)
Also called the pooled variance.
df
• Degrees of freedom (df): the (n –1) in the denominator
of the sample variance formula. This is the essential
sample size of a standard deviation or variance.
• For a standard deviation estimated by weighted RMS,
the denominator, e.g. (n1–1)+(n2–1), is the associated
degrees of freedom.
• Chi-square
(n –1)s2/σ2 has a chi-square distribution with (n –1)
degrees of freedom (df).
Two independent chi-squares χ1 and χ2 with ν1 and ν2 df:
χ1 + χ2 is chi-square with ν1+ν2 df.
• So the sampling distribution of the sample variance is
proportional to a chi-square distribution.
Making the chi-square useful
Example 7.12
n=10. Specification is s ≤0.05. σ=0.03.
What is the probability of observing s≤0.05 oz?
P(s≤0.05) = P(s2≤0.0025) = P((n –1)s2≤9×0.0025)
= P((n –1)s2/σ2≤9×0.0025/0.0009) = P( χν=9 ≤ 25)
= 0.997
n=5, P(s≤0.05) = … = P( χν=4 ≤ 11.11) = 0.975
n=4, P(s≤0.05) = … = P( χν=3 ≤ 8.333) = 0.960

A more realistic question
n =10, s =0.03 oz.

What would a reasonable range of σ be?
0.95=P(χν=9(0.025) ≤ (n –1)s2/σ2 ≤ χν=9(0.975))
= P(1/χν=9(0.975)≤ σ2/(n –1)s2 ≤ 1/χν=9(0.025))
= P((n –1)s2/χν=9(0.975)≤ σ2≤ (n –1)s2/χν=9(0.025))
= P( 9×0.0009/19.0228≤ σ2≤ 9×0.0009/2.70039)
= P( 0.00043 ≤ σ2 ≤ 0.00300)
= P( 0.0206 ≤ σ ≤ 0.0548)
Confidence intervals
Based on this data: n =10, s =0.03 oz,
this statement:
0.95 = P( 0.00043 ≤ σ2 ≤ 0.0030)
= P( 0.0206 ≤ σ ≤ 0.0548)
is called a 95 percent confidence interval.
One-sided 95% confidence intervals:
0.95 = P(σ ≤ 0.0494)
= P(σ ≥ 0.0210)

Confidence intervals
true, unknown
“Based on this data, value of σ
there is a 95 percent
chance that the (true)
value of σ is between
0.0206 and 0.0548
ounces.” In confidence
intervals, the
interval
“The range of σ that is (horseshoe)
consistent with the is what is
observed data is random, not
between 0.0206 and the parameter
0.0548 ounces.” (stake).
3 confidence intervals for σ
1. Exact, based on chi- (0.0206 to 0.0548)

square exact confidence=0.95
3. Normal approximation (0.0159 to 0.0441)

based on moments exact confidence=0.90
s×[1 ± 2/(2ν)1/2]
4. Normal approximation of (0.0204 to 0.0568)

reciprocal
exact confidence=0.96
(1/s) ×[1 ± 2/(2ν) ] or
1/2
s/[1 ± 2/(2ν)1/2]
±2 standard errors
Tukey’s dilemma
Rule of thumb: To determine a standard

deviation to one significant digit, 10%
precision, 95% confidence, you need John W Tukey
n≈200.
Also: 20% precision, 95% confidence, n≈50.
But 200 df is enough data to detect non-normality.

Student’s t:
z normal with mean 0 and standard
deviation 1.
χ2 an independent chi-square with
ν degrees of freedom.
William Gosset
0.4
normal “Student”
0.3
0.2
Y
0.1
t-dist
t = z /[ χ2/ν ]1/2 is distributed as
0
a Student’s t distribution
-5 -4 -3 -2 -1 0 1 2 3 4 5
with ν degrees of freedom.
z-score units
A process monitoring problem
n = 4, s = 0.03 oz, μ is either 0 (on target) or +0.02
oz from target. Degrees of freedom=3.
Specification requires | x | ≤ 0.03. What is the
probability of observing this?
P( –0.03≤ x ≤ +0.03 ) = P(–.03–μ ≤ x–μ ≤ .03–μ)
= P( (–.03–μ)/(s/√n) ≤ (x–μ)/(s/√n) ≤ (.03–μ)/(s/√n) )
= P( (–.03–μ)/(s/√n) ≤ t ≤ (.03–μ)/(s/√n) )
= P( (–.03–μ)/0.015 ≤ t ≤ (.03–μ)/0.015 )
= P(–2.00 ≤ t ≤ +2.00 | μ= 0.00) = 0.861 or
= P(–3.33 ≤ t ≤ +0.67 | μ=+0.02) = 0.701
Operating Characteristic
n = 4, ν = 3, 1.0
s = 0.03 oz, 0.9 P(| x | ≤ 0.03 )
0.8
P(| x | ≤ 0.03 )
0.7
as a function of μ. 0.6
0.5
0.4
0.3
0.2
0.1
0.0
-0.10 -0.05 .00 .05 .10
true μ
More realistic formulation
n = 4, s = 0.03 oz, x =+0.02 oz from target. From
this data, what is a reasonable range of μ?
0.95 = P(tν=3(0.025) ≤ t ≤ tν=3(0.975))
= P(tν=3(0.025) ≤ (x–μ)/(s/√n) ≤ tν=3(0.975))
= P(tν=3(0.025)×(s/√n)≤ x–μ ≤ tν=3(0.975)×(s/√n))
= P( x–tν=3(0.975)×(s/√n)≤ μ ≤ x–tν=3(0.025)×(s/√n))
= P( x–tν=3(0.975)×(s/√n)≤ μ ≤ x+tν=3(0.975)×(s/√n))
= P(0.02–3.182×(0.03/2)≤ μ≤0.02+3.182×(0.03/2))
= P( –0.0277 ≤ μ ≤ 0.0677 )
overetch yield experiment (reprised)
lot split 01-12 13-24 delta

1 clearout 01-12 75 68 7
2 clearout 01-12 45 61 -16
3 clearout 01-12 81 79 2
4 clearout 01-12 78 87 -9
5 clearout 01-12 57 77 -20
mean = –7.20 95% confidence interval

stdev = 11.52 = –7.2 ± 2.776×11.52/√5
t(df=4,0.975)=2.776 = –21.50 to +7.10
The ladder of uncertainty
To know the uncertainty of an average, we
need to know/estimate the underlying …
standard deviation.
When we do not know the population
standard deviation, we need to estimate
it, so we now must cope with the 3rd, 4th
uncertainty in the standard deviation moments
also.
To estimate the uncertainty of the standard standard
deviation, we need to know/estimate the deviations
3rd and 4th moments.
 Student’s t cuts and contains the ladder
of uncertainty. averages
One- vs Two-Sample Problems
one-sample problem
• Example is 5 “split” lots, each paired to their
personal control groups.
• Ultimately depend on deltas, one column of
values.
two-sample problem
• One group’s values do not pair to any particular
subset of the second group’s values.

Imagine as 10 different lots
lot etch++ lot control What might random
1 75 6 68 noise look like?
2 45 7 61 d = difference
3 81 8 79 = 67.2 – 74.4
4 78 9 87 = –7.5
5 57 10 77
mean = 67.2 74.4
stdev = 15.36 10.09
pooled stdev2 = 4×15.362 + 4×10.092
(5-1) + (5-1)
= 1372/8 = 171.5
pooled stdev= 13.09 with df=8
two-sample confidence interval
Var( x1 – x2 ) = σ2(1/n1 + 1/n2)
estimated by sp2(1/n1 + 1/n2) ≡ sd2,
where sp = 13.09, sd = 8.2825
0.95 = P(tν=8(0.025) ≤ t ≤ tν=8(0.975))
= P(tν=8(0.025) ≤ (d–δ)/sd ≤ tν=8(0.975))
= P(tν=8(0.025)×sd ≤ (d–δ) ≤ tν=8(0.975)×sd )
= P( d–tν=8(0.975)×sd ≤ δ ≤ d–tν=8(0.025)×sd )
= P( d–tν=8(0.975)×sd ≤ δ ≤ d+tν=8(0.975)×sd )
= P(–7.2–2.306×12.99 ≤ δ ≤ –7.2+2.306×12.99 )
= P( –26.3 ≤ δ ≤ 11.9 )
paired vs two-sample
paired two-sample
• mean diff = –7.2 • mean diff = –7.2
• n=5, df=4, • n1+n2=10, df=8,
s =11.52 s =13.09
• stderr2 multiplier • stderr2 multiplier
=(1/5) =0.2 =(1/5 + 1/5) =0.4
• 95% confidence • 95% confidence
interval –7.2 ± 14.3 interval –7.2 ± 19.1

paired vs two-sample
paired two-sample
–fewer degrees of +more degrees of
freedom freedom
+potentially smaller s –less design

when the pairing complexity by not
“means something” pairing
+smaller stderr2 –stderr2 multiplier 2×
multiplier by 2× larger

Key tests:
one-sample two-sample
means paired t two-sample t
variances chi-square F distribution

F-distribution (Fisher, Snedecor)
χ12/ν1 s1 2
F = =
χ22/ν2 s2 2
is distributed as F with
Let χ12 and χ22 be numerator degrees of
independent chi- freedom ν1 and
squares with ν1 and ν2 denominator degrees of
degrees of freedom, freedom ν2.
i.e. two variances. Denoted F(ν1, ν2 ).

degrees of
A small experiment freedom
n1 = 8, s1 measures baseline variability. ν1=7,
n2 = 5, s2 measures variability later. ν2=4.
What is probability of a 3× increase if the
true stdevs are (a) the same, (b) differ
by 2×.
1.0
P((s2/s1)>3) = P((s2 /s1 )>9)
2 2 P((s2/s1)>3)
0.8
= P((s22/σ22)/(s12/σ12) > 9×(σ12/σ22)) 0.6
= P(F(ν2,ν1)>9/(σ22/σ12)) 0.4
= P(F(4,7)>9 | σ2/σ1=1) = 0.0068 0.2
= P(F(4,7)>9/4 | σ2/σ1=2) = 0.164 0.0

0 1 2 3 4 5 6 7 8 9 10

σ2/σ1
Reciprocal property
χ12/ν1 s12
26.2% F(ν1,ν2)(p) = =
χ /ν2
2
2
s22
0.6
0.4 pth quantile: F(ν1,ν2)

0.2 (p)
0.0 equals reciprocal of
0.6 the (1–p)th quantile
0.4 with df reversed:
0.2 26.2% 1/F(ν2,ν1)(1–p)
0.0
0 1 2 3 4 5

perhaps more realistically…
n1 = 8, s1 = 0.03 oz estimates baseline variability.
n2 = 5, s2 = 0.05 oz estimates variability again.
If ρ = σ2/σ1, what range of ρ is consistent with the data?
0.95 = P(F7,4(0.025) ≤ (s12/σ12)/(s22/σ22) ≤ F7,4(0.975))
= P(F7,4(0.025) ≤ (σ22/σ12)/(s22/s12) ≤ F7,4(0.975))
= P(F7,4(0.025)×(s22/s12)≤(σ22/σ12)≤F7,4(0.975)×(s22/s12))
= P((s22/s12)/F4,7(0.975)≤ (σ22/σ12) ≤(s22/s12)×F7,4(0.975))
= P( (5/3)2/5.52 ≤ (σ22/σ12) ≤ (5/3)2×9.07 )
= P( 0.5 ≤ (σ22/σ12) ≤ 25.2 ) = P( 0.71 ≤ σ2/σ1 ≤ 5.02 )
= P( 0.71 ≤ ρ ≤ 5.02 )

Lecture 11

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Lecture 11

Enviado por

Direitos autorais:

Formatos disponíveis

Stat 110, Lecture 11

Sampling Distributions, Estimation,

sampling hypothesis estimation

Stat 110 bheavlin@stat.stanford.edu

• that n–1 issue

Stat 110 bheavlin@stat.stanford.edu

Stat 110 bheavlin@stat.stanford.edu

If n elements are selected from a population in

Stat 110 bheavlin@stat.stanford.edu

so residuals live in -1.5

n–1 dimensions. -1.5 -1.0 -0.5 .0 .5 1.0 1.5

Stat 110 bheavlin@stat.stanford.edu

S&L population S&L aves of n=2

the precision effect:

Stat 110 bheavlin@stat.stanford.edu

Numerator of sample variance:

E{ Σi [ X(i)–X ]2/(n–1) } = σ2, so the sample

Stat 110 bheavlin@stat.stanford.edu

Stat 110 bheavlin@stat.stanford.edu

C. Weigh the apple=Wa. Var(Wa ) = σ2,

Stat 110 bheavlin@stat.stanford.edu

Two inconclusive experiments, giving p-values of 0.06

then –ln(pv) is exponential 0.3

Let {z(i)} be independent Bernouilli( p ).

CLT is a better approximation when:

Stat 110 bheavlin@stat.stanford.edu

Let S denote a binomial(n, p); Z a standard normal.

Stat 110 bheavlin@stat.stanford.edu

normal 7 0.044 0.982 0.985 0.970 0.960

Stat 110 bheavlin@stat.stanford.edu

Stat 110 bheavlin@stat.stanford.edu

n =10, s =0.03 oz.

Stat 110 bheavlin@stat.stanford.edu

1. Exact, based on chi- (0.0206 to 0.0548)

3. Normal approximation (0.0159 to 0.0441)

4. Normal approximation of (0.0204 to 0.0568)

Rule of thumb: To determine a standard

But 200 df is enough data to detect non-normality.

Stat 110 bheavlin@stat.stanford.edu

lot split 01-12 13-24 delta

mean = –7.20 95% confidence interval

Stat 110 bheavlin@stat.stanford.edu

Stat 110 bheavlin@stat.stanford.edu

+potentially smaller s –less design

Stat 110 bheavlin@stat.stanford.edu

means paired t two-sample t

variances chi-square F distribution

Stat 110 bheavlin@stat.stanford.edu

i.e. two variances. Denoted F(ν1, ν2 ).

= P((s22/σ22)/(s12/σ12) > 9×(σ12/σ22)) 0.6

= P(F(4,7)>9 | σ2/σ1=1) = 0.0068 0.2

= P(F(4,7)>9/4 | σ2/σ1=2) = 0.164 0.0

Stat 110 bheavlin@stat.stanford.edu

0.4 pth quantile: F(ν1,ν2)

Stat 110 bheavlin@stat.stanford.edu

Você também pode gostar