Você está na página 1de 43

Stat 110, Lecture 11

Sampling Distributions, Estimation,


and Hypothesis Testing (I)
bheavlin@stat.stanford.edu
Statistics

Way Too
No Data Some Data Much Data

Inferential Descriptive
Probability Statistics Statistics

sampling hypothesis estimation


distributions testing

Stat 110 bheavlin@stat.stanford.edu


Some topics

• that n–1 issue


• normal approximation to binomial
• p-values are uniform!
• classic sampling distributions based on the
normal

Stat 110 bheavlin@stat.stanford.edu


That n–1 issue in the denominator
Population: {1,2,3}
(with replacement
sampling)
Population mean is 2
Population variance=2/3
= (1–2)2+(2–2)2+(3–2)2
3
= [1+0+1]/3 = 2/3
= N in the denominator is
population size
Stat 110 bheavlin@stat.stanford.edu
Samples of sample ave var(n) var(n–1)
size n =2 1,1 1.0 0.00 0.00
1,2 1.5 0.25 0.50
a=b: 3× vn=0 1,3 2.0 1.00 2.00
2,1 1.5 0.25 0.50
|a–b|=2 2× vn=1 2,2 2.0 0.00 0.00
2,3 2.5 0.25 0.50
|a–b|=1 4× vn=1/4 3,1 2.0 1.00 2.00
3,2 2.5 0.25 0.50
3,3 3.0 0.00 0.00
ave 2.0 3/9=1/3 6/9=2/3
stdev 1/3

Stat 110 bheavlin@stat.stanford.edu


Random sampling

If n elements are selected from a population in


such a way that every set of n elements in the
population has an equal probability of being
selected, the n elements are said to be a random
sample.

Stat 110 bheavlin@stat.stanford.edu


residuals residuals when n=2
1.5

Residual(i) 1.0

= X(i) – X 0.5

x2 - ave
0.0

Σi Residual(i) = 0 -0.5

-1.0

so residuals live in -1.5

n–1 dimensions. -1.5 -1.0 -0.5 .0 .5 1.0 1.5


x1 - ave

Stat 110 bheavlin@stat.stanford.edu


the population of averages

S&L population S&L aves of n=2


3 00 30
2 2 55 The sampling distribution
2 00 2 000 of averages of two.

1 1 55
1 00 10
variance=2/3 variance=1/3 std error=√(1/3)=0.58

the precision effect:


The standard deviation of a sampling
distribution is the corresponding statistic’s
standard error.
Stat 110 bheavlin@stat.stanford.edu
yasd

2.0 00 1.5
mean=2/3
1.5 1.0 44
1.0 0.5 7777
0.5 5555 0.0 000
0.0 000
For with-replacement ... for the sampling
sampling from {1,2,3}, distribution of the
the sampling distribution standard deviation
of the variance when n=2. when n=2.

Stat 110 bheavlin@stat.stanford.edu


Why n–1?
Lemma: { X(i)} are independent with mean μ and
variance σ2. Var(Σi X(i)) = n σ2.

Numerator of sample variance:


Σi [ X(i)–X ]2 = Σi [(X(i)–μ)–(X–μ)]2 ≡ Σi (z(i)–z)2
= Σi [(1–1/n)z(i) – Σi≠j z(j)/n]2
= Σi (1–1/n)2z(i)2 + Σi [Σi≠j z(j)/n]2
–2 ΣiΣi≠j (1–1/n) z(i) z(j)/n
Stat 110 bheavlin@stat.stanford.edu
Why n–1 continued…
Expectation of numerator of sample variance:
E( Σi [ X(i)–X ]2 )
= Σi (1–1/n)2 E(z(i))2 + Σi E([Σi≠j z(j)/n]2)
–2 ΣiΣi≠j (1–1/n) E(z(i)z(j)/n)
= Σi (1–1/n)2 σ2 + Σi E([Σi≠j z(j)]2)/n2 E(z(i))=0
Lemma: var of sums
= n(1–1/n) σ + Σi (n–1)σ /n
2 2 2 2

= n(1–1/n)2 σ2 + n(n–1)σ2/n2
= ((n–1)2/n)σ2 + ((n–1)/n)σ2 = σ2×((n–1)/n)((n–1)+1)
= σ2×(n–1)
Stat 110 bheavlin@stat.stanford.edu
Comments on why n–1

E{ Σi [ X(i)–X ]2/(n–1) } = σ2, so the sample


variance is an unbiased estimator of σ2.
This unbiasedness does not depend on assuming
a Gaussian or other distribution, just
independence, E(X)=μ, Var(X)=σ2.
Σi [X(i)–X ] = 0, so the n residuals are in n–1 linear
dimensions. This dimensionality issue shows
up in much theory, e.g. combining variance
estimates.

Stat 110 bheavlin@stat.stanford.edu


Four sides of the same coin

1. If X, Y are independent,
Var(X+Y) = Var(X) + Var(Y).
2. { X(i)} independent, each with variance σ2,
Var(Σi X(i)) = n σ2.
3. { X(i)} independent, each with variance σ2,
Var(X) = σ2/n.
4. If { X(i)} are independent with mean μ and
variance σ2, the sample variance (with n–1 in
the denominator) is unbiased.

Stat 110 bheavlin@stat.stanford.edu


Side 5!
How do you get the weights
of an apple and orange?

C. Weigh the apple=Wa. Var(Wa ) = σ2,


Weigh the orange=Wo. Var(Wo ) = σ2,
(sniff)

D. Weigh the
(W+ + W–)/2 = apple
apple+orange= W+.
Weigh the apple–orange (W+ – W–)/2 = orange
=W–. Var((W+.±W–)/2) =
(?) (1/2)2 × (σ2+σ2) = σ2/2
Stat 110 bheavlin@stat.stanford.edu
The sampling distribution of p-values

Assume an 1
underlying
0.8
continuous

w=CDFY(y)
distribution: 0.6

P(W≤w)=FY(y(w)) 0.4

=FY(FY-1(w))=w 0.2

0 .2 .4 .6 .8 1 1.2
y

Stat 110 bheavlin@stat.stanford.edu


“Fisher’s p-value
An application: combination method”

Two inconclusive experiments, giving p-values of 0.06


and 0.08. Can these p-values be combined into an
overall p-value?
If pv is uniform(0,1), 0.4

then –ln(pv) is exponential 0.3

gamma(alpha=2)
with mean=1. The sum of two
0.2
exponentials(λ=1) has
distribution gamma(α=2,λ=1). 0.1 3%
–ln(0.06)–ln(0.08)
0.0
=2.81+2.53
0 1 2 3 4 5 6 7 8
=5.34
P(gamma(α=2,1)≥5.34)=0.03 = the combined p-value
Stat 110 bheavlin@stat.stanford.edu
Normal approximation to binomial

Let {z(i)} be independent Bernouilli( p ).


Then S(n) = Σ i≤n z(i) has distribution binomial(n, p)
Central limit theorem applies:
• μ = E(S(n)) = np, σ2 = Var(S(n)) = np(1–p)

CLT is a better approximation when:


• 0 ≤ np ± 2×√np(1–p) ≤ n or
• np ≥ 4 and n(1–p) ≥ 4

Stat 110 bheavlin@stat.stanford.edu


Improving binomial-normal approx:

Let S denote a binomial(n, p); Z a standard normal.

lower tail:
P(S ≤ a) ≈ P( Z ≤ (a+0.5–np)/√np(1–p) )
upper tail:
P(S ≥ b) ≈ P( Z ≥ (b–0.5–np)/√np(1–p) )
interval:
P(a≤S≤b) ≈ P((a–0.5–np) ≤ Z ≤ (b+0.5–np))
√np(1–p) √np(1–p)

Stat 110 bheavlin@stat.stanford.edu


continuity correction for n=13, p=0.3
0.3 bin bin cont’ty naïve binCDF
binomial vs k pdf CDF corrected normal – pdf/2
uncorrected
0.2
normal 0 0.010 0.010 0.020 0.009 0.005
approx 1 0.054 0.064 0.073 0.040 0.037
0.1
2 0.139 0.202 0.198 0.125 0.133
0
3 0.218 0.421 0.404 0.293 0.312
0.3 0 5 10 15 4 0.234 0.654 0.642 0.524 0.537
binomial vs 5 0.180 0.835 0.834 0.747 0.744
0.2
continuity
corrected 6 0.103 0.938 0.942 0.898 0.886
Y

normal 7 0.044 0.982 0.985 0.970 0.960


0.1
approx 8 0.014 0.996 0.997 0.993 0.989
0.0
9 0.003 0.999 1.000 0.999 0.998
0 5 10 15

Stat 110 bheavlin@stat.stanford.edu


Combining (pooling) standard deviations
Suppose we want to combine a few standard deviations
into a single estimate, e.g. S-chart centerline, etc.
2. Because the sample variances are unbiased estimates,
it makes sense to average them.
3. Suppose the different standard deviations have
different associated sample sizes. It makes sense to
give those with larger sample sizes (n) more weight.
4. In fact, the correct weights are the respective (n–1)
value of each variance:
(n1–1)s12 + (n2–1)s22 weighted
2 =
s1+2 =
(n1–1) + (n2–1) RMS(s1,s2)
Also called the pooled variance.
Stat 110 bheavlin@stat.stanford.edu
df
• Degrees of freedom (df): the (n –1) in the denominator
of the sample variance formula. This is the essential
sample size of a standard deviation or variance.
• For a standard deviation estimated by weighted RMS,
the denominator, e.g. (n1–1)+(n2–1), is the associated
degrees of freedom.
• Chi-square
(n –1)s2/σ2 has a chi-square distribution with (n –1)
degrees of freedom (df).
Two independent chi-squares χ1 and χ2 with ν1 and ν2 df:
χ1 + χ2 is chi-square with ν1+ν2 df.
• So the sampling distribution of the sample variance is
proportional to a chi-square distribution.
Stat 110 bheavlin@stat.stanford.edu
Making the chi-square useful
Example 7.12
n=10. Specification is s ≤0.05. σ=0.03.
What is the probability of observing s≤0.05 oz?
P(s≤0.05) = P(s2≤0.0025) = P((n –1)s2≤9×0.0025)
= P((n –1)s2/σ2≤9×0.0025/0.0009) = P( χν=9 ≤ 25)
= 0.997
n=5, P(s≤0.05) = … = P( χν=4 ≤ 11.11) = 0.975
n=4, P(s≤0.05) = … = P( χν=3 ≤ 8.333) = 0.960

Stat 110 bheavlin@stat.stanford.edu


A more realistic question

n =10, s =0.03 oz.


What would a reasonable range of σ be?
0.95=P(χν=9(0.025) ≤ (n –1)s2/σ2 ≤ χν=9(0.975))
= P(1/χν=9(0.975)≤ σ2/(n –1)s2 ≤ 1/χν=9(0.025))
= P((n –1)s2/χν=9(0.975)≤ σ2≤ (n –1)s2/χν=9(0.025))
= P( 9×0.0009/19.0228≤ σ2≤ 9×0.0009/2.70039)
= P( 0.00043 ≤ σ2 ≤ 0.00300)
= P( 0.0206 ≤ σ ≤ 0.0548)
Stat 110 bheavlin@stat.stanford.edu
Confidence intervals
Based on this data: n =10, s =0.03 oz,
this statement:
0.95 = P( 0.00043 ≤ σ2 ≤ 0.0030)
= P( 0.0206 ≤ σ ≤ 0.0548)
is called a 95 percent confidence interval.
One-sided 95% confidence intervals:
0.95 = P(σ ≤ 0.0494)
= P(σ ≥ 0.0210)

Stat 110 bheavlin@stat.stanford.edu


Confidence intervals
true, unknown
“Based on this data, value of σ
there is a 95 percent
chance that the (true)
value of σ is between
0.0206 and 0.0548
ounces.” In confidence
intervals, the
interval
“The range of σ that is (horseshoe)
consistent with the is what is
observed data is random, not
between 0.0206 and the parameter
0.0548 ounces.” (stake).
Stat 110 bheavlin@stat.stanford.edu
3 confidence intervals for σ

1. Exact, based on chi- (0.0206 to 0.0548)


square exact confidence=0.95

3. Normal approximation (0.0159 to 0.0441)


based on moments exact confidence=0.90
s×[1 ± 2/(2ν)1/2]

4. Normal approximation of (0.0204 to 0.0568)


reciprocal
exact confidence=0.96
(1/s) ×[1 ± 2/(2ν) ] or
1/2

s/[1 ± 2/(2ν)1/2]
±2 standard errors
Stat 110 bheavlin@stat.stanford.edu
Tukey’s dilemma

Rule of thumb: To determine a standard


deviation to one significant digit, 10%
precision, 95% confidence, you need John W Tukey
n≈200.
Also: 20% precision, 95% confidence, n≈50.

But 200 df is enough data to detect non-normality.

Stat 110 bheavlin@stat.stanford.edu


Student’s t:
z normal with mean 0 and standard
deviation 1.
χ2 an independent chi-square with
ν degrees of freedom.

William Gosset
0.4
normal “Student”
0.3

0.2
Y

0.1
t-dist
t = z /[ χ2/ν ]1/2 is distributed as
0
a Student’s t distribution
-5 -4 -3 -2 -1 0 1 2 3 4 5
with ν degrees of freedom.
z-score units
Stat 110 bheavlin@stat.stanford.edu
A process monitoring problem
n = 4, s = 0.03 oz, μ is either 0 (on target) or +0.02
oz from target. Degrees of freedom=3.
Specification requires | x | ≤ 0.03. What is the
probability of observing this?
P( –0.03≤ x ≤ +0.03 ) = P(–.03–μ ≤ x–μ ≤ .03–μ)
= P( (–.03–μ)/(s/√n) ≤ (x–μ)/(s/√n) ≤ (.03–μ)/(s/√n) )
= P( (–.03–μ)/(s/√n) ≤ t ≤ (.03–μ)/(s/√n) )
= P( (–.03–μ)/0.015 ≤ t ≤ (.03–μ)/0.015 )
= P(–2.00 ≤ t ≤ +2.00 | μ= 0.00) = 0.861 or
= P(–3.33 ≤ t ≤ +0.67 | μ=+0.02) = 0.701
Stat 110 bheavlin@stat.stanford.edu
Operating Characteristic

n = 4, ν = 3, 1.0
s = 0.03 oz, 0.9 P(| x | ≤ 0.03 )
0.8
P(| x | ≤ 0.03 )
0.7
as a function of μ. 0.6
0.5
0.4
0.3
0.2
0.1
0.0
-0.10 -0.05 .00 .05 .10
true μ
Stat 110 bheavlin@stat.stanford.edu
More realistic formulation
n = 4, s = 0.03 oz, x =+0.02 oz from target. From
this data, what is a reasonable range of μ?
0.95 = P(tν=3(0.025) ≤ t ≤ tν=3(0.975))
= P(tν=3(0.025) ≤ (x–μ)/(s/√n) ≤ tν=3(0.975))
= P(tν=3(0.025)×(s/√n)≤ x–μ ≤ tν=3(0.975)×(s/√n))
= P( x–tν=3(0.975)×(s/√n)≤ μ ≤ x–tν=3(0.025)×(s/√n))
= P( x–tν=3(0.975)×(s/√n)≤ μ ≤ x+tν=3(0.975)×(s/√n))
= P(0.02–3.182×(0.03/2)≤ μ≤0.02+3.182×(0.03/2))
= P( –0.0277 ≤ μ ≤ 0.0677 )
Stat 110 bheavlin@stat.stanford.edu
overetch yield experiment (reprised)

lot split 01-12 13-24 delta


1 clearout 01-12 75 68 7
2 clearout 01-12 45 61 -16
3 clearout 01-12 81 79 2
4 clearout 01-12 78 87 -9
5 clearout 01-12 57 77 -20

mean = –7.20 95% confidence interval


stdev = 11.52 = –7.2 ± 2.776×11.52/√5
t(df=4,0.975)=2.776 = –21.50 to +7.10
Stat 110 bheavlin@stat.stanford.edu
The ladder of uncertainty
To know the uncertainty of an average, we
need to know/estimate the underlying …
standard deviation.
When we do not know the population
standard deviation, we need to estimate
it, so we now must cope with the 3rd, 4th
uncertainty in the standard deviation moments
also.
To estimate the uncertainty of the standard standard
deviation, we need to know/estimate the deviations
3rd and 4th moments.
 Student’s t cuts and contains the ladder
of uncertainty. averages
Stat 110 bheavlin@stat.stanford.edu
One- vs Two-Sample Problems

one-sample problem
• Example is 5 “split” lots, each paired to their
personal control groups.
• Ultimately depend on deltas, one column of
values.

two-sample problem
• One group’s values do not pair to any particular
subset of the second group’s values.

Stat 110 bheavlin@stat.stanford.edu


Imagine as 10 different lots
lot etch++ lot control What might random
1 75 6 68 noise look like?
2 45 7 61 d = difference
3 81 8 79 = 67.2 – 74.4
4 78 9 87 = –7.5
5 57 10 77
mean = 67.2 74.4
stdev = 15.36 10.09
pooled stdev2 = 4×15.362 + 4×10.092
(5-1) + (5-1)
= 1372/8 = 171.5
pooled stdev= 13.09 with df=8
Stat 110 bheavlin@stat.stanford.edu
two-sample confidence interval
Var( x1 – x2 ) = σ2(1/n1 + 1/n2)
estimated by sp2(1/n1 + 1/n2) ≡ sd2,
where sp = 13.09, sd = 8.2825
0.95 = P(tν=8(0.025) ≤ t ≤ tν=8(0.975))
= P(tν=8(0.025) ≤ (d–δ)/sd ≤ tν=8(0.975))
= P(tν=8(0.025)×sd ≤ (d–δ) ≤ tν=8(0.975)×sd )
= P( d–tν=8(0.975)×sd ≤ δ ≤ d–tν=8(0.025)×sd )
= P( d–tν=8(0.975)×sd ≤ δ ≤ d+tν=8(0.975)×sd )
= P(–7.2–2.306×12.99 ≤ δ ≤ –7.2+2.306×12.99 )
= P( –26.3 ≤ δ ≤ 11.9 )
Stat 110 bheavlin@stat.stanford.edu
paired vs two-sample

paired two-sample
• mean diff = –7.2 • mean diff = –7.2
• n=5, df=4, • n1+n2=10, df=8,
s =11.52 s =13.09
• stderr2 multiplier • stderr2 multiplier
=(1/5) =0.2 =(1/5 + 1/5) =0.4
• 95% confidence • 95% confidence
interval –7.2 ± 14.3 interval –7.2 ± 19.1

Stat 110 bheavlin@stat.stanford.edu


paired vs two-sample

paired two-sample
–fewer degrees of +more degrees of
freedom freedom

+potentially smaller s –less design


when the pairing complexity by not
“means something” pairing
+smaller stderr2 –stderr2 multiplier 2×
multiplier by 2× larger

Stat 110 bheavlin@stat.stanford.edu


Key tests:

one-sample two-sample

means paired t two-sample t

variances chi-square F distribution

Stat 110 bheavlin@stat.stanford.edu


F-distribution (Fisher, Snedecor)

χ12/ν1 s1 2
F = =
χ22/ν2 s2 2
is distributed as F with
Let χ12 and χ22 be numerator degrees of
independent chi- freedom ν1 and
squares with ν1 and ν2 denominator degrees of
degrees of freedom, freedom ν2.

i.e. two variances. Denoted F(ν1, ν2 ).


Stat 110 bheavlin@stat.stanford.edu
degrees of
A small experiment freedom
n1 = 8, s1 measures baseline variability. ν1=7,
n2 = 5, s2 measures variability later. ν2=4.
What is probability of a 3× increase if the
true stdevs are (a) the same, (b) differ
by 2×.
1.0
P((s2/s1)>3) = P((s2 /s1 )>9)
2 2 P((s2/s1)>3)
0.8

= P((s22/σ22)/(s12/σ12) > 9×(σ12/σ22)) 0.6

= P(F(ν2,ν1)>9/(σ22/σ12)) 0.4

= P(F(4,7)>9 | σ2/σ1=1) = 0.0068 0.2

= P(F(4,7)>9/4 | σ2/σ1=2) = 0.164 0.0


0 1 2 3 4 5 6 7 8 9 10

Stat 110 bheavlin@stat.stanford.edu


σ2/σ1
Reciprocal property
χ12/ν1 s12
26.2% F(ν1,ν2)(p) = =
χ /ν2
2
2
s22
0.6

0.4 pth quantile: F(ν1,ν2)


0.2 (p)
0.0 equals reciprocal of
0.6 the (1–p)th quantile
0.4 with df reversed:
0.2 26.2% 1/F(ν2,ν1)(1–p)
0.0

0 1 2 3 4 5

Stat 110 bheavlin@stat.stanford.edu


perhaps more realistically…
n1 = 8, s1 = 0.03 oz estimates baseline variability.
n2 = 5, s2 = 0.05 oz estimates variability again.
If ρ = σ2/σ1, what range of ρ is consistent with the data?
0.95 = P(F7,4(0.025) ≤ (s12/σ12)/(s22/σ22) ≤ F7,4(0.975))
= P(F7,4(0.025) ≤ (σ22/σ12)/(s22/s12) ≤ F7,4(0.975))
= P(F7,4(0.025)×(s22/s12)≤(σ22/σ12)≤F7,4(0.975)×(s22/s12))
= P((s22/s12)/F4,7(0.975)≤ (σ22/σ12) ≤(s22/s12)×F7,4(0.975))
= P( (5/3)2/5.52 ≤ (σ22/σ12) ≤ (5/3)2×9.07 )
= P( 0.5 ≤ (σ22/σ12) ≤ 25.2 ) = P( 0.71 ≤ σ2/σ1 ≤ 5.02 )
= P( 0.71 ≤ ρ ≤ 5.02 )
Stat 110 bheavlin@stat.stanford.edu

Você também pode gostar