Escolar Documentos
Profissional Documentos
Cultura Documentos
∗
2. Bootstrap quantiles: Let θ[q] represent the q th quantile of the bootstrap
statistic. That is, take the vector of statistics produced by the boot-
strap procedure and rank them from smallest to largest. The ranks of
the vector then correspond to the bootstrap estimate of the quantiles of
the distribution. For example, if the number of bootstrap iterations was
1000, then, the 25th element of the ranked vector of bootstrap statistics
is the bootstrap estimate of the 0.025th quantile of the distribution of
the statistic.
3. The bootstrap estimate of the standard error
v
u B
u 1 X ¡ ∗ ¢2
se(θ)
ˆ =t θb − θ̄ (1.2)
B − 1 b=1
1
³ ´
∗ ∗
4. A (1 − α)% confidence interval is θ[B α , θ
] [B(1− α )]
2 2
Example 1 The table below shows the results of a small experiment in which
7 mice were randomly chosen from 16 to receive a new medical treatment,
while the remaining 9 were assigned to the non-treatment group. Investiga-
tors wanted to test whether the treatment prolonged life after surgery. The
table shows the survival times in days.
Say we wish to test for treatment differences, and know that the median
is a better measure of the center of distribution than the mean.
2
1000 bootstrapped Differences of Treatment Medians
500
450
400
350
300
frequency
250
200
150
100
50
0
−100 −50 0 50 100 150 200
median difference
The bootstrap 95% confidence interval was (−29, 101). What is our con-
clusion?
3
In the case where we know the distribution of the sample, but not of
the sample statistic, the parametric bootstrap often provides a powerful ap-
proach.
The basic algorithm for the parametric bootstrap is as follows:
4
Rx
That is, P r{Xi ≤ x} = F (x) = −∞ f (x)dx ∀x. Let Y[n] = max{X1 , · · · , Xn },
in words, Y[n] is the largest value in the sample, or the sample maximum.
Gn (y) = P r{Y[n] ≤ y}
= P r{X1 ≤ y, X2 ≤ y, . . . , Xn ≤ y}
= P r{X1 ≤ y}P r{X2 ≤ y} · · · P r{Xn ≤ y}
= [F (y)]n (2.1)
Why will the non-parametric bootstrap not work for the sample max?
5
Radiocesium Tissue Concentration in Bass from PAR Pond
45
40
35
30
frequency
25
20
15
10
0
0 5 10 15 20 25 30 35
picocuries per gram
137
Figure 2: An approximately Normal Data set of Cs Body Burdens
6
A parametric bootstrap was performed using the normal distribution for
the underlying distribution of the data. A histogram of the bootstrapped max-
imums is shown below.
250
200
frequency
150
100
50
0
20 25 30 35
picocuries per gram
What is the bootstrap estimate of the probability that the maximum body bur-
den in a sample of size 163 will exceed 30 picocuries per gram?
7
2.2. Code for Non-parametric Bootstrap Two Sample
Inference
treatment = [94,197,16,38,99,141,23];
control = [52,104,146,10,51,30,40,27,46];
B=1000; mediantreat=zeros(B,1);
mediancontrol=zeros(B,1);
medianDiff=zeros(B,1);
boottreat=zeros(length(treatment),1);
bootcontrol=zeros(length(control),1);
for b=1:B
for j=1:length(treatment);
pick=unidrnd(length(treatment));
boottreat(j)=treatment(pick);
end
for k=1:length(control);
pick=unidrnd(length(control));
bootcontrol(k)=control(pick);
end
mediantreat(b) = median(boottreat);
mediancontrol(b) = median(bootcontrol);
medianDiff(b) = mediantreat(b)-mediancontrol(b);
end
hist(medianDiff);
title(’1000 bootstrapped Differences of Treatment Medians’)
xlabel(’median difference’)
ylabel(’frequency’)
8
sortmedian=sort(medianDiff);
BSCI=[sortmedian(25),sortmedian(975)];
hist(bass);
title(’Radiocesium Tissue Concentrations in Bass from PAR Pond’);
xlabel(’picocuries per gram’);
ylabel(’frequency’);
mu = mean(bass);
sigma = sqrt(var(bass));
B=1000;
maxbass=zeros(B,1);
for b=1:B
maxbass(b)=max(basspboot);
end
hist(maxbass);
title(’Bootstrapped Maximum Radiocesium Tissue Concentrations
in Bass from PAR Pond’);
xlabel(’picocuries per gram’);
ylabel(’frequency’);
Count30=zeros(B,1);
for j=1:B
if maxbass(j)>=30, Count30(j)=1;
9
end
end
p30=sum(Count30)/B;
10