Statistics For Data Sciences - Mini Project 4

The University of Texas at Dallas, Mechanical Engineering Department
Statistics for Data Sciences - Mini-Project 4

Edgardo Javier Garca Cartagena
May 1, 2014
1 Exercise 1
The point of this exercise is to investigate which of the three nonparametric bootstrap
methods normal approximation, basic bootstrap or percentile bootstrap is most
accurate for constructing 95% condence interval for median of a gamma distribution
with shape and rate parameters, say, 3 and 5, respectively. As in Mini Project 2, use
Monte Carlo simulation to nd coverage probability of each of these condence intervals
for a variety of values of n, e.g., 5, 10, 30 and 100, and summarize your ndings.
1.1 Overview
The built-in R library "boot" is used extensively in this exercise, and similar methodology
to that discussed in class is used. Monte Carlo simulation is performed to simulate many
condence intervals of bootstrap estimate of median from each resample, then coverage
probability is computed. To compute bootsrap median estimate the command boot is
used and to calculate condence interval the command ci.boot is used. Coverage proba-
bility is computed by taking the average of the test if true median is in the condence
interval computed, since it has a binomial distribution. This steps are performed for the
sample sizes suggested in the problem statement and 1000 is the size of the Monte Carlo
simulation.
1.2 Results and Discussion
From the gure can be seen that the coverage probability is higher using the percentile
bootstrap method with similar value of coverage probability around .92 for all sample
cases simulated. Worst method is the Basic bootstrap method with coverage probability
of 0.6 for a small sample size and .85 with lager sample size. The bootstrap method
normal, similar to the percentile method, it seems to not depend on sample size since for
all sample size coverage probability is around 0.9 but still below the percentile method.
1
20 40 60 80 100
0
.
5
0
.
6
0
.
7
0
.
8
0
.
9
1
.
0
n
C
o
v
e
r
a
g
e

P
r
o
b
a
b
i
l
i
t
y
20 40 60 80 100
0
.
5
0
.
6
0
.
7
0
.
8
0
.
9
1
.
0
n
C
o
v
e
r
a
g
e

P
r
o
b
a
b
i
l
i
t
y
20 40 60 80 100
0
.
5
0
.
6
0
.
7
0
.
8
0
.
9
1
.
0
n
C
o
v
e
r
a
g
e

P
r
o
b
a
b
i
l
i
t
y
Basic
Normal
Percentile
Figure 1.1: Coverage probability with sample size for three boostrap methods to compute
condence interval.
2
1.3 R Code
1 l i br a r y ( boot )
2 N=1000
3 n=c ( 5 , 10 , 30 , 100)
4 covprob . bas i c=rep ( 0 , l engt h ( n) ) # INIT VAR TO COMPUTE COVERAGE
PROBABILITIES
5 covprob . normal=rep ( 0 , l engt h ( n) )
6 covprob . pe r c e nt i l e=rep ( 0 , l engt h ( n) )
7 truemedi an=qgamma( 0. 5 , 3 , 5)
8 median . npar=f unc t i on ( x , i ndi c e s ) { # FUNCTION USED TO COMPUTE BOOTSTRAP
PARAMETER ESTIMATE
9 r e s ul t=median ( x [ i ndi c e s ] )
10 r et ur n ( r e s ul t )
11 }
12 f o r ( i i i n 1: l engt h ( n) ) {
13 suma . bas i c =0.0
14 suma . normal =0.0
15 suma . pe r c e nt i l e =0.0
16 f o r ( i i n 1:N ) {
17 rgma=rgamma( n [ i i ] , 3 , 5 )
18 median . npar . boot=boot ( rgma , median . npar , R=2000, sim=" or di nar y " , stype=
" i " ) # ACTUAL COMMAND THAT COMPUTE BOOSTRAP PARAMETER ESTIMATE
19 median . coni nt=boot . c i ( median . npar . boot ) # FEATURE COMPUTATION FROM
BOOTSTRAP ESTIMATED PARAMETERS
20 suma . bas i c=suma . bas i c+cbi nd ( median . coni nt $ bas i c [ 4] <truemedi an &
truemedi an < median . coni nt $ bas i c [ 5 ] ) # CHECK COVERAGE PROBABILITY FOR
BASIC METHOD
21 suma . normal=suma . normal+cbi nd ( median . coni nt $normal [ 2] <truemedi an &
truemedi an < median . coni nt $normal [ 3 ] )# CHECK COVERAGE PROBABILITY FOR
NORMAL METHOD
22 suma . pe r c e nt i l e=suma . pe r c e nt i l e+cbi nd ( median . coni nt $ per cent [ 4] <
truemedi an & truemedi an < median . coni nt $ per cent [ 5 ] )# CHECK COVERAGE
PROBABILITY FOR PERCENTILE METHOD
23 }
24 covprob . bas i c [ i i ]=suma . bas i c /N #AVERAGE
25 covprob . normal [ i i ]=suma . normal /N #AVERAGE
26 covprob . pe r c e nt i l e [ i i ]=suma . pe r c e nt i l e /N #AVERAGE
27 }
28 # PLOTTING RESULTS
29 pdf ( f i l e="bootcovprob . pdf " , wi dth=13, hei ght =8)
30 pl ot ( n , covprob . bas i c , xl ab="n" , yl ab="Coverage Pr obabi l i t y " , type="o" , pch=0,
cex . l ab =1. 5 , yl i m=c ( 0 . 5 , 1 ) , lwd=2, cex=2)
31 par ( new=T)
32 pl ot ( n , covprob . normal , xl ab="n" , yl ab="Coverage Pr obabi l i t y " , type="o" , pch=1,
cex . l ab =1. 5 , yl i m=c ( 0 . 5 , 1 ) , lwd=2, cex=2)
33 par ( new=T)
34 pl ot ( n , covprob . pe r c e nt i l e , xl ab="n" , yl ab="Coverage Pr obabi l i t y " , type="o" , pch
=2, cex . l ab =1. 5 , yl i m=c ( 0 . 5 , 1 ) , lwd=2, cex=2)
35 l egend ( " bottomri ght " , c ( " Basi c " , "Normal " , " Pe r c e nt i l e " ) , pch=c ( 0 , 1 , 2) , cex=2,
l t y =1, lwd=2)
36 dev . o f f ( )
./mp4.r
3

Statistics For Data Sciences - Mini Project 4

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Statistics For Data Sciences - Mini Project 4

Enviado por

Direitos autorais:

Formatos disponíveis

The University of Texas at Dallas, Mechanical Engineering Department

Statistics for Data Sciences - Mini-Project 4

Você também pode gostar