The University of Texas at Dallas, Mechanical Engineering Department
Statistics for Data Sciences - Mini-Project 4
Edgardo Javier Garca Cartagena May 1, 2014 1 Exercise 1 The point of this exercise is to investigate which of the three nonparametric bootstrap methods normal approximation, basic bootstrap or percentile bootstrap is most accurate for constructing 95% condence interval for median of a gamma distribution with shape and rate parameters, say, 3 and 5, respectively. As in Mini Project 2, use Monte Carlo simulation to nd coverage probability of each of these condence intervals for a variety of values of n, e.g., 5, 10, 30 and 100, and summarize your ndings. 1.1 Overview The built-in R library "boot" is used extensively in this exercise, and similar methodology to that discussed in class is used. Monte Carlo simulation is performed to simulate many condence intervals of bootstrap estimate of median from each resample, then coverage probability is computed. To compute bootsrap median estimate the command boot is used and to calculate condence interval the command ci.boot is used. Coverage proba- bility is computed by taking the average of the test if true median is in the condence interval computed, since it has a binomial distribution. This steps are performed for the sample sizes suggested in the problem statement and 1000 is the size of the Monte Carlo simulation. 1.2 Results and Discussion From the gure can be seen that the coverage probability is higher using the percentile bootstrap method with similar value of coverage probability around .92 for all sample cases simulated. Worst method is the Basic bootstrap method with coverage probability of 0.6 for a small sample size and .85 with lager sample size. The bootstrap method normal, similar to the percentile method, it seems to not depend on sample size since for all sample size coverage probability is around 0.9 but still below the percentile method. 1 20 40 60 80 100 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 . 0 n C o v e r a g e
P r o b a b i l i t y 20 40 60 80 100 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 . 0 n C o v e r a g e
P r o b a b i l i t y 20 40 60 80 100 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 . 0 n C o v e r a g e
P r o b a b i l i t y Basic Normal Percentile Figure 1.1: Coverage probability with sample size for three boostrap methods to compute condence interval. 2 1.3 R Code 1 l i br a r y ( boot ) 2 N=1000 3 n=c ( 5 , 10 , 30 , 100) 4 covprob . bas i c=rep ( 0 , l engt h ( n) ) # INIT VAR TO COMPUTE COVERAGE PROBABILITIES 5 covprob . normal=rep ( 0 , l engt h ( n) ) 6 covprob . pe r c e nt i l e=rep ( 0 , l engt h ( n) ) 7 truemedi an=qgamma( 0. 5 , 3 , 5) 8 median . npar=f unc t i on ( x , i ndi c e s ) { # FUNCTION USED TO COMPUTE BOOTSTRAP PARAMETER ESTIMATE 9 r e s ul t=median ( x [ i ndi c e s ] ) 10 r et ur n ( r e s ul t ) 11 } 12 f o r ( i i i n 1: l engt h ( n) ) { 13 suma . bas i c =0.0 14 suma . normal =0.0 15 suma . pe r c e nt i l e =0.0 16 f o r ( i i n 1:N ) { 17 rgma=rgamma( n [ i i ] , 3 , 5 ) 18 median . npar . boot=boot ( rgma , median . npar , R=2000, sim=" or di nar y " , stype= " i " ) # ACTUAL COMMAND THAT COMPUTE BOOSTRAP PARAMETER ESTIMATE 19 median . coni nt=boot . c i ( median . npar . boot ) # FEATURE COMPUTATION FROM BOOTSTRAP ESTIMATED PARAMETERS 20 suma . bas i c=suma . bas i c+cbi nd ( median . coni nt $ bas i c [ 4] <truemedi an & truemedi an < median . coni nt $ bas i c [ 5 ] ) # CHECK COVERAGE PROBABILITY FOR BASIC METHOD 21 suma . normal=suma . normal+cbi nd ( median . coni nt $normal [ 2] <truemedi an & truemedi an < median . coni nt $normal [ 3 ] )# CHECK COVERAGE PROBABILITY FOR NORMAL METHOD 22 suma . pe r c e nt i l e=suma . pe r c e nt i l e+cbi nd ( median . coni nt $ per cent [ 4] < truemedi an & truemedi an < median . coni nt $ per cent [ 5 ] )# CHECK COVERAGE PROBABILITY FOR PERCENTILE METHOD 23 } 24 covprob . bas i c [ i i ]=suma . bas i c /N #AVERAGE 25 covprob . normal [ i i ]=suma . normal /N #AVERAGE 26 covprob . pe r c e nt i l e [ i i ]=suma . pe r c e nt i l e /N #AVERAGE 27 } 28 # PLOTTING RESULTS 29 pdf ( f i l e="bootcovprob . pdf " , wi dth=13, hei ght =8) 30 pl ot ( n , covprob . bas i c , xl ab="n" , yl ab="Coverage Pr obabi l i t y " , type="o" , pch=0, cex . l ab =1. 5 , yl i m=c ( 0 . 5 , 1 ) , lwd=2, cex=2) 31 par ( new=T) 32 pl ot ( n , covprob . normal , xl ab="n" , yl ab="Coverage Pr obabi l i t y " , type="o" , pch=1, cex . l ab =1. 5 , yl i m=c ( 0 . 5 , 1 ) , lwd=2, cex=2) 33 par ( new=T) 34 pl ot ( n , covprob . pe r c e nt i l e , xl ab="n" , yl ab="Coverage Pr obabi l i t y " , type="o" , pch =2, cex . l ab =1. 5 , yl i m=c ( 0 . 5 , 1 ) , lwd=2, cex=2) 35 l egend ( " bottomri ght " , c ( " Basi c " , "Normal " , " Pe r c e nt i l e " ) , pch=c ( 0 , 1 , 2) , cex=2, l t y =1, lwd=2) 36 dev . o f f ( ) ./mp4.r 3