Você está na página 1de 7

Statistics 512 Notes 8: The Monte Carlo Method

The t-test Let X 1 ,K , X n be iid with mean and unknown distribution. Consider the hypotheses H 0 : = 0 vs. H1 : > 0 If the distribution of the X i is normal (with unknown variance), then a test with exact size 0.05 is to use the test statistic X 0 t= . S n and the rejection region {t : t > t ,n 1} [where t ,n 1 is the (1 ) quantile of the t-distribution with n-1 degrees of freedom, i.e., = P(T > t ,n 1 ) ]. This is called the t-test. When the distribution of X i is normal, the test has exact X 0 t= size because when = 0 , has a tS n distribution with n-1 degrees of freedom.

When the distribution of X i is not normal, the test does not necessarily have exact size 0.05. However, as for large n, X 0 X 0 P0 > t , n 1 P0 > z P ( Z > z ) = S S n n because of the Central Limit Theorem so that the t-test has approximate size 0.05 for large samples for any distribution of X i . Note the difference between the rejection rule {t : t > t ,n 1} and {t : t > z } . The large sample {t : t > z } has approximate size , while {t : t > t ,n 1} has exact size . Of course, we now have to assume that X i has a normal distribution. In practice, we may not be willing to assume that the population is normal. In general t-critical values are larger than z critical values (i.e., t ,n 1 > z ) so the t-test is conservative relative to the large sample test. So in practice, many statisticians often use the t-test even if they do not believe the data is normally distributed. Note that lim t ,n 1 = z . n How well does the t-test work in moderate sized samples when the data is not normal, i.e., what is its true size in moderate sized samples? Example 5.8.5: Consider the following contaminated normal distribution: 75% of the time an observation is

generated by a standard normal distribution while 25% of the time it is generated by a normal distribution with mean 0 and standard deviation 25. We call this distribution contaminated normal distribution A. Suppose a random sample of size 20 is generated from contaminated normal distribution A. The mean of X i is 0 so H 0 is true. What is the true size of using the nominal size 0.05 t-test (reject the null hypothesis when t > t.05,19 = 1.729 which would have size 0.05 for a normal distribution) for random samples of size 20 contaminated normal distribution A? Let f ( x) denote the density of the contaminated normal X t ( X 1 ,K , X n ) = distribution A and let . S n The true size of the t-test for contaminated normal distribution A is L I{t ( x1,K , x20 ) > 1.729} f ( x1)L f ( x20 )dx1L dx 20 (1)

where I {t ( x1 ,K , x20 ) > 1.729} =0 if t ( x1 ,K , x20 ) > 1.729 and 0 otherwise. We can write (1) as E[ I {t ( x1 ,K , x20 ) > 1.729}] where the expectation is with respect to random samples from contaminated normal distribution A. The Monte Carlo method: Consider a function g ( X ) of a random vector X where X has density f ( X ) . Consider the expected value of g ( X ) : E[ g ( X )] = g ( x ) f ( x )dx .

Suppose we take an iid random samples X 1 ,K , X n from the density f ( X ) . Then by the law of large numbers n P i=1 g ( X i ) E[ g ( X )] n The Monte Carlo method is to estimate E[ g ( X )] by n Standard error of the estimate is
n n i =1 g ( X i ) i =1 g ( X i ) n S E [ g ( X )] = n By the Central Limit Theorem, an approximate 95% confidence interval for E[ g ( X )] is E[ g ( X )] 1.96 SE[ g ( X )] 2

E[ g ( X )] =

n i =1

g(Xi )

Example: Monte Carlo estimation of Define the unit square as a square centered at (0.5,0.5) with sides of length 1 and the unit circle as the circle centered at the origin with a radius of length 1. The ratio of the area of the unit circle that lies in the first quadrant to the area of the unit square is / 4 .

Let U1 and U 2 be iid uniform (0,1) random variables. Let g (U1 , U 2 ) =1 if (U1 ,U 2 ) is in the unit circle and 0 otherwise. Then E[ g (U1 ,U 2 )] = . 4 Monte Carlo method: Repeat the experiment of drawing U1 and U 2 be iid uniform (0,1) random variables n times and g (U i1 , U i 2 ) estimate by 4 i =1 n
n

In R, the command runif(n) draws n iid uniform (0,1) random variables. Here is a function for estimating pi piest=function(n){ # # Obtains the estimate of pi and its standard # error for the simulation discussed in Example 5.8.1 # # n is the number of simulations # u1=runif(n); u2=runif(n); cnt=rep(0,n); chk=u1^2+u2^2-1; cnt[chk<0]=1; est=4*mean(cnt); se=4*sqrt(est*(1-est)/n); list(estimate=est,standard=se); }

Back to Example 5.8.5: The true size of the 0.05 nominal size t-test ) for random samples of size 20 contaminated normal distribution A? We want to estimate E[ I {t ( x1 ,K , x20 ) > 1.729}] Monte Carlo method: n I {t ( xi ,1 , K , xi ,20 ) > 1.729} [ I {t ( x , K , x ) > 1.729}] = i =1 E 1 20 n where ( xi ,1 ,K , xi ,20 ) is a random sample of size 20 from the contaminated normal distribution A. How to draw a random observation from the contaminated normal distribution A? (1) Draw a Bernoulli random variable B with p=0.75; (2) If B=1, draw a random observation from the standard normal distribution. If B=0, draw a random observation from the normal distribution with mean 0 and standard deviation 25. In R, the command rnorm(n,mean=0,sd=1) draws a random sample of size n from the normal distribution with the specified mean and SD. The command rbinom(n,size=1,p) draws a random sample of size n from Bernoulli distribution with probability of success p. R function for obtaining Monte Carlo estimate E[ I {t ( x1 ,K , x20 ) > 1.729}] empalphacn=function(nsims){

# # Obtains the empirical level of the test discussed in # Example 5.8.5 # # nsims is the number of simulations # sigmac=25; eps=.25; alpha=.05; n=20; tc=qt(1-alpha,n-1); ic=0; for(i in 1:nsims){ samp=rcn(n,eps,sigmac); ttest=(sqrt(n)*mean(samp))/var(samp)^.5; if(ttest>tc){ ic=ic+1; } empalp=ic/nsims; err=1.96*sqrt((empalp*(1-empalp))/nsims); list(empiricalalpha=empalp,error=err); } Generating random observations with given cdf F Theorem 5.8.1: Suppose the random variable U has a uniform (0,1) distribution. Let F be the cdf of a random variable with a continuous distribution function. Then the random variable X = F 1 (U ) has cdf F.

Você também pode gostar