Você está na página 1de 38

Ch 5: Monte Carlo Integration and

Variance Reduction

Book
Statistical Computing with R
Maria L. Rizzo
Chapman & Hall/CRC, 2008

Integral estimation
g ( x) is a function.
We want to compute g ( x)dx, assuming the integral is finite.
We use facts from statistical moments to estimate integrals.
Recall if X is a random variable with distribution f ( X ) (written as X ~ f ( X ))
and Y = g ( X ) is another random variable, then

EY (Y ) = E X ( g ( X )) = g ( x) f ( x)dx. This suggests the following estimator.


-

Let X 1 , X 2 ,..., X n be an i.i.d. sample from f ( X ). Then an unbiased estimator of


1 n
E[ g ( X )] = g ( X i ).
n i =1
2

Simple Monte Carlo estimator for


an integral over [0,1]
1

Goal is to estimate = g ( x)dx.


0

U(0,1) used
Generate m i.i.d. U (0, 1) random variables X 1 , X 2 ,..., X m . because it ts the
domain of
m
1
integration [0,1].
=
g ( X ) E ( g ( X )) =
m

i =1

with probability 1 by the Strong Law of Large Numbers.

Exercise: Write R code to compute the Monte Carlo estimate


of the integral of exp(-x) on the interval [0,1] and compare it
to
3 the exact answer.

One step harder:domain [a,b]


b

Estimate = g (t )dt.
a

One idea is to use a change of variables so that the simple Monte Carlo
estimator over [0,1] can be used.
Specifically, find function y (t ) such that y (a ) = 0, y (b) = 1 and perform
the integration :
y (b )

dt
dt
g
(
t
(
y
))
dy
=
g
(
t
(
y
))
dy.
y(a)

dy
dy
0

The function that works is y (t ) =


4

t a
dt
. Then t ( y ) = a + (b a ) y and
= b a.
ba
dy

One step harder:domain [a,b]


Alternatively, find a probability density with limits (a,b), for example
the U (a,b) density, and use that.
1
I (a U b), where I () is the indicator function.
ba
Note that the interval we want is related to the expectation with regard to the U (a,b)
The U (a,b) density has form
density fU (u ) as follows :
b

1
g
(
t
)
dt
=
(
b

a
)
g
(
t
)
a
a b a dt = (b a)a g (u ) fU (u )du = (b a) EU [ g (u )].
SAMPLING ALGORITHM :
iid

Generate X 1 , X 2 ,..., X m ~ U (a, b)


m

ba

=
g( X i )

m i =1
5

Example 5.3 from book: Non-finite


limits
Use the above approach to estimate the standard normal cdf
1 t 2 / 2
( x) =
e
dt,
-
2
for an arbitrary x.
x

1 t 2 / 2
For x > 0, ( x) = 0.5 +
e
dt , back to finite limits.
0
2
For -x < 0, (- x) = 1 ( x), so use method above.
x

The problem reduces to estimating = e


0

t 2 / 2

dt for x > 0.

Example 5.3:Non-finite limits


Estimate ( x) =

1 t 2 / 2
e
dt, for an arbitrary x.
2

This could be done by generating U (0, x) random variables as just shown,


but this would require a new generation for every choice of x.
Could we possibly solve the problem for every x by just generating
one sample of m U (0,1) random variables?

Example 5.3:Non-finite limits


x

Estimate = e
0

t 2 / 2

dt, for an arbitrary x.

Use change of variables with y = t/x.


dt
Then t = 0 y = 0, t = x y = 1, t = xy
= x.
dy
The integral to be solved becomes
1

= xe
0

( xy ) 2 / 2

dy = EY [ xe

( xy ) 2 / 2

], where Y ~ U (0,1).

SAMPLING ALGORITHM
iid

Generate U1 ,..., U m ~ U (0,1).


m
1
(U i x ) 2 / 2

Set = xe
.
m i =1

8 If x > 0, ( x ) = 0.5 + / 2 ; if x < 0, ( x ) = 1 ( x ).

R Code for Example 5.3


Generates the integral
for 10 positive xs
ranging from .1 to 2.5.
Note, u and hence g is a
vector; We are looping
through the vector x.
R has a function pnorm
to calculate this
automatically.

Close except for the very


high values of x.

Example 5.4: Semi-finite limits


Calculate ( x) =

1 t 2 / 2
e
dt where you have a standard normal generator
2

at your disposal.
Let Z ~ N (0,1).

1 z2/ 2
1 z2/ 2
E[ I (Z x)] = I ( z x) f Z ( z )dz = I ( z x)
e
dz =
e
dz = ( x).
2
-
-
2
SAMPLING ALGORITHM
iid

Generate Z1 ,..., Z m ~ N (0,1).


m
1
Set ( x) = I ( Z i x).
m i =1

10

By the strong law of large


numbers this estimate
approximates the true
normal probability P(Z x)
with probability 1.

R Code for Example 5.4

Margin = 1 means apply


over rows

11

General Result
f ( x) a probability density supported on set A.
To estimate = g ( x) f ( x)dx,
A

m
1
generate X 1 ,..., X m ~ f ( x) and set = g ( X i ).
m i =1
iid

E () = as m with probability 1 by the


Strong Law of Large Numbers (SLLN).

12

Standard errors
1 m

To calculate the standard error of = g ( X i ), we realize that


m i =1
is a sample mean of the independent g ( X ), g ( X ),..., g ( X )
1

and use basic statistical principles.

Uses that the variance


1 m
1 m
of a sum is the sum of

Var( ) = Var g ( X i ) = 2 Var ( g ( X i ))


m i =1
m i =1
the variances of
1
independent
things.
= 2 m 2 , where 2 = Var( g ( X )) is the variance of the
random variable
g ( X ).
m

2
m

Recall from statistics that Var( X ) =


13

2
n

Standard errors
is a sample mean of the independent g ( X 1 ), g ( X 2 ),..., g ( X m ).
and use basic statistical principles.
2

Var() =
, where 2 = Var ( g ( X )).
m
How do we estimate 2 ?
...by the sample variance of g ( X 1 ), g ( X 2 ),..., g ( X m ).
Recall from statistics that the unbiased estimate of sample variance is
1 m
) 2 , while the maximum likelihood estimate is
s =
(
g
(
X
)

i
m 1 i =1
2

1 m
= ( g ( X i ) ) 2 .
m i =1
2

14

Since m/(m-1) approaches 1 for m


large, and m can be fixed large by the
user, we will follow the book and use
the second estimate.

Standard errors
is a sample mean of the independent g ( X 1 ), g ( X 2 ),..., g ( X m ).
and use basic statistical principles.
2

Var() =
, where 2 = Var ( g ( X )).
m
How do we estimate 2 ?
...by the sample variance of g ( X 1 ), g ( X 2 ),..., g ( X m ).
Recall from statistics that the unbiased estimate of sample variance is
1 m
) 2 , while the maximum likelihood estimate is
s =
(
g
(
X
)

i
m 1 i =1
2

1 m
= ( g ( X i ) ) 2 .
m i =1
2

15

Since m/(m-1) approaches 1 for m


large, and m can be fixed large by the
user, we will follow the book and use
the second estimate.

Standard errors
2

Var()
m
1 m
( g ( X i ) ) 2

m
= i =1
m
m

) 2
(
g
(
X
)

i
i =1

m2

Have to be careful to
have two ms in the
denominator.

and
m

s.e.() =
16

( g ( X ) )

i =1

Confidence intervals (CI)


The Central Limit Theorem (CLT) implies that
E ()
N (0,1) in distribution as m .
Var()
Since E () = , this fact is used to develop a 95% confidence interval for .


For Z ~ N (0,1), P(-1.96 < Z < 1.96) = 0.95 and substituting Z =

s.e.( )

P(-1.96 <
< 1.96) = 0.95, and

s.e.( )
P( 1.96 s.e.() < < + 1.96 s.e.()) = 0.95.
A 95% CI for is 1.96 s.e.().
17

Example 5.5

Can use =
instead of <Note the mean
includes a
division by m.

18

Example 5.5

continued

x = 2, Z ~ N (0,1)
g ( Z ) = I ( Z < x) is a Bernoulli random variable, taking value 1 if Z < x and 0 otherwise.
E[ g ( Z )] = E[ I ( Z < x)] = 1 P( Z < x) + 0 P( Z x) = P( Z < x) = ( x).
( x) is the success probability, P[ g ( Z ) = 1] = ( x). Therefore, according to the
Bernoulli distribution, Var[ g ( Z )] = ( x)(1 ( x)).
1 m

= g ( X i ) is the average of m independent Bernoulli trials, and also equals


m i =1
the proportion of successes out of m trials, each with success probability ( x),
( x)(1 ( x))
which is Binomial distributed. The variance for Bin(m, ( x)) is
.
m
19

If this does not ring a


bell, maybe variance of

Example 5.5

from book

MC variance
estimate
> pnorm(2)
[1] 0.9772499
(2) 0.977, which would yield theoretical variance 0.977(1 - 0.977)/10,000 = 2.223e - 06.
The MC variance estimate is very close.
20

Remarks on Example 5.5


( x)(1
( x)) / m,
1.) Some prefer the second estimate, Var[ g ( Z )]
rather than the MC estimate for the case of estimating proportions.
Either can be used.
2.) The algorithm just shown for estimating general functions of the form I ( Z < x)
is sometimes referred to as the " hit or miss" algorithm because it
generates a lot of random variables Z and records the hits ( Z < x).
3.) This algorithm could require many simulations, however, if x is at the
lower end of the support space of Z , for example, x = - 0.06 in example below.

21

Efficiency

Efficiency in general means doing things faster.


In simulation, it means getting a smaller variance of your estimate for the same
number of simulations.
Var (1 )

If 1 and 2 are two estimators for , then 1 is more efficient than 2 if


< 1.

Var ( 2 )
Efficiency is called a second - order property. As the cartoon suggests, you want to
first worry whether your estimator is correct (unbiased) before you concern yourself
with efficiency.
22

Notes on efficiency

Variances are unknown so their MC estimates are used for efficiency calculations.
Variances of averages are of order 1/m (decrease as the number of simulations m
increases) so one way to decrease the variance is by increasing the number of
simulations.
Sometimes the percent reduction using 2 instead of 1 is reported :
Var (1 ) Var (2 )
100
.

Var( )
1

23

Power calculations
Jim Carrey, Bruce Almighty

Statistical power calculations refer to determining how many samples to collect


or simulations to perform to get a desired level of accuracy.
2

We saw earlier that var() =


, where 2 is the true variance of the
m
object we are taking the average of [ g ( X )].

Suppose we are planning to run a simulation study that is costly, and want to
determine the number of simulations m needed to achieve a standard error below .
We have an " a priori" estimate of 2 from prior experiments.

2
We solve
< for m to obtain that we need m > 2 .

24

Tricks for reducing MC variance

There are some tricks for reducing the variance of


MC integration, which ultimately reduce the number
of random variable generations.
Two include the use of antithetic variables and
control variates in Sections 5.4 and 5.5.
These are beyond the scope of the course.

25

Importance sampling
MOTIVATION :
b

To calculate g (t )dt using MC integration we have used a fU = U (a, b)


a

as a generating density, noting that


b

1
g
(
t
)
dt
=
(
b

a
)
g
(
t
)
a
a b a dt = (b a)a g (u ) fU (u )du = (b a) EU [ g (u )].
The sampling algorithm was :
iid

Generate X 1 , X 2 ,..., X m ~ U (a, b)


m
b

a
=
g( X i )

m i =1

26

Importance sampling
iid

Generate X 1 , X 2 ,..., X m ~ U (a, b)


m
b

a
=
g( X i )

m i =1

This will not work well if g is not matched well by the U (a,b) density.

g(x
)

27

Importance sampling
b

1
g
(
t
)
dt
=
(
b

a
)
g
(
t
)
a
a b a dt = (b a)a g (u ) fU (u )du = (b a) EU [ g (u )]

The idea is to replace the generating density f here


by something that is easy to sample from and more
closely represents the function to be integrated.

28

Importance sampling
GOAL : Calculate g ( x)dx.
LOGIC :
Find density f ( x) such that f ( x) > 0 on the set {x:g ( x) 0} that you
can generate from; f ( x) is called the importance function.
Let Y =

g( X )
be a transformed random variable of X , where X ~ f ( X ).
f (X )

g ( X )
g ( x)
Then E[Y ] = E
=
f ( x)dx = g ( x)dx gives the required integral.

f ( x)
f ( X )
ALGORITHM :
iid

Generate X 1 ,..., X m ~ f ( X ).
1 m g( X i )
Set E[Y ] =
.
m i =1 f ( X i )
29

Picking the right f

g ( X )

Var
m
1
Var (Y )
f ( X ) .
Recall from earlier that Var Yi =
=
m
m
m i =1
We want to choose f ( X ) such that

g( X )
has little variability.
f (X )

The best way to do this is to choose f to mimic the shape of g as closely as


possible so that

30

g( X )
c, a constant, since the variance of a constant is 0.
f (X )

Example from book

Unifo
rmExp(
1)

31

Cauchy
= t1
Rescaled

Exp(1)
Rescaled
Cauchy
Note that some
have a bigger

Example continued

f3
f0
f2
g
f4

32

Plot g(x) and each of the fs.


See which f matches the
shape of g most closely.

f1

Example continued
g/f2
g/f4
g/f3
Plot g(x)/f(x) for each of the fs.
See which is most constant.
f3 looks the best.
33Rescaling the Cauchy (f2 f4)
really helped!

Example continued

Unifo
rm
Exp(
1)
Cauc
hy
34

Note these will have g(x)


= 0 so it does not matter

Example continued
Re-scaled
Exp(1)

Re-scaled
Cauchy

35

Example continued

f3 has the smallest standard error, followed by f4.


The Cauchy (f2) is the worst. This is because its support
is so much larger than [0,1] that most of the generated g/
fs = 0. In fact 75% were 0.
36

Summary

Importance sampling to calculate


expectations
GOAL : For X ~ f ( X ), want to calculate E ( g ( X )) = g ( x) f ( x)dx.
Although f ( X ) is already a probability density, it is not easy to sample
from. This regularly happens in Bayesian inference.
Can still apply importance sampling. Here one needs to find another density
to sample from, ( x), sometimes referred to as the envelope function, that now
closely resembles f ( x) g ( x).

All estimates approach the


true value of the integral as m
ALGORITHM :
approaches infinity by the
iid
Generate X 1 ,..., X m ~ ( X ).
SLLN.
1 m g ( X i ) f ( X i ) Despite its simplicity
Set
.

37 E[ g ( X )] =
importance sampling is rarely
m i =1
(Xi )

End of Chapter 5

Ch 6: MC Methods in Inference
Very important applications of what is
learned in Chapter 5.
Not covered in this course except as
potential homework problems.

38

Você também pode gostar