Monte Carlo Integration

Ch 5: Monte Carlo Integration and
Variance Reduction
Book
Statistical Computing with R
Maria L. Rizzo
Chapman & Hall/CRC, 2008
Integral estimation
g ( x) is a function.
We want to compute g ( x)dx, assuming the integral is finite.
We use facts from statistical moments to estimate integrals.
Recall if X is a random variable with distribution f ( X ) (written as X ~ f ( X ))
and Y = g ( X ) is another random variable, then
EY (Y ) = E X ( g ( X )) = g ( x) f ( x)dx. This suggests the following estimator.

-
Let X 1 , X 2 ,..., X n be an i.i.d. sample from f ( X ). Then an unbiased estimator of

1 n
E[ g ( X )] = g ( X i ).
n i =1
2
Simple Monte Carlo estimator for

an integral over [0,1]
1
Goal is to estimate = g ( x)dx.

0
U(0,1) used
Generate m i.i.d. U (0, 1) random variables X 1 , X 2 ,..., X m . because it ts the
domain of
m
1
integration [0,1].
=
g ( X ) E ( g ( X )) =
m
i =1
with probability 1 by the Strong Law of Large Numbers.
Exercise: Write R code to compute the Monte Carlo estimate

of the integral of exp(-x) on the interval [0,1] and compare it
to
3 the exact answer.
One step harder:domain [a,b]

b
Estimate = g (t )dt.
a
One idea is to use a change of variables so that the simple Monte Carlo
estimator over [0,1] can be used.
Specifically, find function y (t ) such that y (a ) = 0, y (b) = 1 and perform
the integration :
y (b )
dt
dt
g
(
t
(
y
))
dy
=
g
(
t
(
y
))
dy.
y(a)
dy
dy
0
The function that works is y (t ) =

4
t a
dt
. Then t ( y ) = a + (b a ) y and
= b a.
ba
dy
One step harder:domain [a,b]

Alternatively, find a probability density with limits (a,b), for example
the U (a,b) density, and use that.
1
I (a U b), where I () is the indicator function.
ba
Note that the interval we want is related to the expectation with regard to the U (a,b)
The U (a,b) density has form
density fU (u ) as follows :
b
1
g
(
t
)
dt
=
(
b
a
)
g
(
t
)
a
a b a dt = (b a)a g (u ) fU (u )du = (b a) EU [ g (u )].
SAMPLING ALGORITHM :
iid
Generate X 1 , X 2 ,..., X m ~ U (a, b)

m
ba
=
g( X i )
m i =1
5
Example 5.3 from book: Non-finite

limits
Use the above approach to estimate the standard normal cdf
1 t 2 / 2
( x) =
e
dt,
-
2
for an arbitrary x.
x
1 t 2 / 2
For x > 0, ( x) = 0.5 +
e
dt , back to finite limits.
0
2
For -x < 0, (- x) = 1 ( x), so use method above.
x
The problem reduces to estimating = e

0
t 2 / 2
dt for x > 0.
Example 5.3:Non-finite limits

Estimate ( x) =
1 t 2 / 2
e
dt, for an arbitrary x.
2
This could be done by generating U (0, x) random variables as just shown,

but this would require a new generation for every choice of x.
Could we possibly solve the problem for every x by just generating
one sample of m U (0,1) random variables?
Example 5.3:Non-finite limits

x
Estimate = e
0
t 2 / 2
dt, for an arbitrary x.
Use change of variables with y = t/x.

dt
Then t = 0 y = 0, t = x y = 1, t = xy
= x.
dy
The integral to be solved becomes
1
= xe
0
( xy ) 2 / 2
dy = EY [ xe
( xy ) 2 / 2
], where Y ~ U (0,1).
SAMPLING ALGORITHM
iid
Generate U1 ,..., U m ~ U (0,1).

m
1
(U i x ) 2 / 2
Set = xe
.
m i =1
8 If x > 0, ( x ) = 0.5 + / 2 ; if x < 0, ( x ) = 1 ( x ).
R Code for Example 5.3

Generates the integral
for 10 positive xs
ranging from .1 to 2.5.
Note, u and hence g is a
vector; We are looping
through the vector x.
R has a function pnorm
to calculate this
automatically.
Close except for the very

high values of x.
Example 5.4: Semi-finite limits

Calculate ( x) =
1 t 2 / 2
e
dt where you have a standard normal generator
2
at your disposal.
Let Z ~ N (0,1).
1 z2/ 2
1 z2/ 2
E[ I (Z x)] = I ( z x) f Z ( z )dz = I ( z x)
e
dz =
e
dz = ( x).
2
-
-
2
SAMPLING ALGORITHM
iid
Generate Z1 ,..., Z m ~ N (0,1).

m
1
Set ( x) = I ( Z i x).
m i =1
10
By the strong law of large

numbers this estimate
approximates the true
normal probability P(Z x)
with probability 1.
R Code for Example 5.4
Margin = 1 means apply

over rows
11
General Result
f ( x) a probability density supported on set A.
To estimate = g ( x) f ( x)dx,
A
m
1
generate X 1 ,..., X m ~ f ( x) and set = g ( X i ).
m i =1
iid
E () = as m with probability 1 by the

Strong Law of Large Numbers (SLLN).
12
Standard errors
1 m
To calculate the standard error of = g ( X i ), we realize that

m i =1
is a sample mean of the independent g ( X ), g ( X ),..., g ( X )
1
and use basic statistical principles.
Uses that the variance

1 m
1 m
of a sum is the sum of
Var( ) = Var g ( X i ) = 2 Var ( g ( X i ))

m i =1
m i =1
the variances of
1
independent
things.
= 2 m 2 , where 2 = Var( g ( X )) is the variance of the
random variable
g ( X ).
m
2
m
Recall from statistics that Var( X ) =

13
2
n
Standard errors
is a sample mean of the independent g ( X 1 ), g ( X 2 ),..., g ( X m ).
2
Var() =
, where 2 = Var ( g ( X )).
m
How do we estimate 2 ?
...by the sample variance of g ( X 1 ), g ( X 2 ),..., g ( X m ).
Recall from statistics that the unbiased estimate of sample variance is
1 m
) 2 , while the maximum likelihood estimate is
s =
(
g
(
X
)
i
m 1 i =1
2
1 m
= ( g ( X i ) ) 2 .
m i =1
2
14
Since m/(m-1) approaches 1 for m

large, and m can be fixed large by the
user, we will follow the book and use
the second estimate.
Standard errors
is a sample mean of the independent g ( X 1 ), g ( X 2 ),..., g ( X m ).
2
Var() =
, where 2 = Var ( g ( X )).
m
How do we estimate 2 ?
...by the sample variance of g ( X 1 ), g ( X 2 ),..., g ( X m ).
Recall from statistics that the unbiased estimate of sample variance is
1 m
) 2 , while the maximum likelihood estimate is
s =
(
g
(
X
)
i
m 1 i =1
2
1 m
= ( g ( X i ) ) 2 .
m i =1
2
15
Since m/(m-1) approaches 1 for m

large, and m can be fixed large by the
user, we will follow the book and use
the second estimate.
Standard errors
2
Var()
m
1 m
( g ( X i ) ) 2
m
= i =1
m
m
) 2
(
g
(
X
)
i
i =1
m2
Have to be careful to
have two ms in the
denominator.
and
m
s.e.() =
16
( g ( X ) )
i =1
Confidence intervals (CI)

The Central Limit Theorem (CLT) implies that
E ()
N (0,1) in distribution as m .
Var()
Since E () = , this fact is used to develop a 95% confidence interval for .

For Z ~ N (0,1), P(-1.96 < Z < 1.96) = 0.95 and substituting Z =
s.e.( )

P(-1.96 <
< 1.96) = 0.95, and
s.e.( )
P( 1.96 s.e.() < < + 1.96 s.e.()) = 0.95.
A 95% CI for is 1.96 s.e.().
17
Example 5.5
Can use =
instead of <Note the mean
includes a
division by m.
18
Example 5.5
continued
x = 2, Z ~ N (0,1)
g ( Z ) = I ( Z < x) is a Bernoulli random variable, taking value 1 if Z < x and 0 otherwise.
E[ g ( Z )] = E[ I ( Z < x)] = 1 P( Z < x) + 0 P( Z x) = P( Z < x) = ( x).
( x) is the success probability, P[ g ( Z ) = 1] = ( x). Therefore, according to the
Bernoulli distribution, Var[ g ( Z )] = ( x)(1 ( x)).
1 m
= g ( X i ) is the average of m independent Bernoulli trials, and also equals

m i =1
the proportion of successes out of m trials, each with success probability ( x),
( x)(1 ( x))
which is Binomial distributed. The variance for Bin(m, ( x)) is
.
m
19
If this does not ring a

bell, maybe variance of
Example 5.5
from book
MC variance
estimate
> pnorm(2)
[1] 0.9772499
(2) 0.977, which would yield theoretical variance 0.977(1 - 0.977)/10,000 = 2.223e - 06.
The MC variance estimate is very close.
20
Remarks on Example 5.5

( x)(1
( x)) / m,
1.) Some prefer the second estimate, Var[ g ( Z )]
rather than the MC estimate for the case of estimating proportions.
Either can be used.
2.) The algorithm just shown for estimating general functions of the form I ( Z < x)
is sometimes referred to as the " hit or miss" algorithm because it
generates a lot of random variables Z and records the hits ( Z < x).
3.) This algorithm could require many simulations, however, if x is at the
lower end of the support space of Z , for example, x = - 0.06 in example below.
21
Efficiency
Efficiency in general means doing things faster.

In simulation, it means getting a smaller variance of your estimate for the same
number of simulations.
Var (1 )
If 1 and 2 are two estimators for , then 1 is more efficient than 2 if

< 1.
Var ( 2 )
Efficiency is called a second - order property. As the cartoon suggests, you want to
first worry whether your estimator is correct (unbiased) before you concern yourself
with efficiency.
22
Notes on efficiency
Variances are unknown so their MC estimates are used for efficiency calculations.
Variances of averages are of order 1/m (decrease as the number of simulations m
increases) so one way to decrease the variance is by increasing the number of
simulations.
Sometimes the percent reduction using 2 instead of 1 is reported :
Var (1 ) Var (2 )
100
.
Var( )
1
23
Power calculations
Jim Carrey, Bruce Almighty
Statistical power calculations refer to determining how many samples to collect

or simulations to perform to get a desired level of accuracy.
2
We saw earlier that var() =

, where 2 is the true variance of the
m
object we are taking the average of [ g ( X )].
Suppose we are planning to run a simulation study that is costly, and want to
determine the number of simulations m needed to achieve a standard error below .
We have an " a priori" estimate of 2 from prior experiments.
2
We solve
< for m to obtain that we need m > 2 .
24
Tricks for reducing MC variance
There are some tricks for reducing the variance of

MC integration, which ultimately reduce the number
of random variable generations.
Two include the use of antithetic variables and
control variates in Sections 5.4 and 5.5.
These are beyond the scope of the course.
25
Importance sampling
MOTIVATION :
b
To calculate g (t )dt using MC integration we have used a fU = U (a, b)

a
as a generating density, noting that

b
1
g
(
t
)
dt
=
(
b
a
)
g
(
t
)
a
a b a dt = (b a)a g (u ) fU (u )du = (b a) EU [ g (u )].
The sampling algorithm was :
iid

m
b
a
=
g( X i )
m i =1
26
Importance sampling
iid

m
b
a
=
g( X i )
m i =1
This will not work well if g is not matched well by the U (a,b) density.
g(x
)
27
Importance sampling
b
1
g
(
t
)
dt
=
(
b
a
)
g
(
t
)
a
a b a dt = (b a)a g (u ) fU (u )du = (b a) EU [ g (u )]
The idea is to replace the generating density f here

by something that is easy to sample from and more
closely represents the function to be integrated.
28
Importance sampling
GOAL : Calculate g ( x)dx.
LOGIC :
Find density f ( x) such that f ( x) > 0 on the set {x:g ( x) 0} that you
can generate from; f ( x) is called the importance function.
Let Y =
g( X )
be a transformed random variable of X , where X ~ f ( X ).
f (X )
g ( X )
g ( x)
Then E[Y ] = E
=
f ( x)dx = g ( x)dx gives the required integral.
f ( x)
f ( X )
ALGORITHM :
iid
Generate X 1 ,..., X m ~ f ( X ).
1 m g( X i )
Set E[Y ] =
.
m i =1 f ( X i )
29
Picking the right f
g ( X )
Var
m
1
Var (Y )
f ( X ) .
Recall from earlier that Var Yi =
=
m
m
m i =1
We want to choose f ( X ) such that
g( X )
has little variability.
f (X )
The best way to do this is to choose f to mimic the shape of g as closely as

possible so that
30
g( X )
c, a constant, since the variance of a constant is 0.
f (X )
Example from book
Unifo
rmExp(
1)
31
Cauchy
= t1
Rescaled
Exp(1)
Rescaled
Cauchy
Note that some
have a bigger
Example continued
f3
f0
f2
g
f4
32
Plot g(x) and each of the fs.

See which f matches the
shape of g most closely.
f1
Example continued
g/f2
g/f4
g/f3
Plot g(x)/f(x) for each of the fs.
See which is most constant.
f3 looks the best.
33Rescaling the Cauchy (f2 f4)
really helped!
Example continued
Unifo
rm
Exp(
1)
Cauc
hy
34
Note these will have g(x)

= 0 so it does not matter
Example continued
Re-scaled
Exp(1)
Re-scaled
Cauchy
35
Example continued
f3 has the smallest standard error, followed by f4.

The Cauchy (f2) is the worst. This is because its support
is so much larger than [0,1] that most of the generated g/
fs = 0. In fact 75% were 0.
36
Summary
Importance sampling to calculate

expectations
GOAL : For X ~ f ( X ), want to calculate E ( g ( X )) = g ( x) f ( x)dx.
Although f ( X ) is already a probability density, it is not easy to sample
from. This regularly happens in Bayesian inference.
Can still apply importance sampling. Here one needs to find another density
to sample from, ( x), sometimes referred to as the envelope function, that now
closely resembles f ( x) g ( x).
All estimates approach the

true value of the integral as m
ALGORITHM :
approaches infinity by the
iid
Generate X 1 ,..., X m ~ ( X ).
SLLN.
1 m g ( X i ) f ( X i ) Despite its simplicity
Set
.
37 E[ g ( X )] =
importance sampling is rarely
m i =1
(Xi )
End of Chapter 5
Ch 6: MC Methods in Inference
Very important applications of what is
learned in Chapter 5.
Not covered in this course except as
potential homework problems.
38

Monte Carlo Integration

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Monte Carlo Integration

Enviado por

Direitos autorais:

Formatos disponíveis

Ch 5: Monte Carlo Integration and

EY (Y ) = E X ( g ( X )) = g ( x) f ( x)dx. This suggests the following estimator.

Let X 1 , X 2 ,..., X n be an i.i.d. sample from f ( X ). Then an unbiased estimator of

Simple Monte Carlo estimator for

Goal is to estimate = g ( x)dx.

with probability 1 by the Strong Law of Large Numbers.

Exercise: Write R code to compute the Monte Carlo estimate

One step harder:domain [a,b]

The function that works is y (t ) =

One step harder:domain [a,b]

Generate X 1 , X 2 ,..., X m ~ U (a, b)

Example 5.3 from book: Non-finite

The problem reduces to estimating = e

Example 5.3:Non-finite limits

This could be done by generating U (0, x) random variables as just shown,

Example 5.3:Non-finite limits

dt, for an arbitrary x.

Use change of variables with y = t/x.

Generate U1 ,..., U m ~ U (0,1).

8 If x > 0, ( x ) = 0.5 + / 2 ; if x < 0, ( x ) = 1 ( x ).

R Code for Example 5.3

Close except for the very

Example 5.4: Semi-finite limits

Generate Z1 ,..., Z m ~ N (0,1).

By the strong law of large

R Code for Example 5.4

Margin = 1 means apply

E () = as m with probability 1 by the

To calculate the standard error of = g ( X i ), we realize that

and use basic statistical principles.

Uses that the variance

Var( ) = Var g ( X i ) = 2 Var ( g ( X i ))

Recall from statistics that Var( X ) =

Since m/(m-1) approaches 1 for m

Since m/(m-1) approaches 1 for m

Confidence intervals (CI)

= g ( X i ) is the average of m independent Bernoulli trials, and also equals

If this does not ring a

Remarks on Example 5.5

Efficiency in general means doing things faster.

If 1 and 2 are two estimators for , then 1 is more efficient than 2 if

Statistical power calculations refer to determining how many samples to collect

We saw earlier that var() =

Tricks for reducing MC variance

There are some tricks for reducing the variance of

To calculate g (t )dt using MC integration we have used a fU = U (a, b)

as a generating density, noting that

Generate X 1 , X 2 ,..., X m ~ U (a, b)

Generate X 1 , X 2 ,..., X m ~ U (a, b)

The idea is to replace the generating density f here

Picking the right f

The best way to do this is to choose f to mimic the shape of g as closely as

Example from book

Plot g(x) and each of the fs.

Note these will have g(x)

f3 has the smallest standard error, followed by f4.

Importance sampling to calculate

All estimates approach the

Você também pode gostar