Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract
Monte Carlo method. New theoretical results are illustrated in numerical ex-
amples.
1. Introduction
The method is particularly advantageous for complex problems with low reg-
ularity, typically resulting in high memory and CPU time demands. The idea
as control variates for more accurate sample approximations and thereby re-
been introduced by M. Giles [1] for Itô stochastic dierential equations arising
in mathematical nance after similar ideas have been published in the earlier
work by S. Heinrichs [2] on numerical quadrature. Since then MLMC has been
extended to elliptic PDEs [3, 4], parabolic problems [5], conservation laws [6],
variational inequalities [7, 8], multiscale PDEs [9], Kalman ltering [10] and
other elds. The recent work [11] contains a recipe for an ecient evaluation of
central statistical moments of arbitrary order. The aim of the present article is
density functions.
tion theory. One way to solve it is to recover the PDF from a truncated sequence
of statistical moments (see the recent work by Giles et al. [12] for an alternative
approach). This task (also known as solving the truncated moment problem ) is
by no means trivial and has been extensively studied in measure and probabil-
ity theory [13, 14, 15, 16]. It is well known that depending on the prescribed
moments, the truncated moment problem may have no solution or multiple (in-
nitely many) solutions. The latter is typically the case when the truncated
out the case of negative even-order monomial moments and similar incompat-
errors, the sequence of estimated moments may become inadmissible even when
criterion to select a PDF which is the most appropriate among innitely many
solutions to the truncated moment problem. The strategy of selecting the least
biased estimate brings us to the concept of the Maximum Entropy (ME) method
constraint at the prescribed moment values. Obviously, the error of this ap-
2
the estimated moments. Under appropriate assumptions the original constraint
ences [18, 19] contain a rigorous error analysis of this class of ME methods, [18]
also combines it with the Monte Carlo approach. The purpose of this work is
to combine the Maximum Entropy approach with the Multilevel Monte Carlo
number of statistical moments, ii) statistical error and iii) discretization error.
We derive complexity estimates for the proposed approach, test its performance
on a set of synthetic problems with known PDFs, and demonstrate its applica-
give a complete a priori error analysis for the proposed method in Section 3.
the statistical moments: the Monte Carlo method based on exact sampling,
the Monte Carlo method based on approximate sampling, and the Multilevel
Monte Carlo approach. The error estimates naturally depend on the number of
statistical moments, the sample size and the level of accuracy for approximate
In the following, ln(·) stands for the natural logarithm. We use a convention
that for two scalar quantities f and g the notation f . g means that there exists
a nonnegative constant C independent of the approximation parameters such
3
2. Preliminaries
In this section we recall some preliminary information needed for the subse-
quent analysis, see e.g. [20] and the references therein for the general framework
of the multilevel Monte Carlo method (we utilise the notations from [8, 11]),
and [18, 19, 21] for the description of the Maximum Entropy method.
able which is not available for direct sampling. Instead, there exists an approx-
1
kEM [X` ] − E[X]k2L2 = |E[X − X` ]|2 + Var[X` ] (1)
M
where Var[X` ] is the variance of X` . The idea of the two-level Monte Carlo
variance of the estimator. Indeed, for the two-level estimator it holds that
or a PDE which is not available in closed form, but can be computed approxi-
method. In this setting the parameter ` plays the role of a discretization param-
eter. It is plausible that the samples of the ne approximation X` are better
than samples of the coarse approximation X`−1 . The Multilevel Monte Carlo
L
X
E ML [X] := EM` [X` − X`−1 ], X0 := 0.
`=1
4
When EM` [X` − X`−1 ] are independent for ` = 1, . . . , L, there holds the error
representation
L
X
kE ML [X] − E[X]k2L2 = |E[X − XL ]|2 +
1
Var[X` − X`−1 ]. (2)
M`
`=1
ples at the ne grids (i.e. when 1 ` ≤ L) at the cost of computing more
dened as a functional
Z
E[− ln(ρ(X))] = − ρ(x) ln ρ(x) dx.
I
The integral can be taken over the support of the density ρ. The subsequent
moments
Z
µk = φk (x)ρ(x) dx, 0≤k≤R (3)
I
where {φk }R
k=0 are linearly independent functions. In this paper we concentrate
on the case when {φ0 , . . . , φR } are algebraic polynomials forming a basis in the
5
the truncated sequence of moments is admissible (i.e. when it corresponds to
some PDF), then the system of equations (3) may have multiple solutions.
choose the one with the greatest Shannon entropy. Under the assumption
that there exists a strictly positive PDF fullling (3), the unique maximizer
is given by
X
R
ρ
R (x) := exp λ φ
k k (x) , λk ∈ R,
k=0 (4)
Z
µk = φk (x)ρR (x) dx, 0 ≤ k ≤ R.
I
For a rigorous derivation of this result see e.g. [18, Lemma 3] and [21]. Assuming
that the statistical moments (3) are not known exactly and a nite sequence
of µk 's and may contain sampling and discretization errors, etc.) is available
instead, we dene the perturbed Maximum Entropy solution ρ̃R as the solution
If ρ is suciently regular (we refer to the subsequent error analysis for the
when R→∞ and the estimation error ρR − ρ̃R becomes small when µ̃k → µk
implying that the total error ρ− ρ̃R converges to zero in a suitable sense. It turns
out that the Kullback-Leibler distance (or the relative entropy, KL-divergence)
Z
ρ(x)
DKL (ρkη) = ρ(x) ln dx
I η(x)
is a natural measure of distance between two probability densities [22, 23]. In
of the truncation and estimation error has been shown in [18, Lemma 3]
6
so that ρR is sometimes called the information projection.
It is well known that DKL (ρkη) is non-negative and DKL (ρkη) = 0 if and only
if ρ = η. Moreover, DKL is an upper bound for all Lp norms of the dierence
ρ − η. The following result can be found in [19, Lemma 2.2, Proposition 2.3],
p−1
kρ − ηkpLp ≤ p(p − 1) (max(kρkL∞ , kηkL∞ ) DKL (ρkη). (8)
Observe that the KL-divergence is non-symmetric and DKL (ρkη) < ∞ im-
plies that supp(ρ) ⊆ supp(η). Another two-sided bound in terms of the weighted
2
L (ρ) norms of the log-densities is required for the subsequent analysis. We refer
Lemma 2. For two probability density functions ρ and η with ln(ρ/η) ∈ L∞ (I)
where I = supp(ρ) it holds that
Z 2
1 −kln(ρ/η)kL∞ (I) ρ(x)
DKL (ρkη) ≥ e ρ(x) ln dx
2 I η(x)
and Z 2
1 k ln(ρ/η)−ckL∞ (I) ρ(x)
DKL (ρkη) ≤ e ρ(x) ln −c dx
2 I η(x)
for any c ∈ R.
Thanks to the error splitting (6) it is sucient to estimate the truncation er-
ror and the estimation error with respect to the KL-divergence. The size of the
Theorem 1 we address two typical cases: when a) ln(ρ) has a nite Sobolev
regularity and b) ln(ρ) is analytic on I. The estimation error DKL (ρR kρ̃R ) is
7
is controlled by a number of estimation parameters. Depending on the method
of estimation (we consider the Monte Carlo method with exact samples, approx-
imate single- and Multilevel Monte Carlo methods) the generic term estimation
optimal relation between parameter values and obtain upper bounds for the re-
quired computational cost. The estimates for the error-versus-cost relation are
8
R
Proof. Let ψ ∈ PR and dene ηR := eψ / eψ so that ηR is a probability density.
R
By (6), non-negativity of the KL-divergence and Lemma 2 with c = ln( eψ )
Z
1 2
DKL (ρkρR ) ≤ DKL (ρkηR ) ≤ ek ln(ρ)−ψkL∞ (I) ρ(x) ln(ρ(x)) − ψ(x) dx.
2 I
Now we are ready to prove a priori convergence estimates for the truncation
√
k ln(ρ) − ψkL∞ (I) ≤ 2k ln(ρ) − ψkH 1 (I)
and moreover
Z
(ln(ρ) − ψ)0 v 0 = 0, ∀v ∈ PR ,
I
(ln(ρ) − ψ)|∂I = 0,
9
see [24, Theorem 3.14]. Then kln(ρ) − ψR kH 1 (I) → 0 for R→∞ and
2
kln(ρ) − ψR kL2 (I) ≤ CR−2s k ln(ρ)k2H s (I)
If ln(ρ) is analytic, the result follows for the same projection ψR from [24,
the quantity DKL (ρR kρ̃R ) the estimation error part in the splitting (6).
turbation of the rst R moments. Evidently, this stability bound depends on the
the orthonormal basis on [−1, 1], i.e. we choose φk (x) := Pk (x) where Pk are
and AR dened in (12) . Then the Maximum Entropy solution ρ̃R of (5) con-
straint at perturbed moments exists, is unique and is close to ρR in the sense
that
R
X
D(ρR kρ̃R ) ≤ CR (µk − µ̃k )2 . (14)
k=1
10
Proof. [18, Lemma 5].
Notice that CR is uniformly bounded when log(ρ) ∈ H s (I) for any s > 1.
This can be shown involving the arguments from [18, Theorem 3].
The next observation is that when the sequence of perturbed moments µ̃k
is obtained by sampling, µ̃k 's are not deterministic but are, in fact, random
variables. As a consequence, it may happen that (13) is not fullled with certain
probability, so that the existence of ρ̃R is not guaranteed in this case. The
bound on the probability that ρ̃R may not exist. The proof follows [18] and is
Theorem 2. Suppose that the assumptions of Lemma 3 are satised and addi-
tionally, let µ̃k be probabilistic estimators for µk satisfying (13) in mean, i.e.
X
R
1
E (µk − µ̃k )2 ≤ Φ(R, ρ) ≤
(2AR CR )2
k=1
for some function Φ. Then ρ̃R exists with probability at least 1 − p∗ where
X
R
2 2
p∗ = (2AR CR ) E (µk − µ̃k ) . (15)
k=1
Proof. The proof is a combination of Lemma 3 and the Markov inequality. For
Z Z
P(Z > c) = 1dP ≤ c−1 Z dP ≤ c−1 E[Z].
Z>c Z>c
X
R X
R
2 −2 2 2
P (µk − µ̃k ) > (2AR CR ) ≤ (2AR CR ) E (µk − µ̃k ) = p∗ .
k=1 k=1
11
Thus, (13) is satised with probability at least 1 − p∗ and ρ̃R exists with the
R
X
(µk − µ̃k )2 ≤ p−1 Φ(R, ρ)
k=1
and together with (14) this implies that DKL (ρR kρ̃R ) ≤ CR p−1 Φ(R, ρ) with the
same probability, 1 − p.
• A Monte Carlo estimator for the case when exact samples of X can be
• A Monte Carlo estimator for the case when only approximate samples
• A Multilevel Monte Carlo estimator for the case when approximate sam-
sampling).
ized Legendre polynomial of degree k. Then it easy to see that the following
12
Proof. We have for any k∈N
Z 1
Var[Pk (Z)] = E[Pk2 (Z)] − E[Pk (Z)]2 ≤ Pk2 (z)ρZ (z) dz ≤ kρZ kL∞ (−1,1)
−1
In this work we focus on the case when the cost of generation of all necessary
nonlinear equations (5). This situation typically occurs when generation of one
Finite Element Method or some other numerical method. Therefore the term
of all necessary samples neglecting the cost of solving the system of nonlinear
equations (5).
section will be that log(ρ) ∈ H s (I), s > 1. In this case the constants in (9),
Next, we address the accuracy and the cost of the aforementioned Maximum
µ̃MC
k := EM [Pk (X)]. (17)
13
we assume that the computational cost is independent of the number of involved
Lemma 5. Suppose that Assumptions 1 and 2 are satised and µk and µ̃MC
k
Proof. Recall that E[EM [Pk (X)]] = µk . Then the assertion follows from
X
R XR
2
µk − µ̃MC
R
E k = Var[EM [Pk (X)]] ≤ kρkL∞
M
k=1 k=1
Theorem 3. Suppose that Assumptions 1 and 2 are satised and ln(ρ) ∈ H s (I)
for some s > 1. Let R ∈ N and ρ̃MC R be the Maximum Entropy solution of
problem (5) for a sequence of approximate moments {µ̃MC , . . . , µ̃MC } dened in
1 R
(17) . Then for any ε > 0 and p ∈ (0, 1) it is possible to select the number of
samples M and the number of moments R so that
DKL (ρkρ̃MC
R )<ε (18)
is satised with probability at least 1 − p and the cost of evaluation of all samples
required for the recovery of ρ̃MC satises the following asymptotic relations:
R
C(ρ̃MC
1
−1 −1− 2s
R )∼p ε , (19)
C(ρ̃MC
R )∼p
−1 −1
ε | ln(ε)|. (20)
Proof. When ln(ρ) ∈ H s (I) for some s > 1 we have by Theorems 1, 2 and
Lemma 5 that
14
with probability at least 1 − p. The cost of evaluation of all required samples is
(18) when
R
R−2s ∼ ∼ ε,
pM
1
or analogously M ∼ p−1 R2s+1 and R ∼ ε− 2s . In this case the cost of evaluation
C(ρ̃MC
2s+1
−1 −
R ) = M · C(X) ∼ p ε 2s
and (19) follows. When ln(ρ) is analytic, Theorems 1, 2 and Lemma 5 imply
DKL (ρkρ̃R ) is minimized for a xed computational cost) when M ∼ p−1 Ra2R
and (18) holds for R = − 12 loga (ε). In this case
Remark 1. Notice that both summands in (21) are asymptotically of the same
order when M ∼ p−1 R2 a2R (rather than M ∼ p−1 Ra2R ). However this leads to
the same asymptotic estimate for the computational cost. Indeed, in this case
(18) holds when
C(ρ̃MC
R ) = M · C(X) ∼ p
−1 −1
ε R ∼ p−1 ε−1 W (ε−1 )∼ p−1 ε−1 | ln(ε)| (22)
see [27], and the new estimate (22) is equivalent to (20) . An analogous situation
appears in the proofs of Theorem 4 and 5. In order to keep the presentation
simple we do not further elaborate on this.
15
4.2. Approximate single level Monte Carlo estimators
Now, let us assume that it is not possible to sample from X, but X` is
simpler relation (23). Then relation 1) follows with δ=5 in the general case,
experiments we observed that this estimate for δ may be too pessimistic (see
assumption.
5
kPk0 kL∞ (−1,1) . k 2 , and kPk0 kL∞ (a,b) . k, −1 < a < b < 1
16
Now we are in the position to prove the counterpart of Lemma 5 for the case
of approximate estimation of moments where the exact moments from (11) are
µ̃SL
k := EM [Pk (XL )]. (24)
Lemma 6. Suppose Assumptions 1 and 3 are satised and let µk and µ̃SL
k be
where Jensen's inequality, Assumption 3 and Lemma 4 were used in the last
Theorem 4. Suppose Assumptions 1 and 3 are satised and let ln(ρ) ∈ H s (I)
for some s > 1. Let {µ̃SL SL
1 , . . . , µ̃R } be a sequence of perturbed moments dened
in (24) for some xed values of the parameters R, M, L and suppose ρ̃SL is the R
corresponding perturbed Maximum Entropy solution. Then for any ε > 0 and
p ∈ (0, 1) the parameters R, M, L can be selected such that the bound
DKL (ρkρ̃SL
R )<ε (25)
is satised with probability at least 1 − p and the cost of evaluation of all samples
required for the recovery of ρ̃SL satises the following asymptotic relations: when
R
C(ρ̃SL
β+γ β+γ 1 β+γ+γδ
−
R )∼p
β ε− β − 2s β , (26)
C(ρ̃SL
β+γ β+γ β+γ+γδ
−
R )∼p
β ε− β | ln(ε)| β . (27)
17
Proof. Recalling decomposition (6), Theorem 2 and Lemma 6 we get
DKL (ρkρ̃SL
1 R
R ) . DKL (ρkρR ) + Rδ+1 NL−β + (28)
p M
with probability at least 1−p, whereas the computational cost satises C(ρ̃SL
R )=
M ∼ R−δ NLβ .
DKL (ρkρ̃SL
1 δ+1 −β
R ) . DKL (ρkρR ) + R NL . (29)
p
Theorem 1 allows to determine the optimal choice of the number of moments R
depending on the smoothness of the log-density ln(ρ). In particular, assuming
that ln(ρ) ∈ H s (I) with s>1 we nd with (9) that (29) is minimised when
1 δ+1 −β
R−2s ∼ R NL ∼ ε,
p
1 1
or, equivalently, when R ∼ ε− 2s and NL ∼ (p−1 R2s+δ+1 ) β . In this case the
C(ρ̃SL
β+γ β+γ 1 β+γ+γδ
γ −δ β+γ
R ) ∼ M NL ∼ R NL ∼ p− β ε− β − 2s β .
On the other hand, when ln(ρ) is analytic, we select R = − 12 loga (ε) and
1
−1 δ+1 2R
NL ∼ (p R a ) β . Then (25) is satised and the computational cost is
proportional to
C(ρ̃SL
β+γ β+γ β+γ+γδ
−
R )∼p
β ε− β | ln(ε)| β
L
X
µ̃ML
k := E ML [Pk (X)] = EM` [Pk (X` ) − Pk (X`−1 )], X0 := 0 (30)
`=1
where EM` [Pk (X` ) − Pk (X`−1 )], ` = 1, . . . , L are based on independent samples.
18
Lemma 7. Suppose Assumption 1 and 3 are satised and let µk and µ̃ML
k be
dened by (11) and (30) respectively. Then it holds that
X
R L
X
ML 2 δ+1 −β −β −1
E µk − µ̃k .R NL + N` M` .
k=1 `=1
Proof. The Lemma follows by the same arguments as in Lemma 6 and anal-
ogously to the standard Multilevel Monte Carlo estimate (2). Recall that
E[µ̃ML ] = E[Pk (XL )], and thus by Jensen's inequality and Assumption 3
k
2
E (µk − µ̃ML
k )
2
= E Pk (X) − Pk (XL )
XL
1 1
+ Var Pk (X` ) − Pk (X`−1 ) + Var[Pk (X1 )]
M` M1
`=2
XL
δ −β −1 −β 1
. k NL + M ` N` + kρkL∞ (−1,1) .
M1
`=1
The last summand can be absorbed by the sum in parentheses, thus the assertion
holds with probability at least 1 − p and the cost of evaluation of all samples
required for the recovery of ρ̃ML satises the following asymptotic relations:
R
δ+1
p−1 ε−1− 2s , if β > γ,
C(ρ̃ML
δ+1
R )∼ p−1 ε−1− 2s (| ln(p)| + | ln(ε)|)2 , if β = γ, (32)
p− βγ ε− βγ − (δ+1)γ
2sβ , if β < γ,
19
when ln(ρ) is analytic the computational cost is proportional to
p−1 ε−1 | ln(ε)|δ+1 , if β > γ,
C(ρ̃ML
R )∼ p−1 ε−1 | ln(ε)|δ+1 (| ln(p)| + | ln(ε)|)2 , if β = γ, (33)
p− βγ ε− βγ | ln(ε)|(δ+1) βγ , if β < γ.
Proof. Recalling decomposition (6), Theorem 2 and Lemma 7 we get
L
!
Rδ+1 X
DKL (ρkρ̃ML
R ) . DKL (ρkρR ) + NL−β + M`−1 N`−β (34)
p
`=1
L
X L
X
C(ρ̃ML
R )= M` C` ∼ M` N`γ
`=1 `=1
DKL (ρkρ̃ML
R ) . DKL (ρkρR ) + R
δ+1 −1 −β
p NL ,
which is the same estimation as (29) and thus the error is minimized for the
20
1
For analytic ln(ρ) we choose NL ∼ (p−1 Rδ+1 a2R ) β and a−2R ∼ ε. Hence the
5. Numerical Experiments
algorithm for solution of the truncated moment problem and report results of
serve that ρ1 is analytic, but not polynomial, ρ2 is piecewise constant with one
3 4
jump, ρ and ρ are piecewise linear and continuous with one kink. Notice that
some s > 1, so that the convergence theory from the previous sections is not
on T` determined by
Z
(ρ(x) − ρ` (x)) dx = 0 ∀K ∈ T` .
K
21
1 ρ1 1 ρ1
ρ2 ρ1
1
ρ3
ρ4
0.5 0.5
0 0
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1
Figure 1: Graphs of the synthetic probability Figure 2: Density function ρ1 and its approx-
density functions ρ1 , . . . , ρ4 . imation ρ1` for ` = 1.
Observe that ρ` is the L2 projection of ρ. For example, ρ1 (x) and the corre-
1
sponding ρ` for `=1 are visualized in Figure 2.
i
a sample of X where Z is a sample from the uniform distribution Z ∼ U(0, 1).
Recall that the multilevel estimator (30) involves dependent pairs of samples X`i
i
and X`−1 corresponding to two approximations of the sample Xi of dierent
pdf ρ1 ρ2 ρ3 ρ4
q q
1 1
x0 −1 2 −1 2 0.8
q
1
ρ(x), x < x0 2 0.5x + 0.55 0
1 1 1√
ρ(x), x ≥ x0 log(3) x+2 4− 2
≈ −1.8314x + 2.1985 50x − 40
Table 1: The denition of the density functions ρ1 , . . . , ρ4 . In the second row x0 denotes the
location of a possible discontinuity of ρn or its derivative. The second and third row contain
explicit expressions for ρn on both sides of x0 .
22
5.2. Solution algorithm
In this section we describe the numerical procedure of computing the Maxi-
ing the moments µk are estimated by sampling and therefore are inexact, but
the skip the tilde-notation for the perturbed quantities until the end of this
section to simplify the notations). For a xed basis of global algebraic polyno-
>
λ = (λ0 , . . . , λR ) so that (4) is fullled. This leads to a system of nonlinear
Z R
!
1 X
Fj (λ) = φj (x)ρ[λ](x) dx − µj , ρ[λ] := ρR = exp λ k φk (35)
−1 k=0
which can be solved by the Newton method. The Newton update step reads
Z 1
J[λ(m) ]jk = φj (x)φk (x)ρ[λ(m) ](x) dx.
−1
The choice of the basis {φk } has, of course, no inuence on the solution of
the problem in the exact arithmetic, but it has a crucial impact on stability and
convergence properties of the Newton algorithm. For example, for the monomial
basis φk (x) = xk the Jacobi matrix is a Hankel matrix with the condition num-
ber growing exponentially with the number of moments R, cf. [28]. On the other
hand, the Jacobi matrix in the basis of Legendre polynomials φk (x) = Pk (x) is
constants in this estimate blow up when the minimum of the probability density
will not perform well when the target probability density function ρ vanishes
23
106 50
Number of Iterations
ρ1 ρ1
104 45
ρ2 ρ2
102 ρ3 40 ρ3
− µ̃k )2
100 ρ4 ρ4
35
10−2
30
10−4
k=1 (µk
25
10−6
20
10−8
PR
15
10−10
10−12 10
10−14 5
10−16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 20 40 60 80 100
Figure 3: Convergence of the Newton method Figure 4: The number of Newton iterations
for R = 100 statistical moments. needed until convergence for R dierent num-
ber of prescribed statistical moments.
or is close to zero on the interval [−1, 1]. When the Newton iteration does
not achieve the prescribed level of accuracy, we select as an output the vector
error equals zero) in the nonlinear system (35) and compute the Maximum
Entropy solution using the Newton method as described above. Figure 3 shows
the convergence of the Newton algorithm for R = 100 moments for the four
ρ and ρ[λ(m) ]
R
X (m)
(λk − λk )2 .
k=1
Observe that the Newton algorithm for the rst three density functions achieves
machine precision in a few (4-5) iterations. The fourth example starts with a
slow convergence, but then the error saturates and the method diverges at the
17th iteration.
24
10−15 dependent on the number of prescribed moments. The rst three prob-
ability density functions only need a few iteration steps to converge almost
independently of the number of moments. For ρ4 the Newton method does not
converge to the tolerance 10−15 when R>5 and returns only an approximate
solution.
5.3. Convergence of the Maximum Entropy method for synthetic density func-
tions
In this section we discuss convergence of the Maximum Entropy method
and compare it with a Kernel Density Estimator from [30]. Here the target
1 4
distribution ρ is one of the synthetic distributions ρ ,...,ρ dened in Section
5.1.
rem 1 which is independent of the type of the moment estimator. Recall that
log(ρ2 ), log(ρ4 ) ∈
/ H 1 (I) violating assumptions of Theorem 1. Nonetheless, we
4
as predicted in Theorem 1. The approximations to ρ converge at a reduced
rate of about R−1 . This indicates that the minimal smoothness assumption
1
ln(ρ) ∈ H (I), s > 1 may be relaxed. Notice that the convergence curve for ρ4
is non-monotone. This eect may be due to inaccurate iterative approximation
Next, we validate the stability bound (14) in Lemma 3. For this we introduce
an articial noise in the moments µ̃k = (1 + yξk )µk , where ξk are iid normal
random variables and the scaling parameter y taking dierent values for each
data point. Figure 6 indicates that (14) is a sharp bound for ρ1 , ρ2 and ρ3
whereas the graph for ρ4 shows a very dierent behaviour. It can be explained
by the very large value of the constant CR for the approximation ρ4R (indeed,
in this example we estimate C20 > 10100 ) and thus it is not surprising that
the stability relation (14) is not realized. On the other hand, this eect (and
25
100
100
10−1
10−3
DKL (ρkρR )
10−5
10−4
10−5
10−6 10−10
ρ1 ρ1
10−7 ρ2 ρ2
ρ3 ρ3
10−8
ρ4 ρ4
10−9 Rˆ{−1} 10−15
Rˆ{−3}
10−10 10−17 10−13 10−9 10−5 10−1
100 101 102
PR
Number of Moments R k=1 (µk − µ̃k )2
Figure 5: Truncation error versus the number Figure 6: Estimation error versus the size of
of exact moments. the moment perturbation with R = 20 per-
turbed moments.
plot the values E[(X` − X)2 ] for varying `. The slope of the convergence curves
E[(Pk (X` ) − Pk (X))2 ] for the xed approximation level `=3 and varying poly-
of `. To make a meaningful test for the MLMC estimator (30) with synthetic
density functions we make a convention that the cost to generate one sample of
follows we compare the Maximum Entropy method with the Kernel Density
the error of the Maximum Entropy method and the Kernel Density Estimator
26
100
ρ1
10−10 10−4 ρ1
ρ2
ρ3
ρ4
10−6 k2
10−15
1 2 3 4 5 6 7 8 9 10 100 101
10−4
10−3
10−5
Figure 9: The mean of the KL-divergence Figure 10: The mean of the KL-divergence
for ρ1 and its approximation by the ME and for ρ2 and its approximation by the ME and
KDE with M exact samples. KDE with M exact samples.
20
Best R for ME
Best R for ME
Chosen hˆ{−1} for KDE
Chosen hˆ{−1} for KDE
M ˆ{1/2}
C log(M )
15
102
10
5 101
Figure 11: The average values of h−1 for the Figure 12: The average values of h−1 for the
KDE and R for the ME for M exact samples KDE and R for the ME for M exact samples
of ρ1 . of ρ2 .
27
from [30] dependent on the sample size for ρ1 and ρ2 (the data points represent
Recall that the bound (18) only holds on a set of probability smaller than 1,
therefore there might be runs with no ME solution. However, this problem
1 2
has never occurred in the experiments for ρ and ρ . This seems reasonable if
we recall Figure 6. Comparing the Maximum Entropy method and the Kernel
the kernel width) and the number of moments M. Notice the very dierent
behaviour of R(M ) and h−1 (M ) for the analytic density function ρ1 and the
2
step-function ρ . The behaviour R = R(M ) is in excellent agreement with the
Density Estimator for ρ1 and ρ2 (again, each data point is an average over
explains the saturation of the KDE convergence curves once the sampling error
has achieved the magnitude of the xed approximation error. The convergence
renable in both the sample size and the level of sample approximation, and
the mathematical framework and notations from the recent articles [8, 11] and
Let D = [−1, 1]2 and assume that the obstacle is parametrized by a con-
28
10−1 10−1
MLMC ME
MC KDE ` = 1
10−2 MC KDE ` = 2
MC KDE ` = 3
E[D(ρ2 kρ̃2 )]
E[D(ρ1 kρ̃1 )]
10−3 MC KDE ` = 4
10−2 t−1/3
10−4
MLMC ME
MC KDE ` = 1
10−5 MC KDE ` = 2
MC KDE ` = 3
10−3
10−6 MC KDE ` = 4
t−1
103 105 107 109 1011 103 105 107 109 1011
Figure 13: The mean KL-divergence of ρ1 Figure 14: The mean KL-divergence of ρ2
approximated by the MLMC-ME and KDE approximated by the MLMC-ME and KDE
on dierent levels. on dierent levels.
The weak formulation of (36) is a variational inequality of the rst kind having
a unique solution u ∈ H01 (D) satisfying u≥ψ a.e. in D and depending contin-
X
ψ(x) = Bq (H) cos(q · x + ϕq ), (37)
q0 ≤|q|≤qs
π
Bq (H) = (2π max(|q|, ql ))−(H+1) , q0 ≤ |q| ≤ qs ,
25
q0 = 1, ql = 10, qs = 26
29
see [8, Sect. 8], [11, Sect. 6] and references therein for the details. The values
u1 involves 13 degrees of freedom whereas the nest solution u8 involves 130 561
degrees of freedom. Clearly, the sizes of the the exact coincidence set X and
of its Finite Element approximations X` are enclosed in the interval [0, 4]. In
x∗ are the smallest and the largest realizations of X` for all ` = 1, . . . , L and
∆ := (x∗ − x∗ )/10.
Figure 15 shows graphs of the Maximum Entropy solutions for dierent num-
tion of the target density function which appears quite concentrated around the
value x0 = 1.13 and having a heavy tail towards smaller values of X (notice
so that the computational cost for every ME solution in Figure 15 is the same.
Figure 16. The KDE density functions are quite rough in the left tail region
and are quantitatively close to the ME density functions with 5060 statistical
moments. The computation of each KDE density requires the same computa-
tional cost which is, however, by factor 9 smaller than the computational cost
for the ME density functions. Figure 17 shows the zoom into the tips of the
30
Next, we consider convergence of the truncation error DKL (ρ||ρR ). Since
ρ ≈ ρ̃ML
60 instead which involves L=9 levels of approximation and ML = 300
samples on the nest grid. The convergence history for dierent values of R is
−1
shown in Figure 18 and indicates the convergence rate of order R .
The behaviour of the estimation error DKL (ρR kρ̃R ) can be seen in Figure 19
(each data point is an average over 30 independent runs). The approximations
ρ̃R are computed with 10 samples on the nest level. Observe that for rising
number of moments the slopes of the convergence curves rapidly become at.
moment perturbation. Recall that due to Lemma 3 this is an upper bound for
DKL (ρR kρ̃R ). Here the slope of the convergence curves rapidly becomes parallel
and decaying for all moments up to R = 50. This indicates that the stability
level Monte Carlo Maximum Entropy method and presented numerical examples
showing that this approach provides a good alternative for numerical approxi-
mation of probability density function of the system output. If the target proba-
bility density function is strictly positive and smooth, this can lead to signicant
savings in computation time. On the other hand, when the target probability
density function has low regularity, the savings are smaller, but the method still
the scope of this paper, is how to choose the number of statistical moments R in
the Maximum Entropy method dependent on the discretization and sampling
parameters in the case when the smoothness of the target distribution is not
31
40 40
R=2 l=4
R=5 l=5
35 R=10 35 l=6
R=30 l=7
30 R=50 30
R=60
25 25
20 20
15 15
10 10
5 5
0 0
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Figure 15: Solutions of the MLMC-ME for Figure 16: Solutions of the KDE with sam-
dierent number of moments R. ples taken from dierent discretization lev-
els `.
MLMC ME approach
t−1
38
KDE l=4 100
KDE l=5
KDE l=6
D(ρ60 kρR )
36
KDE l=7
MLMC ME R=60
34
10−1
32
30
10−2
28
100 101
26
Figure 17: Zoom into the tips of the proba- Figure 18: Convergence of the MLMC-ME
bility density functions in Figures 15 and 16. solution for an increasing number of mo-
ments.
101
E[ k=1 (µk − µ̃k )2 ]
101
E[D(ρR kρ̃R )]
10−1
10−1
R = 2 R = 2
PR
R = 5 10−3 R = 5
R = 10 R = 10
R = 30 R = 30
10−3 R = 50 R = 50
10−5
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Level L Level L
Figure 19: Expected KL-divergence of the Figure 20: Mean square error of R approxi-
ME solution for approximating R moments. mated moments with the MLMC.
32
known. A possible adaptive strategy may start with low values of R and increase
them in the course of the simulation when necessary.
References
[1] M. B. Giles, Multilevel Monte Carlo path simulation, Oper. Res. 56 (3)
(2008) 607617.
method for elliptic PDEs with stochastic coecients, Numer. Math. 119 (1)
(2011) 123161.
Carlo for a class of SPDEs in nance, SIAM J. Financial Math. 3 (1) (2012)
572592.
[7] R. Kornhuber, C. Schwab, M.-W. Wolf, Multilevel Monte Carlo nite ele-
33
[9] A. Abdulle, A. Barth, C. Schwab, Multilevel Monte Carlo methods for
10331070.
imum entropy approach, Appl. Math. Comput. 118 (2-3) (2001) 133149.
[17] E. T. Jaynes, Information theory and statistical mechanics, Phys. Rev. (2)
34
[20] M. B. Giles, Multilevel Monte Carlo methods, Acta Numer. 24 (2015) 259
328.
tions and indirect observations, Studia Sci. Math. Hungar. 2 (1967) 299
318.
New York, 1975, republication, with minor corrections, of the 1963 original,
power function, JIPAM. J. Inequal. Pure Appl. Math. 9 (2) (2008), Article
51, 5.
[28] E. E. Tyrtyshnikov, How bad are Hankel matrices?, Numer. Math. 67 (2)
(1994) 261269.
35
[30] Z. I. Botev, J. F. Grotowski, D. P. Kroese, Kernel density estimation via
R1R62.
36