, x, x9 R
d
, y, y9 R,
[ f (t, x, y) f (t9, x9, y9)[ <
0
([t t9[ [x x9[ [ y y9[), (6)
\t, t9 R
, x, x9 R
d
, [h(t, x) h(t9, x9)[ <
0
([x x9[ [t t9[): (7)
Then the RBSDE (3) has a unique solution (Y, Z, K). Furthermore, the process Y admits the
representation
1004 V. Bally and G. Page`s
Y
t
= u(t, X
t
),
where u denotes the unique solution (in the viscosity sense) of (1).
Another approach is developed in Bally et al. (2002a): the function u solves in a
variational sense the PDE with obstacle h, and u is the minimal solution for the
corresponding variational inequality. Then, using the connections between variational
inequalities and optimal stopping theory (see Bensoussan and Lions 1982) leads to the
representation of the above process Y as a Snell envelope:
Proposition 1. If (Y
t
)
t[0,T]
solves the RBSDE (3), then
Y
t
= ess sup
T
t
E
_
t
f (s, X
s
, Y
s
)ds h(, X
)[T
t
_ _
, (8)
where T
t
is the set of [t, T]valued Tstopping times.
When the function f does not depend upon Y
t
, equation (8) becomes an alternative
denition of (Y
t
)
t[0,T]
which then appears as the usual Snell envelope of a regular optimal
stopping problem associated with the Brownian diffusion (X
t
)
t[0,T]
. Then f represents the
instantaneous gain and h the nal gain. Of course optimal stopping theory for diffusions is
a classical topic in probability theory, and its numerical aspects have been investigated since
the very beginnings of numerical probability, motivated by a wide range of applications in
engineering; see, for example Kushner (1977) on elliptic diffusions, and Bensoussan and
Lions (1982). However, mathematical nance made it still more strategic in the 1980s:
pricing an American option is in some way an almost generic optimal stopping problem
(with f = 0, h > 0 and X a nonnegative martingale).
In brief, a (vanilla) American option is a contract that gives the right to receive once and
only once h(t, X
t
) currency units, at a time t chosen between time 0 and the maturity
T . 0 of the contract. The possibly multidimensional (nonnegative) process (X
t
)
t[0,T]
is
called the underlying asset or risky asset process. If one assumes for the sake of simplicity
that the interest rate is 0, classical arguments in the modelling of nancial markets show
that the price Y
t
of such a contract is given at every time t by (8) (setting f = 0), with
respect to a socalled riskneutral probability which makes the diffusion (X
t
)
t[0,T]
a
martingale. Among all possible models for the asset dynamics, the geometric Brownian
motion on R
q
,
dX
t
= X
t
dB
t
, X
0
= x
0
R
d
, /(d, q),
is widely used. When q = d, it is known as the BlackScholes model. Another question of
interest is whether, at any time t, there is some optimal stopping strategy for exercising this
right in the future. An answer is provided by the (lowest) optimal stopping time given by
t
+
:= inf s > t[Y
s
= h(s, X
s
) in the sense that
t
+
satises Y
t
+
= E(h
t
+
[T
t
).
Historically, the underlying asset of the rst massively traded American option contracts
was onedimensional (a single stock). However, many American options, mostly traded
over the counter, have a much more complex structure depending on a whole basket of d
Solving multidimensional discretetime optimal stopping problems 1005
underlying risky assets X
t
:= (X
1
t
, . . . , X
d
t
). If one thinks, for example, of indices (Dow
Jones, DAX, CAC40, etc.), d is usually greater than 2, 3 or 4.
The usual numerical approach to solving (lowdimensional) optimal stopping problems is
essentially analytic: it consists in solving the variational inequality (1) using standard
techniques of numerical analysis (nite differences, nite elements, nite volumes, etc.). In
spite of the loss of probabilistic interpretation, especially when implicit schemes are used,
these methods are unrivalled in one or two dimensions in terms of their rate of
convergence. This holds similarly for RBSDEs.
However, the autonomous development of mathematical nance, mainly inuenced by its
probabilistic background, gave rise to algorithms directly derived from discretizations of the
Snell envelope (8) (when f = 0). The most famous method is undoubtedly the binomial tree
made very popular in the nancial world by its simplicity of implementation and
interpretation. Some very accurate rates of convergence are available for such models in the
case of the vanilla American put option (h(t, x) := max(K x, 0)) (see Lamberton 1998;
2002; Lamberton and Rogers 2000; Bally and Saussereau 2002).
As the dimension increases, analytic methods become insufcient and the probabilistic
interpretation becomes the key to any numerical approach. Although this paper is concerned
with the more general case of the (h, f )Snell envelopes embodying the numerical
approximation of solutions of RBSDEs, let us persist with regular optimal stopping for a
while. All probabilistic approaches roughly follow the same three steps:
1. Time discretization. One approximates the Tadapted diffusion process (X
t
)
t[0,T]
at
times t
k
= kT=n, k = 0, . . . , n, by an (
~
TT
k
)
0<k<n
Markov chain
~
XX = (
~
XX
k
)
0<k<n
, where
~
TT
k
= T
t
k
. The chain
~
XX is assumed to be easily simulatable on a computer (index k then
stands for absolute time t
k
). Then one approximates the continuoustime TSnell envelope
Y of X by the discrete time
~
TTSnell envelope of
~
XX dened by
~
UU
k
:= ess supE(h(T=n,
~
XX
)[
~
TT
k
),
k
,
where
k
denotes the set of k, . . . , nvalued stopping times. The approximating process
~
XX
will often be chosen to be the Euler scheme (X
k
)
0<k<n
(with Gaussian increments) of the
diffusion, but other choices are possible (Milshtein scheme, etc). In onedimension a binomial
tree can be considered as a weak approximation. When samples (X
0
, X
t
1
, . . . , X
t
n
) of the
diffusion are simulatable, for instance because X
t
= j(t, B
t
) as in the BlackScholes model,
the best choice is of course to consider
~
XX
k
= X
t
k
. In that case, its (
~
TT
k
)
0<k<n
Snell envelope
will be simply denoted by (U
k
)
0<k<n
. Bally and Page`s (2003) carry out a detailed analysis of
the resulting L
p
error: if assumptions (5)(7) hold and
~
XX
k
= X
k
or X
t
k
, then
Y
kT=n
~
UU
k

p
= O(1=
n
_
); if, furthermore,
~
XX
k
= X
t
k
and h is semiconvex (see (16) below
for a denition), then Y
kT=n
U
k

p
= O(1=n).
2. Dynamic programming principle. The discretetime Snell envelope associated with the
obstacle (h(t
k
,
~
XX
k
))
0<k<n
satises (see Neveu 1971) the following backward dynamic
programming principle:
~
UU
n
:= h(t
n
,
~
XX
n
),
~
UU
k
:= max(h(t
k
,
~
XX
k
), E(
~
UU
k1
[
~
TT
k
)) = max(h(t
k
,
~
XX
k
), E(
~
UU
k1
[
~
XX
k
)):
(9)
1006 V. Bally and G. Page`s
The main feature of this formula is that it involves at each step the computation of
conditional expectations: this is the probabilistic counterpart for nonlinearity which makes
the regular Monte Carlo method fail.
3. Computation of conditional expectations. Numerical methods for the massive
computation of conditional expectations can roughly be divided into three families: spatial
discretization of
~
XX
k
; regression of (truncated) expansions on a basis of L
2
(
~
XX
k
) (see
Longstaff and Schwartz, 2001); and representation formulae based on Malliavin calculus (as
in Fournie et al. 1999). The rst two approaches are both nitedimensional. An important
drawback of the last two methods especially in higher dimensions is that they directly
depend on the obstacle process (h(t
k
, )). A spatial discretization method is obstacle free
in the sense that it produces a discrete semigroup independently of any obstacle process
and then works for any such obstacle process.
The quantization tree method that we propose and analyse in this paper belongs to the
rst family (spatial discretization). A specic characteristic of the method is that it does not
explode, even in higher dimensions. The initial idea is simple and shared by many grid
methods (see Broadie and Glasserman 1997; Chevance 1997; Longstaff and Schwartz 2001).
First, at each time step k (i.e. t
k
) one projects
~
XX
k
onto a xed grid
k
:= x
k
1
, . . . , x
k
N
k
1<i<N
k
x
k
i
1
~
XX
k
C
k
i
,
where (C
k
i
)
1<i<N
k
is a Borel partition of R
d
such that C
k
i
u[ [u x
k
i
[ = min
1<<N
k
[u x
k
[.
As a second step, one mimics the above dynamic programming principle (9) on the tree
formed by the grids
0
, . . . ,
n
. The process
~
XX being simulatable, it is possible to compute
by simulation
k
ij
:= P(
~
XX
k1
C
k1
j
[
~
XX
k
C
k
i
). Although (
^
XX
k
)
0<k<n
is not a Markov chain,
one may still dene a pseudoSnell envelope (
^
UU
k
)
0<k<n
by setting
^
UU
n
:= h(
^
XX
n
), and
^
UU
k
:= max(h(
^
XX
k
), E(
^
UU
k1
[
^
XX
k
)), 0 < k < n 1:
Since E(
^
UU
k1
[
^
XX
k
= x
k
i
) =
N
k1
j=1
k
ij
^ uu
k1
(x
k1
j
), a backward induction shows that
^
UU
k
:=
^ uu
k
(
^
XX
k
), where ^ uu
k
satises the backward dynamic programming formula
^ uu
n
(x
n
i
) = h(x
n
i
), 1 < i < N
n
,
^ uu
k
(x
k
i
) = max(h(x
k
i
),
N
k1
j=1
^ uu
k1
(x
k1
j
)
k
ij
), 1 < i < N
k
, 0 < k < n 1, (10)
which we will call the quantization tree algorithm. One may reasonably expect the error
~
UU
k
^
UU
k
to be small. Indeed, we are able to prove that, for some specic choices of grid
k
,
and under some appropriate assumptions on the diffusion coefcients, for every p > 1,

^
UU
k
~
UU
k

p
< C
p
n
k=1

^
XX
k
~
XX
k

p
= O
n
11=d
N
1=d
_ _
:
Practical processing of the quantization tree algorithm (10) raises the following questions:
Solving multidimensional discretetime optimal stopping problems 1007
A. How do we specify the grids
k
:= x
k
1
, . . . , x
k
N
k
? This means two things: rst, how
do we choose in an optimal way the sizes N
k
of the grids, given that
N
0
N
1
. . . N
n
= N; and second, how do we choose the points x
k
i
to keep
the L
p
quantization errors 
~
XX
k
^
XX
k

p
minimal?
B. How do we compute the weights
k
ij
?
C. How do we evaluate the error 
~
UU
k
Y
kT=n

p
(and U
k
Y
kT=n

p
)?
D. What is the complexity of the quantization tree algorithm?
One crucial feature of the problem must be emphasized at this stage: whatever the
methods selected to obtain optimal grids and weights, phases A, B and C are oneshot:
once the grids are settled, their weights and resulting L
p
quantization errors estimated, the
computation of the pseudoSnell envelope of (h(t
k
,
^
XX
k
))
0<k<n
using the quantization tree
algorithm (10) is almost instantaneous on any computer. The numerical experiments carried
out in Section 6.2 (pricing of American options) indicate that, in fact, the quantization tree
grid optimization phase outlined below in Section 3.3 entails a very reasonable cost (less
than 15 minutes on a on 1 GHz PC computer), given the fact that, once completed, one can
instantly price any American payoff option in that model. Furthermore, in many
applications, one may rely on the quantization of universal objects such as standard
Brownian motion. In this latter case, the optimization of the quantization amounts by
scaling to that of a normal qdimensional vector A(0; I
q
) processed once and stored on a
CDROM.
Let us turn now to the optimization phases (A, B and C). Actually this is a very old
story: since the early 1950s people working in signal processing and information theory
have been concerned with the compression of the information contained in a continuous
signal (
~
XX
k
) using a nite number of codebooks (the points x
k
i
) in an optimal way (see
Section 3.1). Several deterministic algorithms have been designed for this purpose,
essentially onedimensional signals. Among them, let us mention Lloyds Method I (see
Kieffer 1982). Meanwhile, a sound mathematical theory of quantization of probability
distributions has been developed (for a recent monograph, see Graf and Luschgy 2000). In
the 1980s, with the emergence of articial neural networks, some new algorithmic aspects
of quantization in higher dimensions were investigated, mainly the competitive learning
vector quantization algorithm (and its variants) which appeared as a degenerate setting for
the Kohonen selforganizing maps (see Fort and Page`s 1995; Bouton and Page`s 1997; and
the references therein). This stochastic algorithmic approach will be adapted below to
Markov dynamics. It is based on massive simulations of independent copies of the random
vector to be quantized. Page`s (1997) appears to have made the rst attempt to apply
optimal multidimensional quantization to numerical probability.
As mentioned above, after Kushners pioneering work, the pricing of American options
led to renewed interest in numerical aspects of optimal stopping: we might mention, among
many others, the analysis of the convergence of the Snell envelope in an abstract
approximation framework (see Lamberton and Page`s 1990) or the rate of convergence of
the premium of a regular American put priced in a binomial tree toward its BlackScholes
counterpart (see Lamberton 1998; and the references therein).
In higher dimensions, several numerical methods have been designed and analysed in the
1008 V. Bally and G. Page`s
literature of the past ten years to solve optimal stopping problems like those naturally
arising in nance or, more generally, to process massive computations of conditional
expectations. In the class of grid methods, one may cite the algorithm devised by Broadie
and Glasserman (1997) for pricing multiasset American options, and the discretization
scheme for BSDEs proposed by Chevance (1997) the latter is onedimensional but easy
to extend to higher dimensions. In both approaches the spatial discretization of discretetime
approximation (
~
XX
k
)
k
consists of N independent copies of (
~
XX
k
)
0<k<n
which form a grid of
size N at every time step k = 1, . . . , n. In Broadie and Glasserman (1997), the transition
between these grids is based on the likelihood ratios between
~
XX
k
and
~
XX
k1
. A convergence
theorem is established without rate of convergence. In Chevance (1997), (
~
XX
k
)
0<k<n
is in
fact the Euler scheme of a diffusion and at every time step the grid is uniformly weighted
by 1=N. The transition weights are based on an empirical frequency approach based on a
Monte Carlo simulation as well. Some a priori L
1
error bounds are proposed for functions h
having nite variations. Longstaff and Schwartz (2001) develop a regression method based
on truncated expansions in L
2
(
~
XX
k
). They introduce a dual dynamic programming principle
for the lowest stopping time
+
and then compute E(h(
+
, X
tr(MM
+
)
_
, where M
+
denotes the transpose of
M.
For every nite set A, we denote by [A[ its cardinality.
x, y
denotes the usual Kronecker delta.
For every x R, [x] := maxn Z[n < x and x := minn Z[n > x.
Section 2 is devoted to the computation of the Snell envelope of a discretetime R
d

valued homogeneous Markov chain using a quantization tree. We start with an example, the
discretization of an RBSDE, in Section 2.1, to introduce the (h, f )Snell envelope. In
Section 2.2 we propose the (backward) quantization tree algorithm and derive some a priori
L
p
error bounds using the L
p
quantization error. Section 3 is devoted to optimal
quantization from a theoretical point of view. Then the extension of the competitive
learning vector quantization algorithm to Markov chains is presented to process the
numerical optimization of the grids and the computation of their transition weights. Section
4 briey recalls some error bounds concerning the Monte Carlo estimation of the transition
weights (for a xed, possibly not optimal, quantization tree and in the linear case f = 0).
These are established in Bally and Page`s (2003). In Section 5 a rst comparison with the
Solving multidimensional discretetime optimal stopping problems 1009
niteelement method is carried out. In Section 6, the above results are applied to the
discretization of RBSDEs, with some a priori error bounds, when the diffusion is uniformly
elliptic. We conclude with a numerical illustration: the pricing of American style exchange
options.
2. Quantization of the Snell envelope of a Markov chain
Before dealing with the general case, let us look more precisely at the case of the RBSDE
presented in the Introduction.
2.1. Time discretization of an RBSDE by an (h, f )Snell envelope
We follow the notation introduced in the Introduction for RBSDEs. It is natural to derive a
discretization scheme for the solution of an RBSDE from the representation formula (8)
following the approach described in the Introduction. Let t
k
:= kT=n, k = 0, . . . , n, denote
the discretization epochs. One just needs to add a discretization term for the integral
_
T
t
f (s, X
s
, Y
s
) ds.
One rst considers the homogeneous Markov chain (X
t
k
)
0<k<n
. Its transition is given by
P
T=n
(x, dy), where P
t
(x, dy) denotes the transition of the diffusion X. The discretetime
(T
t
k
)
0<k<n
(h, f )Snell envelope (U
k
)
0<k<n
of (X
t
k
)
0<k<n
is dened by
U
k
:= ess sup E h
T
n
, X
T=n
_ _
T
n
i=k1
f (t
i
, X
t
i
, U
i
)[T
t
k
_ _
,
k
_ _
, (11)
where
k
denotes the set of k, . . . , nvalued (T
t
k
)
0<k<n
stopping times (the index k
stands for absolute discrete time in this formulation). When samples (X
t
1
, . . . , X
t
n
) can
easily be simulated, (U
k
)
0<k<n
becomes the quantity of interest. When one is dealing with
the Snell envelope of the Euler scheme, (U
n
) remains a tool in the error analysis.
The Gaussian Euler scheme with general step . 0 (here = T=n) is recursively
dened by X
0
:= X
0
and,
\k N, X
k1
:= X
k
b(X
k
) (X
k
)
_
k1
, (12)
where
k
:= (B
k
B
( k1)
)=
_
, k > 1, are i.i.d. A(0, I
q
)distributed. The sequence
(X
k
)
0<k<n
is a homogeneous Markov chain with transition given on bounded Borel
functions by
P
( f )(x) =
_
R
q
f (x b(x)
_
(x):u)e
[u[
2
=2
du
(2)
q=2
: (13)
The discretetime (T
t
k
)
0<k<n
(h, f )Snell envelope (U
k
)
0<k<n
of (X
k
)
0<k<n
is
U
k
:= ess sup E h(, X
)
T
n
i=k1
f (t
i
, X
i
, U
i
)=T
t
k
_ _
,
k
_ _
: (14)
1010 V. Bally and G. Page`s
One crucial fact for our purpose is that both transitions of interest P
(x, dy)
are Lipschitz in the sense of the following denition.
Denition 1. A transition (P(x, dy))
xR
d is KLipschitz if,
\g : R
d
R, Lipschitz continuous, [Pg]
Lip
< K[ g]
Lip
: (15)
Proposition 2. Assume that the drift b : R
d
R
d
and the diffusion coefcient
: R
d
/(d 3 q) of the diffusion X are Lipschitz continuous. Set := T=n.
(a) Euler scheme. Then P
1
0
(2
0
(1 ))
_
= 1
0
(1
0
=2) O(
2
):
If, furthermore, b and satisfy the socalled asymptotic atness assumption,
a . 0, \x, y R
d
,
1
2
(x) ( y)
2
(x y[b(x) b( y)) < a[x y[
2
,
then K
Euler
<
1 2a
2
2
0
_
= 1 a O(
2
).
(b) Diffusion. The transition P
= exp(
0
(1
0
=2)):
If furthermore, the asymptotic atness assumption holds, then
K
diff
= exp(a):
Proof. (a) Let g : R
d
R be a Lipschitz continuous function. Then, for every x, y R
d
,
[P
( g)(x) P
( g)( y)[
2
< E([ g(x b(x)
_
(x)
1
) g( y b( y)
_
( y)
1
)[
2
)
< [ g]
2
Lip
E([x b(x)
_
(x)
1
( y b( y)
_
( y)
1
)[
2
)
< [ g]
2
Lip
([x y[
2
(x) ( y)
2
2(x y[b(x) b( y))
2
[b(x) b( y)[
2
)
< [ g]
2
Lip
(1
2
0
2
0
2
2
0
)[x y[
2
:
The asymptotically at case is established similarly.
(b) Itos formula implies (with obvious notation) that
Solving multidimensional discretetime optimal stopping problems 1011
[X
x
t
X
y
t
[
2
= [x y[
2
2
_
t
0
((X
x
s
X
y
s
[b(X
x
s
) b(X
y
s
))
1
2
tr((X
x
s
)
(X
y
s
))((X
x
s
) (X
y
s
))
+
)ds
_
t
0
(X
x
s
X
y
s
[((X
x
s
) (X
y
s
))dB
s
) (true martingale)
E[X
x
t
X
y
t
[
2
< [x y[
2
0
(2
0
)
_
t
0
E[X
x
s
X
y
s
[
2
ds:
Gronwalls lemma nally leads to E[X
x
t
X
y
t
[
2
< [x y[
2
exp (
0
(2
0
)t).
In the asymptotically at case, one veries that t E[X
x
t
X
y
t
[
2
is differentiable and
satises
d
dt
E[X
x
t
X
y
t
[
2
< a E[X
x
t
X
y
t
[
2
,
whence the result claimed. h
Remark 1. The result of the above proposition still holds if one considers an Euler scheme
where the increments
k
are simply squareintegrable, centred and normalized.
Remark 2. The simplest asymptotically at transitions are the Euler scheme of the
OrnsteinUhlenbeck process dY
t
:=
1
2
Y
t
dt dB
t
for which the property holds with a =
1
4
.
Bally and Page`s (2003) carry out an analysis of the L
p
error induced by considering the
discretetime (h, f )Snell envelopes (U
k
)
0<k<n
and (U
k
)
0<k<n
instead of the process
(Y
t
)
t[0,T]
. The main result is summed up in the proposition below.
Proposition 3. Assume that (5)(7) hold and that X
0
= x R
d
.
(a) Lipschitz continuous setting. Let p > 1. Then
\k 0, . . . , n, Y
t
k
U
k

p
Y
t
k
U
k

p
< C
p
e
C
p
T
(1 [x[)
1
n
_ ,
where C
p
is a positive real constant depending upon p, b, , f and h (by means of
0
).
(b) Semiconvex setting. Assume, furthermore, that f is c
1,2,2
b
(the set of c
1,2,2
functions f
whose existing partial derivatives are all bounded.) and that h is semiconvex in
following sense:
\t [0, T], \x, y R
d
, h(t, y) h(t, x) > (
h
(t, x)[ y x) r[x y[
2
, (16)
where
h
is a bounded function on [0, T] 3R
d
and r > 0. Then, the (h, f )Snell
envelope of the discretized diffusion (X
t
k
)
0<k<n
satises, for every p > 1,
\k 0, . . . , n, Y
t
k
U
k

p
< C
p
e
C
p
T
(1 [x[)
1
n
:
(The real constant C
p
is a a priori different from that in the Lipschitz continuous case.)
1012 V. Bally and G. Page`s
Remark 3. The semiconvexity assumption is a generalization of convexity which embodies
smooth enough functions. This notion seems to have been introduced in Caverhill and
Webber (1990) for pricing onedimensional American options. See also Lamberton (2002) for
recent developments in nance.
Remark 4. If h(t, ) is convex for every t [0, T] with a bounded spatial derivative
h
(t, )
(in the distribution sense), then h is semiconvex with r = 0. Thus, it embodies most
American style payoff functions used in mathematical nance for options pricing like those
involving the positive part of linear combinations or extrema of the underlying traded asset).
Remark 5. If h(t, ) is c
1
for every t R
(X
i=k1
f
i
(X
i
, U
i
)[T
k
_ _
, k, . . . , nvalued T
l
stopping time
_ _
:
(17)
In fact, the (h, f )Snell envelope is simply connected with the regular Snell envelope
appearing in optimal stopping theory: one veries that V
k
:= U
k
k
i=1
f
i
(X
i
, U
i
) is the
standard Snell envelope of the T
k
adapted sequence Z
k
:= h
k
(X
k
)
k
i=1
f
i
(X
i
, U
i
).
Hence, following Neveu (1971), for example, V is the Snell envelope of Z, that is, it
satises the backward dynamic programming principle
V
n
= Z
n
and V
k
= max( Z
k
, E(V
k1
[T
k
)):
(U
k
)
0<k<n
is found to satisfy the dynamic programming principle
U
n
:= h
n
(X
n
),
U
k
:= max(h
k
(X
k
), E(U
k1
f
k1
(X
k1
, U
k1
)[T
k
)), 0 < k < n 1: (18)
It is thus the (h, f )Snell envelope of X. Henceforth E([T
k
) will be written simply as E
k
().
At this stage, a straightforward induction using the Markov property shows that, for every
k 0, . . . , n, U
k
= u
k
(X
k
), where, u
k
is recursively dened by
u
n
:= h
n
,
u
k
:= max(h
k
, P(u
k1
f
k1
(, u
k1
))), 0 < k < n 1: (19)
Solving multidimensional discretetime optimal stopping problems 1013
Example. The time discretization of an RBSDE corresponds to functions
f
k
(x, u) :=
T
n
f
kT
n
, x, u
_ _
and h
k
(x) := h
kT
n
, x
_ _
, 0 < k < n: (20)
1.2.1. The quantization tree algorithm: a pseudoSnell envelope
The starting point of the method is to discretize at every step k 0, . . . , n the random
vector X
k
using a (X
k
)measurable random vector
^
XX
k
that takes nitely many values. The
random variable
^
XX
k
is called a quantization of X
k
. One may always associate with
^
XX
k
a
Borel function q
k
: R
d
R
d
such that
^
XX
k
= q
k
(X
k
). The function q
k
is often called a
quantizer (this terminology comes from signal processing and information theory; see
Section 3.1). It will be convenient to call the nite subset
k
:= q
k
(R
d
) =
^
XX
k
() a
quantization grid, or simply grid. The size of a grid
k
will be denoted by N
k
. The
elements of a quantization grid are called elementary quantizers. We will denote by
N := N
0
N
1
. . . N
n
the total number of elementary quantizers used to quantize all
the X
k
, 0 < k < n.
We now wish to approximate the Snell envelope (U
k
)
0<k<n
by a sequence (
^
UU
k
)
0<k<n
formally dened by a dynamic programming algorithm similar to (18), except that the
random vector X
k
is replaced by its quantization
^
XX
k
, for every k 0, . . . , n, and the
conditional expectation E
k
(i.e the past of the ltration T up to time k) is replaced by
the conditional expectation given
^
XX
k
i.e. E([
^
XX
k
). Henceforth, for the sake of simplicity,
E([
^
XX
k
) will denoted
^
EE
k
().
Assume temporarily that, for every k 0, 1, . . . , n, we have access to an appropriate
quantization
^
XX
k
= q
k
(X
k
) of X
k
. The optimal choice of the grid
k
and the quantizer q
k
that yield the best possible approximation will be investigated in Section 3.1.
Thus, the pseudoSnell envelope is dened by mimicking the original one (18) as follows:
^
UU
n
:= h
n
(
^
XX
n
),
^
UU
k
:= max(h
k
(
^
XX
k
),
^
EE
k
(
^
UU
k1
f
k1
(
^
XX
k1
,
^
UU
k1
)), 0 < k < n 1: (21)
The main reason for considering conditional expectation with respect to
^
XX
k
is that the
sequence (
^
XX
k
)
kN
does not satisfy the Markov property. The quantization tree algorithm then
simply consists in rewriting the pseudoSnell envelope in distribution.
Proposition 4 (Quantization tree algorithm). For every k 0, . . . , n, let
k
:=
x
k
1
, . . . , x
k
N
k
denote a quantization grid of the distribution L(X
k
) and q
k
its quantizer.
For every k 0, . . . , n 1, i 1, . . . , N
k
, j 1, . . . , N
k1
k
ij
:= P(
^
XX
k1
= x
k1
j
[
^
XX
k
= x
k
i
): (22)
One denes the functions ^ uu
k
by the backward induction
1014 V. Bally and G. Page`s
^ uu
n
(x
n
i
) := h
n
(x
n
i
), i 0, . . . , N
n
,
^ uu
k
(x
k
i
) := max h
k
(x
k
i
),
N
k1
j=1
k
ij
^ uu
k1
(x
k1
j
) f
k1
(x
k1
j
, ^ uu
k1
(x
k1
j
))
_ _
_
_
_
_
, (23)
k 0, . . . , n 1, i 1, . . . , N
k
:
Then ^ uu
k
(
^
XX
k
) =
^
UU
k
, 0 < k < n, is the pseudoSnell envelope dened by (21).
Note that if L(X
0
) :=
x
0
, then ^ uu
0
(
^
XX
0
) = ^ uu
0
(x
0
) is deterministic, otherwise
E ^ uu
0
(
^
XX
0
) =
N
0
i=1
p
0
i
^ uu
0
(x
0
i
) with p
0
i
:= P(
^
XX
0
= x
0
i
), 1 < i < N
0
:
Implementing procedure (23) on a computer raises two questions. The rst is how to estimate
numerically the above coefcients
k
ij
. The second is whether the complexity of the
quantization tree algorithm is acceptable.
As far as practical implementation is concerned, the ability to compute the
k
ij
(and the
p
0
i
) at a reasonable cost is the key to the whole method. The most elementary solution is
simply to process a widescale Monte Carlo simulation of the Markov chain (X
k
)
0<k<n
(see
Section 3.3). The estimation of the coefcients is based on the representation formula (22)
of
k
ij
as expectations of simple functions of (X
k
, X
k1
). Furthermore, the a priori error
bounds for U
k
^
UU
k

p
that will be derived in Theorem 2 below all rely on the L
p

quantization errors X
k
^
XX
k

p
, 0 < k < n, which can be simultaneously approximated.
So, the parameters of interest can be evaluated provided that independent paths of the
Markov chain (X
k
)
0<k<n
can be simulated at a reasonable cost. This amounts to the
efcient simulation of some P(x, dy)distributed random vectors for every x R
d
.
We will see in Section 3.3.1 that this rst approach can be improved by combining this
Monte Carlo simulation with the grid optimization procedure.
Turning now to the complexity of the algorithm, a quick look at the structure of the
quantization tree algorithm (23) shows that going from layer k 1 down to layer k requires
kN
k
N
k1
elementary computations (where k . 0 denotes the average number of
computations per link i j). Hence, the cost of completing the tree descent is
Complexity = k(N
0
N
1
. . . N
k
N
k1
. . . N
n1
N
n
),
so that
k
n
(n 1)
2
N
2
< Complexity < k
N
2
4
:
The lower bound holds for N
k
= N=(n 1), 0 < k < n, the upper one for
N
0
= N
1
= N=2, N
k
= 0, 2 < k < n, which is clearly unrealistic. The optimal dispatching
(see, for example, the practical comments in Section 6.1) leads to a complexity close to the
lower bound.
However this is a very pessimistic analysis of the complexity. In fact, in most examples
Solving multidimensional discretetime optimal stopping problems 1015
such as the Euler scheme the Markov transition P(x, dy) is such that, at each step k,
most coefcients of the quantized transition matrix [
k
ij
] are so small that their estimates
produced by the Monte Carlo simulation turn out to be 0. This is taken into account to
speed up the computer procedure so that the practical complexity of the algorithm is O(N).
This can be compared to the complexity of a CoxRossRubinstein onedimensional
binomial tree with
2N
_
time steps which contains approximately N points.
2.2.2. Convergence and rate
The aim now is to provide some a priori L
p
error bounds for U
k
^
UU
k

p
, 0 < k < n,
based on the L
p
quantization errors X
k
^
XX
k

p
, 0 < k < n (keeping in mind that these
quantities can simply be estimated during the Monte Carlo simulation of the chain).
The main necessary assumption on the Markov chain here is that its transition P(x, dy) is
Lipschitz (see Denition 1). This assumption is natural, as emphasized by the above
Proposition 2: the transitions of a diffusion and of its Euler scheme are both Lipschitz if its
coefcients are Lipschitz continuous. The rst task is to evaluate the Lipschitz regularity of
the functions u
k
dened by (19) in that setting.
Proposition 5. Assume that the functions h and f are Lipschitz continuous, uniformly with
respect to k, that is, for every k 0, . . . , n,
\x, x9 R
d
, [h
k
(x) h
k
(x9)[ < [h]
Lip
[x x9[, (25)
\x, x9 R
d
, \ u, u9 R, [ f
k
(x, u) f
k
(x9, u9)[ < [ f ]
Lip
([x x9[ [u u9[) (26)
If the transition P is KLipschitz, then the functions u
k
dened by (19) are Lipschitz
continuous. Furthermore, setting L := K(1 [ f ]
Lip
), one obtains
[u
k
]
Lip
<
L
nk
[h]
Lip
K
L 1
[ f ]
Lip
_ _
, if L . 1,
[h]
Lip
(n k)[ f ]
Lip
if L = 1,
max [h]
Lip
,
K
1 L
[ f ]
Lip
_ _
, if L , 1:
_
_
Remark 6. If f = 0 (i.e. regular optimal stopping), the above inequalities read as follows
[u
k
]
Lip
< (K . 1)
nk
[h]
Lip
:
For practical applications, for example to the Euler scheme or to simulatable diffusions,
L ~ 1 c=n so the coefcient L
nk
does not explode as the number n of time steps goes to
innity.
Proof. As u
n
= h
n
, [u
n
]
Lip
< [h]
Lip
. Then, using the inequality
[max(a, b) max(a9, b9)[ < max([a a9[, [b b9[),
1016 V. Bally and G. Page`s
it easily follows from the dynamic programming equality (19) that
[u
k
]
Lip
< max([h]
Lip
, [P(u
k1
f
k1
(, u
k1
))]
Lip
)
< max([h]
Lip
, K([u
k1
]
Lip
[ f ]
Lip
(1 [u
k1
]
Lip
)))
< max([h]
Lip
, L[u
k1
]
Lip
K[ f ]
Lip
):
By induction,
[u
k
]
Lip
< L
k
max
k<i<n
(L
i
[h]
Lip
(L
k
. . . L
i1
)[ f ]
Lip
)
< max
0<j<nk
(L
j
[h]
Lip
L
j
1
L 1
K[ f ]
Lip
) (if L ,= 1),
and we are done. h
We now move on to the main result of this section: some a priori estimates for
U
k
^
UU
k

p
as a function of the quantization error X
k
^
XX
k

p
.
Theorem 2. Assume that the transition (P(x, dy))
xR
d is KLipschitz and that the functions h
and f satisfy the Lipschitz assumptions (25) and (26). Then,
\p > 1, \ k 0, . . . , n, U
k
^
UU
k

p
<
1
(1 [ f ]
Lip
)
k
n
i=k
d
i
X
i
^
XX
i

p
,
with
d
i
:= ([h]
Lip
[ f ]
Lip
(2
2, p
)K(([u
i1
]
Lip
1)([ f ]
Lip
1) 1))(1 [ f ]
Lip
)
i
,
0 < i < n 1,
d
n
:= ([h]
Lip
[ f ]
Lip
)(1 [ f ]
Lip
)
n
: (27)
Proof. Set
k
:= P(u
k1
f
k1
(:, u
k1
)), k = 0, . . . , n 1, and
n
= 0 so that
E(U
k1
f
k1
(X
k1
, U
k1
)[X
k
) =
k
(X
k
). Dene
^
k
similarly by the equality
^
k
(
^
XX
k
) :=
^
EE
k
(
^
UU
k1
f
k1
(
^
UU
k1
,
^
XX
k1
)) and
^
n
= 0. Then
[U
k
^
UU
k
[ < [h
k
(X
k
) h
k
(
^
XX
k
)[ [
k
(X
k
)
^
k
(
^
XX
k
)[
< [h]
Lip
[X
k
^
XX
k
[ [
k
(X
k
)
^
EE
k
(
k
(X
k
))[ [
^
EE
k
(
k
(X
k
))
^
k
(
^
XX
k
)[:
Now
[
k
(X
k
)
^
EE
k
k
(X
k
)[ < [
k
(X
k
)
k
(
^
XX
k
)[
^
EE
k
[
k
(X
k
)
^
EE
k
(
k
(
^
XX
k
))[
< [
k
]
Lip
([X
k
^
XX
k
[
^
EE
k
[X
k
^
XX
k
[):
(Note that
^
XX
k
is T
k
measurable.) Hence,
Solving multidimensional discretetime optimal stopping problems 1017

k
(X
k
)
^
EE
k
(X
k
)
p
< 2[
k
]
Lip
X
k
^
XX
k

p
:
When p = 2, one may drop the factor 2 since the very denition of the conditional
expectation as a projection in a Hilbert space implies that

k
(X
k
)
^
EE
k
(X
k
)
2
< 
k
(X
k
)
k
(
^
XX
k
)
2
< [
k
]
Lip
X
k
^
XX
k

2
:
On the other hand, coming back to the denition of
k
(X
k
) and
^
k
(
^
XX
k
), one obtains, using
the fact that
^
EE
k
E
k
=
^
EE
k
and that conditional expectation is an L
p
contraction,
[
^
EE
k
(
k
(X
k
))
^
k
(
^
XX
k
)[ <
^
EE
k
[U
k1
f
k1
(X
k1
, U
k1
)
^
UU
k1
f
k1
(
^
XX
k1
,
^
UU
k1
)[

^
EE
k
(
k
(X
k
))
^
k
(
^
XX
k
)
p
< U
k1
f
k1
(X
k1
, U
k1
)
^
UU
k1
f
k1
(
^
XX
k1
,
^
UU
k1
)
p
< (1 [ f ]
Lip
)U
k1
^
UU
k1

p
[ f ]
Lip
X
k1
^
XX
k1

p
:
Finally, for every k 0, . . . , n 1,
U
k
^
UU
k

p
< [h]
Lip
X
k
^
XX
k

p

k
(X
k
)
^
EE
k
(
k
(X
k
))
p

k
(X
k
)
^
k
(
^
XX
k
)
p
< (1 [ f ]
Lip
)U
k1
^
UU
k1

p
([h]
Lip
(2
p,2
)[
k
]
Lip
)X
k
^
XX
k

p
[ f ]
Lip
X
k1
^
XX
k1

p
:
Using the inequality U
n
^
UU
n

p
< [h]
Lip
X
n
^
XX
n

p
, standard computations yield
U
k
^
UU
k

p
<
n
i=k
[h]
Lip
(2
p,2
)[
i
]
Lip
[ f ]
Lip
1 [ f ]
Lip
_ _
(1 [ f ]
Lip
)
ik
X
i
^
XX
i

p
[ f ]
Lip
X
k
^
XX
k

p
<
1
(1 [ f ]
Lip
)
k
n
i=k
(1 [ f ]
Lip
)
i
[h]
Lip
[ f ]
Lip
1 [ f ]
Lip
(2
p,2
)[
i
]
Lip
_ _
3X
i
^
XX
i

p
:
Finally, the denition of
i
and the Lipschitz property of P(x, dy) imply that
[
i
]
Lip
= [P(u
i1
f
i1
(, u
i1
))]
Lip
< K(1 [ f ]
Lip
)[u
i1
]
Lip
K[ f ]
Lip
, 1 < i < n 1:
h
2.2.3. Approximation of the (lowest) optimal stopping time
The second quantity of interest in optimal stopping theory is the (set of) optimal stopping
time(s). A stopping time
opt
is optimal for the (h, f )Snell envelope if
U
0
= E
0
h
opt
(X
opt
)
opt
i=1
f
i
(X
i
, U
i
)
_ _
:
1018 V. Bally and G. Page`s
We know (see, Neveu, 1971) that the lowest optimal stopping time is given by
+
:= mink[U
k
= h
k
(X
k
):
In the case of nonuniqueness of the optimal stopping times,
+
plays a special role because
it turns out to be the easiest to approximate. Thus when dealing with quantization of Markov
chains, it is natural to introduce its counterpart for the quantized process, that is,
^
+
:= min k[^ uu
k
(
^
XX
k
) = h
k
(
^
XX
k
)
_ _
:
In fact, the estimation of the error 
+
^
+

p
seems out of reach, essentially because it
is quite difcult to bound these stopping times from below. Nevertheless, we will be able to
approximate
+
(in probability). Let . 0. Set
:= min k[u
k
(X
k
) < h
k
(X
k
) ,
^
:= min k[^ uu
k
(
^
XX
k
) < h
k
(
^
XX
k
) :
_
Proposition 6. (a) For every . 0,
>
+
, ^
> ^
+
. Furthermore

+
and ^
^
+
as 0:
(These stopping times being integervalued,
and ^
= [
3=2
,
=2
]) <
1
n
k=0
(kd
k
[h]
Lip
)
^
XX
k
X
k

1
:
Proof. Part (a) is an obvious corollary of the denitions of
and ^
.
(b) Set Z
k
:= u
k
(X
k
) ^ uu
k
(
^
XX
k
) h
k
(
^
XX
k
) h
k
(X
k
). Then, we may write
^
= min k[u
k
(X
k
) < h
k
(X
k
) Z
k
so that, on the event max
0<k<n
[ Z
k
[ < =2,
3=2
< ^
<
=2
. Subsequently,
P(^
= [
3=2
,
=2
]) < P( max
0<k<n
[ Z
k
[ . [
2
) <
2
Emax
<k<n
[ Z
k
[:
Now, using the notation of Theorem 2, one has
[ Z
k
[ < [
^
UU
k
U
k
[ [h]
Lip
[
^
XX
k
X
k
[
and thus
E max
0<k<n
[ Z
k
[ <
n
k=0

^
UU
k
U
k

1
[h]
Lip

^
XX
k
X
k

1
<
n
k=0
(kd
k
[h]
Lip
)
^
XX
k
X
k

1
:
h
Solving multidimensional discretetime optimal stopping problems 1019
3. Optimization of the quantization
After some brief background material on optimal quantization of a R
d
valued random
vector, this section is devoted to the optimal quantization method of a Markov chain. For a
modern and rigorous overview of quantization of random vectors, see Graf and Luschgy
(2000) and the references therein.
3.1. Optimal quantization of a random vector X
Let X L
p
R
d
(, /, P). Following the terminology introduced in Section 2.2.1, the L
p

quantization ( p > 1) consists in studying the best possible L
p
approximation of X by a
random vector
^
XX := q(X), where q : R
d
R
d
is a Borel function (quantizer) taking at
most N values called elementary quantizers. Set the quantization grid := q(R
d
) :=
x
1
, . . . , x
N
, x
1
, . . . , x
N
R
d
. Minimizing the L
p
quantization error X q(X)
p
consists in two phases:
1. Having set a grid (R
d
)
N
, [[ < N, nd a/the valued quantizer q
that
minimizes X p
(X)
p
among all valued quantizers q (if any).
2. Find a grid of size [[ < N which achieves the inmum of X q
(X)
p
among
all the grids having at most N points (if any).
The solution to phase 1 is provided by any Voronoi quantizer of the grid , also called a
projection following the closestneighbour rule and dened by
q
:=
x
i
x
i
1
C
i
()
,
where (C
i
())
1<i<N
is a Borel partition of R
d
called a Voronoi tessellation, satisfying
C
i
() R
d
[[x
i
[ = min
1<j<N
[ x
j
[:
A given grid of size N > 2 clearly has innitely many Voronoi tessellations, essentially due
to median hyperplanes. However, all the C
i
() have the same convex closure and boundary,
included in at most N 1 hyperplanes. If the distribution of X weights no hyperplane, that
is, P(X H) = 0 for any hyperplane H, then the Voronoi tessellation is Pessentially
unique.
The Voronoi quantization, denoted by
^
XX
:= q
(X), induces an L
p
quantization error
X
^
XX

p
(in information theory X
^
XX

p
p
is called L
p
distortion) given by
X
^
XX

p
p
=
x
i
E 1
C
i
()
[X x
i
[
p
_ _
= E min
1<i<N
[X x
i
[
p
_ _
=
_
R
d
min
1<i<N
[ x
i
[
p
P
X
(d):
(28)
Notice that the quantization error only depends on the distribution of X, whereas the
Voronoi quantizer q

p
= min X Y
p
, Y : , [Y()[ < N
_ _
: (29)
To carry out phase 2 (grid optimization), one derives from (28) that the quantization error
X
^
XX

p
behaves as symmetric Lipschitz continuous function of the components of the
grid := x
1
, . . . , x
N
(with the temporary convention that some elementary quantizers x
i
may be stuck so that [[ < N). One shows (see Abaya and Wise 1992; Page`s 1997; Graf
and Luschgy 2000) that
X
^
XX

p
, [[ < N, always reaches a minimum
at some grid
+
which takes its values in the convex hull of the support of P
X
. One proceeds
by induction. If N = 1, the existence of a minimum is obvious by convexity; then, one may
assume without loss of generality that [X()[ > N. Then [ [[X
^
XX

p
< m
N1
,
[[ < N, with m
N1
:= minX
^
XX

p
, [[ < N 1 = X
^
XX
+
, N1

p
, is a non
empty compact set for small enough . 0 since it contains
+, N1
for some
appropriate X()
+, N1
. This implies the existence of an optimal grid
+, N
. Then,
following (29),
^
XX
+
, N
is the best L
p
approximation of X over the random vectors taking at
most N values, that is,
X
^
XX
+, N

p
= min X Y
p
, Y : R
d
, [Y()[ < N
_ _
: (30)
As an example, an optimal L
2
quantization of the normal distribution is given in Figure 1.
We will need the following properties (see Page`s 1997; Graf and Luschgy 2000; and
references therein).
Property 1. If P
X
has an innite support, any Noptimal grid
+
has N pairwise distinct
components, P
X
(
N
i=1
@C
i
(
+
)) = 0 and N min
[[<N
X
^
XX

p
is decreasing.
Property 2. If the support of P
X
is everywhere dense in its convex hull H
X
, any Noptimal
grid lies in H
X
and any locally Noptimal grid lying in H
X
has exactly N distinct
components.
Property 3. The minimal L
p
quantization error goes to 0 as N :
lim
N
min
[[<N
X
^
XX

p
= 0:
As a matter of fact, set
N
:= z
1
, . . . , z
N
where (z
k
)
kN
is everywhere dense in R
d
. Then,
min
[[<N
X
^
XX

p
< X
^
XX
N

p
goes to 0 by the Lebesgue dominated convergence
theorem.
The rate of this convergence to zero turns out to be a much more challenging problem.
The solution, often referred to as Zadors theorem, was completed by several authors (Zador
1982; Bucklew and Wise 1982; Graf and Luschgy 2000).
Theorem 3 (Asymptotics). If E[X[
p
, for some . 0, then
Solving multidimensional discretetime optimal stopping problems 1021
lim
N
N
p=d
min
[[<N
X
^
XX

p
p
_ _
= J
p,d
_
[ g[
d=(dp)
(u) du
_ _
1p=d
, (31)
where P
X
(du) = g(u):
d
(du) , l
d
(
d
Lebesgue measure on R
d
). The constant J
p,d
corresponds to the case of the uniform distribution on [0, 1]
d
.
Little is known about the true value of the constant J
p,d
except in onedimension, where
J
p,1
=
1
2
p
( p1)
and in twodimensions, where, for example, J
2,2
=
5
18
3
_
(see Gersho and Gray
Figure 1. Optimal L
2
quantization of the normal distribution A(0, I
2
) with a 500tuple and its
Voronoi tessellation
1022 V. Bally and G. Page`s
1982; Graf and Luschgy 2000). Nevertheless some bounds are available, based on the
introduction of random quantization grids (see Zador 1982; Cohort 2003). Thus, as d ,
J
p,d
~ (d=2e)
p=2
(see Graf and Luschgy 2000).
Theorem 3 says that min
[[<N
X
^
XX

p
= O(N
1=d
): this means that optimal
quantization of a distribution P
X
produces (for every grid size N) some grids with the
same rate as that obtained with uniform lattice grids (when N = m
d
) for U([0, 1]
d
)
distributions (in fact, even then, uniform lattice grids are never optimal if d > 2).
Example (Numerical integration). On the one hand, for every p > 1,
X
^
XX

p
p
= max
_
R
d
[j j q
[
p
d P
X
, j Lipschitz continuous , [j]
Lip
< 1
_ _
(the equality stands for the function j : min
x
i
[ x
i
[). This induces a propagation of
the L
p
quantization error by Lipschitz functions (already used in the proof of Theorem 2).
On the other hand, from a numerical viewpoint,
[Ej(
^
XX

1
< [j]
Lip
X
^
XX

p
: (32)
with
Ej(
^
XX
) =
_
R
d
jdq
(P
X
) =
N
i=1
P(X C
i
()) j(x
i
): (33)
(The parameter of interest is mainly p = 2, for algorithmic reasons.) The numerical
computation of E j(
^
XX

p
. See Page`s (1997) and Fort and Page`s (2002) for further numerical integration
formulae (Holder, c
2
, locally Lipschitz continuous functions).
3.2. How to obtain optimal quantization
As far as numerical applications of optimal quantization of a random vector X are
concerned, it has been emphasized above that we need an algorithm which produces an
optimal (or at least a suboptimal) grid
+
:= x
+
1
, . . . , x
+
N
, the P
X
mass of its Voronoi
tessellation (P
X
(C
i
(
+
)))
1<i<N
, and the resulting L
p
quantization error X
^
XX
+

p
.
3.2.1. Onedimensional quadratic setting (d = 1, p = 2)
In onedimension, an algorithm, known as Lloyds Method I, appears as a byproduct of the
uniqueness problem for optimal grids (in fact stationary grids, see (36) below): for every
grid of size N, one sets
T() =
_
C
i
()
P
X
(d)[P
X
(C
i
())
_ _
1<i<N
:
Solving multidimensional discretetime optimal stopping problems 1023
The grid T() has size N and satises X
^
XX
T()

2
< X
^
XX

2
. If P
X
has a log
concave density function, then T is contracting (see Kieffer 1982; Lloyd 1982; Trushkin
1982). Its unique xed point
+
is clearly an optimal grid and the resulting (deterministic)
iterative algorithm,
t1
= T(
t
), converges exponentially fast towards
+
.
3.2.2. Multidimensional setting (d > 2)
Lloyds Method I formally extends to higher dimensions. Since there are always several
suboptimal quantizers, T can no longer be contracting, although it still converges to some
stationary quantizer (see (36) below), usually not optimal, even locally. Its major drawback
is that it involves at each step several integrals
_
C
i
()
jd P
X
(another is that it only works
for p = 2). This suggests randomizing Lloyds Method I by computing these integrals using
Monte Carlo simulations of P
X
distributed randomvectors.
There is another randomized procedure that can be called upon to nd the critical points
of a function when its gradient admits an integral representation with respect to a
probability distribution: stochastic gradient descent (the stochastic counterpart of
deterministic gradient descent). Let us recall briey what this procedure is.
Let V be a differentiable potential function V : R
M
R such that
lim
[ y[
V( y) = , [=V[
2
= O(V), =V is Lipschitz continuous, =V = 0 is locally
nite, and =V has an integral representation =V( y) = E=
y
v( y, X), where X is an R
L

valued random vector. Let (
t
)
t>1
be a sequence of positive gain parameters satisfying
t
= and
2
t
, . Classical stochastic approximation theory says that the
recursive algorithm
Y
t1
= Y
t
t1
=
y
v(Y
t
,
t1
),
t
i:i:d:,
1
~
L
X, (34)
converges almost surely towards some critical point y
+
=V = 0 of V (for various results
in this direction, see Duo 1997; Kusher and Yin 1997). Under some additional assumptions,
one shows that y
+
is necessary a local minimum (see Pemantle 1990; Lazarev 1992;
Brandie`re and Duo 1996).
Let us return to our optimal quantization problem. We have already noticed that equation
(28) denes a symmetric continuous function on (R
d
)
N
, namely
D
X, p
N
(x
1
, . . . , x
N
) := X
^
XX
x
1
,...,x
N

p
p
= E(d
X, p
N
(x, X)),
with
d
X, p
N
(x
1
, . . . , x
N
, ) := min
1<i<N
[ x
i
[
p
, (x
1
, . . . , x
N
) (R
d
)
N
, R
d
:
D
X, p
N
and d
X, p
N
are called distortion and local distortion functions, respectively. One can show
for p = 2, see Graf and Luschgy 2000; otherwise, see Page`s 1997 that for every p . 1, D
X, p
N
is continuously differentiable at every Ntuple x := (x
1
, . . . , x
N
) (R
d
)
N
such that x
i
,= x
j
,
i ,= j and P
X
(
N
i=1
@C
i
(x)) = 0. The gradient =D
X, p
N
(x) is given by a formal differentiation,
1024 V. Bally and G. Page`s
=D
X, p
N
(x) := E(=
x
d
X, p
N
(x, X)) =
_
R
d
@d
X, p
N
(x, )
@x
i
P
X
(d)
_ _
1<i<N
, (35)
with
@d
X, p
N
@x
i
(x, ) := p1
C
i
(x)
[x
i
[
p1
x
i
[x
i
[
, 1 < i < N
(set
~
00=
~
00 :=
~
00). Equality (35) still holds when p = 1 if P
X
is continuous. A grid
x
1
, . . . , x
N
(with N pairwise distinct components and P
X
negligible Voronoi tessellation
boundaries) is
a stationary grid if =D
X, p
N
(x
1
, . . . , x
N
) = 0: (36)
Then, following Property 1, any optimal grid is stationary.
Setting V := D
X, p
N
and plugging the above formula for =
x
d
X, p
N
into the abstract stochastic
gradient procedure (34) yields (reverting to the grid notation
t
= X
t
1
, . . . , X
t
N
)
t1
=
t
t1
p
=
x
d
X, p
N
(
t
,
t1
),
0
R
d
, [
0
[ = N,
t
i:i:d:,
1
~ P
X
, (37)
where the step sequence (
t
)
t>1
is (0, 1)valued and satises, as usual,
t
= and
2
t
, . The fact that
t
(0, 1) for every t N ensures that [
t
[ = N.
Unfortunately, the assumptions that make the stochastic gradient descent almost surely
converge are never satised by the L
p
distortion function D
X, p
N
which is not a true potential
function for at least two reasons. First, the gradient =D
X, p
N
does not exist at Ntuples of
(R
d
)
N
having stuck components (although it remains locally bounded). So, it cannot be
Lipschitz or even Holder continuous. Second, the L
p
distortion D
X, p
N
(x) does not go to
innity as [x[ := [x
1
[ . . . [x
N
[ (only if min
1<i<N
[x
i
[ ).
However, D
X, p
N
turns out to be a fairly good potential function for practical
implementation, especially in the quadratic case p = 2 (see Figures 1 and 3 and the
CLVQ algorithm below). One does not observe in simulations some components of
t
becoming asymptotically stuck in (37) when t in spite of the structure of the
potential function. Although many stationary grids exist, it does converge toward a grid
+
,
apparently close to optimality.
This quadratic case corresponds to a very commonly implemented procedure in automatic
classication called Competitive Learning Vector Quantization (CLVQ), also known as the
Kohonen algorithm with 0 neighbour. As we are motivated by numerical applications, we
need to compute the target grid
+
with great accuracy. This usually means that
signicantly more iterations of the stochastic optimization procedure (37) are required than
in other applications. When p = 2, the recursive procedure (37) can be described in a more
geometric way. Set
t
:= X
t
1
, . . . , X
t
N
.
Solving multidimensional discretetime optimal stopping problems 1025
1. Competitive phase. Select i(t 1) argmin
i
[X
t
i
t1
[ (closest neighbour).
2. Learning phase. Set
X
t1
i( t1)
:= X
t
i( t1)
t1
(X
t
i( t1)
t1
),
X
t1
i
:= X
t
i
, i ,= i(t = 1):
Concerning numerical implementations of the algorithm, notice that, at each step the grid
t1
lives in the convex hull of
t
and
t1
since the learning phase is simply a homothety
centred at
t1
with ratio 1
t1
. 0 (see Figure 2). This has a stabilizing effect on the
procedure, which explains why one veries in simulations that the CLVQ algorithm does
behave better than its nonquadratic counterparts. Finally, one often renes the CLVQ by
processing a randomized Lloyds I method (see Page`s and Printems 2003).
Figure 3 shows some planar quantizations obtained using the CLVQ algorithm for some
simulations in the quadratic case, see Page`s and Printems 2003).
The main drawback of the CLVQ algorithm is that it is slow: roughly speaking, like most
recursive stochastic algorithms, its rate of convergence is ruled by a central limit theorem
(CLT) at rate 1=
t
_
(cf. Duo 1997). Moreover, at each iteration, the computation of the
winning index i
0
in the learning phase is timeconsuming if N is large. Speeding up the
algorithm requires both these aspects to be addressed. First, one may use some deterministic
sequences with low discrepancy instead of pseudorandom numbers to implement the
algorithm (see Lapeyre et al. 1990) or call upon some averaging methods which reduce the
variance in the CLT theorem (see Pelletier 2000, and the references therein). To cut down
X
t
i
X
t
i
0
t 1 Hom(
t 1
; 1
t 1
)
X
t
j
X
t
1
Figure 2. Competitive Learning Vector Quantization
1026 V. Bally and G. Page`s
the winning index search time, one may implement some fast (approximate) search
procedures. For recent developments, see Gersho and Gray (1992, pp. 332 and 479).
The last practical question of interest is the choice of the starting grid
0
. This question
is clearly connected with the existence of several local minima for the distortion. One way
is to start from a random Ngrid obtained by simulation. An alternative is to process a
socalled splitting method: one progressively adds some (optimal) quantizers with smaller
size to the current state grid: the heuristic idea is that if an Ngrid
+
N
(almost) achieves the
the minimal distortion, then the (N )grid ( N)
+
N
+
+

p
can be obtained online for free as a byproduct of the CLVQ stochastic
gradient descent as soon as
t
converges to
+
(this holds true for any p > 1).
Proposition 7. Let p > 1. Assume that X L
p
, . 0, and P
X
weights no hyperplane. Set,
for every t > 1,
Figure 3. Planar quantizations obtained by CLVQ: (a) P
X
= U([0, 1]
2
); (b) P
X
:= S(1=2)
2
; (c)
P
X
:= A 0
1
2
_
=2
2
_
=2 1
_ _ _ _
.
Solving multidimensional discretetime optimal stopping problems 1027
p
t
i
:=
[1 < s < t[
s
C
i
(
s1
)[
t
,
the empirical frequency of the
s
falling in the ith tessellation C
i
(
s1
) of the (moving)
Voronoi tessellation of
s1
up to time t (i = 1, . . . , N). Also set
D
, t
:=
1
t
t
s=1
min
1<i<N
[
s
X
s1
i
[
, (0, p ),
the average of the lowest distance of
s
(to the power ) to the moving quantization grid
s1
= X
s1
1
, . . . , X
s1
N
, 1 < s < t.
Let
+
be a (stationary) grid (see (36)) and let A
+ :=
t
+
be the set of
convergence of
t
towards
+
. Then, on the event A
+,
\i 1, . . . , n, p
t
i
a:s:
p
i
:= P
X
(C
i
(
+
)) as t ,
\ (0, p), D
, t
a:s:
D
X,
N
(
+
) as t :
Remark 7. The above quantities p
t
and D
, t
obviously satisfy the recursive procedures
p
t
i
= p
t1
i
1
t
( p
t1
i
1
t
C
i
( X
t1
)
), D
, t
= D
, t1
1
t
D
, t1
min
1<i<N
[X
t1
i
t
[
_ _
:
(38)
In fact, the conclusion of Proposition 7 still holds if (1=t)
t>1
is replaced in (38) by any
positive sequence (~
t
)
t>1
satisfying
t
~
t
= and
t
~
( p)=
t
, . Natural choices for
~
t
are 1=t or the original step
t
used in the learning phase of the CLVQ algorithm,
depending on the range of the simulation (see Page`s and Printems 2003).
Proof. For notational convenience, a generic Ntuple x = (x
1
, . . . , x
N
) (R
d
)
N
will be
denoted by its grid notation = x
1
, . . . , x
N
. Let : (R
d
)
N
3R
d
R be a Borel function
satisfying [(, )[ < M[[
t>1
t
( p)=
, imply
that the martingale
1<s<t
((
s1
,
s
) j(
s1
))=s converges almost surely towards a
nite random variable Z
t
s=1
(
s1
,
s
) j(
s1
)
a:s:
0:
Finally, the continuity of j at
+
yields
1
t
t
s=1
(
s1
,
s
)
a:s:
j(
+
) on A
+:
1028 V. Bally and G. Page`s
One rst applies this result to the indicator functions
i
(, ) := 1
C
i
()
(), 1 < i < n; the
associated j functions are continuous at
+
because P
X
weights no hyperplane.
Then set
(, ) := r() min
1<i<N
 x
i

+, (
s
,
s1
) =
min
1<i<N

s1
X
s
i

1
:= (
1
0
, . . . ,
1
n
), . . . ,
t
:= (
t
0
, . . . ,
t
n
), . . . . Our aim is to produce for every
k 0, . . . , n some optimal grids
k
:= x
k
1
, . . . , x
k
N
k
with size N
k
, their transition
kernels [
k
ij
] and their L
p
quantization errors. Note that, if one sets
p
k
i
:= P(X
k
C
i
(
k
)), p
k
ij
:= P(X
k1
C
j
(
k1
) X
k
C
i
(
k
)),
then
k
ij
:= P(
^
XX
k1
= x
k1
j
[
^
XX
k
= x
k
i
) =
p
k
ij
p
k
i
,
i = 1, . . . , N
k
, j = 1, . . . , N
k1
, k = 0, . . . , n 1:
In the quadratic case ( p = 2), the extended CLVQ algorithm proceeds as follows:
1. Initialization phase.
Initialize the n 1 starting grids
0
k
:= x
0, k
1
, . . . , x
0, k
N
k
, k = 0, . . . , n, of the n 1
CLVQ algorithms that will quantize the distributions
k
.
Initialize the marginal counter vectors
k,0
i
:= 0, i = 1, . . . , N
k
, for every
k = 0, . . . , n.
Solving multidimensional discretetime optimal stopping problems 1029
Initialize the transition counters
k,0
ij
:= 0, i = 1, . . . , N
k
, j = 1, . . . , N
k1
,
k = 0, . . . , n 1.
2. Updating t ~ t 1. At step t, the n 1 grids
t
k
, k = 0, . . . , n, have been
obtained. We now use the sample
t1
to carry on the optimization process, building
up the grids
t1
k
as follows. For every k = 0, . . . , n:
Simulate
t1
k
(using
t1
k1
if k > 1).
Select the winner in the kth CLVQ algorithm, that is, the only index
i
t1
k
1, . . . , N
k
satisfying
t1
k
C
i
t1
k
(
t
k
):
Update the kth CLVQ algorithm:
\i 1, . . . , N
k
,
t1
k,i
=
t
k,i
t1
1
i=i
t1
k
(
t
k,i
t1
k
):
Update the kth marginal counter vector
k, t
:= (
k, t
i
)
1<i<N
k
:
\i 1, . . . , N
k
,
k, t1
i
:=
k, t
i
1
i=i
t1
k
:
Update the (quadratic) distortion estimator D
k, t
:
D
k, t1
:= D
k, t
1
t 1
([
t
k,i
t1
k
t1
[
2
D
k, t
):
Update the transition counters
k, t
:= (
k, t
ij
)
1<i<N
k1
,1<j<N
k
(k > 1):
\i 1, . . . , N
k1
, \j 1, . . . , N
k
,
k1, t1
ij
:=
k1, t
ij
1
i=i
t1
k1
, j=i
t1
k
:
Update the transition kernels (
k, t
ij
)
1<i<N
k1
,1<j<N
k
(k > 1):
k, t1
ij
:=
k, t1
ij
k, t1
i
(possibly only once at the end of the simulation process!):
Following Proposition 7 one has, on the event
t
k
+
k
, D
k, t
t
D
X
k
,2
N
k
(
+
k
) and
k, t
t
t
( p
+, k
i
)
1<i<N
k
= (P(X
k
C
i
(
+
k
)))
1<i<N
k
(thanks to (39)):
The same martingale approach shows that, on the event
t
k
+
k
t
k1
+
k1
,
k, t
t
t
p
+, k
ij
= (P(X
k
C
i
(
+
k
), X
k1
C
j
(
+
k1
)))
1<i<N
k
,1<j<N
k1
so that, on the event
t
k
t
+
k
, k = 0, . . . , n, for every k 1, . . . , n,
k, t
ij
t
+, k
ij
, 1 < i < N
k
, 1 < j < N
k1
:
The main features of this algorithm are essentially those of the original CLVQ algorithm.
Moreover, note that the forward optimization of the grids and the weight computation are not
recursive in k, so there is no deterioration of the optimization process as k increases.
1030 V. Bally and G. Page`s
When p ,= 2 we employ an L
p
optimization procedure (extended L
p
CLVQ) by using the
general L
p
formula (37) in the grid updating phase.
3.3.2. A priori optimal dispatching of optimal quantizer sizes
The above grid optimization procedures for Markov chains (extended CLVQ and its L
p

variants) produce optimal grids
k
for a given dispatching of their sizes N
k
. This raises the
optimization problem of how to dispatch a priori the sizes N
0
, . . . , N
n
of the quantization
grids, assumed to be L
p
optimal, if one wishes to use at most N > N
0
. . . N
n
elementary quantizers.
Let p > 1. Formula (27) in Theorem 2 provides some positive real coefcients
d
0
, d
1
, . . . , d
n
such that, for any sequence of quantizations (
^
XX
k
)
0<k<n
,
u
0
(X
0
) ^ uu
0
(
^
XX
0
)
p
<
n
i=0
d
i
X
i
^
XX
i

p
: (40)
Then, the best we can do is to specify the sizes N
k
that minimize the righthand side of (40)
when all the quantization vectors
^
XX
k
s are L
p
optimal, that is,
min
( N
0
...N
n
<N)
n
i=0
d
i
3X
i
^
XX
i

p
with X
i
^
XX
i

p
= min
[[<N
i
X
i
^
XX
i

p
, 0 < i < n:
This will also produce an asymptotic bound for the resulting L
p
error
^ uu
0
(
^
XX
0
) u
0
(X
0
)
p
. The key is Theorem 3, which says that N
1=d
k
min
[[<N
k
X
k
^
XX
k

p
converges to some positive constant as N
k
.
Proposition 8. Assume that all the distributions L(X
k
) have an absolutely continuous part
j
k
, 0 < k < n. Let N N
+
. Set, for every i 0, . . . , n,
N
i
:= [r
i
N] and r
i
:=
a
i
0<j<n
a
j
with a
i
:= j
i

1= p
d=(dp)
d
i
_ _
d=(d1)
, 0 < i < n:
(41)
Assume that all the quantizations
^
XX
k
of the X
k
are L
p
optimal with size N
k
. Then,
lim
N
N
1=d
max
0<i<n
u
i
(X
i
) ^ uu
( N
i
)
i
(
^
XX
i
)
p
< J
1= p
p,d
n
i=0
a
i
_ _
11=d
(42)
where J
p,d
is dened in Theorem 3.
The following simple lemma solves the continuous bit allocation problem.
Lemma 1. Let a
0
, . . . , a
n
be some positive real numbers. Then the function
r := (r
0
, . . . , r
n
)
n
i=0
a
i
r
1=d
i
dened on the set r
0
. . . r
n
= 1, r
i
> 0,
0 < i < n reaches its minimum at
Solving multidimensional discretetime optimal stopping problems 1031
r :=
a
d=(d1)
i
a
d=(d1)
0
. . . a
d=(d1)
n
_ _
0<i<n
so that
n
i=0
a
i
r
1=d
i
=
n
i=0
a
d=(d1)
i
_ _
11=d
:
Note that this minimum value is nondecreasing as a function of the a
i
and that
n
j=0
a
d=(d1)
j
_ _
11=d
< (n 1)
1=d
n
i=0
a
i
:
Proof of Proposition 8. First, rewrite (40) as follows:
N
1=d
u
0
(X
0
) ^ uu
( N
0
)
0
(
^
XX
0
)
p
<
n
i=0
d
i
N
i
N
_ _
1=d
(N
1=d
i
X
i
^
XX
i

p
),
keeping in mind that every
^
XX
k
is L
p
optimal with size N
k
. Then,
lim
N
N
1=d
max
0<i<n
u
i
(X
i
) ^ uu
( N
i
)
i
(
^
XX
i
)
p
<
n
i=0
d
i
lim
N
N
N
i
_ _
1=d
N
1=d
i
X
i
^
XX
i

p
_ _
_ _
=
n
i=0
d
i
r
1=d
i
lim
N
N
1=d
i
X
i
^
XX
i

p
_ _
:
Now, as N
i
for every i 0, . . . , n, Theorem 3 implies that
lim
N
N
1=d
i
X
i
^
XX
i

p
= (J
p,d
j
i

d=(dp)
)
1= p
. Then
lim
N
N
1=d
max
0<i<n
u
i
(X
i
) ^ uu
( N
i
)
i
(
^
XX
i
)
p
< J
1= p
p,d
n
i=0
d
i
j
i

1= p
d=dp
r
1=d
i
< J
1= p
p,d
n
i=0
j
i

d=( p(d1))
d=(dp)
d
d=(d1)
i
_ _
11=d
:
and the theorem is proved. h
Remark 8. The above asymptotic bound (42) is not fully satisfactory: if one wishes to
optimize the number n of time steps as a function of the number N of elementary quantizers,
one needs some a priori error bounds for a given couple (n, N). To this end we will need to
control the distributions of the X
k
, namely to dominate them by a xed distribution up to
some afne time scaling. This can be done, for example, with some uniformly elliptic
diffusions (X
t
k
)
0<k<n
(see Section 6.1) or with their Doleans exponentials (see Bally et al.
2002b).
1032 V. Bally and G. Page`s
4. Weight estimation in the quantization tree: statistical error
In this short section we summarize some results developed in Bally and Page`s (2003). First,
although the question of the rate of convergence of the quantization grid optimization and
the companion parameter estimation is natural, one must keep in mind that this is a one
shot phase of the process.
We will not discuss rigorously the error induced on the coefcients
k
ij
by the online
estimation procedure (38): it would lead us too far, given the partial results available on the
rate of convergence of the CLVQ algorithm. However, let us mention that, under some
appropriate assumptions (see Duo 1997, p. 52), one can show that a recursive stochastic
algorithm like (34) with step
t
satises a CLT with rate 1=
t
_
as t . Thus, if
t
~ =t (with large enough), a standard CLT holds as for the regular Monte Carlo
method (for more details concerning the rate of convergence of the CLVQ algorithm in one
dimension, see Page`s and Printems 2003).
A less ambitious but still challenging problem is the following. Consider some xed grids
0
, . . . ,
n
, their companion parameters and the related quantization tree algorithm (23).
What is the error induced by the use of Monte Carlo estimated weights ~
k
ij
instead of the
true weights
k
ij
? This third type of error the statistical error is extensively investigated
in Bally and Page`s (2003), but let us summarize the situation (a CLT is given in Bally
2002).
This error strongly depends on the structure of the nonlinearity. In the linear case (no
reection: h(t, ) := if t ,= T, and f : = 0), the composition of the empirical frequency
matrices ( ~
k
ij
)
0<k<n
yields ~
n
i
= (1=M)
M
=1
1
^
XX
n
=x
n
i
so that
~
UU
n
= (1=M)
M
=1
h(T,
^
XX
n
).
This is but a standard Monte Carlo method for computing Eh(T, X
T
), ruled by the regular
CLT: the statistical error is O(1=
M
_
). In the nonlinear case, the empirical frequencies
cannot be composed. In Bally and Page`s (2003) we focus on the regular Snell envelope of
(h(X
k
))
0<k<n
where h is a bounded Lipschitz continuous function ( f = 0) and X
k
stands
either for the diffusion X
kT=n
(with Lipschitz continuous coefcients) or for its Euler
scheme X
k
, k = 0, . . . , n. Then the statistical error depends on the regularity of the
obstacle h as follows:
E[~ uu
0
(x
0
) ^ uu
0
(x
0
)[ < C
b,, h,T
nN
_
n
k=1
X
k
^
XX
k

2
r
n, N, M
M
_ , C
b,, h,T
. 0, (43)
where r
n, N, M
=
n
_
N
2
=
M
_
if h is semiconvex and X
k
is the diffusion X
kT=n
, and
r
n, N, M
= n
3=4
N
2
=
nM
_
otherwise. Optimality of the quantizations
^
XX
k
is not required.
5. Finiteelement method versus quantization: a rst
comparison
A quick comparison of the niteelement method, on the one hand, and the quantization
method, on the other, shows that there is a strong analogy between them: in both cases one
computes the approximation of the solution at a nite number of points using a weighted
Solving multidimensional discretetime optimal stopping problems 1033
sum, starting from a nal condition also evaluated at a nite number of points. The aim of
this short section is to obtain a slightly deeper insight into this analogy. In order to compare
the two methods we will use the simplest possible example, the heat equation
(@
t
1
2
)u = 0, u
T
= f : (44)
Let j[) :=
_
R
d
j(x)(x)dx denote the inner product of L
2
(R
d
,
d
). The weak form of (44)
is given by
f [j) u
t
[j)
1
2
_
T
t
d
i=1
@u
s
@x
i
@j
@x
i
_ _
ds = 0,
for every t [0, T] and every test function c
2
c
(R
d
,
d
). The rst step of both approaches
is the time discretization. Let n N
+
and t
k
= kh, h := T=n (we temporarily abandon the
notation for the step because of the Laplacian). Then, we consider the discrete problem
f [) u
t
k
0
[)
h
2
n
k=k
0
d
i=1
@u
t
k
@x
i
@
x
i
_ _
= 0:
We use the dynamic programming principle in order to solve this problem by induction: we
put u
n
:= f and dene u
k
to be the solution of the elliptic problem
u
k1
[) u
k
, )
h
2
d
i=1
@u
k
@x
i
@
@x
i
_ _
= 0: (45)
Equation (45) can be seen either as a Dirichlet problem on the whole of R
d
with condition
zero at innity or as a problem restricted to a large enough ball. This is a technical point
which requires some special treatment (an additional error appears if we restrict to a ball),
but this is of no interest here.
At this stage, one calls upon the niteelement method (or the quantization) in order to
solve (45). In both cases, one builds up a grid := x
1
, . . . , x
N
R
d
and looks for some
weights
k
ij
in order to approximate u
k
by
^ uu
k
(x
i
) =
1<j<N
k
ij
^ uu
k1
(x
j
): (46)
5.1. The niteelement method
In the niteelement method, one tries to nd the grid which best ts in the geometry of the
problem. The same aim appears in the quantization method when we look for an optimal
quantization. The simplest grid used in the niteelement method is based on triangles.
Each x
i
is the vertex of several triangles and we may consider it as the centroid of the
polygon completed by these triangles. We denote by A
i
:= x
i
1
, . . . , x
i
k
the vertices of
this polygon, that is, the points which are vertices of a triangle having x
i
as a vertex. These
are the neighbours of x
i
.
1034 V. Bally and G. Page`s
Now, having dened a grid , we focus on the construction of the weights
k
ij
and on
their signicance. We begin with the niteelement method. One constructs the trials
T
i
, 1 < i < N, in the following way. One sets T
i
(x
i
) := 1 and T
i
(x
j
) := 0 for every
x
j
A
i
. Then one sets T
i
to be linear on each triangle so we obtain a pyramid centred at
x
i
whose basis is the polygon. The trial is null outside the polygon. The idea is to replace
(45) by a nitedimensional problem to be solved in the space o
N
spanned by T
i
,
1 < i < N. Note that the trials are not orthogonal. In any case, each function U o
N
may
be written as U(x) :=
N
i=1
U
i
T
i
(x) (we use lowercase letters for general functions in L
2
and capital letters for functions in o
N
): Note also that U
i
= U(x
i
) and that
@U
@x
i
(x) =
N
j=1
U
j
@T
j
@x
i
(x):
There are two ways of solving the nitedimensional problem, leading to the same result. The
rst one is the Galerkin method based on the weak form of the PDE and the second one is
the RieszReilich method based on the Dirichlet principle. We use the second here. The
solution of (45) is given by the function u H
1
0
(R
d
) which minimizes the energy
e(u) =:
_
R
d
h
2
d
i=1
@u
@x
i
(x)
2
u(x)u
k1
(x)
_ _
dx:
The discretization consists in solving the above minimum problem in o
N
: Since each
function in this space may be identied with the vector U := (U
1
, . . . , U
N
), one may write
the discrete problem as follows. Let
E(U) :=
h
2
N
i, j=1
U
i
K
ij
U
j
N
i=1
U
i
U
k1
i
where K
ij
:=
1<r<N
@T
i
=@x
p
[@T
j
=@x
r
) and U
k1
i
:= ^ uu
k1
(x
i
), that is, the coefcients of
^ uu
k1
. Note that u
k1
has been changed into ^ uu
k1
: the hat stresses that we are working with
the approximation computed at the step k 1 of the dynamic principle algorithm. Now we
have a nitedimensional problem
^ uu
k
(x
i
) =
1<j<N
k
ij
^ uu
k1
(x
j
), with
k
ij
:=
2
h
(K)
1
ij
(see (46)), whose solution is given by U
k
= 2h
1
K
1
U
k1
:
5.2. Quantization method
We turn now to the quantization method where
k
ij
= P(X
k1
C
j
(
k1
)[X
k
C
i
(
k
)).
Writing P
h
(x, dy) := P(X
k1
dy[X
k
= x), then
k
ij
=
_
C
i
(
k
)
P
h
(1
C
j
(
k1
)
)()dP X
1
k
(d) ~
1
h
P
h
(1
C
j
(
k1
)
)(x
i
):
Solving multidimensional discretetime optimal stopping problems 1035
So the part of the trial T
i
in the niteelement method is played here by the indicator function
of the Voronoi tessellation of x
i
: Both h
1
T
i
and h
1
1
C
i
are approximations of the Dirac
mass at x
i
. Finally, we note that
1<j<N
^ uu
k1
(x
j
)
k
ij
~ P
h
^ uu
k1
(x
i
):
The FeynmanKac formula shows that ^ uu
k
is the solution of (45) with nal condition ^ uu
k1
.
5.3. Conclusion
In both methods the rst step is a time discretization. Then, as a second step, both of them
solve the same PDE problem (45) using a space approximation procedure. The nite
element method relies on the variational principle, whereas the quantization method is based
on the probabilistic interpretation of the PDE. In the niteelement method this leads to a
linear system. Solving it amounts to inverting the sparse matrix K (only neighbours yield
nonnull entries). In the quantization method, we directly obtain the solution of the system:
the weights are computed using the Monte Carlo method. Once again, the matrix K is
sparse, numerically speaking, since the Brownian motion does not go too far in one time
step: many weights
k
ij
will be close to 0 except for the neighbours.
Finally, note that formula (46), however it arises, reads as a nitedifference scheme built
on a grid which is no longer uniform with weights which are no longer h
1
. It looks like a
nitedifference scheme adapted to the geometry of the problem.
Beyond these similarities, it appears that in the niteelement method one projects the
function to be computed as an expectation, whereas in the quantization method one projects
the underlying process X involved in the FeynmanKac formula.
From a numerical point of view, there is a connection between the conditioning of the
matrix K to be inverted in the niteelement method and the asymptotic variance term in the
CLT that heuristically gives the rate of convergence of the CLVQ algorithm: when K has
a small eigenvalue, the variance of the CLVQ is large i:e: it converges more slowly.
6. Applications to RBSDEs and American option pricing
6.1. RBSDEs and optimal stopping of Brownian diffusions
In Section 2.1 we pointed out that (h, f )Snell envelopes (U
k
)
0<k<n
and (U
k
)
0<k<n
of the
sampled diffusion (X
kT=n
)
0<k<n
or its Euler scheme (X
k
)
0<k<n
each provide natural
discretization schemes for the RBSDE (according to the ability to simulate the diffusion).
For a time discretization step T=n, these Snell envelopes are both related to the functions
f
k
(x, u) := (T=n) f (t
k
, x, u) and h
k
(x) := h(t
k
, x), k = 0, . . . , n. They satisfy
U
n
:= h(T, X
T
), U
k
:= max(h
k
(X
kT=n
), E(U
k1
f
k
(X
( k1)T=n
, U
k1
)[T
kT=n
)),
0 < k < n 1, (47)
1036 V. Bally and G. Page`s
U
n
:= h(T, X
n
), U
k
:= max(h
k
(X
k
), E(U
k1
f
k
(X
k1
, U
k1
)[T
kT=n
)),
0 < k < n 1: (48)
Throughout this section, we denote by u
( n)
k
the function satisfying U
k
:= u
( n)
k
(X
k
).
Lemma 2 below shows that the Lipschitz coefcients of functions u
( n)
k
and u
( n)
k
do not
explode as n goes to innity. Its (easy) proof is left to the reader.
Lemma 2. Assume that (5)(7) hold.
(a) Diffusion. The (h, f )Snell envelope (U
k
)
0<k<n
dened by (11) satises
U
k
= u
( n)
k
(X
t
k
) with
[u
( n)
k
]
Lip
< K
1
exp(K
0
(T t
k
))
n, k
n
,
where
K
1
:=
0
1
1
0
=2
, K
0
:=
0
(2
0
=2), lim
n
sup
0<k<n
[
n, k
[ = 0:
If, furthermore, the transition is asymptotically at with parameter a .
0
, then
[u
( n)
k
]
Lip
< K
2
n, k
n
: (49)
with K
2
:=
0
max(1, 1=(a
0
)) and lim
n
sup
0<k<n
[
n, k
[ = 0. When f does not
depend on u (regular optimal stopping problems) then K
2
:=
0
max(1, 1=a) and (49)
holds if the diffusion is asymptotically at.
(b) Euler scheme. The (h, f )Snell envelope (U
k
)
0<k<n
dened by (14) satises
U
k
= u
( n)
k
(X
k
) and the functions u
( n)
k
are Lipschitz continuous. The same bounds
as those obtained for the Lipschitz coefcients of u
( n)
k
in part (a) hold.
Remark 9. In the asymptotically at case, the minimal assumption on f is a . [ f ]
Lip
.
Remark 10. A less precise statement of Lemma 2 could be: there exist real constants
~
KK
0
,
~
KK
1
. 0 depending only on b, , h and f , such that,
\n > 1, \k 0, . . . , n 1, [u
( n)
k
]
Lip
<
~
KK
1
e
~
KK
0
(Tt
k
)
:
Furthermore, if the diffusion is asymptotically at enough, one may set
~
KK
0
:= 0.
Lemma 2 yields the following bounds for the coefcients d
( n)
i
dened by equation (27).
Proposition 9. Let p > 1. For every xed n and every k 0, . . . , n,
\ i k, . . . , n,
d
( n)
i
(1 T=n
0
)
k
= (
0
(2
2, p
)K
1
e
K
0
(Tt
i
)
)e
0
( t
i
t
k
)
i, k, n
n
where
Solving multidimensional discretetime optimal stopping problems 1037
lim
n
max
0<k<i<n
[
i, k, n
[ = 0
so that
d
:= sup
n>1
max
0<i<n
d
( n)
i
, :
We use the notation for RBSDEs introduced in Section 1. The aim here is to derive some
a priori error bounds for Y
t
k
^ uu
k
(
^
XX
k
)
p
as a function of the total number N of
elementary quantizers and of the number n of time steps, all real constants depending on b,
, h and T. To this end, we will need some precise estimates for the probability densities
of the diffusion process (X
t
)
t[0,T]
in the uniformly elliptic case. Assume that diffusion
parameters b and satisfy the assumptions
+
>
0
I
d
,
0
. 0 (uniform ellipticity),
(b, c
b
(R
d
)) or (b, Lipschitz continuous and bounded):
(50)
Then, there exist two real constants , . 0 such that the diffusion process (X
x
t
)
t[0,T]
starting at x has a probability density function p
t
(x, y) at every time t [0, T] satisfying
p
t
(x, y) <
(2t)
d=2
exp
[ y x[
2
2t
_ _
: (51)
The bounded Lipschitz setting is due to Friedman (1975, Theorems 4.5 and 5.4), the smooth
sublinear setting follows from Kusuoka and Stroock (1985). When the diffusion is only
hypoelliptic and satises a nondegeneracy Hormander type assumption, it is also established
in Kusuoka and Stroock (1985) that (51) holds with exponent d9 > d=2. This could lead to
different dispatching rules (provided that d9 is known). An alternative approach could be to
rely on similar results established by Bally and Talay (1996) for an excited version of the
Euler scheme.
Theorem 4. Assume (5)(7) hold, that the diffusion (X
t
)
t[0,T]
satises (50) and X
0
= x. For
N > n 1, assign
N
k
:=
_
t
d=(2(d1))
k
N
t
d=(2(d1))
1
. . . t
d=(2(d1))
n
_
> 1
elementary quantizers to the optimal quantization grid
k
of X
k
or X
t
k
, 1 < k < n, and set
N
0
= 1. (Note that in fact N < N
0
. . . N
n
< N n 1.)
(a) Diffusion. Let
^
XX
k
denote the optimal L
p
quantization of the diffusion X
t
k
. Then,
\ p > 1, max
0<k<n
Y
t
k
u
k
(
^
XX
k
)
p
< C
p
e
C
p
T
n
11=d
N
1=d
1 [x[
n
_ _
, (52)
where = 1 if h is semiconvex and f is c
1,2,2
b
, and =
1
2
otherwise.
(b) Euler scheme. Let
^
XX
k
denote the optimal L
p
quantization of the Euler scheme X
k
.
Then, the above error bound holds with =
1
2
.
1038 V. Bally and G. Page`s
Some practical comments about the dispatching are in order. First, one veries using
t
k
= kT=n that
N
k
~
3d 2
2(d 1)
k
n
_ _
d=(2(d1))
N
n
as n , N , with n = o(N):
Note that the optimized ratio N
k
=N of the elementary quantiers assigned to time t
k
marginally depends upon the dimension d since
N
k
N
~
3
2n
k
n
_
when d becomes large (say d > 5): (53)
Then the (theoretical) complexity of the algorithm is approximately 9N
2
k=(8n), which is
close to the lowest possible (see (24)). Note that, for example in the Lipschitz continuous
setting, if N := n
13d=2
,
max
0<k<n
Y
t
k
^ uu
k
(
^
XX
k
)
p
<
C
p
e
C
p
T
(1 [x[)
n
_ :
Proof of Theorem 4. (a) One derives from Proposition 3 and Theorem 2 applied to the
diffusion (X
t
k
)
0<k<n
that, for every p > 1,
max
0<k<n
Y
t
k
U
k

p
<
C
p
(x)
n
with C
p
(x) := C
p
e
C
p
T
(1 [x[), C
p
. 0, and
max
0<k<n
U
k
^
UU
k

p
<
n
k=0
d
( n)
k
X
t
k
^
XX
k

p
:
Proposition 9 shows that, for every n > 1 and every k 0, . . . , n,
d
( n)
k
=
0
(1 C
b,, p
e
(
0
C
b,
)(Tt
k
)
)e
0
t
k
k, n
n
< C
0
,T, p
[
k, n
[
n
with
C
0
,T, p
:=
0
e
0
T
1
(2
2, p
)(1 C
0
)
C
0
e
(
0
C
0
)T
_ _
, C
0
:=
0
(1
0
=2),
and
lim
n
max
0<k<n
[
k, n
[ = 0:
First, set
^
XX
0
:= X
0
= x. Setting
n
:= max
0<k<n
[
k, n
[, one has
Solving multidimensional discretetime optimal stopping problems 1039
N
1=d
max
0<k<n
Y
t
k
^ uu
k
(
^
XX
k
)
p
< C
0
,T, p
n
n
_ _
n
k=0
N
N
k
_ _
1=d
3 N
1=d
k
min
[[<N
k
X
t
k
^
X
t
k
X
t
k

p
C
p
(x)
n
:
Now, the Gaussian domination inequality (51) implies that, for every k 1, . . . , n,
\ x, y R
d
, p
t
k
(x, y) <
d=2
t
k
_
Z
( y),
where Z ~
L
A(0, I
d
) and
Y
(x) is for the probability density function of a random vector Y.
Hence, for every k 1, . . . , n, N
k
> 1, and every grid := v
1
, . . . , v
N
k
R
d
of size
N
k
,
X
t
k
^
XX
t
k

p
<
d=2
 min
1<<N
k
[v
t
k
_
Z[ 
p
, (54)
=
d=2
t
k
_
Z
^
ZZ
(x)=
t
k
_

p
:
Hence
min
,[[<N
k
X
t
k
^
XX
t
k

p
<
d=2
t
k
_
min
[[<N
k
Z
^
ZZ

p
: (55)
Applying Theorem 3 to Z (i.e. to the normal distribution A(0, I
d
) on R
d
) yields
min
[[<N
k
Z
^
ZZ

p
< (1
N
)
1= p
~
JJ
p,d

Z

1= p
d=(dp)
= (1
N
)
1= p
~
JJ
p,d
(1 p=d)
(dp)=2 p
2
_
:
(56)
where
~
JJ
p,d
:= J
1= p
p,d
and lim
N
N
= 0. Combining (55) and (56) yields
N
1=d
k
min
[[<N
k
X
t
k
^
XX
t
k

p
<
(d1)=2
~
JJ
p,d
(1
N
k
)
1= p
(1 p)
(dp)=2 p
2t
k
_
,
k 0, . . . , n:
N
1=d
max
0<k<n
Y
t
k
^ uu
k
(
^
XX
k
)
p
< C
0
,T, p
n
n
_ _
1 max
1<k<n
N
k
_ _
2
_
~
JJ
p,d
1
p
d
_ _
(dp)=2 p
3
(d1)=2
n
k=1
N
N
k
_ _
1=d
t
k
_
C
p
(x)N
1=d
n
< C
n,
0
,T, p,d
1<k<n
r
1=d
k
t
k
_
C
p
(x)N
1=d
n
,
where r
k
t
d=(2(d1))
k
, 1 < k < n, r
1
. . . r
n
= 1 and N
k
:= r
k
N > 1, and C
n,
0
,T, p,d
is bounded as n by a real constant C
0
,T, p,d
. Following Lemma 1,
1040 V. Bally and G. Page`s
N
1=d
max
0<k<n
Y
t
k
^ uu
k
(
^
XX
k
)
p
< C
0
,T, p,d
n
k=1
t
d=(2(d1))
k
_ _
11=d
C
p
(x)N
1=d
n
:
Now, Jensens inequality implies that
1<k<n
t
d=(2(d1))
k
< T
d=(2(d1))
n. Setting
C
p
:= T
d=(2(d1))
C
0
,T, p,d
yields the bound (52) by setting C
p
at the appropriate value.
(b) The main modication lies in the above inequality (54). With the same notation,
X
k
^
XX
XX
k

p
=  min
1<<N
k
[v
X
k
[ [[
p
<  min
1<<N
k
[v
X
t
k
[
p

p
X
t
k
X
k

p
< X
t
k
^
XX
t
k

p
C
p
e
C
p
T
(1 [x[)
1
n
_ ,
using classical L
p
error bounds for the Euler scheme. The rest of the proof is as before. h
One can easily derive an optimized choice for the number n of time steps.
Corollary 1. (a) Lipschitz setting (quantization of the Euler scheme or of the diffusion). The
optimal number n of time steps and the resulting error bound satisfy
n ~
2d
d 1
C
p
(x)
_ _
2=(3d2)
N
2=(3d2)
and [u
0
(x) ^ uu
0
(x)[ = O(N
1=(3d2)
) = O
1
n
_
_ _
,
where C
p
(x) < C
p
e
C
p
T
(1 [x[).
(b) Semiconvex setting (quantization of the diffusion). The optimal number n of time
steps and the resulting error bound satisfy
n ~
d
d 1
C
p
(x)
_ _
d=(2d1)
N
1=2(d1)
and [u
0
(x) ^ uu
0
(x)[ = O(N
1=(2d1)
) = O
1
n
_ _
:
6.2. Numerical pricing of American exchange options by quantization
6.2.1. The test model
One considers two risky assets, a stock S
1
with a geometric dividend rate and a stock S
2
without dividend. The interest rate r is deterministic and constant. Assume that (S
1
, S
2
)
follows a BlackScholes dynamics, so that, under the riskneutral probability P, one has
d
1
t
= S
1
t
((r ) dt
1
dB
1
t
), S
1
0
:= s
1
0
. 0,
d
2
t
= S
2
t
(r dt
2
dB
2
t
), S
2
0
:= s
2
0
. 0,
where (B
1
, B
2
) is a twodimensional Brownian motion with covariance B
1
, B
2
)
t
= r t,
r [1, 1]. One can verify that the discounted traded assets
Solving multidimensional discretetime optimal stopping problems 1041
~
SS
1
t
:= e
rt
(e
t
S
1
t
) and
~
SS
2
t
:= e
rt
S
2
t
, t [0, T],
make up a twodimensional Pmartingale with respect to the ltration T
B
of the two
dimensional Brownian motion B := (B
1
, B
2
). The diffusion S := (S
1
, S
2
) is obviously not
uniformly elliptic, but (ln S
1
, ln S
2
) clearly is.
An American exchange option with exchange rate is the right to exchange once and
only once, at any time t [0, T], units of asset S
2
for one unit of asset S
1
. Or, to put it
the other way round, the right to buy one unit of asset S
1
for units of asset S
2
.
The discounted premium of such an option is dened as the (h, 0)Snell envelope of
h
t
:= e
rt
(S
1
t
S
2
t
)
= max(e
t
~
SS
1
t
~
SS
2
t
, 0),
where x
[T
t
) := S(t, S
1
t
, S
2
t
, r,
1
,
2
, ):
One noticeable feature of this derivative is that the premium of its European counterpart, that
is, the right to exchange at time T, S
2
and S
1
, admits a closed form given by
E x
t
:= E(T t, S
1
t
, S
2
t
, ~ , ), ~ :=
2
1
2
2
2r
1
2
_
where
E(t, x, y, , ) := x e
t
erf (d
1
(t, x, y, , )) y erf (d
1
(t, x, y, , )
t
_
),
d
1
(t, x, y, , ) :=
ln (x=y) (
2
=2 )t
t
_ , erf (x) :=
_
x
e
u
2
=2
du
2
_ :
American exchange options have two characteristics: the (discounted) contingent claim h
t
only depends on the state variable S
t
:= (S
1
t
, S
2
t
) at time t, and the European option related
to h
T
has a closed form. For such options, one uses the premium of the European option as
a control variate to reduce the computations to the residual American part of the option.
That is to say, keeping in mind that r = 0, set
\ t [0, T], k
t
:= h
t
E x(T t, S
1
t
, S
2
t
, ~ , ):
Since (E(T t, S
1
t
, S
2
t
, ~ , ))
t[0,T]
is a Pmartingale, it follows that
Sx
t
= ess sup
t<<T
k
E x
t
:
So, pricing the American exchange option amounts to pricing the American option having k
t
as discounted contingent claim. The interesting fact for numerical purposes is that k
T
= 0
and that k
t
is always smaller than h
t
. Moreover, standard computations show that one may
write k
t
:= k(ln
~
SS
1
t
, ln
~
SS
2
t
), where k is a Lipschitz continuous function.
1042 V. Bally and G. Page`s
6.2.2. Practical implementation and results
In a BlackScholes model, exact simulation of (S
t
k
)
0<k<n
at times 0 =:
t
0
, t
1
, t
2
, . . . , t
k
, . . . , t
n
:= T is possible. In fact, for numerical purposes, it is
more convenient to consider the couple (ln
~
SS
1
t
, ln
~
SS
2
t
) as the underlying variables of the
pricing problem. One simulates this process recursively at times t
k
:= kT=n, 0 < k < n, by
setting
~
SS
1
0
:= s
1
0
. 0,
~
SS
2
0
:= s
2
0
. 0 and, for every k 0, n . . . , n 1,
ln (
~
SS
1
t
k1
) := ln (
~
SS
1
t
k
)
2
1
2
1
_
2k
_ _
,
ln (
~
SS
2
t
k1
) := ln (
~
SS
2
t
k
)
2
2
2
2
_
(r
2k
1 r
2
_
2k1
)
_ _
where
k
is an i.i.d. sequence of normal random variables and := T=n.
We carried out a simulation in which the parameters of the options were set as follows
T := 1 (1 year),
1
=
2
:= 20%, := 5%, S
1
0
= 40, S
2
0
:= 36 or 44, := 1 and
r 0:8; 0; 0:8 (the price does not depend upon the interest rate r).
The quantization was processed with N := 5722 R
2
valued elementary quantizers
dispatched on 25 layers using the optimal dispatching rule (41). The estimation of the
weights (and of the quadratic quantization error) was carried out using M := 10
6
trials in
the CLVQ algorithm. The reference solution labelled VZ in Table 1 computed by a two
dimensional nitedifference algorithm devised by Villeneuve and Zanette (2002).
The quantization premium of European style options was computed using the number
N
25
of elementary quantizers on the last (25th) layer, namely N
25
:= 299. The numerical
experiments were carried out with = 1. Of course, once the quantization is performed, the
pricing of any American style options for any (reasonable) value of simply needs to rerun
the (pseudo)dynamic programming formula whose CPU cost is negligible. One could
parametrize the starting values s
1
0
and s
2
0
similarly. Figure 4 displays the global results
obtained for 25 maturities from (approximately) 2 weeks up to 1 year.
One important and promising fact is that quite similar results have been obtained by
directly quantizing the standard Brownian motion itself instead of the geometric Brownian
motions of the above BlackScholes model. The rst noticeable fact is that the functions h
k
are no longer Lipschitz continuous: this speaks for the robustness of the method. This
robustness of the method could play a role in the future development of this approach to
Table 1. Quantization premia versus reference premia for some European and American exchange
options
S
2
0
:= 36 r 0.8 0 0.8 S
2
0
:= 44 r 0.8 0 0.8
Euro. B & S 6.6547 5.2674 3.0674 Euro. B & S 3.6390 2.2289 0.3217
Euro. Quantiz. 6.6297 5.2558 3.0639 Euro. Quantiz. 3.6133 2.2117 0.3151
Am. VZ 6.9754 5.6468 4.0000 Am. VZ 3.7692 2.3364 0.3595
Am. Quantiz. 6.9812 5.6520 4.000 Am. Quantiz. 3.7726 2.3398 0.3610
Solving multidimensional discretetime optimal stopping problems 1043
nance since the quantization of a standard Brownian motion is parameterfree (except for
the dimension!). Furthermore, the quantization of every layer can be achieved, up to an
appropriate dilatation, by using some precomputed tables of optimal quantization of the
standard ddimensional Gaussian vector. (Generating such optimal quantization tables is
done by a splitting method. At each step, 10
6
CLVQ trials are processed. The CPU time for
producing an optimal (N 1)grid from an optimal Ngrid increases from 5 s (if N = 25)
to 60 s (N = 300).) At this stage, if using these precomputed grids, all that remains is to
estimate the transition weights
k
ij
using a standard Monte Carlo simulation. Exploratory
experiments showed that 10 000 trials are enough to obtain results as accurate as above for
American exchange options (the total CPU time required for this weight estimation is then
25 s). The quantization tree descent itself is instantaneous.
These results augur well of the future comparisons with former pricing methods for
multidimensional American options (see Broadie and Glasserman 1997; Longstaff and
Schwartz 2001). The simulations were performed by J. Printems (Univ. Paris 12). Some
extensive numerical investigations have been carried out in Bally et al. (2002b).
6.2.3. Provisional remarks
There is a possible alternative to optimal quantization: one may build the grids of the
quantization tree by settling the rst N = N=n random paths of the Markov chain. The
Figure 4. American and European style option prices as a function of the maturity (S
1
0
:= 40,
S
2
0
:= 36, r := 0). Dashes depict the reference prices (V&Z for American style and B&S for European
style options); crosses depict the quantization price for American style options; and diamonds depict
the quantization price for European style options
1044 V. Bally and G. Page`s
resulting theoretical quantization errors almost surely still follow a O(N
1=d
)rate, with a
worse sharp rate (see Cohort 2003). The companion parameter estimation is carried out by
a standard Monte Carlo simulation.
Some developments (quantized hedging) are proposed in Bally et al. (2002b), and some
rstorder schemes based on correctors obtained using Malliavin calculus are proposed in
Bally et al. (2003).
Appendix. Partial almost sure convergence results for the
CLVQ algorithm
As the distortion D
X,2
N
does not enjoy the standard properties of a potential for a stochastic
gradient, we are led to make the following restrictive assumption:
supp(P
X
) is a compact set: (57)
Hence, the convex hull of supp(P
X
) is a convex compact set.
The onedimensional case
In this very special setting, standard stochastic approximation theory applies and we obtain
a satisfactory almost sure convergence result. The convex hull of supp(P
X
) is an interval
[a, b]. One rst notices that the set of Ngrids is onetoone with the simplex
a,b,
N
:= := (x
1
, . . . , x
N
) (a, b)
N
[ a , x
1
, . . . , x
N
, b,
which is invariant for the algorithm provided that the starting value
0
a,b,
N
. Then,
\ := (x
1
, . . . , x
N
)
a,b,
N
, =D
X,2
N
() := 2
_
~xx
i1
~xx
i
(x
i
) P
X
(d)
_ _
1<i<N
, (58)
where ~xx
1
:= a, ~xx
i
:= (x
i
x
i1
)=2, ~xx
N1
:= b. Consequently, the distribution P
X
being
continuous (i.e. no single is weighted), =D
X,2
N
has a continuous extension on the compact
set
a,b,
n
.
Theorem 5 (Page`s 1997). (a) If
0
a,b,
N
, then the algorithm (37) lives in
a,b,
N
.
Furthermore, if P
X
is continuous, if =D
X,2
N
= 0
a,b,
N
is nite and if the step
t
satises
the usual decreasing step assumption
t>1
t
= and
t>1
2
t
, , then
a:s
+
=D
X,2
N
= 0:
(b) Moreover, if P
X
has a logconcave density then =D
X,2
N
= 0
a,b,
N
=
argmin
a, b,
N
D
X,2
N
=
+
(see Kieffer 1982; Trushkin 1982; Lamberton and Page`s 1996;
Graf and Luschgy 2000), with
Solving multidimensional discretetime optimal stopping problems 1045
+
= a
2k 1
2N
(b a), 1 < k < N
_ _
if X ~ U([0, 1]).
The multidimensional case (d > 2)
In the multidimensional setting, only partial results are proved, even for bounded
distributions . We observe once again that the singularity of =D
X,2
N
makes the standard
theory inefcient. The result below follows from a specic proof. The assumption on the
distribution of the stimuli
t
is now the following:
P
X
has a bounded density f with a compact convex support: (59)
Roughly speaking, Theorem 6 below says that, almost surely, either the N components of
t
remain parted and converge to some stationary quantizer of D
X,2
N
, or they get stuck into M
aggregates which will converge, up to some subsequence, towards a stationary quantizer of
D
X,2
M
. This is of very little help for simulations since it does not say precisely how often the
elementary quantizers remain parted. In the worst case all the components get stuck into a
single aggregate at the average of P
X
the resulting quantization would be simply useless.
These partial theoretical results seem very pessimistic in view of the practical performance of
the CLVQ: no aggregation phenomenon is usually observed when the algorithm is
appropriately initialized (random or splitting method).
Theorem 6 (Page`s 1997). Assume that (59) holds and that the step sequence satises
t>1
t
= and
t>1
2
t
, .
(a) Palmost surely, either the elements of
t
remain asymptotically parted or at least
two elements of
t
get asymptotically stuck as t , that is,
lim
t
dist(
t
,
c
o
N
) . 0 or lim
t
dist(
t
,
c
o
N
) = 0,
where o
N
denotes the set of Ntuples with pairwise distinct components.
(b) On the event lim
t
dist(
t
,
c
o
N
) . 0 (asymptotically parted components), there
exists a level
+
. 0 and a connected component
+
of =D
X,2
N
= 0 D
X,2
N
=
+
such
that
a:s:
+
as t :
(c) On the event lim
t
dist(
t
,
c
o
N
) = 0, the components denitely get stuck, that is,
dist(
t
,
c
o
N
)
t
as t :
There is a partition I
1
. . . I
M
of 1, . . . , N along which the components
t
make M
aggregates as t . At least one of the limiting values of
t
is a zero of =D
X,2
M
.
1046 V. Bally and G. Page`s
References
Abaya, E. and Wise, G. (1992) On the existence of optimal quantizers. IEEE Trans. Inform. Theory,
38, 937946.
Bally, V., Caballero, M.E., Fernandez, B. and El Karoui, N. (2002a) Reected BSDEs, PDEs and
variational inequalities. Technical Report no. 4455, Project MATHFI, INRIA, Le Chesnay, France.
Bally, V. (2002) The central limit theorem for a nonlinear algorithm based on quantization. Technical
Report no. 4629, Project MATHFI, INRIA, Le Chesnay, France.
Bally, V. and Page`s, G. (2003) Error analysis of the quantization algorithm for obstacle problems,
Stochastic Process. Appl., 106, 140.
Bally, V., Page`s, G. and Printems, J. (2002b) A quantization tree method for pricing and hedging
multidimensional American options. Technical Report no. 753, Laboratoire de Probabilites et
mode`les Aleatoires, Universite de Paris 6, France.
Bally, V., Page`s, G. and Printems, J. (2003) First order schemes in the numerical quantization method.
Math. Finance, 13, 116.
Bally, V. and Saussereau, B. (2002) Approximation of the Snell envelope and computation of
American option prices in dimension one. ESAIM Probab. Statist., 16, 121.
Bally, V. and Talay, D. (1996) The law of the Euler scheme for stochastic differential equations (II):
Convergence rate of the density. Monte Carlo Methods Appl., 2, 93128.
Bensoussan, A. and Lions, J.L. (1982) Applications of Variational Inequalities in Stochastic Control.
Amsterdam: NorthHolland.
BouchardDenize, B. and Touzi, N. (2002) Discrete time approximation and MonteCarlo simulation
of backward stochastic differential equations. Technical Report no. 766, Laboratoire de
Probabilites et Mode`les Aleatoires, Universite de Paris 6, France.
Bouton, C. and Page`s, G. (1997) About the multidimensional Competitive Learning Vector
Quantization algorithm with constant gain. Ann. Appl. Probab., 7, 679710.
Brandie`re, O. and Duo, M. (1996) Les algorithmes stochastiques contournentils les pie`ges? Ann.
Inst. H. Poincare Probab. Statist., 32, 395427.
Briand, P., Delyon, B. and Memin, J. (2001) Donskertype theorem for BSDEs. Electron. Comm.
Probab., 6, 114.
Briand, P., Delyon, B. and Memin, J. (2002) On the robustness of backward stochastic differential
equations. Stochastic Process. Appl., 97, 229253.
Broadie, M. and Glasserman, P. (1997) Pricing Americanstyle securities using simulation. J. Econom.
Dynam. Control, 21, 13231352.
Bucklew, J. and Wise, G. (1982) Multidimensional asymptotic quantization theory with rth power
distortion measures. IEEE Trans. Inform. Theory, 28, 239247.
Caverhill, A.P. and Webber, N. (1990) American options: theory and numerical analysis. In S. Hodges
(ed.), Options: Recent Advances in Theory and Practice. Manchester: Manchester University Press.
Chevance, D. (1997) Numerical methods for backward stochastic differential equations. In L. Rogers
and D. Talay (eds), Numerical Methods in Finance. Cambridge: Cambridge University Press.
Cohort, P. (2003) Limit theorems for the random normalized distortion. Ann. Appl. Probab. To appear.
Duo, M. (1997) Random Iterative Models, Appl. Math. 34. Berlin: SpringerVerlag.
ElKaroui, N., Kapoudjan, C., Pardoux, E
., Lasry, J.M., Lebouchoux, J., Lions, P.L. and Touzi, N. (1999) Applications of Malliavin
calculus to MonteCarlo methods in nance. Finance Stochastics, 3, 391412.
Fournie, E
., Lasry, J.M., Lebouchoux, J. and Lions, P.L. (2001) Applications of Malliavin calculus to
MonteCarlo methods in nance II. Finance Stochastics, 5, 201236.
Friedman, A. (1975) Stochastic Differential Equations and Applications, Vol. 1. New York: Academic
Press.
Gersho, A. and Gray, R. (1992) Vector Quantization and Signal Compression. Boston: Kluwer.
Gersho, A. and Gray, R. (eds.) (1982) Special issue on Quantization. IEEE Trans. Inform. Theory, 28.
Graf, S. and Luschgy, H. (2000) Foundations of Quantization for Probability Distributions, Lecture
Notes in Math. 1730. Berlin: SpringerVerlag.
Kieffer, J. (1982) Exponential rate of convergence for the Lloyds Method I. IEEE Inform. Theory, 28,
205210.
KohatsuHiga, A. and Pettersson, R. (2002) Variance reduction methods for simulation of densities on
Wiener space. SIAM J. Numer. Anal., 40, 431450.
Kushner, H.J. (1977) Probability Methods for Approximation in Stochastic Control and for Elliptic
Equations. New York: Academic Press.
Kushner, H.J. and Yin, G.G. (1997) Stochastic Approximation Algorithms and Applications. New York:
SpringerVerlag.
Kusuoka, S. and Stroock, D. (1985) Applications of the Malliavin calculus, part II. J. Fac. Sci. Univ.
Tokyo, 32, 176.
Lazarev, V.A. (1992) Convergence of stochastic approximation procedures in the case of a regression
equation with several roots. Problems Inform. Transmission, 28, 6678.
Lamberton, D. (1998) Error estimates for the binomial approximation of the American put option.
Ann. Appl. Probab., 8, 206233.
Lamberton, D. (2002) Brownian optimal stopping and random walks. Appl. Math. Optim., 45, 283
324.
Lamberton, D. and Page`s, G. (1990) Sur lapproximation des reduites. Ann. Inst. H. Poincare Probab.
Statist., 26, 331355.
Lamberton, D. and Page`s, G. (1996) On the critical points of the 1dimensional Competitive Learning
Vector Quantization algorithm. Proceedings of the ESANN96, Bruges, Belgium.
Lamberton, D. and Rogers, L.C. (2000) Optimal stopping and embedding. J. Appl. Probab., 37, 11431148.
Lapeyre, B., Page`s, G. and Sab, K. (1990) Sequences with low discrepancy. Generalisation and
application to RobbinsMonro algorithm. Statistics, 21, 251272.
Lions, P.L. and Regnier, H. (2002) Calcul des prix et des sensibilites dune option americaine par une
methode de Monte Carlo. Working paper.
Longstaff, F.A. and Schwartz, E.S. (2001) Valuing American options by simulation: a simple least
squares approach. Rev. Financial Stud., 14, 113148.
Lloyd, S.P. (1982) Least squares quantization in PCM. IEEE Trans Inform. Theory, 28, 129137.
Ma, J., Protter, P., San Mart n, J. and Torres, S. (2002) Numerical method for backward stochastic
differential equations. Ann. Appl. Probab., 12, 302316.
Neveu, J. (1971) Martingales a` Temps Discret. Paris: Masson.
Page`s, G. (1997) A space vector quantization method for numerical integration. J. Computat. Appl.
Math., 89, 138.
1048 V. Bally and G. Page`s
Page`s, G. and Printems, J. (2003) Optimal quantization for numerics: the Gaussian case. Monte Carlo
Methods Appl., 9, (2).
Pardoux, E
. and Peng, S. (1992) Backward stochastic differential equations and quasilinear parabolic
partial differential equations. In B.L. Rozovskii and R.B. Sowers (eds), Stochastic Partial
Differential Equations and Their Applications, Lecture Notes in Control and Inform. Sci. 176,
pp. 200217. Berlin: SpringerVerlag.
Pelletier, M. (2000) Asymptotic almost sure efciency of averaged stochastic algorithms. SIAM J.
Control Optim., 39, 4972.
Pemantle, R. (1990) Nonconvergence to unstable points in urn models and stochastic approximations.
Ann. Probab., 18, 698712.
Trushkin, A. (1982) Sufcient conditions for uniqueness of a locally optimal quantizer for a class of
convex error weighting functions. IEEE Trans. Inform. Theory, 28, 187198.
Villeneuve, S. and Zanette, A. (2002) Parabolic A.D.I. methods for pricing American option on two
stocks. Math. Oper. Res., 27, 121149.
Zador, P. (1982) Asymptotic quantization error of continuous signals and the quantization dimension.
IEEE Trans Inform. Theory, 28, 139148.
Received May 2001 Revised January 2003
Solving multidimensional discretetime optimal stopping problems 1049