Você está na página 1de 35

Hamiltonian ODEs in the Wasserstein space of

probability measures
L. AMBROSIO
Scuola Normale Superiore di Pisa
AND
W. GANGBO
Georgia Institute of Technology
Abstract
In this paper we consider a Hamiltonian H on P
2
(R
2d
), the set of probabil-
ity measures with nite quadratic moments on the phase space R
2d
= R
d
R
d
,
which is a metric space when endowed with the Wasserstein distance W
2
. We
study the initial value problem d
t
/dt + (J
d
v
t

t
) =0, where J
d
is the canon-
ical symplectic matrix,
0
is prescribed, v
t
is a tangent vector to P
2
(R
2d
) at

t
, and belongs to H(
t
), the subdifferential of H at
t
. Two methods for con-
structing solutions of the evolutive system are provided. The rst one concerns
only the case where
0
is absolutely continuous. It ensures that
t
remains abso-
lutely continuous and v
t
= H(
t
) is the element of minimal norm in H(
t
).
The second method handles any initial measure
0
. If we furthermore assume
that H is convex, proper and lower semicontinuous on P
2
(R
2d
), we prove
that the Hamiltonian is preserved along any solution of our evolutive system:
H(
t
) = H(
0
). c 2000 Wiley Periodicals, Inc.
1 Introduction
In the last few years there has been a considerable interest in the theory of gra-
dient ows in the Wasserstein space P
2
(R
D
) of probability measures with nite
quadratic moments in R
D
, starting from the fundamental papers [35], [43], with
several applications ranging from rates of convergence to equilibrium to the proof
of functional and geometric inequalities. In particular, in [4] (see also [13]), a sys-
tematic theory of these gradient ows is built, providing existence and uniqueness
results, contraction estimates and error estimates for the implicit Euler scheme.
In this paper, motivated by a work in progress by Gangbo &Pacini [31], we pro-
pose a rigorous theory concerning evolution problems in P
2
(R
D
) of Hamiltonian
type. Here typically D = 2d and the measures we are dealing with are dened in
the phase space. As shown in Section 8, our study covers a large class of systems
which have recently generated a lot of interest, including the Vlasov-Poisson in
one space dimension [9] [47], the Vlasov-Monge-Amp` ere [12] [18] and the semi-
geostrophic systems [10] [16] [17] [19] [18] [23] [20] [21] [22] [40].
Communications on Pure and Applied Mathematics, Vol. 000, 00010035 (2000)
c 2000 Wiley Periodicals, Inc.
2 LUIGI AMBROSIO, WILFRID GANGBO
We note that a general theory of Hamiltonian ODEs for non-smooth Hamil-
tonian H, in particular when H is only convex, seems to be completely understood
only in nite-dimensional spaces, and even in these spaces the uniqueness question
has been settled only in very recent times, see Remark 6.5. In innite-dimensional
Hilbert spaces very little appears to be known at the level of existence of solutions,
and nothing is known at the level of uniqueness.
Besides its comprehensive character, another nice feature of our theory is its
ability to handle singular initial data and singular solutions. This class of solutions
is natural, for instance, to include solutions (e.g. those generated by classical non-
kinetic solutions) with one or nitely many velocities, see [47] for a rst result
in this direction. At the same time, there is the possibility to handle discrete and
continuous models with the same formalism, and to show stability results (the rst
one in this direction, for two specic models, is [18]).
We recall that P
2
(R
D
) is canonically endowed with the Wasserstein distance
W
2
, dened as follows:
(1.1) W
2
2
(, ) := min

_
_
R
D
R
D
[x y[
2
d(x, y) : (, )
_
.
Here (, ) is the set of Borel probabilty measures on R
D
R
D
which have
and as their marginals. The Riemannian structure of P
2
(R
D
), introduced at a
formal level in [43] and later fully developed in [4], will be intensively exploited in
this work. Notice that, as soon as P
2
(R
D
) is endowed with a differentiable struc-
ture, the theory of ODEs in the nite-dimensional space R
D
naturally extends to a
theory of ODEs in the innite-dimensional space P
2
(R
D
): it sufces to consider
the isometry I : z
z
, where
z
stands for the Dirac mass at z.
In particular, we consider the case when D = 2d and we are given a lower
semicontinuous Hamiltonian H : P
2
(R
2d
) R. As we will be mostly considering
semiconvex Hamiltonians, in the sense of displacement convexity [38], mimick-
ing some classical concepts of convex analysis we introduce in Denition 3.2 the
subdifferential H() and denote by H() its element with minimal L
2
(; R
2d
)
norm (well dened whenever H() ,= / 0).
The problem we study in Section 6 is: given an initial measure P
2
(R
2d
),
nd a path t
t
P
2
(R
2d
) such that
(1.2)
_
_
_
d
dt

t
+ (JH(
t
)
t
) = 0, t (0, T)

0
=
and |H(
t
)|
L
2
(
t
)
L
1
(0, T). Here, J is a (2d) (2d) symplectic matrix.
Using a suitable chain rule in the Wasserstein space rst introduced in [4],
we prove in Theorem 5.2 that H is constant among all solutions
t
of (1.2), pro-
vided H is convex (or concave) for some real number . The proof of this
fact requires neither regularity assumptions on the velocity eld JH(
t
) nor the
absolute continuity of
t
.
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 3
Existence of solutions can be established in (1.2) if one imposes a growth con-
dition on the gradient, as
(H1) the existence of constants C
o
(0, +), R
o
(0, +] that for all P
2
a
(R
2d
)
with W
2
(, ) < R
o
we have D(H), H() ,= / 0 and [H()(z)[ C
o
(1+[z[)
for almost every z R
2d
and a continuity property of the gradient as
(H2) If = L
2d
,
n
=
n
L
2d
P
a
2
(R
2d
), sup
n
W
2
(
n
, ) < R
o
and
n

narrowly, then there exist a subsequence n(k) and functions w
k
, w : R
2d
R
2d
such that w
k
=H(
n(k)
)
n(k)
-a.e., w =H() -a.e. and w
k
w L
2d
a.e. in
R
2d
as k +.
Here we are denoting by P
a
2
(R
2d
) the elements of P
2
(R
2d
) that are absolutely
continuous with respect to L
2d
. The requirements of bounds and continuity on the
gradient naturally appear also in the nite dimensional theory, in order to obtain
bounds on the discrete solutions of the ODE and to pass to the limit.
In Theorem 6.6 we show that a minor variant of the algorithms used in [10],
[12], [17] in connection with specic models, establishes existence of a solution
t
in (1.2) up to some time T = T(C
o
, R
o
) (T = + whenever R
o
= +), when
0
=

0
L
2d
is absolutely continuous with respect to L
2d
and (H1) and (H2) hold. A
good feature of this algorithm is that it preserves the absolute continuity condition,
so that
t
=
t
L
2d
, and provides the entropy inequalities
_
R
2d
S(
t
)dz
_
R
2d
S(
0
)dz t [0, T], with S convex.
Unlike the theory of gradient ows, where the selection of the gradient among
all subdifferentials is ensured on any solution by energy reasons (see [4]), in our
case it is not clear why in general this selection should be the natural one, even
though it provides the tangency condition and it is more likely to provide bounds,
by the minimality of the gradient. Therefore, we consider also a weaker version of
(1.2), which works for arbitrary initial measures : nd a path t
t
P
2
(R
2d
)
and vector elds v
t
L
2
(
t
; R
2d
) such that
(1.3)
_
_
_
d
dt

t
+ (Jv
t

t
) = 0,
0
= , t (0, T)
v
t
T

t
P
2
(R
2d
) H(
t
) for a.e. t.
Here T

t
P
2
(R
2d
) is the tangent space to P
2
(R
2d
) at , according to Ottos calcu-
lus [4], dened as the L
2
(; R
2d
) closure of the gradients of C

c
(R
2d
) maps. Even
in this case we are able to show that H is constant along solutions of (1.3), provided
H is convex (or concave) for some R.
For the system in (1.3), we weaken (H1) and (H2) and only assume that
(H1) the existence of constants C
o
[0, +), R
o
(0, +] such that for all
P
2
(R
2d
) withW
2
(, ) <R
o
we have D(H), H() ,= / 0 and |H()|
L
2
()

C
o
4 LUIGI AMBROSIO, WILFRID GANGBO
and
(H2) If sup
n
W
2
(
n
, ) < R
o
and
n
narrowly, then the limit points of convex
combinations of H(
n
)
n

n=1
for the weak

-topology are representable as w


for some w H() T

P
2
(R
2d
).
In Section 7 a second algorithm, based on linear interpolation of transport maps,
provides existence of solutions to (1.3). We refer to Theorem 7.4 for a complete
statement of the results we obtain. In particular, when =
( x, v)
, dening h on
R
2d
by h(x, v) = H(
(x,v)
), the algorithm used in this section coincides with a nat-
ural nite-dimensional algorithm yielding in the limit the volume-preserving ow
associated to the ODE (see Remark 6.5 for a more precise discussion):
(1.4)
_
J
d
( x(t), v(t)) h(x(t), v(t)), t (0, T)
(x(0), v(0)) = ( x, v).
Note that proving existence of (1.3) is harder, compared to proving existence
for the symplied system
(1.5)
_
_
_
d
dt

t
+ (Jv
t

t
) = 0,
0
= , t (0, T)
v
t
H(
t
) for a.e. t,
where we drop the constraint that v
t
T

t
P
2
(R
2d
), and so v
t
may be not tangent
to P
2
(R
2d
). The system in (1.5) does not make geometrical sense, except in spe-
cial cases such as when
t
is concentrated on nitely many points (in this case
L
2
(
t
; R
2d
) = T

t
P
2
(R
2d
)). On the technical side, the lack of the tangency condi-
tion seems to prevent the possibility of proving constancy of the Hamiltonian along
solutions of (1.5).
Finally, we add more motivations for the terminology Hamiltonian adopted
for the systems (1.2) and (1.3) (particularly when J is the canonical symplectic
matrix). A rst justication is given in [31], where J
d
H() is shown to be the
symplectic gradient induced by a suitable skew-symmetric 2-form (see the more
detailed discussion made right after Denition 5.1). Moreover, in the recent work
[18] the authors consider Hamiltonians on R
2nd
of the form
(x
1
, v
1
; ; x
n
, v
n
) H
n
(x
1
, v
1
; ; x
n
, v
n
) =
1
2
W
2
2
_
1
n
n

i=1

(x
i
,v
i
)
,
1
n
n

i=1

(a
n
i
,b
n
i
)
_
,
where (a
n
1
, b
n
1
), , (a
n
n
, b
n
n
) R
2d
are prescribed. They study the classical nite-
dimensional Hamiltonian systems
(1.6)
_

_
x
n
i
(t) = n
v
i
H
n
(x
n
1
(t), v
n
1
(t); ; x
n
n
(t), v
n
n
(t)) t (0, T)
v
n
i
(t) = n
x
i
H
n
(x
n
1
(t), v
n
1
(t); ; x
n
n
(t), v
n
n
(t)) t (0, T)
(x
n
i
(0), v
n
i
(0)) prescribed i = 1, , n.
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 5
Dening

n
t
=
1
n
n

i=1

(x
n
i
(t),v
n
i
(t))
,
it is readily checked that the paths t
n
t
P
2
(R
2d
) satisfy (1.3) with H
n
in place
of H. In [18], it is proven that if the initial conditions (x
n
i
(0), v
n
i
(0)) are suitably
chosen and
n
= 1/n
n
i=1

(a
n
i
,b
n
i
)
tends to as n tends to +, then up to a subse-
quence which is independent of the time variable t, the measures
n
t

n=1
narrowly
converge as n + to measures
t

t[0,T]
satisfying (1.2) for the Hamiltonian
H() =1/2W
2
2
(, ).
Acknowledgment It is a pleasure to express our gratitude to Y. Brenier for the
many interesting and instructive discussions we had. Criticisms were also provided
by T. Nguyen.
2 Basic notation and terminology
In this section we x our basic notation and terminology on measure theory and
Hamiltonian systems.
- The effective domain of a function H : A (, +] is the set D(H) of all
a A such that H(a) < +. We say that H is proper if D(H) ,= / 0.
- Let d, D be integers. We denote by I
D
the identity matrix on R
D
and we denote
by J
d
the sympletic (2d) (2d) matrix
J
d
=
_
0 I
d
I
d
0
_
.
When d = 1, this is the clockwise rotation of angle /2. We denote by id the
identity map on R
D
or R
2d
.
- If r > 0 and z R
D
, B
r
(z) denotes the ball in R
D
of center z and radius r. If
B R
D
we denote by B
c
the complement of B.
- Assume that is a nonnegative Borel measure on a topological space X and
that is a nonnegative Borel measure on a topological space Y. We say that a Borel
map t : X Y transports onto , and we write t
#
=, if [B] = [t
1
(B)] for
all Borel sets BY. We sometimes say that t pushes to . We denote by T (, )
the set of all t such that t
#
=.
If is a nonnegative Borel measure on X Y then its projection proj
X
is a
nonnegative Borel measure on X and its projection proj
Y
is a nonnegative Borel
measure on Y; they are dened by
proj
X
[A] =[AY], proj
Y
[B] =[X B].
A measure on X Y is said to have and as its marginals if = proj
X
and
= proj
Y
. We write that (, ) and call a transport plan between and .
- When X = Y = M, any minimizer
o
in (1.1) is called an optimal transport
plan between and . We write
o

o
(, ).
6 LUIGI AMBROSIO, WILFRID GANGBO
- We denote by P(R
D
) the set of Borel probability measures on R
D
. The
Ddimensional Lebesgue measure on R
D
is denoted by L
D
. The 2-moment of
P(R
D
) with respect to the origin is dened by
M
2
() =
_
R
D
[x[
2
d(x).
Notice that W
2
2
(,
0
) = M
2
(). We will be dealing in particular with
P
2
(R
D
) :=
_
P(R
D
) : M
2
() < +
_
and its subspace P
a
2
(R
D
), made of absolutely continuous measures with respect
to L
D
.
- If P
2
(R
D
) and v
1
, . . . , v
k
L
2
(R
D
, ), we write v = (v
1
, . . . , v
k
) L
2
(R
D
, ; R
k
)
or simply v L
2
(; R
k
).
- Assume that , are Borel probability measures on M=R
D
with M
2
(), M
2
() <
+ and absolutely continuous with respect to L
D
. Then there exists a unique
minimizer
o
in (1.1), characterized by the fact that
o
= (id t

)
#
for some
map t

: R
D
R
D
which coincides a.e. with the gradient of a convex function.
Therefore, the map t

is the unique minimizer of


t
_
R
D
[z t(z)[
2
d(z)
over T (, ).
- If h C
1
(R
2d
), the Hamiltonian vector eld associated to h is X
h
= Jh.
When X C
1
(R
2d
, R
2d
), the ow of X is the map : [a, b] R
2d
R
2d
dened
by
(2.1)
_

(t, z) = X(t, (t, z)) t [a, b], z R


2d
(0, z) = z, z R
2d
.
The ow is unique, and the growth condition
[X(t, z)[ C(t)(1+[z[) with C L
1
(a, b)
ensures its existence.
- If
o
=
z
and we set
t
= (t, )
#

o
=
(t,z)
, then
t
satisfy the continuity
equation
(2.2)
d
dt

t
+ (X
t
) = 0
in the sense of distributions. When X = X
h
for a Hamiltonian h, (2.1) is called a
Hamiltonian system.
In this work, we consider the innite-dimensional version of (2.1 2.2), where

z
is replaced by a measure P
2
(R
d
R
d
) and X
h
is replaced by the Hamilton-
ian vector eld X
H
of a Hamiltonian H : P
2
(R
d
R
d
) (, +]. When d = 1,
that vector eld is dened to be the clockwise rotation by the angle /2, on the
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 7
tangent space at of P
2
(R
2d
) of the the gradient of H.
3 The differentiable structure of the Wasserstein space P
2
(R
D
)
In this section we introduce the differentiable an Riemannian structure of P
2
(R
D
)
following essentially the approach developed in [4] (see also [11] [43], two seminal
papers on this subject).
We recall rst that
_
P
2
(R
D
),W
2
_
is a complete and separable space, not locally
compact. We refer to Proposition 7.1.5 and Remark 7.1.9 in [4] for more comments
. However, bounded sets in P
2
(R
D
) are (sequentially) relatively compact with re-
spect to the so-called narrow convergence, i.e. weak convergence in the duality
with C
b
(R
D
), the space of continuous and bounded functions in R
D
. Actually a se-
quence
n

n=1
converges to in P
2
(R
D
) if and only if
n
narrowly converge to
and M
2
(
n
) M
2
() as n +. The lack of compactness in P
2
(R
D
) is pre-
cisely due to the fact that narrow convergence does not always imply convergence
of second moments.
To derive the differentiable structure from the metric structure, we start from the
following fact, proved in Theorem 8.3.1 of [4]: if
t
P
2
(R
D
) solve the continuity
equation
(3.1)
d
dt

t
+ (w
t

t
) = 0
in the sense of distributions in (a, b) R
D
, for some time-dependent velocity eld
w
t
with |w
t
|
L
2
(
t
)
L
1
(a, b), then
(3.2) W
2
(
s
,
t
)
_
t
s
|w

|
L
2
(

;R
D
)
d a s t b.
As a consequence we obtain that if the maps t
t
is absolutely continuous from
[a, b] to P
2
(R
D
). Conversely, it was proved in the same theorem in [4] that for any
absolutely continuous curve t
t
, there is always a unique, up to negligible sets
in time, velocity eld v
t
for which both the continuity equation and, asymptotically,
equality holds in (3.2):
(3.3) lim
h0
1
[h[
W
2
(
t+h
,
t
) =|v
t
|
L
2
(
t
)
for a.e. t.
In Proposition 8.4.5 of [4], this minimality property of v
t
is proved to be equivalent
to the fact that v
t
belongs to the L
2
(
t
; R
D
) closure of : C

c
(R
D
). Hence,
we may view v
t
as the tangent velocity eld to
t
and dene the tangent space to
P
2
(R
D
) at , as follows:
(3.4) T

P
2
(R
D
) = : C

c
(R
D
)
L
2
(;R
D
)
.
Notice also that a simple duality argument gives (see Lemma 8.4.2 of [4])
(3.5)
_
T

P
2
(R
D
)

=
_
w L
2
(; R
D
) : (w) = 0
_
.
8 LUIGI AMBROSIO, WILFRID GANGBO
In the following we shall denote by

: L
2
(; R
D
) T

P
2
(R
D
) the canonical
orthogonal projection.
Summing up, the previous results can be rephrased as follows:
Theorem 3.1 (Due to [4]). The class of absolutely continuous curves
t
: [a, b]
P
2
(R
D
) coincides with the class of solutions of the continuity equation for some
velocity eld w
t
with |w
t
|
L
2
(
t
;R
D
)
L
1
(a, b).
For any absolutely continuous curve
t
: [a, b] P
2
(R
D
) there exist v
t
L
2
(
t
; R
D
)
for which both the continuity equation and (3.3) hold. Given a solution of the con-
tinuity equation (3.1), equality holds in (3.2) if and only if w
t
T

t
P
2
(R
D
) for a.e.
t.
Finally, the map t v
t
L
2
(
t
; R
D
) is uniquely determined up to L
1
negligible
sets.
It is proven in (8.4.6) in [4] that the above tangent velocity vector v
t
, is identied
for almost every t by the following property :
(3.6) lim
h0
_
x,
y x
h
_
#

h
= (id, v
t
)
#

t
in P
2
(R
D
R
D
)
for any choice of
h

o
(
t
,
t+h
). Essentially this property says that optimal
plans between
t+h
and
t
asymptotically behave as the plans induced by the trans-
port maps (id+hv
t
)
#

t
. In the case when
t
P
a
2
(R
D
), where optimal plans are
unique and induced by maps, (3.6) reduces to
(3.7)
t
h
id
h
v
t
in L
2
(
t
; R
D
) as h 0,
where t
h
are the optimal transport maps between
t
and
t+h
.
Several notions of differential can be dened, according to this differentiable
structure. We state here the one more relevant for our purposes, motivated by
the fact that we will be dealing with convex Hamiltonians (for concave ones, one
should instead use a superdifferential).
Denition 3.2 (Fr echet subdifferential). Let H : P
2
(R
D
) (, +] be a proper,
lower semicontinuous function and let D(H). We say that w L
2
(, R
D
)
belongs to the Fr echet subdifferential H() if
H() H() + sup

o
(,)
_
R
D
R
D
w(x), y x)d(x, y) +o(W
2
(, ))
as .
Denition 3.2 is a particular case of Denition 10.3.1 [4] (with the replacement
of a sup with an inf, see also Proposition 4.2), where the elements of the subdif-
ferential are plans, and so, are measures in the product R
D
R
D
, instead of maps
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 9
on R
D
. If
o
(, ), recall that its barycentric projection is characterized by
= (
1
)
#
(y) or, equivalently, by
(3.8)
_
R
D
(x) (x)d(x) =
_
R
D
R
D
(x)yd(x, y) C
b
(R
D
).
Hence, we can rephrase the condition w H() as
(3.9) H() H() + sup

o
(,)
_
R
D
R
D
w(x), (x) x)d(x) +o(W
2
(, )).
Notice that, whenever P
a
2
(R
D
), there is only an optimal plan induced by t

and = t

.
It has been proved in Theorem 12.4.4 of [4] that
(3.10) id T

P
2
(R
D
) P
2
(R
D
),
o
(, ).
By (3.9) and (3.10) we infer that w H() iff

w H(). Notice that H()


is a closed and convex subset of L
2
(; R
D
). Therefore, as it is customary in subd-
ifferential analysis, we shall denote by H() the element of H(), of minimal
L
2
(; R
D
)norm. The previous comments show in particular that, by the minimal-
ity of its norm, H() =

H() belongs to H() T

P
2
(R
D
).
In the following lemma we state a well-known continuity property of optimal
plans or maps. Its proof, which is by now standard in the Monge-Kantorovich
theory, can be found for instance in Proposition 7.1.3 [4]. We reproduce part of it
for the readers convenience.
Lemma 3.3 (Continuity of optimal plans and maps). Assume that
n

n=1
,
n

n=1
are bounded sequences in P
2
(R
D
) narrowly converging respectively to and .
Assume that
o
(, ) contains a unique plan . Then (i)
(3.11) lim
n+
_
R
D
R
D
g(x, y)d
n
(x, y) =
_
R
D
R
D
gd
for any choice of
n

o
(
n
,
n
) and for any continuous function g : R
D
R
D
R
satisfying
(3.12) lim
[(x,y)[+
[g[(x, y)
[x[
2
+[y[
2
= 0.
(ii) Assume furthermore that
n
, P
a
2
(R
D
) and that there exists a closed ball
B, of nite radius, containing the supports of
n
and . Then there exist Lipschitz,
convex functions u
n
, u : R
D
R+ such that u
n
= t

n

n
-a.e. in R
D
and
u = t

-a.e. in R
D
. In addition, there exists a subsequence n
k

k=1
of integers
such that
(3.13) u
n
k
u L
D
a.e. in R
D
.
Proof. An argument which is by now standard and can be found in [30] charac-
terizes the elements
o
(
n
,
n
) to be the elements of (
n
,
n
) whose supports,
supp
n
, are cyclically monotone. More precisely,
n

o
(
n
,
n
) if and only if
10 LUIGI AMBROSIO, WILFRID GANGBO

n
(
n
,
n
) and there exist convex, lower semicontinuous functions, u
n
: R
D

R+, such that


(3.14) supp
n
u
n
.
If v
n
=u

n
is the Fenchel-Moreau transform of u
n
and B is any closed set containing
the support of
n
, then
(3.15) u
n
(x) = inf
yB

1
2
x; y) v
n
(y) x R
D
.
Using the fact that
n

o
(
n
,
n
) and
n

n=1
,
n

n=1
are bounded in P
2
(R
D
),
we obtain that
(3.16) sup
n
_
R
D
R
D
([x[
2
+[y[
2
)d
n
(x, y) = sup
n
M
2
(
n
) +M
2
(
n
) < +.
By (3.16),
n

n=1
is precompact for the narrow topology. Assume
n
k

k=1
is
a narrowly convergent subsequence whose limit is . Using again (3.16), it is
clear that (, ) and (3.11) holds if we substitute
n

n=1
by
n
k

k=1
. By
Proposition 7.1.3 of [4], every point in supp is a limit of points in supp
n
k
and so,
supp is cyclically monotone. This implies
o
(, ) = . Since the limit
is independent of the subsequence
n
k

k=1
, we have proven that
n

n=1
narrowly
converges to and (3.11) holds. This proves (i).
Let id be the identity map on R
D
and assume now that
n
, P
a
2
(R
D
), so that
(3.17)
n

o
(
n
,
n
) = idt

n
and
o
(, ) = idt

.
Since convex functions are differentiable L
D
almost everywhere, (3.14) and the
rst equality in (3.17) imply that t

n
= u
n

n
-a.e. in R
D
. Let us furthermore
assume that there exists a closed ball B, of nite radius, containing the supports
of
n
and . Enlarging B if necessary, we may without loss of generality that
B contains the origin and so, by (3.15), u
n
is Lipschitz with a Lipschitz constant
bounded above by the radius of B. We may substitute u
n
by u
n
u
n
(0) without
altering the validity of the above reasonings. Therefore, in the sequel, we may
assume without loss of generality that u
n
(0) = 0. Ascoli-Arzela lemma ensures
the existence of a subsequence u
n
k

k=1
which is locally uniformly convergent.
Its limit u is necessary convex, with a Lipschitz constant bounded above by the
diameter of B.
Now, let us show the convergence of the transport maps. Passing to the limit as
n in the suddifferential inequality
u
n
(x
/
) u
n
(x) +u
n
(x); x
/
x)
we immediately obtain that, at any differentiability point of all maps u
n
, any limit
point of u
n
(x)

n=1
belongs to the subdifferential u(x). It follows that u
n
converge to u wherever all gradients (including u) are dened, hence L
D
a.e.
in R
D
. In particular, recalling (3.14) and the fact that every point in supp is a limit
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 11
of points in supp
n
k
, we conclude that supp u. This, together with the second
inequality in (3.17) implies that t

=u almost everywhere on R
D
. QED.
4 Convex analysis on P
2
(R
D
)
Let
0
,
1
P
2
(R
D
) and let
o
(
0
,
1
) be an optimal transport plan. Let

1
: R
D
R
D
: (z, w) z and
2
: R
D
R
D
: (z, w) w be the rst and second
projections of R
D
R
D
onto R
D
. As suggested in [38], the interpolation (1t)
1
+
t
2
between maps can be used to interpolate between the measures
0
and
1
as
follows:
(4.1)
t
=
_
(1t)
1
+t
2
_
#
.
The proof of the well known fact that t
t
is a geodesic in P
2
(R
D
) of constant
speed, i.e. W
2
(
s
,
t
) = [t s[W
2
(
0
,
1
) for all s, t [0, 1], can be found in Theo-
rem 7.2.2 of [4]; furthermore, any constant speed geodesic has this representation
for a suitable optimal . As it is customary in Riemannian geometry, the identica-
tion of constant speed geodesics with segments allows the introduction of various
notions of convexity for functions (see Chapter 9 of [4] and [34]).
Denition 4.1 (convexity). Let H : P
2
(R
D
) (, +] be proper and let
R. We say that H is convex if for every
0
,
1
P
2
(R
D
) and every optimal
transport plan
o
(
0
,
1
) we have
(4.2) H(
t
) (1t)H(
0
) +tH(
1
)

2
t(1t)W
2
2
(
0
,
1
) t [0, 1].
Here
t
= ((1t)
1
+t
2
)
#
, where
1
and
2
are the above projections.
For a real-valued map, -convexity means that the second distributional deriva-
tive of t H(
t
) is larger than L
1
. In general, the inequality above is equivalent
to saying that t H(
t
) is W
2
2
(
0
,
1
)convex. In particular, 0convexity corre-
sponds to the notion of displacement convexity introduced in [38]. Finally, notice
that this notion of convexity is slightly stronger than the one introduced in [4],
where the inequality above is imposed only on some optimal transport plan.
Proposition 4.2 (Characterization of subdifferentials of convex functions). Let
H : P
2
(R
D
) (, +] be lower semicontinous and convex for some R
and let D(H). Then, any of the following two conditions is equivalent to
w H():
(i)
(4.3) H() H() + inf

o
(,)
_
R
D
w(x); (x) x)d(x) +o(W
2
(, ));
12 LUIGI AMBROSIO, WILFRID GANGBO
(ii) for all P
2
(R
2d
) we have
(4.4) H() H() + sup

o
(,)
_
R
D
w(x); (x) x)d(x) +

2
W
2
2
(, ).
Proof. It is clear that w H() implies (i), and that (ii) implies w H(). So,
it remains to show that (i) implies (ii). To this aim, x P
2
(R
2d
),
o
(, )
and dene the constant speed geodesic
t

t[0,1]
, between and as in (4.1).
Then, we know that for t < 1 there is a unique optimal plan between and
t
,
induced by
t
= (
1
, (1t)
1
+t
2
)
#
(see Lemma 7.2.1 of [4]), so that (4.3) and
the identity
t
id = t( id) give
liminf
t0
H(
t
) H()
t

_
R
D
w(x); (x) x)d(x).
Then, by applying (4.2) we get
H() H()
_
R
D
w(x); (x) x)d(x) +

2
W
2
2
(, ).
QED.
It is not difcult to show that the inmum in (i) and the supremum (ii) are
achieved. As shown in Chapter 10 of [4], the inf denition of subdifferential in
(i) ensures the weak closure properties of the graph of the subdifferential. Again,
in the case when P
a
2
(R
D
), the previous formula reduces to
H() H() +
_
R
D
w(x); t

(x) x)d +

2
W
2
2
(, ) P
2
(R
D
).
The typical Hamiltonian we consider in this paper is the negative squared Wasser-
stein distance. Some of its properties, established in Proposition 9.3.12 and Theo-
rem 10.4.12 of [4], are summarized in the following proposition.
Proposition 4.3 (Convexity of the negative Wasserstein distance). Let P
2
(R
D
)
and dene
(4.5) H() =
1
2
W
2
2
(, ) P
2
(R
D
).
Then H is (1)-convex. Furthermore, if P
2
(R
D
),
(4.6) H() T

P
2
(R
D
) = id :
o
(, )
and therefore H() is the minimizer in
(4.7) min
_
_
R
D
[ id[
2
d :
o
(, )
_
.
Here is the barycentric projection of , as dened in (3.8). In particular,
(4.8) H() T

P
2
(R
D
) =
_
t

id
_
P
a
2
(R
D
).
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 13
Notice that W
2
2
(, ) is, on the other hand, trivially convex with respect to the
conventional linear structure of P
2
(R
D
), as t
1
+(1t)
2
(t
1
+(1t)
2
, )
whenever
1
(
1
, ) and
2
(
2
, ). Also, as shown in Example 9.1.5 of [4],
for each R, W
2
(, ) fails to be -convex along geodesics.
5 Basic properties of solutions of Hamiltonian ODEs
We now have all the necessary ingredients for the denition of Hamiltonian ow
in P
2
(R
2d
). In order to cover more examples (see Section 8) we consider also the
case when the space is P
2
(R
D
) and J : R
D
R
D
is a linear map satisfying Jv v
for all v R
D
(this framework includes the canonical case D = 2d and J = J
d
).
Denition 5.1. Let H : P
2
(R
D
) (, +] be a proper, lower semicontinuous
function. We say that an absolutely continuous curve
t
: [0, T] D(H) is a Hamil-
tonian ODE relative to H, starting from P
2
(R
D
), if there exist v
t
L
2
(
t
; R
D
)
with |v
t
|
L
2
(
t
)
L
1
(0, T), such that
(5.1)
_

_
d
dt

t
+ (Jv
t

t
) = 0,
0
= , t (0, T)
v
t
T

t
P
2
(R
D
) H(
t
) for a.e. t.
The terminology Hamiltonian ODE is fully justied in the case D = 2d,
J = J
d
in a work in progress by Gangbo and Pacini [31]. There, they prove that
J
d
induces a nondegenerate bilinear skew-symmetric closed 2form as follows.
Denoting by T

P
2
(R
2d
) the subbundle dened by
T

P
2
(R
2d
) :=
_

(J
d
v) : v T

P
2
(R
2d
)
_
,
they dene

: T

P
2
(R
2d
)T

P
2
(R
2d
) R as follows: if v
1
=

(J
d
v
1
), v
2
=

(J
d
v
2
) T

P
2
(R
2d
), with v
1
, v
2
T

P
2
(R
2d
), they set

( v
1
, v
2
) =
_
R
2d
J
d
v
1
; v
2
)d P
2
(R
2d
).
It is easy to check that

is well dened (i.e. it does not depend on the choice of


the vectors v
i
such that v
i
=

(Jv
i
)), skew-symmetric and nondegenerate.
For any P
2
(R
2d
) where H exists, the Hamiltonian vector eld X
H

T

P
2
(R
2d
) is classically dened by the identity

(X
H
(), v) =
_
R
2d
H(); v)) = dH( v) v T

P
2
(R
2d
).
In other words,

(X
H
(), ) = dH(). The system (5.1) with v
t
= H(
t
) is
then easily seen to be equivalent to the condition that the tangent velocity vec-
tor

t
(J
d
v
t
) to
t
is X
H
(
t
) or equivalently,
t
= X
H
(
t
). More generally, one
could dene a Hamiltonian subdifferential by considering the vectors (J
d
v)
with v H() T

P
2
(R
2d
).
14 LUIGI AMBROSIO, WILFRID GANGBO
The integrability condition |v
t
|
L
2
(
t
)
L
1
(0, T) ensures that the continuity equa-
tion makes sense in the sense of distributions; furthermore (see for instance Lemma 8.1.2
in [4]), possibly redening
t
in a negligible set of times, we can assume that t
t
is narrowly continuous in [0, T]. We shall always make tacitly this continuity as-
sumption in the sequel.
In the construction of solutions to Hamiltonian ODEs by approximation, one
nds that the subdifferential inclusion v
t
H(
t
) (and therefore the continuity
equation with velocity eld Jv
t
) has good stability properties (see for instance
Lemma 10.1.3 and Lemma 10.3.8 of [4], or Remark 6.5). The tangency condi-
tion, on the other hand, is not stable in general; however this condition is crucial
to show that t H(
t
) is constant for Hamiltonian ODEs. In the proof of this
fact we follow the Wasserstein chain rule in 10.1.2 and Proposition 10.3.18 of
[4], whose proof (based on a subdifferentiability argument) we reproduce for the
readers convenience.
Theorem 5.2. Let H be as in Denition 5.1, and let
t
be a Hamiltonian ODE,
with |v
t
|
L
2
(
t
)
L

(0, T). If H is convex for some R then t H(


t
) is
constant.
Proof. We rst prove that t H(
t
) is a Lipschitz function. Let C be the L

norm
of |v
t
|
L
2
(
t
)
and notice that (3.2) gives that the Lipschitz constant of t
t
is less
than C. We denote by w
t
the tangent velocity eld to
t
and notice that, as Jv
t
is an
admissible velocity eld for
t
, we have that w
t
Jv
t
is orthogonal to T

t
P
2
(R
D
)
for a.e. t.
Let now D(0, T) be the set of points where both v
t
H(
t
) and |v
t
|
L
2
(
t
)

C hold. Let t D, s [0, T] and notice that by Proposition 4.2
H(
t
) H(
s
) inf

o
(
t
,
s
)
_
R
D
R
D
v
t
(x); y x)d

2
W
2
2
(
t
,
s
)
C
2
[t s[ +
C
2

2
(t s)
2
C
2
(1+
T

2
)[t s[.
As H is lower semicontinuous, by approximation the same inequality holds when
s, t [0, T]. Reversing the r oles of s and t we obtain that the Lipschitz constant of
t H(
t
) is less than C
2
(1+T

/2).
It remains to show that the derivative of t H(
t
) is equal to 0. Fix t (0, T)
where this derivative exists, (3.6) holds, v
t
H(
t
) T

t
P
2
(R
D
) and w
t
Jv
t
is orthogonal to T

t
P
2
(R
D
). We have then the existence of optimal plans
h

o
(
t
,
t+h
) satisfying
C
2

2
h
2
+H(
t+h
) H(
t
)
_
R
D
R
D
v
t
(x); y x)d
h
.
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 15
Next, we dene
h
= (x, (y x)/h)
#

h
to obtain
H(
t+h
) H(
t
) h
_
R
D
R
D
v
t
(x); y)d
h
+o(h)
and use (3.6) to obtain
1
H(
t+h
) H(
t
) h
_
R
D
R
D
v
t
(x); y)d(id, w
t
)
#

t
+o(h)
= h
_
R
2d
v
t
(x); w
t
(x))d
t
+o(h)
= h
_
R
2d
v
t
(x); Jv
t
(x))d
t
+o(h) = o(h).
Since s H(
s
) is differentiable at s =t, this can happen only if the derivative
is 0. QED.
6 Existence of Hamiltonian ows: regular initial data
Before stating our main existence theorem, we state a technical lemma concern-
ing the approximation of tangent vectors by smooth gradients.
Lemma 6.1. Let = L
D
P
2
(R
D
) be satisfying m
r
> 0 L
D
a.e. on B
r
for any r > 0. If C > 0, v T

P
2
(R
D
) and
(6.1) [v(z)[ C(1+[z[) for almost every z R
D
then there exists a sequence
n

n=1
C

c
(R
D
) such that
[
n
(z)[ C(2+[z[) z R
D
and
lim
n+
|v
n
|
L
2
(;R
D
)
= 0.
Proof. Let
n

n=1
C

c
(R
D
) be such that |v
n
|
L
2
()
0 as n +. For
all r > 0 we have
limsup
n+
|v
n
|
2
L
2
(B
r
,L
D
,R
D
)

1
m
r
limsup
n+
|v
n
|
2
L
2
()
= 0.
This proves that v L
2
loc
(R
2d
, L
2d
) and that curl v = 0. Let l
1
C

c
be a non-
negative probability density whose support is contained in the unit ball of R
2d
and
set
v
h
= l
h
v, with l
h
(z) =
1
h
2d
l
1
(
z
h
).
1
Even though the test function (x, y) v
t
(x); y) is possibly discontinuous and unbounded, one
can use the boundedness of 2-moments of
h
and the fact that their rst marginal does not depend
on h to pass to the limit, see for instance 5.1.1 in [4]
16 LUIGI AMBROSIO, WILFRID GANGBO
Clearly, v
h
C

(R
2d
, R
2d
) and curl v
h
= 0. Hence, there exist A
h
C

(R
2d
) such
that v
h
=A
h
and A
h
(0) = 0. Thanks to Jensens inequality, (6.1) implies that
[v
h
(z)[ = [
_
R
2d
l
h
(w)v(z w)dw[ C
_
R
2d
l
h
(w)(1+[z w[)dw
C(1+[z[) +C
_
R
2d
l
h
(w)[w[dw
=C(1+[z[) +hC
_
R
2d
l
1
(w
/
)[w
/
[dw
/
C(1+[z[) +hC
_
B
1
(0)
l
1
(w
/
)dw
/
C(2+[z[), (6.2)
for h 1. Since v
h

h>0
converges L
2d
almost everywhere to v, the uniform
bound in (6.2) and the fact that P
2
(R
D
) imply, by the dominated convergence
theorem,
(6.3) lim
h0
|vA
h
|
2
L
2
(;R
D
)
= 0.
Dene
(6.4) B
r
h
(z) =
_
A
h
(z) for [z[ r
0 for [z[ 2r.
Note that B
r
h
is a C(2+r)Lipschitz function and so it admits an extension to R
D
,
that we still denote by B
r
h
, which is C(2+r)Lipschitz. We use (6.1), (6.2) and the
fact that
(6.5) [B
r
h
(z)[ C(2+r) C(2+[z[) on B
c
r
(0)
to conclude that for all h 1
_
R
2d
[vB
r
h
[
2
d =
_
B
r
(0)
[vA
h
[
2
d +
_
B
c
r
(0)
[vB
r
h
[
2
d

_
R
2d
[vA
h
[
2
d +4C
2
_
B
c
r
(0)
(2+[z[)
2
d. (6.6)
We combine (6.3) and (6.6) to conclude that
(6.7) lim
h, 1/r0
|vB
r
h
|
2
L
2
(;R
D
)
= 0.
This, together with (6.2) and (6.5) yields the lemma. QED.
The following lemma provides a discrete solution of the Hamiltonian ODE in
a small time interval, whose iteration will lead to a discrete solution. To make
the iteration possible, one has to show that the ow preserves in some sense the
bounds on the initial datum: this is possible thanks to the fact that the ow is
incompressible.
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 17
Lemma 6.2. Let h > 0, let =L
D
P
a
2
(R
D
) be satisfying
(6.8) m
r
> 0 L
D
a.e. on B
r
, for any r > 0
and let v T

P
2
(R
D
) be satisfying (6.1), with e
Ch
2. Then there exists a family
of measures
t
=
t
L
D
, t [0, h], satisfying
(a)
_
R
D S(
t
)dz
_
R
D S()dz for any convex function S : [0, +) [0, +);
(b) t
t
P
2
(R
D
) is absolutely continuous,
0
= and the continuity
equation
(6.9)
d
dt

t
+ (Jv
t
) = 0, (t, z) (0, h) R
D
holds;
(c)
t
m
r
/ L
D
a.e. on B
r
, with r
/
= e
Ch
r +2(e
Ch
1).
Finally, we have also that t
t
is Lipschitz continuous, with Lipschitz constant
less than L
o
=C
_
24(1+M
2
()) and, in particular,
(6.10) W
2
(
t
, ) hL
o
t [0, h].
Remark 6.3. Assumption (6.8) is used twice. First, it is used to conclude that
since v is dened almost everywhere, then it is dened L
D
almost everywhere,
hence
t
almost everywhere, if
t
L
D
. More importantly, it is used to apply
Lemma 6.1, to treat v as a gradient and to obtain that Jv is divergence free with
respect to L
D
. This leads to the conclusion that the ow (t, ) associated to Jv
preserves L
D
for each t xed.
Proof of lemma 6.2 We assume rst that v = C

c
(R
D
; R
D
) and that the
weaker condition [v(z)[ C(2 +[z[) is fullled. Under this assumption the au-
tonomous vector eld Jv is smooth and divergence-free, so the ow : [0, h]
R
D
R
D
associated to Jv is smooth and measure-preserving. In this case we
simply dene
t
=(t, )
#
, so that the continuity equation (6.9) is satised. The
measure preserving property gives that
t
=
t
L
D
, with
(6.11)
t
(t, ) =.
Notice that (a) (with an equality, and even for nonconvex S) follows immediately
by (6.11), and (c) as well, provided we show that (t, )
1
(B
r
) B
r
/ . To show
the latest inclusion, notice that (t, y) =(t, )
1
(y) is the ow associated to Jv,
hence
d
dt
[(t, y)[ [Jv[((t, y)) C(2+[(t, y)[).
By integrating this differential inequality we immediately obtain that
2+[(t, y)[ e
Ct
(2+[y[).
Hence, [y[ < r implies [(t, y)[ < r
/
for t [0, h]. An analogous argument gives
2+[(t, z)[ e
Ct
(2+[z[), hence when e
Ch
< 2 we obtain
[(t, z)[ 2([z[ +1).
18 LUIGI AMBROSIO, WILFRID GANGBO
Using this inequality we can estimate
_
R
D
[Jv[
2
d
t
2C
2
_
R
D
(4+[y[
2
)d
t
= 8C
2
+2C
2
_
R
D
[(t, z)[
2
d
8C
2
+16C
2
_
R
D
(1+[z[
2
)d = 24C
2
+16C
2
M
2
() L
2
o
.
Using this estimate in conjunction with (3.2) and (6.9) yields that t
t
is L
o

Lipschitz .
In the general case we consider a sequence v
n
=
n
with all properties stated in
Lemma 6.1. As > 0 L
D
a.e., we can also assume with no loss of generality that
v
n
v L
D
a.e. in R
2d
. Let
n
t
be the measures built according to the previous
construction relative to v
n
and notice that t
n
t
are equi-bounded in P
2
(R
D
),
and L
o
Lipschitz continuous. Furthermore,
n
t
=
n
t
L
D
with
n
t
locally uniformly
bounded from below. Hence, we may assume with no loss of generality that
n
t

t
narrowly for any t [0, h].
By the lower semicontinuity of moments we get
t
P
2
(R
D
) for any t, and the
lower semicontinuity of Wasserstein distance (see for instance Proposition 7.1.3 in
[4]) gives that the Lipschitz bound and the distance bound (6.10) are preserved
in the limit. Also the inequality
_
S(
n
t
)dz
_
S()dz with S convex and the
local lower bound in (c) are easily seen to be stable under weak convergence, and
imply (choosing S =

S convex, growing faster than linearly at innity, such that
_

S()dz < +) that
t
=
t
L
D
P
a
2
(R
D
) with
t
m
r
/ L
D
a.e. on B
r
for any
r > 0.
It remains to show the validity of the continuity equation in (b). To this aim,
it sufces to show that, for t xed, Jv
n

n
t
converge in the sense of distributions
to Jv
t
. As

S grows faster than linearly at innity, we obtain from the inequality
_

S(
n
t
)dz
_

S()dz, that
n
t
is equi-integrable (see for instance Proposition 1.27
of [3]). Hence for any > 0 we can nd > 0 such that
L
D
(B) < =
_
B

t
dz +sup
n
_
B

n
t
dz <.
We x r > 0 and choose as B B
r
an open set given by Egorov theorem, so that
v
n
v uniformly on B
r
B; let also v
/
: R
2d
R
2d
be a continuous function
coinciding with v on B
r
B, with [v
/
[ C(2 +r). For any C
c
(B
r
) we have
then
_
R
D
Jv
n

n
t
dz =
_
R
D
(Jv
n
Jv
/
)
n
t
dz +
_
R
D
Jv
t
dz
+
_
R
D
(Jv
/
Jv)
t
dz +
_
R
D
Jv
/
(
n
t

t
)dz,
so that
limsup
n+

_
R
D
Jv
n

n
t
dz
_
R
D
Jv
t
dz

2Csup[[(2+r).
As is arbitrary, this proves the weak convergence. QED.
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 19
Remark 6.4 (Stability of upper bounds). By the same argument one can show
that if M
r
L
D
a.e. on B
r
for any r > 0, then
t
M
r
/ L
D
a.e. on B
r
with
r
/
= e
Ch
r +2(e
Ch
1).
The main result of this section is concerned with Hamiltonians H satisfying the
following properties:
(H1) There exist constants C
o
(0, +), R
o
(0, +] such that for all P
2
a
(R
D
)
with W
2
(, ) < R
o
we have D(H), H() ,= / 0 and w = H() satises
[w(z)[ C
o
(1+[z[) for almost every z R
D
.
(H2) If =L
D
,
n
=
n
L
D
P
a
2
(R
D
), sup
n
W
2
(
n
, ) < R
o
and
n
nar-
rowly, then there exist a subsequence n(k) and functions w
k
, w : R
D
R
D
such
that w
k
= H(
n(k)
)
n(k)
-a.e., w = H() -a.e. and w
k
w L
D
a.e. in R
D
as k +.
To ensure the constancy of H along the solutions of the Hamiltonian system we
consider also:
(H3) H : P
2
(R
D
) (, +] is proper, lower semicontinuous and convex for
some R.
Recalling that P
a
2
(R
D
) is dense in P
2
(R
D
) it would be not difcult to show, by
the same argument used at the beginning of the proof of Theorem 5.2, that (H3) and
(H1) imply that H is Lipschitz continuous on the ball P
2
(R
D
) : W
2
(, )
R
o
. Assumption (H2), instead, is a kind of C
1
-regularity assumption on H.
Thinking to the nite-dimensional theory (for instance to Peanos existence the-
orems for ODEs with a continuous velocity eld) some assumption of this type
seems to be necessary in order to get existence. In the following remark we dis-
cuss, instead, existence in the at innite-dimensional case and uniqueness in
the nite-dimensional case.
Remark 6.5. Assume that we are given a convex (or -convex for some R)
Lipschitz function H : R
2d
R. Then, H(x) is not empty for all x R
2d
and we
may dene solutions of the Hamiltonian ODE those absolutely continuous maps
x : [0, +) R
2d
satisfying J
d
x(t) H(x(t)) for a.e. t [0, +).
The same subdifferentiability argument used in the proof of Theorem 5.2 then
shows that t H(x(t)) is constant along Hamiltonian ows. Existence of Hamil-
tonian ows can be achieved by the following discrete scheme: x a time parame-
ter h > 0 and an initial datum x R
2d
. Then, choose p
0
H(x
0
) and set x
h
(t) =
x
0
+J
d
p
0
t for t [0, h], choose p
1
H(x
h
(h)) and set x
h
(t) =x
1
+J
d
p
1
(t h) for
t [h, 2h] and so on. In this way x
h
(t) solves the delayed Hamiltonian equation
(6.12) J
d
x
h
(t) H
_
x
h
(h[
t
h
])
_
for a.e. t 0.
Using a compactness and equi-continuity argument we can nd a sequence (h
i
) 0
and a Lipschitz map x : [0, ) R
2d
such that x
h
i
(t) converge to x(t) as i for
any t 0 and x
h
i
weakly converge in L
2
loc
([0, ); R
2d
) to x.
20 LUIGI AMBROSIO, WILFRID GANGBO
In order to show that J
d
x H(x) a.e., we use an integral version of the discrete
subdifferential inclusion, namely
H(y)
_

0
H(x
h
i
(h
i
[
t
h
i
]))(t)dt +
_

0
y x
h
i
(h
i
[
t
h
i
]), J
d
x
h
i
(t))(t)dt,
with (t) nonnegative, with compact support and satisfying
_
dt = 1, and pass to
the limit as i to nd
H(y)
_

0
H(x(t))(t)dt +
_

0
y x(t), J
d
x(t))(t)dt.
Choosing properly a family
i
of approximations of
t
, this yields
H(y) H(x) +y x(t), J
d
x(t))
at any Lebesgue point t of x. This proves existence of Hamiltonian ows. We also
refer the reader to a work in progress by Ghoussoub and Moameni [32] on related
questions.
Notice that this scheme doesnt seem to work in the innite-dimensional case,
when R
2d
is replaced by an innite-dimensional phase space X, due to the difculty
of handling terms
_
f
h
(t), g
h
(t))dt with f
h
weakly converging in L
2
loc
([0, +); X)
and g
h
(t) only pointwise weakly converging to g(t). Indeed, we are not aware of
any existence result in this direction.
Coming back to the nite-dimensional case X = R
2d
, the results in [5] (see also [6]
for special classes of Hamiltonians) ensure a kind of generic uniqueness prop-
erty, or uniqueness in the ow sense, in the same spirit of DiPernaLions theory
[25] (see 6 of [5] for a precise formulation). In brief, among all families of solu-
tions x(t, x) of the ODE, the condition
(6.13) x(t, )
#
L
2d
CL
2d
with C independent of t
determines x up to L
d
negligible sets (i.e. if x and x full (6.13), then x(, x) =
x(, x) for L
d
a.e. x) and the unique x satisfying (6.13) is stable within the class
of approximations fullling (6.13) (in particular, one nds that x(t, ) is measure-
preserving for all t). It turns out that the scheme described here produces a discrete
ow x
h
(t, x) satisfying (6.13) with C = 1, and therefore is a good approximation of
the unique Hamiltonian ow x. See also [45] for discrete schemes (called leap-frog
schemes) that really preserve the symplectic forms and therefore the symplectic
volume.
Theorem 6.6. Assume that (H1) and (H2) hold and that T > 0 satises (6.18).
Then there exists a Hamiltonian ow
t
=
t
L
D
: [0, T] D(H) starting from
= L
D
P
a
2
(R
D
), satisfying (5.1), such that the velocity eld v
t
coincides
with H(
t
) for a.e. t [0, T]. Furthermore, t
t
is LLipschitz, with
L
2
= 2C
2
o
(1+M) and M = e
(25C
2
o
+1)T
(1+M( )).
Finally, there exists a function l(r) depending only on T and C
o
such that
(6.14) m
r
L
D
-a.e. on B
r
r > 0 =
t
m
l(r)
L
D
-a.e. on B
r
r > 0
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 21
and
(6.15) M
r
L
D
-a.e. on B
r
r > 0 =
t
M
l(r)
L
D
-a.e. on B
r
r > 0.
If in addition (H3) holds, then t H(
t
) is constant.
Proof. In the rst two steps of the proof, we shall assume existence of positive
numbers m
r
such that the initial datum satises m
r
> 0 L
D
a.e. on B
r
for any
r > 0. That technical assumption will be removed only in the last step of the proof
of the theorem.
Step 1. (a time discrete scheme). Since is integrable, standard arguments
give existence of a convex function S : [0, +) [0, +), which grows faster than
linearly at innity and such that
_
S( )dz is nite. We x an integer N sufciently
large, so that C
o
h < 1/8 and 1+C
o
h/2 < e
C
o
h
< 1+2C
o
h with h = T/N, and we
divide [0, T] into N equal intervals of length h. We shall next argue how, for any
such N, Lemma 6.2 gives time discrete solutions
N
t
=
N
t
L
D
satisfying:
(a) the Lipschitz constant of t
N
t
is less than

L, with

L independent of N;
(b) sup
N,t
W
2
(
N
t
, ) < R
o
,
_
S(
N
t
)dz
_
S( )dz and
N
t
m
l(r)
L
D
-a.e. on
B
r
for any r > 0;
(c) the delayed Hamiltonian equation
(6.16)
d
dt

N
t
+ (Jv
N
t

N
t
) = 0
holds in the sense of distributions in (0, T) R
D
, with v
N
t
= H(
N
ih
) for
0 i N 1 and t [ih, (i +1)h).
In order to build
N
t
, we apply Lemma 6.2 N times with C = C
o
: we start with
= and v = H( L
D
) to obtain a solution
N
t
of (6.16) in [0, h]. Then, we
apply the lemma again with =
N
h
and v =H(
N
h
L
D
) to extend it continuously
to a solution of (6.16) in [h, 2h]. In N steps we build the solution in [0, T].
However, in order to be sure that the lemma can be applied each time, we have
to check that the inequality W
2
(
N
ih
, ) < R
o
is valid for i = 0, . . . , N1, and this is
where the restriction on T comes from: rst notice that since
W
2
(
N
(i+1)h
,
N
ih
) hC
o
_
24(1+M
2
(
N
ih
)) ,
by the triangle inequality we need only to prove by induction an upper bound of
the form
(6.17) M
2
(
N
ih
) M,
22 LUIGI AMBROSIO, WILFRID GANGBO
for some M such that C
o
T
_
24(1+M) <R
o
. To estimate inductively the moments,
we recall that M
2
() =W
2
2
(,
0
) and we use the triangle inequality to nd
M
2
(
N
(i+1)h
)
_
_
M
2
(
N
ih
) +hC
o
_
24(1+M
2
(
N
ih
))
_
2
(1+h)M
2
(
N
ih
) +24(1+
1
h
)h
2
C
2
o
(1+M
2
(
N
ih
))
(1+(25C
2
o
+1)h)M
2
(
N
ih
) +25C
2
o
h
as soon as 24(h+1) < 25. Hence, setting for brevity P = 25C
2
o
+1, we have the
inequality
M
2
(
N
(i+1)h
) (1+Ph)M
2
(
N
ih
) +Ph.
By induction we get
M
2
(
N
ih
) (1+Ph)
i
(M
2
( ) +1) 1
and setting i = N we nd that M = e
PT
(1+M
2
( )) is a good upper bound on all
moments. We have proved that the lemma can be iterated N times, provided
(6.18) C
o
T
_
24(1+e
(25C
2
o
+1)T
(1+M
2
( ))) < R
o
.
Finally, let us nd an explicit expression for the function l(r) in (b) (the ar-
gument for (6.15) is similar, and based on Remark 6.4). As the constant r
/
in
Lemma 6.2 is less than re
C
o
h
+4C
o
h, by our choice of h, by induction on i we get

N
t
m
r
i
L
D
-a.e. on B
r
with r
i
=re
iC
o
h
+4C
o
h(e
(i1)C
o
h
+ +1) t [0, ih], 1 i N.
Since
r
N
= re
NC
o
h
+4C
o
h
e
NC
o
h
1
e
C
o
h
1
(r +8)e
NC
o
h
= (r +8)e
C
o
T
,
it sufces to set l(r) = (r +8)e
C
o
T
.
Step 2. (passage to the limit). By (a), (b), t
N
t
are equi-bounded in P
2
(R
D
),
and equi-Lipschitz continuous. Hence, we may assume with no loss of generality
that
N
t

t
narrowly for any t [0, T].
By the lower semicontinuity of moments we get
t
P
2
(R
D
) for any t, and the
narrow lower semicontinuity of the Wasserstein distance (see for instance Propo-
sition 7.1.3 of [4]) gives that the L-Lipschitz bound in (a) and the distance bound
in (b) are preserved in the limit. Also the inequality
_
S(
N
t
)dz
_
S( )dz and
the local lower bounds in (b) are easily seen to be stable under weak convergence,
hence
t
=
t
L
D
, and the conclusion of (6.14) holds with l(r) = (r +8)e
C
o
T
(the
argument for (6.15) is similar, and based on Remark 6.4).
It remains to show that
t
is an Hamiltonian ow. To this aim, it is enough
to show that, for any t xed, v
N
t

N
t
converges, in the sense of distributions, to
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 23
JH(
t
)
t
. Assume by contradiction that this does not happen, i.e. there exist a
subsequence N
i
and a smooth test function such that
(6.19) inf
i

_
R
D
v
N
i
t
; )d
N
i
t

_
R
D
v
t
; )d
t

> 0.
Let us denote by [] the greatest integer function. Notice that by assumption (H2)
and the narrow convergence of
N
i
[N
i
t]/N
i
to
t
we can assume with no loss of gener-
ality that
v
N
i
t
= JH(
N
i
[N
i
t]/N
i
) JH(
t
) L
D
a.e. in R
2d
as i +.
By the same argument used at the end of the proof of Lemma 6.2, based on Egorov
theorem and the equi-integrability of
N
i
t
, we prove that v
N
i
t

N
i
t
converge in the
sense of distributions to JH(
t
)
t
, thus reaching a contradiction with (6.19).
Therefore, it sufces to pass to the limit as N in (6.16) to obtain that
t
is
an Hamiltonian ow with velocity eld v
t
=H(
t
).
Step 3. Now we consider the general case. We strongly approximate in
L
1
(R
D
) by functions
k
such that
k
L
D
P
2
(R
D
) and, for any k, there exist con-
stants m
k
r
> 0 such that
k
m
k
r
L
D
-a.e. on B
r
for any r > 0 (for instance, convex
combinations of with a Gaussian). We also notice that the equi-integrability of

k

k=1
ensures the existence of a convex function S having a more than linear
growth at innity, and independent of k, such that
_
S(
k
)dz 1 for any k.
The construction performed in Step 1 and Step 2 can then be applied for each
k, yielding solutions of the Hamiltonian ODE
k
t
=
k
t
L
D
, t [0, T], satisfying

k
0
=
k
,
_
S(
k
t
)dx 1, and
(6.20)
d
dt

k
t
+ (JH(
k
t
)
k
t
) = 0 in (0, T) R
2d
.
As, by construction, t
k
t
are L-Lipschitz, we can also assume, possibly extract-
ing a subsequence, that
k
t

t
narrowly as k + for any t [0, T]. The upper
bound on
_
S(
k
t
)dx then ensures that
t
P
a
2
(R
D
) for all t [0, T].
The same argument used in Step 2, based on (H2) and the equi-integrability of

k
t
, shows that for any t [0, T], JH(
k
t
)
k
t
converges to JH(
t
)
t
as k +
in the sense of distributions. Therefore, passing to the limit as k +in (6.20) we
obtain that
t
is a solution of the Hamiltonian ODE with velocity eld JH(
t
).
Let us next give a more explicit expression for the Lipschitz constant of t
t
.
Recall that by (6.17), we have
(6.21) M
2
(
N
ih
) M = e
PT
(1+M
2
( ))
and W
2
(

, ) < R
o
for [0, T]. Thus, (6.21) and (H1) imply that
(6.22)
|H(

)|
2
L
2
(

;R
D
)
C
2
o
_
R
D
(1+[z[)
2
d

(z) 2C
2
o
(1+M(

)) 2C
2
o
(1+M).
24 LUIGI AMBROSIO, WILFRID GANGBO
This, together with (3.2), yields
(6.23) W
2
(
s
,
t
)
_
t
s
|H(

)|
L
2
(

;R
D
)
d L(t s).
Finally, the constancy of t H(
t
) follows by the (essential) boundedness of
|v
t
|
L
2
(
t
;R
D
)
and Theorem 5.2. QED.
We conclude this section by showing a class of Hamiltonians satisfying the
assumptions of Theorem 6.6.
Lemma 6.7. Let P
2
(R
D
) with a bounded support and let V : R
D
R be
V
convex, W : R
D
R
D
R convex and even, both differentiable and with at most
quadratic growth at innity. Then, for a > 0 the function
(6.24)
H() =H
0
()+V ()+W () =
a
2
W
2
2
(, )+
_
R
2d
V d +
1
2
_
R
D
R
D
W d
is (
V
a)convex, lower semicontinuous and satises (H1) and (H2).
Proof. Possibly rescaling V and W, we shall assume that a = 1. It is well known
(see for instance [46] or Chaper 10 of [4]) that the potential energy V is
V
convex
and lower semicontinuous, and that the interaction energy W is convex and lower
semicontinuous. As a consequence, H is (
V
1)convex and lower semicontinu-
ous.
In order to show (H1) it sufces to notice that both W and W have a growth
at most linear at innity, and prove that
(6.25) H() =H
0
() +V +(W ) P
2
(R
D
),
taking also into account that Proposition 4.3 yields, in the case when P
a
2
(R
D
),
H
0
() = t

id, and that t

(; R
D
) (by the boundedness of the support
of ).
The inclusion in (6.25) is a direct consequence of the characterization (4.4)
of the subdifferential and of the inequalities
V () V () +
_
R
D
V, id)d +

V
2
W
2
2
(, )
W () W () +
_
R
D
(W) , id)d
for
o
(, ) (see for instance [4]). In order to prove the inclusion , we x
a vector H() and dene, for
o
(, ), the measures
t
= ((1t)
1
+
t
2
)
#
and
t
:= (
1
, (1 t)
1
+t
2
)
#

o
(,
t
). As (
t
id) = t( id),
by applying the denition of subdifferential we obtain
liminf
t0
H(
t
) H()
t

_
R
D
w, id)d.
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 25
Now, the dominated convergence theorem gives
lim
t0
V (
t
) V ()
t
=
_
R
2d
V, id)d, lim
t0
W (
t
) W ()
t
=
_
R
2d
(W), id)d,
so that
liminf
t0
H
0
(
t
) H()
t

_
R
D

0
, id)d
with
0
= V (W) . Then, by (1)convexity of H
0
we get
H
0
() H
0
() +
_
R
D

0
, id)d
1
2
W
2
2
(, ).
The previous inequality, together with Propositions 4.2 and 4.3, gives that
0

H
0
().
Property (H2) follows directly from the identity
H() = (t

id) +V +(W)
and from Lemma 3.3. QED.
As shown in [38], another important class of convex functionals in P
2
(R
D
)
is provided by the so-called internal energy functional = L
D

_
S()dz.
However, as the subdifferential of this functional is not empty only when L
S
() is
a W
1,1
function (here L
S
(y) = yS
/
(y) S(y)), these functionals fail to satisfy (H1).
The previous result can be extended to Hamiltonians generated from those of
Lemma 6.7 through a sup-convolution. For simplicity we consider the case when
neither potential nor interaction energies are present, but their inclusion does not
present any substantial difculty.
Lemma 6.8. Assume that R
D
is a bounded open set, and that
(a) K P() is a convex set, with respect to the standard linear structure of
P(), closed with respect to the narrow convergence;
(b)

J : K R+ is strictly convex with respect to the standard linear
structure of P(), bounded from below and lower semicontinuous with
respect to the narrow convergence.
Dene the Hamiltonian H on P
2
(R
D
) by
(6.26) H() = inf
K

1
2
W
2
2
(, ) +

J().
Then H is (1)convex and lower semicontinuous, and satises (H1) and (H2).
Proof of Lemma 6.8. Since W
2
2
(, )

J() is (2)-convex for each
K, we obtain that H is (1)-convex and so (H3) holds.
1. Notice rst that W
2
2
(, ) is lower semicontinuous with respect to the narrow
convergence (see for instance Proposition 7.1.3 of [4]). Since

J is bounded from
26 LUIGI AMBROSIO, WILFRID GANGBO
below and lower semicontinuous, and since bounded sets in P
2
(R
D
) are sequen-
tially compact with respect to the narrow convergence, we obtain that the inmum
in the denition of H is attained. Strict convexity of

J and convexity of W
2
2
(, )
give uniqueness of the minimizer, which we denote by (). A compactness argu-
ment based on the uniqueness of () then shows that
n
in P
2
(R
D
) implies
(
n
) () narrowly in P(). As is bounded the map () is also
continuous between P
2
(R
D
) and P
2
().
2. Let
o
P
a
2
(R
D
) and P
2
(R
D
). Clearly,
H() H(
o
)
1
2
_
W
2
2
(, (
o
)) W
2
2
(
o
, (
o
))
_
.
This, together with the fact that the Wasserstein gradient of
1
2
W
2
2
(, (
o
))
at
o
is t
(
o
)

o
id (see (4.8)), yields that t
(
o
)

o
id H(
o
) and so H(
o
) is
nonempty.
To characterize the elements of H(
o
), let C

c
(R
D
) and set
g
s
= id+s,
s
= g
s#

o
,
s
=(
s
).
If H(
o
), the fact that H is (1)convex implies that
H(
s
) H(
o
)
_
R
2d
; t

o
id)d
o
+
1
2
W
2
2
(
o
,
s
) 0.
For [s[ << 1, g
s
is the gradient of a convex function and so, the previous inequality
yields
s
_
R
D
; )d
o
+
s
2
2
_
R
2d
[[
2
d
o
H(
o
) H(
s
)

1
2
_
W
2
2
(
s
,
s
) W
2
2
(
o
,
s
)
_

1
2
_
R
D
[idt

s
[
2
d
s

1
2
_
R
D
[idk
s
t

s
[
2
d
s
=
1
2
_
R
D
[idt

s
[
2
d
s

1
2
_
R
2d
[t

s
k
s
[
2
d
s
. (6.27)
Here, we have set k
s
= g
1
s
. One can easily check that
(6.28) k
s
(y) = y s(y) +
s
2
2

2
(y)(y) +(s, y),
where is a function such that [(s, y)[ [s[
3
[[[[
C
3
(R
2d
)
. We combine (6.27) and
(6.28) to conclude that
s
_
R
D
; )d
o
+
s
2
2
_
R
2d
[[
2
d
o
s
_
R
D
idy; )d
s
+o(s),
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 27
where
s
is the unique optimal plan between
s
and
s
. Recall now that
s

o
in
P
2
(R
D
) and
s
in P
2
() as s 0, hence Lemma 3.3 gives
(6.29) s
_
R
D
; )d
o
+
s
2
2
_
R
D
[[
2
d
o
s
_
R
D
idt

o
; )d
o
+o(s).
We divide both sides of (6.29) rst by s > 0 then s < 0; letting [s[ 0 we nd

_
R
D
; )d
o
=
_
R
D
idt

o
; )d
o
.
This proves that

0
= t

o
id. The minimality of the norm of the gradient then
gives
(6.30) H(
o
) = t

o
id.
From this representation of H(
o
) and from (3.13) we obtain both (H1) and
(H2). QED.
7 An alternative algorithm yielding existence of Hamiltonian ows for
general initial data
In this section we provide a new discrete scheme providing existence of solu-
tions to Hamiltonian ows for general initial data, i.e. not necessarily absolutely
continuous with respect to Lebesgue measure. Being based on a linear interpola-
tion at the level of transports, when particularized to Dirac masses this algorithm
coincides with the one considered in Remark 6.5.
Lemma 7.1. Let f : X Y be a Borel map, P(X), and let v L
2
(; R
D
).
Then, setting = f
#
, we have f
#
(v) = w for some w L
2
(; R
D
) with
(7.1) |w|
L
2
(;R
D
)
|v|
L
2
(;R
D
)
.
Proof. Let := f
#
(v) and L

(Y; R
D
); denoting by

, = 1, , N, the
components of we have

i=1
_
Y

i
d
i

| f |
L
2
(;R
D
)
|v|
L
2
(;R
D
)
= ||
L
2
(;R
D
)
|v|
L
2
(;R
D
)
.
Since is arbitrary this proves (7.1). QED.
Lemma 7.2. Let T > 0, C 0,
n
t
: [0, T] P
2
(R
D
) and v
n
t
L
2
(
t
; R
k
) be
satisfying:
(a)
n
t

t
narrowly as n +, for all t [0, T];
(b) |v
n
t
|
L
2
(
t
;R
k
)
C for a.e. t [0, T];
(c) the R
k
-valued space-time measures v
n
t

n
t
dt are weakly

converging in (0, T)
R
D
to .
28 LUIGI AMBROSIO, WILFRID GANGBO
Then there exist v
t
L
2
(
t
; R
k
), with |v
t
|
L
2
(
t
;R
k
)
C for a.e. t, such that =
v
t

t
dt.
Proof. Possibly extracting a subsequence we can also assume that the scalar space-
time measures [v
n
t
[
n
t
dt weak

-converge to , and it is well-known (see for instance


Proposition 1.62(b) of [3]) that [[ . Since, by H older inequality, the projection
of [v
n
t
[
n
t
dt on [0, T] is less than Cdt, the same is true for . Hence the disintegra-
tion theorem (see for instance Theorem 2.28 in [3]) provides us with the represen-
tation =
t
dt for suitable R
k
-valued measures in R
D
having total variation less
than C for a.e. t.
Now, for any C

c
(0, T), C

c
(R
D
; R
k
) we have

_
T
0
(t);
t
)dt

=[; )[ = lim
n+

_
T
0
(t); v
n
t

n
t
)dt

C
_
T
0
[[(t)
_
[[
2
;
t
)dt.
As is arbitrary, this means that [;
t
)[ C
_
[[
2
;
t
) for a.e. t. By a density
argument we can nd a Lebesgue negligible set N (0, T) such that
[;
t
)[ C
_
[[
2
;
t
) C

c
(R
D
; R
k
), t (0, T) N .
Hence, for any t (0, T) N we have
t
= v
t

t
for some v
t
L
2
(
t
; R
k
) with
L
2
(
t
; R
k
) norm less than C. QED.
We consider now two basic assumptions on the Hamiltonian, that are variants
of those considered in the previous section.
(H1) There exist constants C
o
[0, +), R
o
(0, +] such that for all P
2
(R
D
)
with W
2
(, ) < R
o
we have D(H), H() ,= / 0 and |H()|
L
2
()
C
o
.
(H2) If sup
n
W
2
(
n
, ) < R
o
and
n
narrowly, then
(7.2)

m=1
co(H(
n
)
n
: n m)
_
w : w H() T

P
2
(R
D
)
_
,
where co denotes the closed convex hull, with respect to weak

-topology.
Remark 7.3. (a) Assumption (H1) is weaker than (H1), with the replacement of a
pointwise bound with an integral one. Also (H2) is essentially weaker than (H2),
as it does not impose any strong convergence property on H(
n
); however,
this forces to consider a stability with respect to closed convex hulls.
(b) A sufcient condition which ensures (H2) is the following:
(H2) If sup
n
W
2
(
n
, ) < R
o
and
n
narrowly, then
H(
n
)
n
H()
in the sense of distribution.
(c) As in the previous section, the condition (H3) ensures constancy of the
Hamiltonian along the Hamiltonian ows. We can apply the same argument used
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 29
at the beginning of the proof of Theorem 5.2, to obtain that (H3) and (H1) imply
that H is Lipschitz continuous on the ball P
2
(R
D
) : W
2
(, ) R
o
.
Theorem 7.4. Assume that (H1) and (H2) hold and that C
o
T < R
o
. Then there
exists a Hamiltonian ow
t
: [0, T] D(H) starting from P
2
(R
D
), satisfying
(5.1), such that t
t
is C
o
Lipschitz. Furthermore, if (H3) holds, then t H(
t
)
is constant.
In particular, if H() T

P
2
(R
D
) = H() for all such that W
2
(, ) <
R
o
, then the velocity eld v
t
in (5.1) coincides with H(
t
) for a.e. t [0, T].
Proof. Step 1. (construction of a discrete solution). We x an integer N sufciently
large and we divide [0, T] in N equal intervals of length h =T/N. We build discrete
solutions
N
t
satisfying:
(a) the Lipschitz constant of t
N
t
is less than C
o
;
(b) W
2
(
N
t
, ) C
o
T;
(c) the delayed Hamiltonian equation
(7.3)
d
dt

N
t
+ (w
N
t

N
t
) = 0
holds in the sense of distributions in (0, T) R
D
, with
(7.4) w
N
t

N
t
=
_
id+(t ih)JH(
N
ih
)
_
#
_
JH(
N
ih
)
N
ih
_
for 0 i N1 and t [ih, (i +1)h).
We build rst the solution in [0, h], setting w
N
o
= JH( ). We then set

N
t
=
_
id+tw
N
o
_
#
, w
N
t
=
_
id+tw
N
o
_
#
_
w
N
o

_

N
t
, t [0, h].
We claim that w
N
t
is an admissible velocity eld for
N
t
. Indeed, for any
C

c
(R
D
) we have
d
dt
_
R
D
d
N
t
=
d
dt
_
R
D
(id+tw
N
o
)d =
_
R
D
(x +tw
N
o
); w
N
o
)d
=
D

i=1
_
R
D

x
i
d
_
_
id+tw
N
o
_
#
(w
N
oi
)
_
=
_
R
D
; w
N
t
)d
N
t
.
As is arbitrary, this proves that (7.3) is fullled in [0, h]. Notice also that Lemma 7.1
gives
_
R
D
[w
N
t
[
2
d
N
t

_
R
D
[w
N
o
[
2
d C
2
o
t [0, h],
hence (3.2) gives that the Lipschitz constant of t
N
t
in [0, h] is bounded by C
o
.
In particular W
2
( ,
N
t
) C
o
h for t [0, h]. We can repeat this process, setting
w
N
h
= JH(
N
h
) and introduce the following extensions on (h, 2h] :

N
t
=
_
id+(t h)w
N
h
_
#

N
h
, w
N
t
:=
_
id+(t h)w
N
h
_
#
_
w
N
h

N
h
_

N
t
30 LUIGI AMBROSIO, WILFRID GANGBO
for t [h, 2h], with the Lipschitz constant of t
N
t
is bounded by C
o
and the con-
tinuity equation (c) holding. By iterating this process N times we build a solution
of (7.3), provided NhC
o
< R
o
. In summary, we have obtained that
(7.5) W
2
(
N
t
, ) C
o
T, |H(
N
t
)|
L
2
(
N
t
;R
D
)
C
o
, |w
N
t
|
L
2
(
N
t
;R
D
)
C
o
for t [0, T]. The rst inequality in (7.5) is due to our choice of T and to the fact
that t
t
is C
o
Lipschitz. The second inequality is a consequence of (H1). To
obtain the last inequality in (7.5), we have used Lemma 7.1. By (7.5), we can
readily conclude (a) and (b). The construction of
N
t
and w
N
t
is made such that (c)
holds.
Step 2. (passage to the limit). By (a), (b), t
N
t
are equi-bounded in P
2
(R
D
),
and equi-Lipschitz continuous. Hence, we may assume with no loss of generality
that
N
t

t
narrowly for any t [0, T].
By the lower semicontinuity of moments we get
t
P
2
(R
D
) for any t, more-
over, the lower semicontinuity of W
2
(, ) under narrow convergence gives that the
C
o
-Lipschitz bound in (a) and the distance bound in (b) are preserved in the limit.
It remains to show that
t
solves the Hamiltonian ODE. To this aim, taking
into account Lemma 7.2 and possibly extracting a subsequence (not relabelled for
simplicity) we can assume that there exist w
t
L
2
(
t
; R
D
), with |w
t
|
L
2
(
t
)
C
o
for a.e. t, such that the space-time measures w
N
t

N
t
dt weak

-converge to w
t

t
dt.
We have to show that w
t
= Jv
t
for some v
t
T

P
2
(R
D
). To this aim, notice that
lim
N+
_
T
0
(t); w
N
t

N
t
)dt =
_
T
0
(t); w
t

t
)dt C

c
(0, T), C

c
(R
D
; R
D
).
For xed, this means that the maps t ; w
N
t

N
t
) weakly converge in L
2
(0, T)
to ; w
t

t
). Therefore, a sequence of convex combinations of them converges a.e.
to ; w
t

t
) and we obtain
(7.6) ; w
t

t
) limsup
N+
; w
N
t

N
t
)
for a.e. t [0, T]. By a density argument we can nd a Lebesgue negligible set
N (0, T) such that, for all t (0, T) N , (7.6) holds for all C
o
(R
D
; R
D
)
(the closure, in the sup norm, of C
c
(R
D
; R
D
)).
Now, x t (0, T) N where (7.6) holds for all C
o
(R
D
; R
D
) and apply
Hahn-Banach theorem to obtain that
w
t

M=1
K
M,t
where K
M,t
is the closed convex hull of w
N
t

N
t

NM
with respect to the weak

topology. Indeed, x M and assume by contradiction that w


t

t
does not belong
to K
M,t
. Then, we can strongly separate w
t

t
and K
M,t
by a continuous linear
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 31
functional, induced by some function C
c
(R
D
; R
D
), to obtain a contradiction
with (7.6). As
w
N
t

N
t
=
_
id+(t [Nt]/N)w
N
[Nt]/N
_
#
(w
N
[Nt]/N

N
[Nt]/N
)
=
_
id+(t [Nt]/N)JH(
N
[Nt]/N
)
_
#
(JH(
N
[Nt]/N
)
N
[Nt]/N
)
we obtain also that
w
t

M=1
co
__
JH(
N
[Nt]/N
)
N
[Nt]/N
: N M
__
,
hence (H2) gives that w
t

t
= Jv
t

t
for some v
t
H(
t
) T

t
P
2
(R
D
).
Finally, the constancy of t H(
t
) follows by the (essential) boundedness of
|v
t
|
L
2
(
t
;R
D
)
and Theorem 5.2. QED.
Remark 7.5. One can readily check that if we assume that (H1) and (H2) hold
and that C
o
T <R
o
, then there exists a Hamiltonian ow
t
: [0, T] D(H) starting
from P
2
(R
D
), satisfying (1.2), such that t
t
is C
o
Lipschitz. Furthermore,
if (H3) holds, then t H(
t
) is constant.
We can prove now the following extension of Lemma 6.7, where we drop the
boundedness assumption on the support of .
Lemma 7.6. Let P
2
(R
D
) and let V, W as in Lemma 6.7. Then the function H
dened in (6.24) satises (H1), (H2) and (H3).
Proof. (H3) has already been proved in Lemma 6.7, while (H1) follows by the
identity (6.25), taking into account that
_
R
D
[ id[
2
d
_
R
D
R
D
[y x[
2
d =W
2
2
(, )
o
(, ).
Finally, let us check property (H2). Let w be the weak

limit of the convex


combinations
l(n)

i=n

n
i
w
i

i
with 0
n
i
1,
l(n)

i=n

n
i
= 1,
and, representing as w
n
=
n
id for suitable
n

o
(
n
, ), dene

n
=
l(n)

i=n

n
i

i
,
n
=
l(n)

i=n

n
i

i
(
n
, ).
Let be a distance in P(R
D
R
D
) inducing the narrow convergence (see for in-
stance Remark 5.1.1 of [4]). As any limit point with respect to the narrow topology
of
n

n=1
belongs to
o
(, ) (see for instance Proposition 7.1.3 of [4]), a com-
pactness argument gives an innitesimal sequence
n

n=1
(0, +) and
n

32 LUIGI AMBROSIO, WILFRID GANGBO

o
(, ) such that (
n
,
n
) <
n
. In particular, setting
n
=
l(n)
i=n

n
i

i

o
(, )
and noticing that is induced by a norm, we have
(
n
,
n
) sup
in

i
.
In particular, since
o
(, ) is narrowly closed, we infer that any limit point , in
the narrow topology, of
n
, belongs to
o
(, ). Let be any of these limit points,
along a subsequence n(k), and notice that for any C

c
(R
2d
; R
2d
) we have
w; ) = lim
k+

l(n(k))

i=n(k)

n(k)
i
(
i
id)
i
; ) = lim
k+
_
R
D
R
D
y x; (x))d
n(k)
=
_
R
D
R
D
y x; (x))d = ( id); ).
As is arbitrary, this proves that w= id, hence (3.10) and Proposition 4.3 yield
w T

P
2
(R
D
) and w H(). QED.
8 Examples
In this section we briey illustrate some PDEs tting in our framework.
Semigeostrophic equations.
(a) If we set d = 1 and =

L
2
in Lemma 6.7, where R
2
is a bounded
Borel set with L
2
() = 1, then
d
dt

t
+D
x

_
J
1
(T

t
id)
t
_
= 0
is the Hamiltonian ODE relative to W
2
2
(, )/2, thanks to (4.6). This PDE is a
variant of the semigeostrophic equation. Notice that the (1)convexity of H is
ensured by Proposition 4.3.
(b) When d = 1 and

J() =
1
2
_

2
dx, then the Hamiltonian ODE relative to
H() := sup
K

1
2
W
2
2
(, L
3
)

J()
corresponds to the semigeostrophic shallow water equation, studied in [17]. It
sufces to apply Lemma 6.8.
(c) Finally, if D = 3, J(x, y, z) = (y, x, 0) and H() = W
2
2
(, )/2, with =

L
3
, then the Hamiltonian ODE is the 3-d semigeostrophic equation studied in
[10] and [16].
Vlasov-Poisson and Vlasov-Monge-Amp ere equations.
Suppose that d 1, = (

L
d
)
0
, where R
d
is a bounded Borel set
with L
d
() = 1, and
0
is the Dirac mass in R
d
. Then, as shown in [18], the
Hamiltonian in Lemma 6.7 decouples into
H() =
1
2
M
2
(
2
)
1
2
W
2
2
(
1
,

L
d
),
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 33
where
1
(resp.
2
) is the rst (resp. second) marginal of . This is due to the
fact the optimal transport map t

between P
a
2
(R
2d
) and has necessarily the
form (t, 0), where t is the optimal transport map between
1
and

L
d
, and an
analogous property holds at the level of optimal plans, when is a general measure
in P
2
(R
2d
).
Setting
t
= f (t, )L
2d
and
t
(x) =
_
R
d f (t, x, v)dv (i.e. the rst marginal of
, we have then obtained the Hamiltonian for the Vlasov-Monge-Amp` ere (VMA)
equation studied in [12] and more recently in [18], which is (up to a scaling argu-
ment)
(8.1)
_
d
dt
f (t, x, v) +D
x

_
v f (t, x, v)
_
= D
v

_
f (t, x, v)
x

t
(x)
_
(id
x

t
)
#

t
=

L
d
, with [x[
2
/2

t
(x) convex.
Note that when d = 1 the relation between
t
and

t
reduces to
t
= 1
xx

t
and so (8.1) is nothing but the well-known Vlasov-Poisson equation. Our existence
result Theorem 6.6 covers the case of absolutely continuous solutions, while The-
orem 7.4 covers, thanks to Lemma 7.6, also the case of general initial data: in this
case (VMA) has to be understood as follows:
(8.2)
_
d
dt

t
+D
x
(v
t
) = D
v
((id )
t
)

0
(
1
t
,

L
d
).
Indeed, any
/

o
(
t
,

L
d

0
) can be written as a product (id 0)
#

2
t
,
with
o
(
1
t
,

L
d
), so that
/
= ( , 0). Finally, it would be interesting to
compare carefully, in one space dimension, our existence result for the Vlasov-
Poisson equation with the existence result in [47]. Here we just mention that on
the one hand our result allows more general initial data (no exponential decay of
the velocities is required), on the other hand the solution built in [47] has additional
space-time BV regularity properties related to velocity averaging, that are used to
dene the product D
v
( f
x

t
).
Acknowledgment.
Luigi Ambrosio gratefully acknowledges the support provided by the MIUR
PRIN04 project Calcolo delle Variazioni. Wilfrid Gangbo gratefully acknowl-
edges the support provided by NSF grants DMS-02-00267 and DMS-03-754729.
He also acknowledges the hospitality of the Mathematical Sciences Research In-
stitute, Berkeley, CA 94720.
Bibliography
[1] Agueh, M. PhD Dissertation 2002. School of Mathematics Georgia Institute of Technology.
[2] Ambrosio, L. Lecture notes on optimal transport problems. In the procceedings of the CIME
course Mathematical aspects of evolving interfaces, Madeira (Pt), P. Colli and J.F. Rodrigues
Eds., 1812, 152, Springer, 2003.
[3] Ambrosio, L; Fusco, N.; Pallara, D. Functions of bounded variation and free discontinuity prob-
lems. Oxford Mathematical Monographs, Clarendon Press, 2000.
34 LUIGI AMBROSIO, WILFRID GANGBO
[4] Ambrosio, L.; Gigli, N.; Savar e, G. Gradient ows in metric spaces and the Wasserstein spaces
of probability measures. Lectures in Mathematics, ETH Zurich, Birkh auser, 2005.
[5] Ambrosio, L. Transport equation and Cauchy problem for BV vector elds. Inventiones Math-
ematicae, 158:227260, 2004.
[6] Bouchut, F. Renormalized solutions to the Vlasov equation with coefcients of bounded varia-
tion. Arch. Rational Mech. Anal., 157:7590, 2001.
[7] Brenier, Y. D ecomposition polaire et r earrangement monotone des champs de vecteurs. C.R.
Acad. Sci. Paris S er. I Math., 305:805808, 1987.
[8] Brenier, Y. Derivation of the Euler equations from a caricature of Coulomb interaction. Comm.
Math. Phys. 212, no. 1, 93104, 2000.
[9] Brenier, Y. Convergence of the Vlasov-Poisson system to the incompressible Euler equations.
To appear in Comm. PDEs.
[10] Benamou, J.D.; Brenier, Y. Weak existence for the semigeostrophic equations formulated as
a coupled Monge-Amp` ere equations/transport problem. SIAM J. Appl. Anal. Math. 58, no 5,
14501461 (1998).
[11] J.D. Benamou and Y. Brenier. A computational uid mechanics solution to the Monge-
Kantorovich mass transfer problem. Numer. Math. 84, no. 3, 375393, 2000.
[12] Brenier,Y.; Loeper, G. A geometric approximation to the Euler equations: The Vlasov-Monge-
Amp` ere equation. Geom. Funct. Anal., 2004.
[13] Carrillo, J.A.; McCann R.J.; Villani, C. Contraction in the 2-wasserstein metric length space
and thermalization of granular media. Archive for Rational Mech. Anal., 2005.
[14] Cordero-Erausquin, D; Gangbo, W.; Houdre C. Inequalities for generalized en-
tropy and optimal transportation. To appear in AMS Contemp. Math. Preprint
www.math.gatech.edu/ gangbo/publications/.
[15] Cloke, P.; Cullen, M.J.P. A semi-geostrophic ocean model with outcropping . Dyn. Atmos.
Oceans, 21:2348, 1994.
[16] Cullen, M.J.P.; Feldman M. Lagrangian solutions of the semigeostrophic equations in physical
space. To appear in SIAM Jour. Analysis 156, 241273, 2001.
[17] Cullen, M.J.P.; Gangbo W. A variational approach for the 2-dimensional semi-geostrophic shal-
low water equations. Arch. Ration. Mech. Anal., 156 241273, 2001.
[18] Cullen, M.J.P.; Gangbo, W.; Pisante, G. Semigeostrophic equations discretized in reference and
dual variables. Submitted
[19] Cullen, M.J.P.; Maroo, H. The fully compressible semigeostrophic equations from meteorol-
ogy. Archive for Rational Mech. and Analysis 167 no 4, 309336, 2003.
[20] Cullen, M.J.P.; Purser, R.J. An extended lagrangian theory of semigeostrophic frontogenesis. J.
Atmosph. Sciences, 41 14771497, 1984.
[21] Cullen, M.J.P; Purser, R.J. Properties of the lagrangian semigeostrophic equations. J. Atmosph.
Sciences vol 40, 17 26842697, 1989.
[22] Cullen, M.J.P; Norbury, J.; Purser, R.J. Generalised Lagrangian solutions for atmospheric and
oceanic ow. SIAM J. Appl. Math., 51, 20-31, 1991.
[23] Cullen, M.J.P.; Roulstone, J. A geometric model of the nonlinear equilibration of two-
dimensional Eady waves. J. Atmos. Sci., 50, 328-332, 1993.
[24] Dacorogna, B. Direct Methods in the Calculus of Variations. Springer-Verlag, 1989.
[25] Di Perna, R.J.; Lions, P.L. Ordinary differential equations, transport theory and Sobolev spaces.
Invent. Math., 98:511547, 1989.
[26] Eliassen, A. The quasi-static equations of motion. Geofys. Publ., 17, No 3, 1948.
[27] Evans, L.C. Partial differential equations and Monge-Kantorovich mass transfer. Int. Press,
Current Dev. Math, 26, 2678, 1997.
HAMILTONIAN ODES IN THE SPACE OF PROBABILITY MEASURES 35
[28] Gangbo, W. An elementary proof of the polar factorization of vector-valued functions. Arch.
Rational Mech. Anal., 128:381399, 1994.
[29] Gangbo, W; McCann, R.J. Optimal maps in Monges mass transport problem. C.R. Acad. Sci.
Paris S er. I Math. 321 16531658, 1995.
[30] Gangbo, W; McCann, R.J. The geometry of optimal transportation. Acta Math. 177 113161,
1996.
[31] Gangbo, W.; Pacini, T. Innite dimensional Hamiltonian systems in terms of the Wasserstein
distance. In progress.
[32] Ghoussoub, N.; Moameni, A. On the existence of Hamiltonian paths connecting Lagrangian
submanifolds. In progress.
[33] Hoskins, B.J.. The geostrophic momentum approximation and the semi-geostrophic equations.
J. Atmosph. Sciences 32, 233242, 1975.
[34] Jost J. Nonpositive curvature: geometric and analytic aspects. Lectures in Mathematics ETH
Z urich, Birkh auser Verlag, Basel, 1997.
[35] Jordan R.; Kinderlehrer, D.; Otto, F. The variational formulation of the Fokker-Planck equation.
SIAM J. Math. Anal. 29, 117, 1998.
[36] Kantorovich, L. On the translocation of masses. C.R. (Doklady) Acad. Sci. URSS (N.S.),
37:199201, 1942.
[37] Kantorovich, L. On a problem of Monge (In Russian). Uspekhi Math. Nauk., 3:225226, 1948.
[38] McCann, R.J. A convexity principle for interacting gases. Adv. Math. 128 153179, 1997.
[39] McCann, R.J. Existence and uniqueness of monotone measure-preserving maps. Duke Math.
J., 80 309323, 1995.
[40] McCann, R.J.; Oberman, A. Exact semi-geostrophic ows in an elliptical ocean bassin .
Nonlinearity 17 18911922, 2004.
[41] Monge, G. M emoire sur la th eorie des d eblais et de remblais. Histoire de lAcad emie Royale
des Sciences de Paris, avec les M emoires de Math ematique et de Physique pour la m eme ann ee,
pages 666704 (1781).
[42] Katz, B.S., editor. Nobel Laureates in Economic Sciences: a Biographical Dictionary. Garland
Publishing Inc., New York, 1989.
[43] Otto, F. The geometry of dissipative evolution equations: the porous medium equation. Comm.
P.D.E., 26 156186, 2001.
[44] Pedlosky, J. Geophysical Fluid Dynamics. Springer-Verlag (1982).
[45] Sanz-Serna, J.M.; Vadillo, F. Studies in numerical nonlinear instability III: augmented Hamil-
tonian systems. SIAM J. Appl. Math., 47 92108, 1995.
[46] Villani, C. Topics in optimal transportation. Graduate Studies in Mathematics 58, American
Mathematical Society, 2003.
[47] Zheng, Y.; Majda, A. Existence of global weak solutions to one-component Vlasov-Poisson and
Fokker-Planck-Poisson systems in one space dimension with measures as initial data. Comm.
Pure Appl. Math., 47 13651401, 1994.
Received Month 200X.

Você também pode gostar