Escolar Documentos
Profissional Documentos
Cultura Documentos
Analysis
Gerald Teschl
Gerald Teschl
Fakultat f ur Mathematik
Nordbergstrae 15
Universitat Wien
1090 Wien, Austria
E-mail: Gerald.Teschl@univie.ac.at
URL: http://www.mat.univie.ac.at/
~
gerald/
2010 Mathematics subject classication. 46-01, 28-01, 46E30
Abstract. This manuscript provides a brief introduction to Real and Func-
tional Analysis. It covers basic Hilbert and Banach space theory including
Lebesgue spaces and their duals (no knowledge about Lebesgue integration
is assumed).
Keywords and phrases. Functional Analysis, Banach space, Hilbert space,
Measure Theory, Lebesgue spaces.
Typeset by /
/
o-L
A
T
E
X and Makeindex.
Version: June 1, 2011
Copyright c _ 20042011 by Gerald Teschl
Contents
Preface vii
Part 1. Functional Analysis
Chapter 0. Introduction 3
0.1. Linear partial dierential equations 3
Chapter 1. A rst look at Banach and Hilbert spaces 7
1.1. Warm up: Metric and topological spaces 7
1.2. The Banach space of continuous functions 17
1.3. The geometry of Hilbert spaces 24
1.4. Completeness 29
1.5. Bounded operators 30
1.6. Sums and quotients of Banach spaces 33
Chapter 2. Hilbert spaces 35
2.1. Orthonormal bases 35
2.2. The projection theorem and the Riesz lemma 40
2.3. Operators dened via forms 42
2.4. Orthogonal sums and tensor products 44
Chapter 3. Compact operators 47
3.1. Compact operators 47
3.2. The spectral theorem for compact symmetric operators 49
3.3. Applications to SturmLiouville operators 52
iii
iv Contents
3.4. More on compact operators 55
3.5. Fredholm theory for compact operators 57
Chapter 4. The main theorems about Banach spaces 61
4.1. The Baire theorem and its consequences 61
4.2. The HahnBanach theorem and its consequences 66
4.3. Weak convergence 72
Chapter 5. Bounded linear operators 79
5.1. Banach algebras 79
5.2. The C
t
u(t, x) =
2
x
2
u(t, x). (0.1)
Here u(t, x) is the temperature distribution at time t at the point x. It
is usually assumed, that the temperature at x = 0 and x = 1 is xed, say
u(t, 0) = a and u(t, 1) = b. By considering u(t, x) u(t, x)a(ba)x it is
clearly no restriction to assume a = b = 0. Moreover, the initial temperature
distribution u(0, x) = u
0
(x) is assumed to be known as well.
3
4 0. Introduction
Since nding the solution seems at rst sight not possible, we could try
to nd at least some solutions of (0.1) rst. We could for example make an
ansatz for u(t, x) as a product of two functions, each of which depends on
only one variable, that is,
u(t, x) = w(t)y(x). (0.2)
This ansatz is called separation of variables. Plugging everything into
the heat equation and bringing all t, x dependent terms to the left, right
side, respectively, we obtain
w(t)
w(t)
=
y
(x)
y(x)
. (0.3)
Here the dot refers to dierentiation with respect to t and the prime to
dierentiation with respect to x.
Now if this equation should hold for all t and x, the quotients must be
equal to a constant (we choose instead of for convenience later on).
That is, we are lead to the equations
w(t) = w(t) (0.4)
and
y
x) +c
3
sin(
x). (0.7)
However, y(x) must also satisfy the boundary conditions y(0) = y(1) = 0.
The rst one y(0) = 0 is satised if c
2
= 0 and the second one yields (c
3
can
be absorbed by w(t))
sin(
) = 0, (0.8)
which holds if = (n)
2
, n N. In summary, we obtain the solutions
u
n
(t, x) = c
n
e
(n)
2
t
sin(nx), n N. (0.9)
So we have found a large number of solutions, but we still have not
dealt with our initial condition u(0, x) = u
0
(x). This can be done using
the superposition principle which holds since our equation is linear. Hence
any nite linear combination of the above solutions will be again a solution.
Moreover, under suitable conditions on the coecients we can even consider
innite linear combinations. In fact, choosing
u(t, x) =
n=1
c
n
e
(n)
2
t
sin(nx), (0.10)
0.1. Linear partial dierential equations 5
where the coecients c
n
decay suciently fast, we obtain further solutions
of our equation. Moreover, these solutions satisfy
u(0, x) =
n=1
c
n
sin(nx) (0.11)
and expanding the initial conditions into a Fourier series
u
0
(x) =
n=1
u
0,n
sin(nx), (0.12)
we see that the solution of our original problem is given by (0.10) if we
choose c
n
= u
0,n
.
Of course for this last statement to hold we need to ensure that the series
in (0.10) converges and that we can interchange summation and dierenti-
ation. You are asked to do so in Problem 0.1.
In fact many equations in physics can be solved in a similar way:
Reaction-Diusion equation:
t
u(t, x)
2
x
2
u(t, x) +q(x)u(t, x) = 0,
u(0, x) = u
0
(x),
u(t, 0) = u(t, 1) = 0. (0.13)
Here u(t, x) could be the density of some gas in a pipe and q(x) > 0 describes
that a certain amount per time is removed (e.g., by a chemical reaction).
Wave equation:
2
t
2
u(t, x)
2
x
2
u(t, x) = 0,
u(0, x) = u
0
(x),
u
t
(0, x) = v
0
(x)
u(t, 0) = u(t, 1) = 0. (0.14)
Here u(t, x) is the displacement of a vibrating string which is xed at x = 0
and x = 1. Since the equation is of second order in time, both the initial
displacement u
0
(x) and the initial velocity v
0
(x) of the string need to be
known.
Schrodinger equation:
i
t
u(t, x) =
2
x
2
u(t, x) +q(x)u(t, x),
u(0, x) = u
0
(x),
u(t, 0) = u(t, 1) = 0. (0.15)
6 0. Introduction
Here [u(t, x)[
2
is the probability distribution of a particle trapped in a box
x [0, 1] and q(x) is a given external potential which describes the forces
acting on the particle.
All these problems (and many others) lead to the investigation of the
following problem
Ly(x) = y(x), L =
d
2
dx
2
+q(x), (0.16)
subject to the boundary conditions
y(a) = y(b) = 0. (0.17)
Such a problem is called a SturmLiouville boundary value problem.
Our example shows that we should prove the following facts about our
SturmLiouville problems:
(i) The SturmLiouville problem has a countable number of eigen-
values E
n
with corresponding eigenfunctions u
n
(x), that is, u
n
(x)
satises the boundary conditions and Lu
n
(x) = E
n
u
n
(x).
(ii) The eigenfunctions u
n
are complete, that is, any nice function u(x)
can be expanded into a generalized Fourier series
u(x) =
n=1
c
n
u
n
(x).
This problem is very similar to the eigenvalue problem of a matrix and
we are looking for a generalization of the well-known fact that every sym-
metric matrix has an orthonormal basis of eigenvectors. However, our linear
operator L is now acting on some space of functions which is not nite
dimensional and it is not at all clear what even orthogonal should mean
for functions. Moreover, since we need to handle innite series, we need
convergence and hence dene the distance of two functions as well.
Hence our program looks as follows:
What is the distance of two functions? This automatically leads
us to the problem of convergence and completeness.
If we additionally require the concept of orthogonality, we are lead
to Hilbert spaces which are the proper setting for our eigenvalue
problem.
Finally, the spectral theorem for compact symmetric operators will
be the solution of our above problem
Problem 0.1. Find conditions for the initial distribution u
0
(x) such that
(0.10) is indeed a solution (i.e., such that interchanging the order of sum-
mation and dierentiation is admissible). (Hint: What is the connection
between smoothness of a function and decay of its Fourier coecients?)
Chapter 1
A rst look at Banach
and Hilbert spaces
1.1. Warm up: Metric and topological spaces
Before we begin, I want to recall some basic facts from metric and topological
spaces. I presume that you are familiar with these topics from your calculus
course. As a general reference I can warmly recommend Kellys classical
book [4].
A metric space is a space X together with a distance function d :
X X R such that
(i) d(x, y) 0,
(ii) d(x, y) = 0 if and only if x = y,
(iii) d(x, y) = d(y, x),
(iv) d(x, z) d(x, y) +d(y, z) (triangle inequality).
If (ii) does not hold, d is called a pseudo-metric. Moreover, it is
straightforward to see the inverse triangle inequality (Problem 1.1)
[d(x, y) d(z, y)[ d(x, z). (1.1)
Example. Euclidean space R
n
together with d(x, y) = (
n
k=1
(x
k
y
k
)
2
)
1/2
is a metric space and so is C
n
together with d(x, y) = (
n
k=1
[x
k
y
k
[
2
)
1/2
.
The set
B
r
(x) = y X[d(x, y) < r (1.2)
is called an open ball around x with radius r > 0. A point x of some set
U is called an interior point of U if U contains some ball around x. If x
7
8 1. A rst look at Banach and Hilbert spaces
is an interior point of U, then U is also called a neighborhood of x. A
point x is called a limit point of U (also accumulation or cluster point)
if (B
r
(x)x) U ,= for every ball around x. Note that a limit point
x need not lie in U, but U must contain points arbitrarily close to x. A
point x is called an isolated point of U if there exists a neighborhood of x
not containing any other points of U. A set which consists only of isolated
points is called a discrete set.
Example. Consider R with the usual metric and let U = (1, 1). Then
every point x U is an interior point of U. The points 1 are limit points
of U.
A set consisting only of interior points is called open. The family of
open sets O satises the properties
(i) , X O,
(ii) O
1
, O
2
O implies O
1
O
2
O,
(iii) O
O implies
O.
That is, O is closed under nite intersections and arbitrary unions.
In general, a space X together with a family of sets O, the open sets,
satisfying (i)(iii) is called a topological space. The notions of interior
point, limit point, and neighborhood carry over to topological spaces if we
replace open ball by open set.
There are usually dierent choices for the topology. Two not too inter-
esting examples are the trivial topology O = , X and the discrete
topology O = P(X) (the powerset of X). Given two topologies O
1
and O
2
on X, O
1
is called weaker (or coarser) than O
2
if and only if O
1
O
2
.
Example. Note that dierent metrics can give rise to the same topology.
For example, we can equip R
n
(or C
n
) with the Euclidean distance d(x, y)
as before or we could also use
d(x, y) =
n
k=1
[x
k
y
k
[. (1.3)
Then
1
n
n
k=1
[x
k
[
_
n
k=1
[x
k
[
2
k=1
[x
k
[ (1.4)
shows B
r/
n
(x)
B
r
(x) B
r
(x), where B,
B are balls computed using d,
d, respectively.
1.1. Warm up: Metric and topological spaces 9
Example. We can always replace a metric d by the bounded metric
d(x, y) =
d(x, y)
1 +d(x, y)
(1.5)
without changing the topology (since the open balls do not change).
Every subspace Y of a topological space X becomes a topological space
of its own if we call O Y open if there is some open set
O X such that
O =
O Y (relative or induced topology).
Example. The set (0, 1] R is not open in the topology of X = R, but it is
open in the relative topology when considered as a subset of Y = [1, 1].
A family of open sets B O is called a base for the topology if for each
x and each neighborhood U(x), there is some set O B with x O U(x).
Since an open set O is a neighborhood of every one of its points, it can be
written as O =
OB
O and we have
Lemma 1.1. If B O is a base for the topology, then every open set can
be written as a union of elements from B.
If there exists a countable base, then X is called second countable.
Example. By construction the open balls B
1/n
(x) are a base for the topol-
ogy in a metric space. In the case of R
n
(or C
n
) it even suces to take balls
with rational center and hence R
n
(as well as C
n
) is second countable.
A topological space is called a Hausdor space if for two dierent
points there are always two disjoint neighborhoods.
Example. Any metric space is a Hausdor space: Given two dierent
points x and y, the balls B
d/2
(x) and B
d/2
(y), where d = d(x, y) > 0, are
disjoint neighborhoods (a pseudo-metric space will not be Hausdor).
The complement of an open set is called a closed set. It follows from
de Morgans rules that the family of closed sets ( satises
(i) , X (,
(ii) C
1
, C
2
( implies C
1
C
2
(,
(iii) C
( implies
(.
That is, closed sets are closed under nite unions and arbitrary intersections.
The smallest closed set containing a given set U is called the closure
U =
CC,UC
C, (1.6)
10 1. A rst look at Banach and Hilbert spaces
and the largest open set contained in a given set U is called the interior
U
=
_
OO,OU
O. (1.7)
It is not hard to see that the closure satises the following axioms (Kura-
towski closure axioms):
(i) = ,
(ii) U U,
(iii) U = U,
(iv) U V = U V .
In fact, one can show that they can equivalently be used to dene the topol-
ogy by observing that the closed sets are precisely those which satisfy A = A.
We can dene interior and limit points as before by replacing the word
ball by open set. Then it is straightforward to check
Lemma 1.2. Let X be a topological space. Then the interior of U is the
set of all interior points of U and the closure of U is the union of U with
all limit points of U.
A sequence (x
n
)
n=1
X is said to converge to some point x X if
d(x, x
n
) 0. We write lim
n
x
n
= x as usual in this case. Clearly the
limit is unique if it exists (this is not true for a pseudo-metric).
Every convergent sequence is a Cauchy sequence; that is, for every
> 0 there is some N N such that
d(x
n
, x
m
) , n, m N. (1.8)
If the converse is also true, that is, if every Cauchy sequence has a limit,
then X is called complete.
Example. Both R
n
and C
n
are complete metric spaces.
Note that in a metric space x is a limit point of U if and only if there
exists a sequence (x
n
)
n=1
Ux with lim
n
x
n
= x. Hence U is closed
if and only if for every convergent sequence the limit is in U. In particular,
Lemma 1.3. A closed subset of a complete metric space is again a complete
metric space.
Note that convergence can also be equivalently formulated in terms of
topological terms: A sequence x
n
converges to x if and only if for every
neighborhood U of x there is some N N such that x
n
U for n N. In
a Hausdor space the limit is unique.
A set U is called dense if its closure is all of X, that is, if U = X. A
metric space is called separable if it contains a countable dense set.
1.1. Warm up: Metric and topological spaces 11
Lemma 1.4. A metric space is separable if and only if it is second countable
as a topological space.
Proof. From every dense set we get a countable base by considering open
balls with rational radii and centers in the dense set. Conversely, from every
countable base we obtain a dense set by choosing on element from each
element of the base.
Lemma 1.5. Let X be a separable metric space. Every subset of X is again
separable.
Proof. Let A = x
n
nN
be a dense set in X. The only problem is that
AY might contain no elements at all. However, some elements of A must
be at least arbitrarily close: Let J N
2
be the set of all pairs (n, m) for
which B
1/m
(x
n
) Y ,= and choose some y
n,m
B
1/m
(x
n
) Y for all
(n, m) J. Then B = y
n,m
(n,m)J
Y is countable. To see that B is
dense, choose y Y . Then there is some sequence x
n
k
with d(x
n
k
, y) < 1/k.
Hence (n
k
, k) J and d(y
n
k
,k
, y) d(y
n
k
,k
, x
n
k
) +d(x
n
k
, y) 2/k 0.
Next we come to functions f : X Y , x f(x). We use the usual
conventions f(U) = f(x)[x U for U X and f
1
(V ) = x[f(x) V
for V Y . The set Ran(f) = f(X) is called the range of f and X is called
the domain of f. A function f is called injective if for each y Y there
is at most one x X with f(x) = y (i.e., f
1
(y) contains at most one
point) and surjective or onto if Ran(f) = Y . A function f which is both
injective and surjective is called bijective.
A function f between metric spaces X and Y is called continuous at a
point x X if for every > 0 we can nd a > 0 such that
d
Y
(f(x), f(y)) if d
X
(x, y) < . (1.9)
If f is continuous at every point, it is called continuous.
Lemma 1.6. Let X be a metric space. The following are equivalent:
(i) f is continuous at x (i.e, (1.9) holds).
(ii) f(x
n
) f(x) whenever x
n
x.
(iii) For every neighborhood V of f(x), f
1
(V ) is a neighborhood of x.
Proof. (i) (ii) is obvious. (ii) (iii): If (iii) does not hold, there is
a neighborhood V of f(x) such that B
(x) , f
1
(V ) for every . Hence
we can choose a sequence x
n
B
1/n
(x) such that f(x
n
) , f
1
(V ). Thus
x
n
x but f(x
n
) , f(x). (iii) (i): Choose V = B
(x) f
1
(V ) for some .
12 1. A rst look at Banach and Hilbert spaces
The last item implies that f is continuous if and only if the inverse
image of every open set is again open (equivalently, the inverse image of
every closed set is closed). If the image of every open set is open, then f
is called open. A bijection f is called a homeomorphism if both f and
its inverse f
1
are continuous. Note that if f is a bijection, then f
1
is
continuous if and only if f is open.
In a topological space, (iii) is used as the denition for continuity. How-
ever, in general (ii) and (iii) will no longer be equivalent unless one uses
generalized sequences, so-called nets, where the index set N is replaced by
arbitrary directed sets.
The support of a function f : X C
n
is the closure of all points x for
which f(x) does not vanish; that is,
supp(f) = x X[f(x) ,= 0. (1.10)
If X and Y are metric spaces, then X Y together with
d((x
1
, y
1
), (x
2
, y
2
)) = d
X
(x
1
, x
2
) +d
Y
(y
1
, y
2
) (1.11)
is a metric space. A sequence (x
n
, y
n
) converges to (x, y) if and only if
x
n
x and y
n
y. In particular, the projections onto the rst (x, y) x,
respectively, onto the second (x, y) y, coordinate are continuous. More-
over, if X and Y are complete, so is X Y .
In particular, by the inverse triangle inequality (1.1),
[d(x
n
, y
n
) d(x, y)[ d(x
n
, x) +d(y
n
, y), (1.12)
we see that d : X X R is continuous.
Example. If we consider R R, we do not get the Euclidean distance of
R
2
unless we modify (1.11) as follows:
d((x
1
, y
1
), (x
2
, y
2
)) =
_
d
X
(x
1
, x
2
)
2
+d
Y
(y
1
, y
2
)
2
. (1.13)
As noted in our previous example, the topology (and thus also conver-
gence/continuity) is independent of this choice.
If X and Y are just topological spaces, the product topology is dened
by calling O X Y open if for every point (x, y) O there are open
neighborhoods U of x and V of y such that U V O. In the case of
metric spaces this clearly agrees with the topology dened via the product
metric (1.11).
A cover of a set Y X is a family of sets U
such that Y
.
A cover is called open if all U
which still
covers Y is called a subcover.
Lemma 1.7 (Lindelof). If X is second countable, then every open cover
has a countable subcover.
1.1. Warm up: Metric and topological spaces 13
Proof. Let U
)
is one for Y .
(ii) Let O
with O
=
O
Y and
XY
is an open cover for X which has a nite subcover. This subcover induces a
nite subcover for Y .
(iii) Let Y X be compact. We show that XY is open. Fix x XY
(if Y = X, there is nothing to do). By the denition of Hausdor, for
every y Y there are disjoint neighborhoods V (y) of y and U
y
(x) of x. By
compactness of Y , there are y
1
, . . . , y
n
such that the V (y
j
) cover Y . But
then U(x) =
n
j=1
U
y
j
(x) is a neighborhood of x which does not intersect
Y .
(iv) Let O
k
U(x, y
k
(x)).
Since nite intersections of open sets are open, U(x)
xX
is an open cover
and there are nitely many points x
j
such that the U(x
j
) cover X. By
construction, the U(x
j
) V (x
j
, y
k
(x
j
)) O
(x
j
,y
k
(x
j
))
cover X Y .
(v) Note that a cover of the union is a cover for each individual set and
the union of the individual subcovers is the subcover we are looking for.
(vi) Follows from (iii) since an intersection of closed sets is closed.
As a consequence we obtain a simple criterion when a continuous func-
tion is a homeomorphism.
Corollary 1.10. Let X and Y be topological spaces with X compact and
Y Hausdor. Then every continuous bijection f : X Y is a homeomor-
phism.
Proof. It suces to show that f maps closed sets to closed sets. By (ii)
every closed set is compact, by (i) its image is also compact, and by (iii)
also closed.
A subset K X is called sequentially compact if every sequence
has a convergent subsequence. In a metric space compact and sequentially
compact are equivalent.
Lemma 1.11. Let X be a metric space. Then a subset is compact if and
only if it is sequentially compact.
Proof. Suppose X is compact and let x
n
be a sequence which has no conver-
gent subsequence. Then K = x
n
has no limit points and is hence compact
by Lemma 1.9 (ii). For every n there is a ball B
n
(x
n
) which contains only
nitely many elements of K. However, nitely many suce to cover K, a
contradiction.
Conversely, suppose X is sequentially compact and let O
be some
open cover which has no nite subcover. For every x X we can choose
some (x) such that if B
r
(x) is the largest ball contained in O
(x)
, then
either r 1 or there is no with B
2r
(x) O
.
Now let y be the limit of some convergent subsequence and x some r
(0, 1) such that B
r
(y) O
(y)
. Then this subsequence must eventually be in
B
r/5
(y), but this is impossible since if d = d(x
n
1
, x
n
2
) is the distance between
two consecutive elements of this subsequence, then B
2d
(x
n
1
) cannot t into
1.1. Warm up: Metric and topological spaces 15
O
(y)
by construction whereas on the other hand B
2d
(x
n
1
) B
4r/5
(a)
O
(y)
.
In a metric space, a set is called bounded if it is contained inside some
ball. Note that compact sets are always bounded (show this!). In R
n
(or
C
n
) the converse also holds.
Theorem 1.12 (HeineBorel). In R
n
(or C
n
) a set is compact if and only
if it is bounded and closed.
Proof. By Lemma 1.9 (ii) and (iii) it suces to show that a closed interval
in I R is compact. Moreover, by Lemma 1.11 it suces to show that
every sequence in I = [a, b] has a convergent subsequence. Let x
n
be our
sequence and divide I = [a,
a+b
2
] [
a+b
2
, b]. Then at least one of these two
intervals, call it I
1
, contains innitely many elements of our sequence. Let
y
1
= x
n
1
be the rst one. Subdivide I
1
and pick y
2
= x
n
2
, with n
2
> n
1
as
before. Proceeding like this, we obtain a Cauchy sequence y
n
(note that by
construction I
n+1
I
n
and hence [y
n
y
m
[
ba
n
for m n).
By Lemma 1.11 this is equivalent to
Theorem 1.13 (BolzanoWeierstra). Every bounded innite subset of R
n
(or C
n
) has at least one limit point.
Combining Theorem 1.12 with Lemma 1.9 (i) we also obtain the ex-
treme value theorem.
Theorem 1.14 (Weierstra). Let K be compact. Every continuous function
f : K R attains its maximum and minimum.
A topological space is called locally compact if every point has a com-
pact neighborhood.
Example. R
n
is locally compact.
The distance between a point x X and a subset Y X is
dist(x, Y ) = inf
yY
d(x, y). (1.14)
Note that x is a limit point of Y if and only if dist(x, Y ) = 0.
Lemma 1.15. Let X be a metric space. Then
[ dist(x, Y ) dist(z, Y )[ d(x, z). (1.15)
In particular, x dist(x, Y ) is continuous.
Proof. Taking the inmum in the triangle inequality d(x, y) d(x, z) +
d(z, y) shows dist(x, Y ) d(x, z)+dist(z, Y ). Hence dist(x, Y )dist(z, Y )
d(x, z). Interchanging x and z shows dist(z, Y ) dist(x, Y ) d(x, z).
16 1. A rst look at Banach and Hilbert spaces
Lemma 1.16 (Urysohn). Suppose C
1
and C
2
are disjoint closed subsets of
a metric space X. Then there is a continuous function f : X [0, 1] such
that f is zero on C
1
and one on C
2
.
If X is locally compact and C
1
is compact, one can choose f with compact
support.
Proof. To prove the rst claim, set f(x) =
dist(x,C
2
)
dist(x,C
1
)+dist(x,C
2
)
. For the
second claim, observe that there is an open set O such that O is compact
and C
1
O O XC
2
. In fact, for every x, there is a ball B
(x) such
that B
(x) XC
2
. Since C
1
is compact, nitely
many of them cover C
1
and we can choose the union of those balls to be O.
Now replace C
2
by XO.
Note that Urysohns lemma implies that a metric space is normal; that
is, for any two disjoint closed sets C
1
and C
2
, there are disjoint open sets
O
1
and O
2
such that C
j
O
j
, j = 1, 2. In fact, choose f as in Urysohns
lemma and set O
1
= f
1
([0, 1/2)), respectively, O
2
= f
1
((1/2, 1]).
Lemma 1.17. Let X be a locally compact metric space. Suppose K is
a compact set and O
j
n
j=1
an open cover. Then there is a partition of
unity for K subordinate to this cover; that is, there are continuous functions
h
j
: X [0, 1] such that h
j
has compact support contained in O
j
and
n
j=1
h
j
(x) 1 (1.16)
with equality for x K.
Proof. For every x K there is some and some j such that B
(x) O
j
.
By compactness of K, nitely many of these balls cover K. Let K
j
be the
union of those balls which lie inside O
j
. By Urysohns lemma there are
functions g
j
: X [0, 1] such that g
j
= 1 on K
j
and g
j
= 0 on XO
j
. Now
set
h
j
= g
j
j1
k=1
(1 g
k
).
Then h
j
: X [0, 1] has compact support contained in O
j
and
n
j=1
h
j
(x) = 1
n
j=1
(1 g
j
(x))
shows that the sum is one for x K, since x K
j
for some j implies
g
j
(x) = 1 and causes the product to vanish.
Problem 1.1. Show that [d(x, y) d(z, y)[ d(x, z).
1.2. The Banach space of continuous functions 17
Problem 1.2. Show the quadrangle inequality [d(x, y) d(x
, y
)[
d(x, x
) +d(y, y
).
Problem 1.3. Let X be some space together with a sequence of distance
functions d
n
, n N. Show that
d(x, y) =
n=1
1
2
n
d
n
(x, y)
1 +d
n
(x, y)
is again a distance function. (Hint: Note that f(x) = x/(1 + x) is mono-
tone.)
Problem 1.4. Show that the closure satises the Kuratowski closure axioms.
Problem 1.5. Show that the closure and interior operators are dual in the
sense that
XA = (XA)
and XA
= (XA).
(Hint: De Morgans laws.)
Problem 1.6. Let U V be subsets of a metric space X. Show that if U
is dense in V and V is dense in X, then U is dense in X.
Problem 1.7. Show that every open set O R can be written as a countable
union of disjoint intervals. (Hint: Let I
= max
xI
[f(x) g(x)[. (1.17)
It is not hard to see that with this denition C(I) becomes a normed linear
space:
A normed linear space X is a vector space X over C (or R) with a
nonnegative function (the norm) |.| such that
|f| > 0 for f ,= 0 (positive deniteness),
|f| = [[ |f| for all C, f X (positive homogeneity),
and
18 1. A rst look at Banach and Hilbert spaces
|f +g| |f| +|g| for all f, g X (triangle inequality).
If positive deniteness is dropped from the requirements one calles |.|
a seminorm.
From the triangle inequality we also get the inverse triangle inequal-
ity (Problem 1.8)
[|f| |g|[ |f g|. (1.18)
Once we have a norm, we have a distance d(f, g) = |fg| and hence we
know when a sequence of vectors f
n
converges to a vector f. We will write
f
n
f or lim
n
f
n
= f, as usual, in this case. Moreover, a mapping
F : X Y between two normed spaces is called continuous if f
n
f
implies F(f
n
) F(f). In fact, it is not hard to see that the norm, vector
addition, and multiplication by scalars are continuous (Problem 1.9).
In addition to the concept of convergence we have also the concept of
a Cauchy sequence and hence the concept of completeness: A normed
space is called complete if every Cauchy sequence has a limit. A complete
normed space is called a Banach space.
Example. The space
1
(N) of all complex-valued sequences x = (x
j
)
j=1
for
which the norm
|x|
1
=
j=1
[a
j
[ (1.19)
is nite is a Banach space.
To show this, we need to verify three things: (i)
1
(N) is a vector space
that is closed under addition and scalar multiplication, (ii) |.|
1
satises the
three requirements for a norm, and (iii)
1
(N) is complete.
First of all observe
k
j=1
[x
j
+y
j
[
k
j=1
[x
j
[ +
k
j=1
[y
j
[ |x|
1
+|y|
1
(1.20)
for every nite k. Letting k , we conclude that
1
(N) is closed under
addition and that the triangle inequality holds. That
1
(N) is closed under
scalar multiplication and the two other properties of a norm are straight-
forward. It remains to show that
1
(N) is complete. Let x
n
= (x
n
j
)
j=1
be
a Cauchy sequence; that is, for given > 0 we can nd an N
such that
|x
m
x
n
|
1
for m, n N
j=1
[x
m
j
x
n
j
[ (1.21)
1.2. The Banach space of continuous functions 19
and take m :
k
j=1
[x
j
x
n
j
[ . (1.22)
Since this holds for all nite k, we even have |xx
n
|
1
. Hence (xx
n
)
1
(N) and since x
n
1
(N), we also conclude x = x
n
+ (x x
n
)
1
(N).
Example. The previous example can be generalized by considering the
space
p
(N) of all complex-valued sequences x = (x
j
)
j=1
for which the norm
|x|
p
=
_
_
j=1
[x
j
[
p
_
_
1/p
, p [1, ), (1.23)
is nite. By [x
j
+y
j
[
p
2
p
max([x
j
[, [y
j
[)
p
= 2
p
max([x
j
[
p
, [y
j
[
p
) 2
p
([x
j
[
p
+
[y
j
[
p
) it is a vector space, but the triangle inequality is only easy to see in
the case p = 1.
To prove it we need the elementary inequality (Problem 1.12)
a
1/p
b
1/q
1
p
a +
1
q
b,
1
p
+
1
q
= 1, a, b 0, (1.24)
which implies Holders inequality
|xy|
1
|x|
p
|y|
q
(1.25)
for x
p
(N), y
q
(N). In fact, by homogeneity of the norm it suces to
prove the case |x|
p
= |y|
q
= 1. But this case follows by choosing a = [x
j
[
p
and b = [y
j
[
q
in (1.24) and summing over all j.
Now using [x
j
+y
j
[
p
[x
j
[ [x
j
+y
j
[
p1
+[y
j
[ [x
j
+y
j
[
p1
, we obtain from
Holders inequality (note (p 1)q = p)
|x +y|
p
p
|x|
p
|(x +y)
p1
|
q
+|y|
p
|(x +y)
p1
|
q
= (|x|
p
+|y|
p
)|(x +y)|
p1
p
.
Hence
p
is a normed space. That it is complete can be shown as in the case
p = 1 (Problem 1.13).
Example. The space
j=1
together with the norm
|x|
= sup
jN
[x
j
[ (1.26)
is a Banach space (Problem 1.14). Note that with this denition Holders
inequality (1.25) remains true for the cases p = 1, q = and p = , q = 1.
The reason for the notation is explained in Problem 1.16.
20 1. A rst look at Banach and Hilbert spaces
Now what about convergence in the space C(I)? A sequence of functions
f
n
(x) converges to f if and only if
lim
n
|f f
n
|
= lim
n
sup
xI
[f
n
(x) f(x)[ = 0. (1.27)
That is, in the language of real analysis, f
n
converges uniformly to f. Now
let us look at the case where f
n
is only a Cauchy sequence. Then f
n
(x)
is clearly a Cauchy sequence of real numbers for every xed x I. In
particular, by completeness of C, there is a limit f(x) for each x. Thus we
get a limiting function f(x). Moreover, letting m in
[f
m
(x) f
n
(x)[ m, n > N
, x I, (1.28)
we see
[f(x) f
n
(x)[ n > N
, x I; (1.29)
that is, f
n
(x) converges uniformly to f(x). However, up to this point we
do not know whether it is in our vector space C(I) or not, that is, whether
it is continuous or not. Fortunately, there is a well-known result from real
analysis which tells us that the uniform limit of continuous functions is again
continuous: Fix x I and > 0. To show that f is continuous we need
to nd a such that [x y[ < implies [f(x) f(y)[ < . Pick n so that
|f
n
f|
3
+
3
=
as required. Hence f(x) C(I) and thus every Cauchy sequence in C(I)
converges. Or, in other words
Theorem 1.18. C(I) with the maximum norm is a Banach space.
Next we want to look at countable bases. To this end we introduce a few
denitions rst.
The set of all nite linear combinations of a set of vectors u
n
X is
called the span of u
n
and denoted by spanu
n
. A set of vectors u
n
X
is called linearly independent if every nite subset is. If u
n
N
n=1
X,
N N , is countable, we can throw away all elements which can be
expressed as linear combinations of the previous ones to obtain a subset of
linearly independent vectors which have the same span.
We will call a countable set of linearly independent vectors u
n
N
n=1
X
a Schauder basis if every element f X can be uniquely written as a
countable linear combination of the basis elements:
f =
N
n=1
c
n
u
n
, c
n
= c
n
(f) C, (1.30)
1.2. The Banach space of continuous functions 21
where the sum has to be understood as a limit if N = (the sum is not
required to converge unconditionally). Since we have assumed the set to be
linearly independent, the coecients c
n
(f) are uniquely determined.
Example. The set of vectors
n
, with
n
n
= 1 and
n
m
= 0, n ,= m, is a
Schauder Basis for the Banach space
1
(N).
Let x = (x
j
)
j=1
1
(N) be given and set x
n
=
n
j=1
x
j
j
. Then
|x x
n
|
1
=
j=n+1
[x
j
[ 0
since x
n
j
= x
j
for 1 j n and x
n
j
= 0 for j > n. Hence
x =
j=1
x
j
j
and
n
n=1
is a Schauder basis (linear independence is left as an exer-
cise).
A set whose span is dense is called total and if we have a countable total
set, we also have a countable dense set (consider only linear combinations
with rational coecients show this). A normed linear space containing a
countable dense set is called separable.
Example. Every Schauder basis is total and thus every Banach space with
a Schauder basis is separable (the converse is not true). In particular, the
Banach space
1
(N) is separable.
While we will not give a Schauder basis for C(I), we will at least show
that it is separable. In order to prove this we need a lemma rst.
Lemma 1.19 (Smoothing). Let u
n
(x) be a sequence of nonnegative contin-
uous functions on [1, 1] such that
_
|x|1
u
n
(x)dx = 1 and
_
|x|1
u
n
(x)dx 0, > 0. (1.31)
(In other words, u
n
has mass one and concentrates near x = 0 as n .)
Then for every f C[
1
2
,
1
2
] which vanishes at the endpoints, f(
1
2
) =
f(
1
2
) = 0, we have that
f
n
(x) =
_
1/2
1/2
u
n
(x y)f(y)dy (1.32)
converges uniformly to f(x).
Proof. Since f is uniformly continuous, for given we can nd a <
1/2 (independent of x) such that [f(x) f(y)[ whenever [x y[ .
22 1. A rst look at Banach and Hilbert spaces
Moreover, we can choose n such that
_
|y|1
u
n
(y)dy . Now abbreviate
M = max
x[1/2,1/2]
1, [f(x)[ and note
[f(x)
_
1/2
1/2
u
n
(x y)f(x)dy[ = [f(x)[ [1
_
1/2
1/2
u
n
(x y)dy[ M.
In fact, either the distance of x to one of the boundary points
1
2
is smaller
than and hence [f(x)[ or otherwise [, ] [x1/2, x+1/2] and the
dierence between one and the integral is smaller than .
Using this, we have
[f
n
(x) f(x)[
_
1/2
1/2
u
n
(x y)[f(y) f(x)[dy +M
=
_
|y|1/2,|xy|
u
n
(x y)[f(y) f(x)[dy
+
_
|y|1/2,|xy|
u
n
(x y)[f(y) f(x)[dy +M
+ 2M +M = (1 + 3M), (1.33)
which proves the claim.
Note that f
n
will be as smooth as u
n
, hence the title smoothing lemma.
Moreover, f
n
will be a polynomial if u
n
is. The same idea is used to approx-
imate noncontinuous functions by smooth ones (of course the convergence
will no longer be uniform in this case).
Now we are ready to show:
Theorem 1.20 (Weierstra). Let I be a compact interval. Then the set of
polynomials is dense in C(I).
Proof. Let f(x) C(I) be given. By considering f(x)f(a)
f(b)f(a)
ba
(x
a) it is no loss to assume that f vanishes at the boundary points. Moreover,
without restriction we only consider I = [
1
2
,
1
2
] (why?).
Now the claim follows from Lemma 1.19 using
u
n
(x) =
1
I
n
(1 x
2
)
n
,
where
I
n
=
_
1
1
(1 x
2
)
n
dx =
n
n + 1
_
1
1
(1 x)
n1
(1 +x)
n+1
dx
= =
n!
(n + 1) (2n + 1)
2
2n+1
=
(n!)
2
2
2n+1
(2n + 1)!
=
n!
1
2
(
1
2
+ 1) (
1
2
+n)
.
1.2. The Banach space of continuous functions 23
Indeed, the rst part of (1.31) holds by construction and the second part
follows from the elementary estimate
2
2n + 1
I
n
< 2.
j=1
|f
j
| <
implies that
j=1
f
j
= lim
n
n
j=1
f
j
exists. The series is called absolutely convergent in this case.
Problem 1.11. While
1
(N) is separable, it still has room for an uncount-
able set of linearly independent vectors. Show this by considering vectors of
the form
x
a
= (1, a, a
2
, . . . ), a (0, 1).
(Hint: Take n such vectors and cut them o after n + 1 terms. If the cut
o vectors are linearly independent, so are the original ones. Recall the
Vandermonde determinant.)
Problem 1.12. Prove (1.24). (Hint: Take logarithms on both sides.)
Problem 1.13. Show that
p
(N) is a separable Banach space.
Problem 1.14. Show that
p
(N) for p p
0
and
lim
p
|x|
p
= |x|
.
24 1. A rst look at Banach and Hilbert spaces
1.3. The geometry of Hilbert spaces
So it looks like C(I) has all the properties we want. However, there is
still one thing missing: How should we dene orthogonality in C(I)? In
Euclidean space, two vectors are called orthogonal if their scalar product
vanishes, so we would need a scalar product:
Suppose H is a vector space. A map ., .. : H H C is called a
sesquilinear form if it is conjugate linear in the rst argument and linear
in the second; that is,
1
f
1
+
2
f
2
, g =
1
f
1
, g +
2
f
2
, g,
f,
1
g
1
+
2
g
2
=
1
f, g
1
+
2
f, g
2
,
1
,
2
C, (1.34)
where denotes complex conjugation. A sesquilinear form satisfying the
requirements
(i) f, f > 0 for f ,= 0 (positive denite),
(ii) f, g = g, f
(symmetry)
is called an inner product or scalar product. Associated with every
scalar product is a norm
|f| =
_
f, f. (1.35)
Only the triangle inequality is nontrivial. It will follow from the Cauchy
Schwarz inequality below. Until then, just regard (1.35) as a convenient
short hand notation.
The pair (H, ., ..) is called an inner product space. If H is complete
(with respect to the norm (1.35)), it is called a Hilbert space.
Example. Clearly C
n
with the usual scalar product
x, y =
n
j=1
x
j
y
j
(1.36)
is a (nite dimensional) Hilbert space.
Example. A somewhat more interesting example is the Hilbert space
2
(N),
that is, the set of all complex-valued sequences
_
(x
j
)
j=1
j=1
[x
j
[
2
<
_
(1.37)
with scalar product
x, y =
j=1
x
j
y
j
. (1.38)
(Show that this is in fact a separable Hilbert space Problem 1.13.)
1.3. The geometry of Hilbert spaces 25
A vector f H is called normalized or a unit vector if |f| = 1.
Two vectors f, g H are called orthogonal or perpendicular (f g) if
f, g = 0 and parallel if one is a multiple of the other.
If f and g are orthogonal, we have the Pythagorean theorem:
|f +g|
2
= |f|
2
+|g|
2
, f g, (1.39)
which is one line of computation (do it!).
Suppose u is a unit vector. Then the projection of f in the direction of
u is given by
f
= u, fu (1.40)
and f
dened via
f
= f u, fu (1.41)
is perpendicular to u since u, f
= u, f u, fu = u, f u, fu, u =
0.
f
f
I
f
f
f
f
fw
+ (f
u)|
2
= |f
|
2
+[u, f [
2
(1.42)
and hence f
|
2
.
Note that the CauchySchwarz inequality entails that the scalar product
is continuous in both variables; that is, if f
n
f and g
n
g, we have
f
n
, g
n
f, g.
26 1. A rst look at Banach and Hilbert spaces
As another consequence we infer that the map |.| is indeed a norm. In
fact,
|f +g|
2
= |f|
2
+f, g +g, f +|g|
2
(|f| +|g|)
2
. (1.44)
But let us return to C(I). Can we nd a scalar product which has the
maximum norm as associated norm? Unfortunately the answer is no! The
reason is that the maximum norm does not satisfy the parallelogram law
(Problem 1.19).
Theorem 1.23 (Jordanvon Neumann). A norm is associated with a scalar
product if and only if the parallelogram law
|f +g|
2
+|f g|
2
= 2|f|
2
+ 2|g|
2
(1.45)
holds.
In this case the scalar product can be recovered from its norm by virtue
of the polarization identity
f, g =
1
4
_
|f +g|
2
|f g|
2
+ i|f ig|
2
i|f + ig|
2
_
. (1.46)
Proof. If an inner product space is given, verication of the parallelogram
law and the polarization identity is straightforward (Problem 1.21).
To show the converse, we dene
s(f, g) =
1
4
_
|f +g|
2
|f g|
2
+ i|f ig|
2
i|f + ig|
2
_
.
Then s(f, f) = |f|
2
and s(f, g) = s(g, f)
(x)g(x)dx. (1.47)
1.3. The geometry of Hilbert spaces 27
The corresponding inner product space is denoted by /
2
cont
(I). Note that
we have
|f|
_
[b a[|f|
(1.48)
and hence the maximum norm is stronger than the /
2
cont
norm.
Suppose we have two norms |.|
1
and |.|
2
on a vector space X. Then
|.|
2
is said to be stronger than |.|
1
if there is a constant m > 0 such that
|f|
1
m|f|
2
. (1.49)
It is straightforward to check the following.
Lemma 1.24. If |.|
2
is stronger than |.|
1
, then every |.|
2
Cauchy sequence
is also a |.|
1
Cauchy sequence.
Hence if a function F : X Y is continuous in (X, |.|
1
), it is also
continuous in (X, |.|
2
) and if a set is dense in (X, |.|
2
), it is also dense in
(X, |.|
1
).
In particular, /
2
cont
is separable. But is it also complete? Unfortunately
the answer is no:
Example. Take I = [0, 2] and dene
f
n
(x) =
_
_
0, 0 x 1
1
n
,
1 +n(x 1), 1
1
n
x 1,
1, 1 x 2.
(1.50)
Then f
n
(x) is a Cauchy sequence in /
2
cont
, but there is no limit in /
2
cont
!
Clearly the limit should be the step function which is 0 for 0 x < 1 and 1
for 1 x 2, but this step function is discontinuous (Problem 1.24)!
This shows that in innite dimensional vector spaces dierent norms
will give rise to dierent convergent sequences! In fact, the key to solving
problems in innite dimensional spaces is often nding the right norm! This
is something which cannot happen in the nite dimensional case.
Theorem 1.25. If X is a nite dimensional vector space, then all norms
are equivalent. That is, for any two given norms |.|
1
and |.|
2
, there are
positive constants m
1
and m
2
such that
1
m
2
|f|
1
|f|
2
m
1
|f|
1
. (1.51)
Proof. Since equivalence of norms is an equivalence relation (check this!) we
can assume that |.|
2
is the usual Euclidean norm. Moreover, we can choose
an orthogonal basis u
j
, 1 j n, such that |
j
j
u
j
|
2
2
=
j
[
j
[
2
. Let
28 1. A rst look at Banach and Hilbert spaces
f =
j
j
u
j
. Then by the triangle and CauchySchwarz inequalities
|f|
1
j
[
j
[|u
j
|
1
j
|u
j
|
2
1
|f|
2
and we can choose m
2
=
_
j
|u
j
|
1
.
In particular, if f
n
is convergent with respect to |.|
2
, it is also convergent
with respect to |.|
1
. Thus |.|
1
is continuous with respect to |.|
2
and attains
its minimum m > 0 on the unit sphere (which is compact by the Heine-Borel
theorem, Theorem 1.12). Now choose m
1
= 1/m.
Problem 1.17. Show that the norm in a Hilbert space satises |f + g| =
|f| +|g| if and only if f = g, 0, or g = 0.
Problem 1.18 (Generalized parallelogram law). Show that in a Hilbert
space
1j<kn
|x
j
x
k
|
2
+|
1jn
x
j
|
2
= n
1jn
|x
j
|
2
.
The case n = 2 is (1.45).
Problem 1.19. Show that the maximum norm on C[0, 1] does not satisfy
the parallelogram law.
Problem 1.20. In a Banach space the unit ball is convex by the triangle
inequality. A Banach space X is called uniformly convex if for every
> 0 there is some such that |x| 1, |y| 1, and |
x+y
2
| 1 imply
|x y| .
Geometrically this implies that if the average of two vectors inside the
closed unit ball is close to the boundary, then they must be close to each
other.
Show that a Hilbert space is uniformly convex and that one can choose
() = 1
_
1
2
4
. Draw the unit ball for R
2
for the norms |x|
1
=
[x
1
[ + [x
2
[, |x|
2
=
_
[x
1
[
2
+[x
2
[
2
, and |x|
= max([x
1
[, [x
2
[). Which of
these norms makes R
2
uniformly convex?
(Hint: For the rst part use the parallelogram law.)
Problem 1.21. Suppose Q is a vector space. Let s(f, g) be a sesquilinear
form on Q and q(f) = s(f, f) the associated quadratic form. Prove the
parallelogram law
q(f +g) +q(f g) = 2q(f) + 2q(g) (1.52)
and the polarization identity
s(f, g) =
1
4
(q(f +g) q(f g) + i q(f ig) i q(f + ig)) . (1.53)
1.4. Completeness 29
Conversely, show that every quadratic form q(f) : Q R satisfying
q(f) = [[
2
q(f) and the parallelogram law gives rise to a sesquilinear form
via the polarization identity.
Show that s(f, g) is symmetric if and only if q(f) is real-valued.
Problem 1.22. A sesquilinear form is called bounded if
|s| = sup
f=g=1
[s(f, g)[
is nite. Similarly, the associated quadratic form q is bounded if
|q| = sup
f=1
[q(f)[
is nite. Show
|q| |s| 2|q|.
(Hint: Use the parallelogram law and the polarization identity from the pre-
vious problem.)
Problem 1.23. Suppose Q is a vector space. Let s(f, g) be a sesquilinear
form on Q and q(f) = s(f, f) the associated quadratic form. Show that the
CauchySchwarz inequality
[s(f, g)[ q(f)
1/2
q(g)
1/2
(1.54)
holds if q(f) 0.
(Hint: Consider 0 q(f + g) = q(f) + 2Re(s(f, g)) + [[
2
q(g) and
choose = t s(f, g)
n=1
can be dened by
|(x
n
)
n=1
| = lim
n
|x
n
| and is independent of the equivalence class (show
this!). Thus
X is a normed space (
X is not! Why?).
30 1. A rst look at Banach and Hilbert spaces
Theorem 1.27.
X is a Banach space containing X as a dense subspace if
we identify x X with the equivalence class of all sequences converging to
x.
Proof. (Outline) It remains to show that
X is complete. Let
n
= [(x
n,j
)
j=1
]
be a Cauchy sequence in
X. Then it is not hard to see that = [(x
j,j
)
j=1
]
is its limit.
Let me remark that the completion
X is unique. More precisely every
other complete space which contains X as a dense subset is isomorphic to
X. This can for example be seen by showing that the identity map on X
has a unique extension to
X (compare Theorem 1.29 below).
In particular it is no restriction to assume that a normed linear space
or an inner product space is complete. However, in the important case
of /
2
cont
it is somewhat inconvenient to work with equivalence classes of
Cauchy sequences and hence we will give a dierent characterization using
the Lebesgue integral later.
1.5. Bounded operators
A linear map A between two normed spaces X and Y will be called a (lin-
ear) operator
A : D(A) X Y. (1.55)
The linear subspace D(A) on which A is dened is called the domain of A
and is usually required to be dense. The kernel (also null space)
Ker(A) = f D(A)[Af = 0 X (1.56)
and range
Ran(A) = Af[f D(A) = AD(A) Y (1.57)
are dened as usual. The operator A is called bounded if the operator
norm
|A| = sup
fD(A),f
X
=1
|Af|
Y
(1.58)
is nite.
By construction, a bounded operator is Lipschitz continuous,
|Af|
Y
|A||f|
X
, f D(A), (1.59)
and hence continuous. The converse is also true
Theorem 1.28. An operator A is bounded if and only if it is continuous.
Proof. Suppose A is continuous but not bounded. Then there is a sequence
of unit vectors u
n
such that |Au
n
| n. Then f
n
=
1
n
u
n
converges to 0 but
|Af
n
| 1 does not converge to 0.
1.5. Bounded operators 31
In particular, if X is nite dimensional, then every operator is bounded.
Note that in general one and the same operation might be bounded (i.e.
continuous) or unbounded, depending on the norm chosen.
Example. Consider the vector space of dierentiable functions X = C
1
[0, 1]
and equip it with the norm (cf. Problem 1.27)
|f|
,1
= max
x[0,1]
[f(x)[ + max
x[0,1]
[f
(x)[
Let Y = C[0, 1] and observe that the dierential operator A =
d
dx
: X Y
is bounded since
|Af|
= max
x[0,1]
[f
(x)[ max
x[0,1]
[f(x)[ + max
x[0,1]
[f
(x)[ = |f|
,1
.
However, if we consider A =
d
dx
: D(A) Y Y dened on D(A) =
C
1
[0, 1], then we have an unbounded operator. Indeed, choose
u
n
(x) = sin(nx)
which is normalized, |u
n
|
n
(x) = n cos(nx)
is unbounded, |Au
n
|
= L(X, C) is called
the dual space of X.
Theorem 1.30. The space L(X, Y ) together with the operator norm (1.58)
is a normed space. It is a Banach space if Y is.
Proof. That (1.58) is indeed a norm is straightforward. If Y is complete and
A
n
is a Cauchy sequence of operators, then A
n
f converges to an element
g for every f. Dene a new operator A via Af = g. By continuity of
the vector operations, A is linear and by continuity of the norm |Af| =
lim
n
|A
n
f| (lim
n
|A
n
|)|f|, it is bounded. Furthermore, given
> 0 there is some N such that |A
n
A
m
| for n, m N and thus
|A
n
f A
m
f| |f|. Taking the limit m , we see |A
n
f Af| |f|;
that is, A
n
A.
The Banach space of bounded linear operators L(X) even has a multi-
plication given by composition. Clearly this multiplication satises
(A+B)C = AC +BC, A(B +C) = AB +BC, A, B, C L(X) (1.60)
and
(AB)C = A(BC), (AB) = (A)B = A(B), C. (1.61)
Moreover, it is easy to see that we have
|AB| |A||B|. (1.62)
In other words, L(X) is a so-called Banach algebra. However, note that
our multiplication is not commutative (unless X is one-dimensional). We
even have an identity, the identity operator I satisfying |I| = 1.
Problem 1.25. Consider X = C
n
and let A : X X be a matrix. Equip
X with the norm (show that this is a norm)
|x|
= max
1jn
[x
j
[
and compute the operator norm |A| with respect to this matrix in terms of
the matrix entries. Do the same with respect to the norm
|x|
1
=
1jn
[x
j
[.
Problem 1.26. Show that the integral operator
(Kf)(x) =
_
1
0
K(x, y)f(y)dy,
where K(x, y) C([0, 1] [0, 1]), dened on D(K) = C[0, 1] is a bounded
operator both in X = C[0, 1] (max norm) and X = /
2
cont
(0, 1).
1.6. Sums and quotients of Banach spaces 33
Problem 1.27. Show that the set of dierentiable functions C
1
(I) becomes
a Banach space if we set |f|
,1
= max
xI
[f(x)[ + max
xI
[f
(x)[.
Problem 1.28. Show that |AB| |A||B| for every A, B L(X). Con-
clude that the multiplication is continuous: A
n
A and B
n
B imply
A
n
B
n
AB.
Problem 1.29. Let A L(X) be a bijection. Show
|A
1
|
1
= inf
fX,f=1
|Af|.
Problem 1.30. Let
f(z) =
j=0
f
j
z
j
, [z[ < R,
be a convergent power series with convergence radius R > 0. Suppose A is
a bounded operator with |A| < R. Show that
f(A) =
j=0
f
j
A
j
exists and denes a bounded linear operator (cf. Problem 1.10).
1.6. Sums and quotients of Banach spaces
Given two Banach spaces X
1
and X
2
we can dene their (direct) sum
X = X
1
X
2
as the cartesian product X
1
X
2
together with the norm
|(x
1
, x
2
)| = |x
1
| + |x
2
|. In fact, since all norms on R
2
are equivalent
(Theorem 1.25), we could as well take |(x
1
, x
2
)| = (|x
1
|
p
+ |x
2
|
p
)
1/p
or
|(x
1
, x
2
)| = max(|x
1
|, |x
2
|). In particular, in the case of Hilbert spaces
the choice p = 2 will ensure that X is again a Hilbert space. Note that X
1
and X
2
can be regarded as subspaces of X
1
X
2
by virtue of the obvious
embeddings x
1
(x
1
, 0) and x
2
(0, x
2
). It is straightforward to show
that X is again a Banach space and to generalize this concept to nitely
many spaces (Problem 1.31).
Moreover, given a closed subspace M of a Banach space X we can dene
the quotient space X/M as the set of all equivalence classes [x] = x +
M with respect to the equivalence relation x y if x y M. It is
straightforward to see that X/M is a vector space when dening [x] +[y] =
[x + y] and [x] = [x] (show that these denitions are independent of the
representative of the equivalence class).
Lemma 1.31. Let M be a closed subspace of a Banach space X. Then
X/M together with the norm
|[x]| = inf
yM
|x +y|. (1.63)
34 1. A rst look at Banach and Hilbert spaces
is a Banach space.
Proof. First of all we need to show that (1.63) is indeed a norm. If |[x]| = 0
we must have a sequence x
j
M with x
j
x and since M is closed we
conclude x M, that is [x] = [0] as required. To see |[x]| = [[|[x]| we
use again the denition
|[x]| = |[x]| = inf
yM
|x +y| = inf
yM
|x +y|
= [[ inf
yM
|x +y| = [[|[x]|.
The triangle inequality follows with a similar argument and is left as an
exercise.
Thus (1.63) is a norm and it remains to show that X/M is complete. To
this end let [x
n
] be a Cauchy sequence. Since it suces to show that some
subsequence has a limit, we can assume |[x
n+1
][x
n
]| < 2
n
without loss of
generality. Moroever, by denition of (1.63) we can chose the representatives
x
n
such that |x
n+1
x
n
| < 2
n
(start with x
1
and then chose the remaining
ones inductively). By construction x
n
is a Cauchy sequence which has a limit
x X since X is complete. Moreover, it is straightforward to check that [x]
is the limit of [x
n
].
Problem 1.31. Let X
j
, j = 1, . . . , n be Banach spaces. Let X be the
cartesian product X
1
X
n
together with the norm
|(x
1
, . . . , x
n
)|
p
=
_
_
_
_
n
j=1
|x
j
|
p
_
1/p
, 1 p < ,
max
j=1,...,n
|x
j
|, p = .
Show that X is a Banach space. Show that all norms are equivalent.
Problem 1.32. Suppose A L(X, Y ). Show that Ker(A) is closed. Show
that A is well dened on X/ Ker(A) and that this new operator is again
bounded (with the same norm) and injective.
Chapter 2
Hilbert spaces
2.1. Orthonormal bases
In this section we will investigate orthonormal series and you will notice
hardly any dierence between the nite and innite dimensional cases. Through-
out this chapter H will be a Hilbert space.
As our rst task, let us generalize the projection into the direction of
one vector:
A set of vectors u
j
is called an orthonormal set if u
j
, u
k
= 0
for j ,= k and u
j
, u
j
= 1. Note that every orthonormal set is linearly
independent (show this).
Lemma 2.1. Suppose u
j
n
j=1
is an orthonormal set. Then every f H
can be written as
f = f
+f
, f
=
n
j=1
u
j
, fu
j
, (2.1)
where f
and f
= 0 for all 1 j n.
In particular,
|f|
2
=
n
j=1
[u
j
, f[
2
+|f
|
2
. (2.2)
Moreover, every
f in the span of u
j
n
j=1
satises
|f
f| |f
| (2.3)
with equality holding if and only if
f = f
. In other words, f
is uniquely
characterized as the vector in the span of u
j
n
j=1
closest to f.
35
36 2. Hilbert spaces
Proof. A straightforward calculation shows u
j
, f f
= 0 and hence f
and f
= f f
f =
n
j=1
j
u
j
in the span of u
j
n
j=1
. Then one computes
|f
f|
2
= |f
+f
f|
2
= |f
|
2
+|f
f|
2
= |f
|
2
+
n
j=1
[
j
u
j
, f[
2
from which the last claim follows.
From (2.2) we obtain Bessels inequality
n
j=1
[u
j
, f[
2
|f|
2
(2.4)
with equality holding if and only if f lies in the span of u
j
n
j=1
.
Of course, since we cannot assume H to be a nite dimensional vec-
tor space, we need to generalize Lemma 2.1 to arbitrary orthonormal sets
u
j
jJ
. We start by assuming that J is countable. Then Bessels inequality
(2.4) shows that
jJ
[u
j
, f[
2
(2.5)
converges absolutely. Moreover, for any nite subset K J we have
|
jK
u
j
, fu
j
|
2
=
jK
[u
j
, f[
2
(2.6)
by the Pythagorean theorem and thus
jJ
u
j
, fu
j
is a Cauchy sequence
if and only if
jJ
[u
j
, f[
2
is. Now let J be arbitrary. Again, Bessels
inequality shows that for any given > 0 there are at most nitely many
j for which [u
j
, f[ (namely at most |f|/). Hence there are at most
countably many j for which [u
j
, f[ > 0. Thus it follows that
jJ
[u
j
, f[
2
(2.7)
is well-dened and (by completeness) so is
jJ
u
j
, fu
j
. (2.8)
Furthermore, it is also independent of the order of summation.
2.1. Orthonormal bases 37
In particular, by continuity of the scalar product we see that Lemma 2.1
can be generalized to arbitrary orthonormal sets.
Theorem 2.2. Suppose u
j
jJ
is an orthonormal set in a Hilbert space
H. Then every f H can be written as
f = f
+f
, f
jJ
u
j
, fu
j
, (2.9)
where f
and f
= 0 for all j J. In
particular,
|f|
2
=
jJ
[u
j
, f[
2
+|f
|
2
. (2.10)
Furthermore, every
f spanu
j
jJ
satises
|f
f| |f
| (2.11)
with equality holding if and only if
f = f
. In other words, f
is uniquely
characterized as the vector in spanu
j
jJ
closest to f.
Proof. The rst part follows as in Lemma 2.1 using continuity of the scalar
product. The same is true for the last part except for the fact that every
f spanu
j
jJ
can be written as f =
jJ
j
u
j
(i.e., f = f
). To see this
let f
n
spanu
j
jJ
converge to f. Then |ff
n
|
2
= |f
f
n
|
2
+|f
|
2
0
implies f
n
f
and f
= 0.
Note that from Bessels inequality (which of course still holds) it follows
that the map f f
is continuous.
Of course we are particularly interested in the case where every f H
can be written as
jJ
u
j
, fu
j
. In this case we will call the orthonormal
set u
j
jJ
an orthonormal basis (ONB).
If H is separable it is easy to construct an orthonormal basis. In fact,
if H is separable, then there exists a countable total set f
j
N
j=1
. Here
N N if H is nite dimensional and N = otherwise. After throwing
away some vectors, we can assume that f
n+1
cannot be expressed as a linear
combination of the vectors f
1
, . . . , f
n
. Now we can construct an orthonormal
set as follows: We begin by normalizing f
1
:
u
1
=
f
1
|f
1
|
. (2.12)
Next we take f
2
and remove the component parallel to u
1
and normalize
again:
u
2
=
f
2
u
1
, f
2
u
1
|f
2
u
1
, f
2
u
1
|
. (2.13)
38 2. Hilbert spaces
Proceeding like this, we dene recursively
u
n
=
f
n
n1
j=1
u
j
, f
n
u
j
|f
n
n1
j=1
u
j
, f
n
u
j
|
. (2.14)
This procedure is known as GramSchmidt orthogonalization. Hence
we obtain an orthonormal set u
j
N
j=1
such that spanu
j
n
j=1
= spanf
j
n
j=1
for any nite n and thus also for n = N (if N = ). Since f
j
N
j=1
is total,
so is u
j
N
j=1
. Now suppose there is some f = f
+f
H for which f
,= 0.
Since u
j
N
j=1
is total, we can nd a
f in its span, such that |f
f| < |f
|
contradicting (2.11). Hence we infer that u
j
N
j=1
is an orthonormal basis.
Theorem 2.3. Every separable Hilbert space has a countable orthonormal
basis.
Example. In /
2
cont
(1, 1) we can orthogonalize the polynomial f
n
(x) = x
n
(which are total by the Weierstra approximation theorem Theorem 1.20)
The resulting polynomials are up to a normalization equal to the Legendre
polynomials
P
0
(x) = 1, P
1
(x) = x, P
2
(x) =
3 x
2
1
2
, . . . (2.15)
(which are normalized such that P
n
(1) = 1).
Example. The set of functions
u
n
(x) =
1
2
e
inx
, n Z, (2.16)
forms an orthonormal basis for H = /
2
cont
(0, 2). The corresponding orthog-
onal expansion is just the ordinary Fourier series. (That these functions are
total will follow from the StoneWeierstra theorem Problem 5.9)
The following equivalent properties also characterize a basis.
Theorem 2.4. For an orthonormal set u
j
jJ
in a Hilbert space H the
following conditions are equivalent:
(i) u
j
jJ
is a maximal orthogonal set.
(ii) For every vector f H we have
f =
jJ
u
j
, fu
j
. (2.17)
(iii) For every vector f H we have
|f|
2
=
jJ
[u
j
, f[
2
. (2.18)
(iv) u
j
, f = 0 for all j J implies f = 0.
2.1. Orthonormal bases 39
Proof. We will use the notation from Theorem 2.2.
(i) (ii): If f
jJ
would be a
larger orthonormal set, contradicting the maximality of u
j
jJ
.
(ii) (iii): This follows since (ii) implies f
= 0.
(iii) (iv): If f, u
j
= 0 for all j J, we conclude |f|
2
= 0 and hence
f = 0.
(iv) (i): If u
j
jJ
were not maximal, there would be a unit vector g such
that u
j
jJ
g is a larger orthonormal set. But u
j
, g = 0 for all j J
implies g = 0 by (iv), a contradiction.
By continuity of the norm it suces to check (iii), and hence also (ii),
for f in a dense set.
It is not surprising that if there is one countable basis, then it follows
that every other basis is countable as well.
Theorem 2.5. In a Hilbert space H every orthonormal basis has the same
cardinality.
Proof. Without loss of generality we assume that H is innite dimensional.
Let u
j
jJ
and v
k
kK
be two orthonormal bases. Set K
j
= k
K[v
k
, u
j
,= 0. Since these are the expansion coecients of u
j
with re-
spect to v
k
kK
, this set is countable. Hence the set
K =
jJ
K
j
has the
same cardinality as J. But k K
K implies v
k
= 0 and hence
K = K.
The cardinality of an orthonormal basis is also called the Hilbert space
dimension of H.
It even turns out that, up to unitary equivalence, there is only one
separable innite dimensional Hilbert space:
A bijective linear operator U L(H
1
, H
2
) is called unitary if U preserves
scalar products:
Ug, Uf
2
= g, f
1
, g, f H
1
. (2.19)
By the polarization identity (1.46) this is the case if and only if U preserves
norms: |Uf|
2
= |f|
1
for all f H
1
(note the a norm preserving linear
operator is automatically injective). The two Hilbert space H
1
and H
2
are
called unitarily equivalent in this case.
Let H be an innite dimensional Hilbert space and let u
j
jN
be any
orthogonal basis. Then the map U : H
2
(N), f (u
j
, f)
jN
is uni-
tary (by Theorem 2.4 (ii) it is onto and by (iii) it is norm preserving). In
particular,
Theorem 2.6. Any separable innite dimensional Hilbert space is unitarily
equivalent to
2
(N).
40 2. Hilbert spaces
Finally we briey turn to the case where H is not separable.
Theorem 2.7. Every Hilbert space has an orthonormal basis.
Proof. To prove this we need to resort to Zorns lemma (see Appendix A):
The collection of all orthonormal sets in H can be partially ordered by inclu-
sion. Moreover, every linearly ordered chain has an upper bound (the union
of all sets in the chain). Hence a fundamental result from axiomatic set
theory, Zorns lemma, implies the existence of a maximal element, that is,
an orthonormal set which is not a proper subset of every other orthonormal
set.
Hence, if u
j
jJ
is an orthogonal basis, we can show that H is unitarily
equivalent to
2
(J) and, by prescribing J, we can nd an Hilbert space of
any given dimension.
Problem 2.1. Let u
j
be some orthonormal basis. Show that a bounded
linear operator A is uniquely determined by its matrix elements A
jk
=
u
j
, Au
k
with respect to this basis.
2.2. The projection theorem and the Riesz lemma
Let M H be a subset. Then M
= f[g, f = 0, g M is called
the orthogonal complement of M. By continuity of the scalar prod-
uct it follows that M
= M
+f
with f
M and f
. One writes
M M
= H (2.20)
in this situation.
Proof. Since M is closed, it is a Hilbert space and has an orthonormal
basis u
j
jJ
. Hence the existence part follows from Theorem 2.2. To see
uniqueness suppose there is another decomposition f =
f
+
f
. Then
f
=
f
M M
= 0.
Corollary 2.9. Every orthogonal set u
j
jJ
can be extended to an orthog-
onal basis.
Proof. Just add an orthogonal basis for (u
j
jJ
)
.
2.2. The projection theorem and the Riesz lemma 41
Moreover, Theorem 2.8 implies that to every f H we can assign a
unique vector f
,
lies in M
. The operator P
M
f = f
, f
= g, P
M
f. Clearly we have P
M
f = f
P
M
f = f
= span(M). (2.22)
Note that by H
= 0 we see that M
.
Proof. If 0, we can choose g = 0. Otherwise Ker() = f[(f) = 0
is a proper subspace and we can nd a unit vector g Ker()
. For every
f H we have (f) g ( g)f Ker() and hence
0 = g, (f) g ( g)f = (f) ( g) g, f.
In other words, we can choose g = ( g)
= 0.
Problem 2.2. Suppose U : H H is unitary and M H. Show that
UM
= (UM)
.
Problem 2.3. Show that an orthogonal projection P
M
,= 0 has norm one.
Problem 2.4. Suppose P L(H) satises
P
2
= P and Pf, g = f, Pg
and set M = Ran(P). Show
42 2. Hilbert spaces
Pf = f for f M and M is closed,
g M
implies Pg M
and thus Pg = 0,
and conclude P = P
M
.
2.3. Operators dened via forms
In many situations operators are not given explicitly, but implicitly via their
associated sesquilinear forms f, Ag. As an easy consequence of the Riesz
lemma, Theorem 2.10, we obtain that there is a one to one correspondence
between bounded operators and bounded sesquilinear forms:
Lemma 2.11. Suppose s is a bounded sesquilinear form; that is,
[s(f, g)[ C|f| |g|. (2.23)
Then there is a unique bounded operator A such that
s(f, g) = f, Ag. (2.24)
Moreover, the norm of A is given by
|A| = sup
f=g=1
[s(f, g)[. (2.25)
Proof. For every g H we have an associated bounded linear functional
g
(f) = s(f, g)
dened via
f, A
g = Af, g. (2.26)
Example. If H = C
n
and A = (a
jk
)
1j,kn
, then A
= (a
kj
)
1j,kn
.
A few simple properties of taking adjoints are listed below.
Lemma 2.13. Let A, B L(H) and C. Then
(i) (A+B)
= A
+B
, (A)
,
2.3. Operators dened via forms 43
(ii) A
= A,
(iii) (AB)
= B
,
(iv) |A
A| = |AA
|.
Proof. (i) is obvious. (ii) follows from f, A
g = A
f, g = f, Ag. (iii)
follows from f, (AB)g = A
f, Bg = B
| = sup
f=g=1
[f, A
g[ = sup
f=g=1
[Af, g[
= sup
f=g=1
[g, Af[ = |A|
and
|A
A| = sup
f=g=1
[f, A
Ag[ = sup
f=g=1
[Af, Ag[
= sup
f=1
|Af|
2
= |A|
2
,
where we have used that [Af, Ag[ attains its maximum when Af and Ag
are parallell (compare Theorem 1.22).
Note that |A| = |A
) = Ran(A)
. (2.27)
A sesquilinear form is called nonnegative if s(f, f) 0 and we will call
A nonnegative, A 0, if its associated sesquilinear form is. We will write
A B if AB 0.
Lemma 2.14. Suppose A I for some > 0. Then A is a bijection with
bounded inverse, |A
1
|
1
.
Proof. By denition |f|
2
f, Af |f||Af| and thus |f| |Af|.
In particular, Af = 0 implies f = 0 and thus for every g Ran(A) there is
a unique f = A
1
g. Moreover, by |A
1
g| = |f|
1
|Af| =
1
|g| the
operator A
1
is bounded. So if g
n
Ran(A) converges to some g H, then
f
n
= A
1
g
n
converges to some f. Taking limits in g
n
= Af
n
shows that
g = Af is in the range of A, that is, the range of A is closed. To show that
Ran(A) = H we pick h Ran(A)
. Then 0 = h, Ah |h|
2
shows h = 0
and thus Ran(A)
= 0.
Combining the last two results we obtain the famous LaxMilgram the-
orem which plays an important role in theory of elliptic partial dierential
equations.
44 2. Hilbert spaces
Theorem 2.15 (LaxMilgram). Let s be a sesquilinear form which is
bounded, [s(f, g)[ C|f| |g|, and
coercive, s(f, f) |f|
2
.
Then for every g H there is a unique f H such that
s(h, f) = h, g, h H. (2.28)
Proof. Let A be the operator associated with s. Then A and f =
A
1
g.
Problem 2.5. Let H a Hilbert space and let u, v H. Show that the operator
Af = u, fv
is bounded and compute its norm. Compute the adjoint of A.
Problem 2.6. Prove (2.25). (Hint: Use |f| = sup
g=1
[g, f[ compare
Theorem 1.22.)
Problem 2.7. Suppose A has a bounded inverse A
1
. Show (A
1
)
=
(A
)
1
.
Problem 2.8. Show (2.27).
2.4. Orthogonal sums and tensor products
Given two Hilbert spaces H
1
and H
2
, we dene their orthogonal sum
H
1
H
2
to be the set of all pairs (f
1
, f
2
) H
1
H
2
together with the scalar
product
(g
1
, g
2
), (f
1
, f
2
) = g
1
, f
1
1
+g
2
, f
2
2
. (2.29)
It is left as an exercise to verify that H
1
H
2
is again a Hilbert space.
Moreover, H
1
can be identied with (f
1
, 0)[f
1
H
1
and we can regard H
1
as a subspace of H
1
H
2
, and similarly for H
2
. It is also customary to write
f
1
+f
2
instead of (f
1
, f
2
).
More generally, let H
j
, j N, be a countable collection of Hilbert spaces
and dene
j=1
H
j
=
j=1
f
j
[ f
j
H
j
,
j=1
|f
j
|
2
j
< , (2.30)
which becomes a Hilbert space with the scalar product
j=1
g
j
,
j=1
f
j
=
j=1
g
j
, f
j
j
. (2.31)
Example.
j=1
C =
2
(N).
2.4. Orthogonal sums and tensor products 45
Similarly, if H and
H are two Hilbert spaces, we dene their tensor
product as follows: The elements should be products f
f of elements f H
and
f
H. Hence we start with the set of all nite linear combinations of
elements of H
H
T(H,
H) =
n
j=1
j
(f
j
,
f
j
)[(f
j
,
f
j
) H
H,
j
C. (2.32)
Since we want (f
1
+f
2
)
f = f
1
f +f
2
f, f (
f
1
+
f
2
) = f
f
1
+f
f
2
,
and (f)
f = f (
f) we consider T(H,
H)/A(H,
H), where
A(H,
H) = span
n
j,k=1
k
(f
j
,
f
k
) (
n
j=1
j
f
j
,
n
k=1
k
f
k
) (2.33)
and write f
f for the equivalence class of (f,
f).
Next we dene
f
f, g g = f, g
f, g (2.34)
which extends to a sesquilinear form on T(H,
H)/A(H,
H). To show that we
obtain a scalar product, we need to ensure positivity. Let f =
i
f
i
f
i
,=
0 and pick orthonormal bases u
j
, u
k
for spanf
i
, span
f
i
, respectively.
Then
f =
j,k
jk
u
j
u
k
,
jk
=
i
u
j
, f
i
u
k
,
f
i
(2.35)
and we compute
f, f =
j,k
[
jk
[
2
> 0. (2.36)
The completion of T(H,
H)/A(H,
H) with respect to the induced norm is
called the tensor product H
H of H and
H.
Lemma 2.16. If u
j
, u
k
are orthonormal bases for H,
H, respectively, then
u
j
u
k
is an orthonormal basis for H
H.
Proof. That u
j
u
k
is an orthonormal set is immediate from (2.34). More-
over, since spanu
j
, span u
k
are dense in H,
H, respectively, it is easy to
see that u
j
u
k
is dense in T(H,
H)/A(H,
H). But the latter is dense in
H
H.
Example. We have H C
n
= H
n
.
It is straightforward to extend the tensor product to any nite number
of Hilbert spaces. We even note
(
j=1
H
j
) H =
j=1
(H
j
H), (2.37)
46 2. Hilbert spaces
where equality has to be understood in the sense that both spaces are uni-
tarily equivalent by virtue of the identication
(
j=1
f
j
) f =
j=1
f
j
f. (2.38)
Problem 2.9. Show that f
f = 0 if and only if f = 0 or
f = 0.
Problem 2.10. We have f
f = g g ,= 0 if and only if there is some
C0 such that f = g and
f =
1
g.
Problem 2.11. Show (2.37)
Chapter 3
Compact operators
3.1. Compact operators
A linear operator A dened on a normed space X is called compact if every
sequence Af
n
has a convergent subsequence whenever f
n
is bounded. The
set of all compact operators is denoted by C(X). It is not hard to see that
the set of compact operators is an ideal of the set of bounded operators
(Problem 3.1):
Theorem 3.1. Every compact linear operator is bounded. Linear combina-
tions of compact operators are compact and the product of a bounded and a
compact operator is again compact.
If X is a Banach space then this ideal is even closed:
Theorem 3.2. Let X be a Banach space, and let A
n
be a convergent se-
quence of compact operators. Then the limit A is again compact.
Proof. Let f
(0)
j
be a bounded sequence. Choose a subsequence f
(1)
j
such
that A
1
f
(1)
j
converges. From f
(1)
j
choose another subsequence f
(2)
j
such that
A
2
f
(2)
j
converges and so on. Since f
(n)
j
might disappear as n , we con-
sider the diagonal sequence f
j
= f
(j)
j
. By construction, f
j
is a subsequence
of f
(n)
j
for j n and hence A
n
f
j
is Cauchy for every xed n. Now
|Af
j
Af
k
| = |(AA
n
)(f
j
f
k
) +A
n
(f
j
f
k
)|
|AA
n
||f
j
f
k
| +|A
n
f
j
A
n
f
k
|
shows that Af
j
is Cauchy since the rst term can be made arbitrary small
by choosing n large and the second by the Cauchy property of A
n
f
j
.
47
48 3. Compact operators
Note that it suces to verify compactness on a dense set.
Theorem 3.3. Let X be a normed space and A C(X). Let X be its
completion, then A C(X), where A is the unique extension of A.
Proof. Let f
n
X be a given bounded sequence. We need to show that
Af
n
has a convergent subsequence. Pick f
j
n
X such that |f
j
n
f
n
|
1
j
and
by compactness of A we can assume that Af
n
n
g. But then |Af
n
g|
|A||f
n
f
n
n
| +|Af
n
n
g| shows that Af
n
g.
One of the most important examples of compact operators are integral
operators:
Lemma 3.4. The integral operator
(Kf)(x) =
_
b
a
K(x, y)f(y)dy, (3.1)
where K(x, y) C([a, b] [a, b]), dened on /
2
cont
(a, b) is compact.
Proof. First of all note that K(., ..) is continuous on [a, b] [a, b] and hence
uniformly continuous. In particular, for every > 0 we can nd a > 0
such that [K(y, t) K(x, t)[ whenever [y x[ . Let g(x) = Kf(x).
Then
[g(x) g(y)[
_
b
a
[K(y, t) K(x, t)[ [f(t)[dt
_
b
a
[f(t)[dt |1| |f|,
whenever [y x[ . Hence, if f
n
(x) is a bounded sequence in /
2
cont
(a, b),
then g
n
(x) = Kf
n
(x) is equicontinuous and has a uniformly convergent
subsequence by the Arzel` aAscoli theorem (Theorem 3.5 below). But a
uniformly convergent sequence is also convergent in the norm induced by
the scalar product. Therefore K is compact.
Note that (almost) the same proof shows that K is compact when dened
on C[a, b].
Theorem 3.5 (Arzel`aAscoli). Suppose the sequence of functions f
n
(x),
n N, in C[a, b] is (uniformly) equicontinuous, that is, for every > 0
there is a > 0 (independent of n) such that
[f
n
(x) f
n
(y)[ if [x y[ < . (3.2)
If the sequence f
n
is bounded, then there is a uniformly convergent subse-
quence.
3.2. The spectral theorem for compact symmetric operators 49
Proof. Let x
j
j=1
be a dense subset of our interval (e.g., all rational num-
bers in [a, b]). Since f
n
(x
1
) is bounded, we can choose a subsequence f
(1)
n
(x)
such that f
(1)
n
(x
1
) converges (BolzanoWeierstra). Similarly we can extract
a subsequence f
(2)
n
(x) from f
(1)
n
(x) which converges at x
2
(and hence also
at x
1
since it is a subsequence of f
(1)
n
(x)). By induction we get a sequence
f
(j)
n
(x) converging at x
1
, . . . , x
j
. The diagonal sequence
f
n
(x) = f
(n)
n
(x)
will hence converge for all x = x
j
(why?). We will show that it converges
uniformly for all x:
Fix > 0 and chose such that [f
n
(x) f
n
(y)[
3
for [x y[ < .
The balls B
(x
j
) cover [a, b] and by compactness even nitely many, say
1 j p, suce. Furthermore, choose N
such that [
f
m
(x
j
)
f
n
(x
j
)[
3
for n, m N
and 1 j p.
Now pick x and note that x B
(x
j
) for some j. Thus
[
f
m
(x)
f
n
(x)[ [
f
m
(x)
f
m
(x
j
)[ +[
f
m
(x
j
)
f
n
(x
j
)[
+[
f
n
(x
j
)
f
n
(x)[
for n, m N
.
3.2. The spectral theorem for compact symmetric operators
Let H be a Hilbert space. A linear operator A is called symmetric if its
domain is dense and if
g, Af = Ag, f f, g D(A). (3.3)
If A is bounded (with D(A) = H), then A is symmetric precisely if A = A
,
that is, if A is self-adjoint. However, for unbounded operators there is a
subtle but important dierence between symmetry and self-adjointness.
A number z C is called eigenvalue of A if there is a nonzero vector
u D(A) such that
Au = zu. (3.4)
The vector u is called a corresponding eigenvector in this case. The set of
all eigenvectors corresponding to z is called the eigenspace
Ker(Az) (3.5)
50 3. Compact operators
corresponding to z. Here we have used the shorthand notation Az for A
zI. An eigenvalue is called simple if there is only one linearly independent
eigenvector.
Theorem 3.6. Let A be symmetric. Then all eigenvalues are real and
eigenvectors corresponding to dierent eigenvalues are orthogonal.
Proof. Suppose is an eigenvalue with corresponding normalized eigen-
vector u. Then = u, Au = Au, u =
2
)u
1
, u
2
= Au
1
, u
2
u
1
, Au
2
= 0
nishing the proof.
Note that while eigenvectors corresponding to the same eigenvalue will
in general not automatically be orthogonal, we can of course replace each
set of eigenvectors corresponding to by an set of orthonormal eigenvectors
having the same linear span (e.g. using GramSchmidt orthogonalization).
Now we show that A has an eigenvalue at all (which is not clear in the
innite dimensional case)!
Theorem 3.7. A symmetric compact operator A has an eigenvalue
1
which
satises [
1
[ = |A|.
Proof. We set = |A| and assume ,= 0 (i.e, A ,= 0) without loss of
generality. Since
|A|
2
= sup
f:f=1
|Af|
2
= sup
f:f=1
Af, Af = sup
f:f=1
f, A
2
f
there exists a normalized sequence u
n
such that
lim
n
u
n
, A
2
u
n
=
2
.
Since A is compact, it is no restriction to assume that A
2
u
n
converges, say
lim
n
A
2
u
n
=
2
u. Now
|(A
2
2
)u
n
|
2
= |A
2
u
n
|
2
2
2
u
n
, A
2
u
n
+
4
2
2
(
2
u
n
, A
2
u
n
)
(where we have used |A
2
u
n
| |A||Au
n
| |A|
2
|u
n
| =
2
) implies
lim
n
(A
2
u
n
2
u
n
) = 0 and hence lim
n
u
n
= u. In addition, u is
a normalized eigenvector of A
2
since (A
2
2
)u = 0. Factorizing this last
equation according to (A )u = v and (A + )v = 0 show that either
v ,= 0 is an eigenvector corresponding to or v = 0 and hence u ,= 0 is an
eigenvector corresponding to .
3.2. The spectral theorem for compact symmetric operators 51
Note that for a bounded operator A, there cannot be an eigenvalue with
absolute value larger than |A|, that is, the set of eigenvalues is bounded by
|A| (Problem 3.3).
Now consider a symmetric compact operator A with eigenvalue
1
(as
above) and corresponding normalized eigenvector u
1
. Setting
H
1
= u
1
= f H[u
1
, f = 0 (3.6)
we can restrict A to H
1
since f H
1
implies
u
1
, Af = Au
1
, f =
1
u
1
, f = 0 (3.7)
and hence Af H
1
. Denoting this restriction by A
1
, it is not hard to see
that A
1
is again a symmetric compact operator. Hence we can apply Theo-
rem 3.7 iteratively to obtain a sequence of eigenvalues
j
with corresponding
normalized eigenvectors u
j
. Moreover, by construction, u
j
is orthogonal to
all u
k
with k < j and hence the eigenvectors u
j
form an orthonormal set.
This procedure will not stop unless H is nite dimensional. However, note
that
j
= 0 for j n might happen if A
n
= 0.
Theorem 3.8. Suppose H is an innite dimensional Hilbert space and A :
H H is a compact symmetric operator. Then there exists a sequence of real
eigenvalues
j
converging to 0. The corresponding normalized eigenvectors
u
j
form an orthonormal set and every f H can be written as
f =
j=1
u
j
, fu
j
+h, (3.8)
where h is in the kernel of A, that is, Ah = 0.
In particular, if 0 is not an eigenvalue, then the eigenvectors form an
orthonormal basis (in addition, H need not be complete in this case).
Proof. Existence of the eigenvalues
j
and the corresponding eigenvectors
u
j
has already been established. If the eigenvalues should not converge to
zero, there is a subsequence such that [
j
k
[ . Hence v
k
=
1
j
k
u
j
k
is a
bounded sequence (|v
k
|
1
) for which Av
k
has no convergent subsequence
since |Av
k
Av
l
|
2
= |u
j
k
u
j
l
|
2
= 2, a contradiction.
Next, setting
f
n
=
n
j=1
u
j
, fu
j
,
we have
|A(f f
n
)| [
n
[|f f
n
| [
n
[|f|
since f f
n
H
n
and |A
n
| = [
n
[. Letting n shows A(f
f) = 0
proving (3.8).
52 3. Compact operators
By applying A to (3.8) we obtain the following canonical form of compact
symmetric operators.
Corollary 3.9. Every compact symmetric operator A can be written as
Af =
j=1
j
u
j
, fu
j
, (3.9)
where
j
are the nonzero eigenvalues with corresponding eigenvectors u
j
from the previous theorem.
Remark: There are two cases where our procedure might fail to con-
struct an orthonormal basis of eigenvectors. One case is where there is
an innite number of nonzero eigenvalues. In this case
n
never reaches 0
and all eigenvectors corresponding to 0 are missed. In the other case, 0 is
reached, but there might not be a countable basis and hence again some of
the eigenvectors corresponding to 0 are missed. In any case by adding vec-
tors from the kernel (which are automatically eigenvectors), one can always
extend the eigenvectors u
j
to an orthonormal basis of eigenvectors.
Corollary 3.10. Every compact symmetric operator has an associated or-
thonormal basis of eigenvectors.
This is all we need and it remains to apply these results to Sturm
Liouville operators.
Problem 3.3. Show that if A is bounded, then every eigenvalue satises
[[ |A|.
Problem 3.4. Find the eigenvalues and eigenfunctions of the integral op-
erator
(Kf)(x) =
_
1
0
u(x)v(y)f(y)dy
in /
2
cont
(0, 1), where u(x) and v(x) are some given continuous functions.
Problem 3.5. Find the eigenvalues and eigenfunctions of the integral op-
erator
(Kf)(x) = 2
_
1
0
(2xy x y + 1)f(y)dy
in /
2
cont
(0, 1).
3.3. Applications to SturmLiouville operators
Now, after all this hard work, we can show that our SturmLiouville operator
L =
d
2
dx
2
+q(x), (3.10)
3.3. Applications to SturmLiouville operators 53
where q is continuous and real, dened on
D(L) = f C
2
[0, 1][f(0) = f(1) = 0 /
2
cont
(0, 1), (3.11)
has an orthonormal basis of eigenfunctions.
The corresponding eigenvalue equation Lu = zu explicitly reads
u
(0) = 1
happens to satisfy u(1) = 0 and these are precisely the eigenvalues we are
looking for.
Note that the fact that /
2
cont
(0, 1) is not complete causes no problems
since we can always replace it by its completion H = L
2
(0, 1). A thorough
investigation of this completion will be given later, at this point this is not
essential.
We rst verify that L is symmetric:
f, Lg =
_
1
0
f(x)
(g
(x) +q(x)g(x))dx
=
_
1
0
f
(x)
(x)dx +
_
1
0
f(x)
q(x)g(x)dx
=
_
1
0
f
(x)
g(x)dx +
_
1
0
f(x)
q(x)g(x)dx (3.13)
= Lf, g.
Here we have used integration by part twice (the boundary terms vanish
due to our boundary conditions f(0) = f(1) = 0 and g(0) = g(1) = 0).
Of course we want to apply Theorem 3.8 and for this we would need to
show that L is compact. But this task is bound to fail, since L is not even
bounded (see the example in Section 1.5)!
So here comes the trick: If L is unbounded its inverse L
1
might still
be bounded. Moreover, L
1
might even be compact and this is the case
here! Since L might not be injective (0 might be an eigenvalue), we consider
R
L
(z) = (L z)
1
, z C, which is also known as the resolvent of L.
In order to compute the resolvent, we need to solve the inhomogenous
equation (L z)f = g. This can be done using the variation of constants
formula from ordinary dierential equations which determines the solution
54 3. Compact operators
up to an arbitrary solution of the homogenous equation. This homogenous
equation has to be chosen such that f D(L), that is, such that f(0) =
f(1) = 0.
Dene
f(x) =
u
+
(z, x)
W(z)
_
_
x
0
u
(z, t)g(t)dt
_
+
u
(z, x)
W(z)
_
_
1
x
u
+
(z, t)g(t)dt
_
, (3.14)
where u
(z, x)+(q(x)z)u
(z, 0) =
0, u
(z, 0) = 1 respectively u
+
(z, 1) = 0, u
+
(z, 1) = 1 and
W(z) = W(u
+
(z), u
(z)) = u
(z, x)u
+
(z, x) u
(z, x)u
+
(z, x) (3.15)
is the Wronski determinant, which is independent of x (check this!).
Then clearly f(0) = 0 since u
(x) =
u
+
(z, x)
W(z)
_
_
x
0
u
(z, t)g(t)dt
_
+
u
(z, x)
W(z)
_
_
1
x
u
+
(z, t)g(t)dt
_
. (3.16)
Thus we can dierentiate once more giving
f
(x) =
u
+
(z, x)
W(z)
_
_
x
0
u
(z, t)g(t)dt
_
+
u
(z, x)
W(z)
_
_
1
x
u
+
(z, t)g(t)dt
_
g(x)
= (q(x) z)f(x) g(x). (3.17)
In summary, f is in the domain of L and satises (L z)f = g.
Note that z is an eigenvalue if and only if W(z) = 0. In fact, in this
case u
+
(z, x) and u
(z, 1) =
c u
+
(z, 1) = 0 which shows that u
(z))
_
u
+
(z, x)u
(z, t), x t
u
+
(z, t)u
(z, x), x t
(3.18)
we see that (L z)
1
is given by
(L z)
1
g(x) =
_
1
0
G(z, x, t)g(t)dt. (3.19)
3.4. More on compact operators 55
Moreover, from G(z, x, t) = G(z, t, x) it follows that (Lz)
1
is symmetric
for z R (Problem 3.6) and from Lemma 3.4 it follows that it is compact.
Hence Theorem 3.8 applies to (L z)
1
and we obtain:
Theorem 3.11. The SturmLiouville operator L has a countable number of
eigenvalues E
n
. All eigenvalues are discrete and simple. The corresponding
normalized eigenfunctions u
n
form an orthonormal basis for /
2
cont
(0, 1).
Proof. Pick a value R such that R
L
() exists. By Lemma 3.4 R
L
()
is compact and by Theorem 3.3 this remains true if we replace /
2
cont
(0, 1)
by its completion. By Theorem 3.8 there are eigenvalues
n
of R
L
() with
corresponding eigenfunctions u
n
. Moreover, R
L
()u
n
=
n
u
n
is equivalent
to Lu
n
= ( +
1
n
)u
n
, which shows that E
n
= +
1
n
are eigenvalues
of L with corresponding eigenfunctions u
n
. Now everything follows from
Theorem 3.8 except that the eigenvalues are simple. To show this, observe
that if u
n
and v
n
are two dierent eigenfunctions corresponding to E
n
, then
u
n
(0) = v
n
(0) = 0 implies W(u
n
, v
n
) = 0 and hence u
n
and v
n
are linearly
dependent.
Problem 3.6. Show that for our SturmLiouville operator u
(z, x)
=
u
(z
, x). Conclude R
L
(z)
= R
L
(z
K is compact
and symmetric and thus, by Corollary 3.9, there is a countable orthonormal
set u
j
and nonzero numbers s
j
,= 0 such that
K
Kf =
j=1
s
2
j
u
j
, fu
j
. (3.20)
Moreover, |Ku
j
|
2
= u
j
, K
Ku
j
= u
j
, s
2
j
u
j
= s
2
j
implies
s
j
= |Ku
j
| > 0. (3.21)
The numbers s
j
= s
j
(K) are called singular values of K. There are either
nitely many singular values or they converge to zero.
Theorem 3.12 (Canonical form of compact operators). Let K be compact
and let s
j
be the singular values of K and u
j
corresponding orthonormal
56 3. Compact operators
eigenvectors of K
K. Then
K =
j
s
j
u
j
, .v
j
, (3.22)
where v
j
= s
1
j
Ku
j
. The norm of K is given by the largest singular value
|K| = max
j
s
j
(K). (3.23)
Moreover, the vectors v
j
are again orthonormal and satisfy K
v
j
= s
j
u
j
. In
particular, v
j
are eigenvectors of KK
j
u
j
, fu
j
+f
with f
Ker(K
j
u
j
, fKu
j
=
j
s
j
u
j
, fv
j
as required. Furthermore,
v
j
, v
k
= (s
j
s
k
)
1
Ku
j
, Ku
k
= (s
j
s
k
)
1
K
Ku
j
, u
k
= s
j
s
1
k
u
j
, u
k
shows that v
j
are orthonormal. Finally, (3.23) follows from
|Kf|
2
= |
j
s
j
u
j
, fv
j
|
2
=
j
s
2
j
[u
j
, f[
2
_
max
j
s
j
(K)
2
_
|f|
2
,
where equality holds for f = u
j
0
if s
j
0
= max
j
s
j
(K).
If K is self-adjoint, then u
j
=
j
v
j
,
2
j
= 1, are the eigenvectors of K
and
j
s
j
are the corresponding eigenvalues.
An operator K L(H) is called a nite rank operator if its range is
nite dimensional. The dimension
rank(K) = dimRan(K)
is called the rank of K. Since for a compact operator
Ran(K) = spanv
j
(3.24)
we see that a compact operator is nite rank if and only if the sum in (3.22)
is nite. Note that the nite rank operators form an ideal in L(H) just as
the compact operators do. Moreover, every nite rank operator is compact
by the HeineBorel theorem (Theorem 1.12).
Lemma 3.13. The closure of the ideal of nite rank operators in L(H) is
the ideal of compact operators.
3.5. Fredholm theory for compact operators 57
Proof. Since the limit of compact operators is compact, it remains to show
that every compact operator K can be approximated by nite rank ones.
To this end assume that K is not nite rank and note that
K
n
=
n
j=1
s
j
u
j
, .v
j
converges to K as n since
|K K
n
| = max
j0
s
j
(K)
by (3.23).
Moreover, this also shows that the adjoint of a compact operator is again
compact.
Corollary 3.14. An operator K is compact (nite rank) if and only K
is.
In fact, s
j
(K) = s
j
(K
) and
K
j
s
j
v
j
, .u
j
. (3.25)
Proof. First of all note that (3.25) follows from (3.22) since taking adjoints
is continuous and (u
j
, .v
j
)
= v
j
, .u
j
. The rest is straightforward.
Problem 3.8. Show that Ker(A
(3.28)
there is some hope. In fact, we will show that this formula (makes sense
and) holds if K is a compact operator.
58 3. Compact operators
Lemma 3.15. Let K C(H) be compact. Then Ker(1K) is nite dimen-
sional and Ran(1 K) is closed.
Proof. We rst show dimKer(1 K) < . If not we could nd an innite
orthogonal system u
j
j=1
Ker(1 K). By Ku
j
= u
j
compactness of K
implies that there is a convergent subsequence u
j
k
. But this is impossible
by |u
j
u
k
|
2
= 2 for j ,= k.
To see that Ran(1 K) is closed we rst claim that there is a > 0
such that
|(1 K)f| |f|, f Ker(1 K)
. (3.29)
In fact, if there were no such , we could nd a normalized sequence f
j
Ker(1K)
with |f
j
Kf
j
| <
1
j
, that is, f
j
Kf
j
0. After passing to a
subsequence we can assume Kf
j
f by compactness of K. Combining this
with f
j
Kf
j
0 implies f
j
f and f Kf = 0, that is, f Ker(1K).
On the other hand, since Ker(1K)
= Ker(1 K
) (3.30)
by (2.27) we see that the left and right hand side of (3.28) are at least nite
for compact K and we can try to verify equality.
Theorem 3.16. Suppose K is compact. Then
dimKer(1 K) = dimRan(1 K)
, (3.31)
where both quantities are nite.
Proof. It suces to show
dimKer(1 K) dimRan(1 K)
, (3.32)
since replacing K by K
1
there would be an element
3.5. Fredholm theory for compact operators 59
in H
2
with the same image under 1 K contradicting our assumption that
1K is injective. Proceeding inductively we obtain a sequence of subspaces
H
j
= (1 K)
j
H with H
j+1
H
j
. Now choose a normalized sequence
f
j
H
j
H
j+1
. Then for k > j we have
|Kf
j
Kf
k
|
2
= |f
j
f
k
(1 K)(f
j
f
k
)|
2
= |f
j
|
2
+|f
k
+ (1 K)(f
j
f
k
)|
2
1
since f
j
H
j+1
and f
k
+ (1 K)(f
j
f
k
) H
j+1
. But this contradicts the
fact that Kf
j
must have a convergent subsequence.
To show (3.32) in the general case, suppose dimKer(1K) < dimRan(1
K)
. Moreover,
combining (3.31) with (3.30) also shows
dimKer(1 K) = dimKer(1 K
) (3.35)
for compact K.
This theory can be generalized to the case of operators where both
Ker(1 K) and Ran(1 K)
(3.36)
is the called the index of K. Theorem 3.16 now says that a compact oper-
ator is Fredholm of index zero.
Problem 3.10. Compute Ker(1 K) and Ran(1 K)
n=1
X
n
. We can assume that the sets X
n
are closed
and none of them contains a ball; that is, XX
n
is open and nonempty for
every n. We will construct a Cauchy sequence x
n
which stays away from all
X
n
.
Since XX
1
is open and nonempty, there is a closed ball B
r
1
(x
1
)
XX
1
. Reducing r
1
a little, we can even assume B
r
1
(x
1
) XX
1
. More-
over, since X
2
cannot contain B
r
1
(x
1
), there is some x
2
B
r
1
(x
1
) that is
not in X
2
. Since B
r
1
(x
1
) (XX
2
) is open, there is a closed ball B
r
2
(x
2
)
B
r
1
(x
1
) (XX
2
). Proceeding by induction, we obtain a sequence of balls
such that
B
rn
(x
n
) B
r
n1
(x
n1
) (XX
n
).
Now observe that in every step we can choose r
n
as small as we please; hence
without loss of generality r
n
0. Since by construction x
n
B
r
N
(x
N
) for
n N, we conclude that x
n
is Cauchy and converges to some point x X.
61
62 4. The main theorems about Banach spaces
But x B
rn
(x
n
) XX
n
for every n, contradicting our assumption that
the X
n
cover X.
Remark: The set of rational numbers Q can be written as a countable
union of its elements. This shows that completeness assumption is crucial.
(Sets which can be written as the countable union of nowhere dense sets
are said to be of rst category. All other sets are second category. Hence
we have the name category theorem.)
In other words, if X
n
X is a sequence of closed subsets which cover
X, at least one X
n
contains a ball of radius > 0.
Since a closed set is nowhere dense if and only if its complement is open
and dense (cf. Problem 1.5), there is a reformulation which is also worthwhile
noting:
Corollary 4.2. Let X be a complete metric space. Then any countable
intersection of open dense sets is again dense.
Proof. Let O
n
be open dense sets whose intersection is not dense. Then
this intersection must be missing some closed ball B
n
X
n
, where X
n
= XO
n
are closed and nowhere dense. Now note that
X
n
= X
n
B are closed nowhere dense sets in B
. But B
is a complete
metric space, a contradiction.
Now we come to the rst important consequence, the uniform bound-
edness principle.
Theorem 4.3 (BanachSteinhaus). Let X be a Banach space and Y some
normed linear space. Let A
is uniformly
bounded, |A
| C.
Proof. Let
X
n
= x[ |A
x| n for all =
x[ |A
x| n.
Then
n
X
n
= X by assumption. Moreover, by continuity of A
and the
norm, each X
n
is an intersection of closed sets and hence closed. By Baires
theorem at least one contains a ball of positive radius: B
(x
0
) X
n
. Now
observe
|A
y| |A
(y +x
0
)| +|A
x
0
| n +C(x
0
)
for |y| . Setting y =
x
x
, we obtain
|A
x|
n +C(x
0
)
|x|
for every x.
4.1. The Baire theorem and its consequences 63
The next application is
Theorem 4.4 (Open mapping). Let A L(X, Y ) be a bounded linear oper-
ator from one Banach space onto another. Then A is open (i.e., maps open
sets to open sets).
Proof. Denote by B
X
r
(x) X the open ball with radius r centered at x
and let B
X
r
= B
X
r
(0). Similarly for B
Y
r
(y). By scaling and translating balls
(using linearity of A), it suces to prove B
Y
A(B
X
1
) for some > 0.
Since A is surjective we have
Y =
_
n=1
A(B
X
n
)
and the Baire theorem implies that for some n, A(B
X
n
) contains a ball
B
Y
(y) A(B
X
1
) and by convexity of A(B
X
1
)
we also have B
Y
A(B
X
1
).
So we have B
Y
A(B
X
1
), but we would need B
Y
A(B
X
1
). To complete
the proof we will show A(B
X
1
) A(B
X
2
) which implies B
Y
/2
A(B
X
1
).
For every y A(B
X
1
) we can choose some sequence y
n
A(B
X
1
) with
y
n
y. Moreover, there even is some x
n
B
X
1
with y
n
= A(x
n
). How-
ever x
n
might not converge, so we need to argue more carefully and ensure
convergence along the way: start with x
1
B
X
1
such that y Ax
1
B
Y
/2
.
Scaling the relation B
Y
A(B
X
1
) we have B
Y
/2
A(B
X
1/2
) and hence we can
choose x
2
B
X
1/2
such that (y Ax
1
) Ax
2
B
Y
/4
A(B
X
1/4
). Proceeding
like this we obtain a sequence of points x
n
B
X
2
1n
such that
y
n
k=1
Ax
k
B
Y
2
n
.
By |x
k
| < 2
1k
the limit x =
k=1
x
k
exists and satises |x| < 2. Hence
y = Ax A(B
X
2
) as desired.
Remark: The requirement that A is onto is crucial (just look at the
one-dimensional case X = C). Moreover, the converse is also true: If A is
open, then the image of the unit ball contains again some ball B
Y
A(B
X
1
).
Hence by scaling B
Y
r
A(B
X
r
) and letting r we see that A is onto:
Y = A(X).
As an immediate consequence we get the inverse mapping theorem:
Theorem 4.5 (Inverse mapping). Let A L(X, Y ) be a bounded linear
bijection between Banach spaces. Then A
1
is continuous.
64 4. The main theorems about Banach spaces
Example. Consider the operator (Aa)
n
j=1
= (
1
j
a
j
)
n
j=1
in
2
(N). Then its
inverse (A
1
a)
n
j=1
= (j a
j
)
n
j=1
is unbounded (show this!). This is in agree-
ment with our theorem since its range is dense (why?) but not all of
2
(N):
For example (b
j
=
1
j
)
j=1
, Ran(A) since b = Aa gives the contradiction
=
j=1
1 =
j=1
[jb
j
[
2
=
j=1
[a
j
[
2
< .
In fact, for an injective operator the range is closed if and only if the inverse
is bounded (Problem 4.2).
Another important consequence is the closed graph theorem. The graph
of an operator A is just
(A) = (x, Ax)[x D(A). (4.1)
If A is linear, the graph is a subspace of the Banach space X Y (provided
X and Y are Banach spaces), which is just the cartesian product together
with the norm
|(x, y)|
XY
= |x|
X
+|y|
Y
(4.2)
(check this). Note that (x
n
, y
n
) (x, y) if and only if x
n
x and y
n
y.
We say that A has a colsed graph if (A) is a closed set in X Y .
Theorem 4.6 (Closed graph). Let A : X Y be a linear map from a
Banach space X to another Banach space Y . Then A is bounded if and only
if its graph is closed.
Proof. If (A) is closed, then it is again a Banach space. Now the projection
1
(x, Ax) = x onto the rst component is a continuous bijection onto X.
So by the inverse mapping theorem its inverse
1
1
is again continuous.
Moreover, the projection
2
(x, Ax) = Ax onto the second component is also
continuous and consequently so is A =
2
1
1
. The converse is easy.
Remark: The crucial fact here is that A is dened on all of X!
Operators whose graphs are closed are called closed operators. Being
closed is the next option you have once an operator turns out to be un-
bounded. If A is closed, then x
n
x does not guarantee you that Ax
n
converges (like continuity would), but it at least guarantees that if Ax
n
converges, it converges to the right thing, namely Ax:
A bounded: x
n
x implies Ax
n
Ax.
A closed: x
n
x and Ax
n
y implies y = Ax.
If an operator is not closed, you can try to take the closure of its graph,
to obtain a closed operator. If A is bounded this always works (which is
just the contents of Theorem 1.29). However, in general, the closure of the
4.1. The Baire theorem and its consequences 65
graph might not be the graph of an operator as we might pick up points
(x, y
1,2
) (A) with y
1
,= y
2
. Since (A) is a subspace, we also have
(x, y
2
) (x, y
1
) = (0, y
2
y
1
) (A) in this case and thus (A) is the graph
of some operator if and only if
(A) (0, y)[y Y = (0, 0). (4.3)
If this is the case, A is called closable and the operator A associated with
(A) is called the closure of A.
In particular, A is closable if and only if x
n
0 and Ax
n
y implies
y = 0. In this case
D(A) = x X[x
n
D(A), y Y : x
n
x and Ax
n
y,
Ax = y. (4.4)
For yet another way of dening the closure see Problem 4.5.
Example. Consider the operator A in
p
(N) dened by Aa
j
= ja
j
on
D(A) = a
p
(N)[a
j
,= 0 for nitely many j.
(i). A is closable. In fact, if a
n
0 and Aa
n
b then we have a
n
j
0
and thus ja
n
j
0 = b
j
for any j N.
(ii). The closure of A is given by
D(A) = a
p
(N)[(ja
j
)
j=1
p
(N)
and Aa
j
= ja
j
. In fact, if a
n
a and Aa
n
b then we have a
n
j
a
j
and ja
n
j
b
j
for any j N and thus b
j
= ja
j
for any j N. In particular
(ja
j
)
j=1
= (b
j
)
j=1
p
(N). Conversely, suppose (ja
j
)
j=1
p
(N) and
consider
a
n
j
=
_
a
j
, j n,
0, j > n.
Then a
n
a and Aa
n
(ja
j
)
j=1
.
(iii). Note that the inverse of A is the bounded operator A
1
a
j
=
j
1
a
j
dened on all of
p
(N). Thus A
1
is closed. However, since its range
Ran(A
1
) = D(A) is dense but not all of
p
(N), A
1
does not map closed
sets to closed sets in general. In particular, the concept of a closed operator
should not be confused with the concept of a closed map in topology!
The closed graph theorem tells us that closed linear operators can be
dened on all of X if and only if they are bounded. So if we have an
unbounded operator we cannot have both! That is, if we want our operator
to be at least closed, we have to live with domains. This is the reason why in
quantum mechanics most operators are dened on domains. In fact, there
is another important property which does not allow unbounded operators
to be dened on the entire space:
66 4. The main theorems about Banach spaces
Theorem 4.7 (HellingerToeplitz). Let A : H H be a linear operator on
some Hilbert space H. If A is symmetric, that is g, Af = Ag, f, f, g H,
then A is bounded.
Proof. It suces to prove that A is closed. In fact, f
n
f and Af
n
g
implies
h, g = lim
n
h, Af
n
= lim
n
Ah, f
n
= Ah, f = h, Af
for every h H. Hence Af = g.
Problem 4.1. Is the sum of two closed operators also closed? (Here A+B
is dened on D(A+B) = D(A) D(B).)
Problem 4.2. Suppose A : D(A) Ran(A) is closed and injective. Show
that A
1
dened on D(A
1
) = Ran(A) is closed as well.
Conclude that in this case Ran(A) is closed if and only if A
1
is bounded.
Problem 4.3. Show that the dierential operator A =
d
dx
dened on D(A) =
C
1
[0, 1] C[0, 1] (sup norm) is a closed operator. (Compare the example in
Section 1.5.)
Problem 4.4. Consider A =
d
dx
dened on D(A) = C
1
[0, 1] L
2
(0, 1).
Show that its closure is given by
D(A) = f L
2
(0, 1)[g L
2
(0, 1), c C : f(x) = c +
_
x
0
g(y)dy
and Af = g.
Problem 4.5. Consider a linear operator A : D(A) X Y , where X
and Y are Banach spaces. Dene the graph norm associated with A by
|x|
A
= |x|
X
+|Ax|
Y
. (4.5)
Show that A : D(A) Y is bounded if we equip D(A) with the graph norm.
Show that the completion X
A
of (D(A), |.|
A
) can be regarded as a subset of
X if and only if A is closable. Show that in this case the completion can
be identied with D(A) and that the closure of A in X coincides with the
extension from Theorem 1.29 of A in X
A
.
4.2. The HahnBanach theorem and its consequences
Let X be a Banach space. Recall that we have called the set of all bounded
linear functionals the dual space X
nN
y
n
x
n
(4.6)
whose norm is |l
y
| = |y|
q
(Problem 4.7). But can every element of
p
(N)
. Now dene
y
n
= l(
n
), (4.7)
where
n
n
= 1 and
n
m
= 0, n ,= m. Then
[y
n
[ = [l(
n
)[ |l| |
n
|
1
= |l| (4.8)
shows |y|
(N). A similar
argument shows
p
(N)
=
q
(N), 1 p < (Problem 4.8). One usually
identies
p
(N)
with
q
(N) using this canonical isomorphism and simply
writes
p
(N)
=
q
(N). In the case p = this is not true, as we will see
soon.
It turns out that many questions are easier to handle after applying a
linear functional X
. Can we conclude
x(t) = y(t)? The answer is yes and will follow from the HahnBanach
theorem.
We rst prove the real version from which the complex one then follows
easily.
Theorem 4.8 (HahnBanach, real version). Let X be a real vector space
and : X R a convex function (i.e., (x+(1)y) (x)+(1)(y)
for (0, 1)).
If is a linear functional dened on some subspace Y X which satises
(y) (y), y Y , then there is an extension to all of X satisfying
(x) (x), x X.
Proof. Let us rst try to extend in just one direction: Take x , Y and
set
Y = spanx, Y . If there is an extension
to
Y it must clearly satisfy
(y +x) = (y) +
(x).
68 4. The main theorems about Banach spaces
So all we need to do is to choose
(x) such that
(y +x) (y +x). But
this is equivalent to
sup
>0,yY
(y x) (y)
(x) inf
>0,yY
(y +x) (y)
1
x) (y
1
)
(y
2
+
2
x) (y
2
)
2
for every
1
,
2
> 0 and y
1
, y
2
Y . Rearranging this last equations we see
that we need to show
2
(y
1
) +
1
(y
2
)
2
(y
1
1
x) +
1
(y
2
+
2
x).
Starting with the left-hand side we have
2
(y
1
) +
1
(y
2
) = (
1
+
2
) (y
1
+ (1 )y
2
)
(
1
+
2
)(y
1
+ (1 )y
2
)
= (
1
+
2
)((y
1
1
x) + (1 )(y
2
+
2
x))
2
(y
1
1
x) +
1
(y
2
+
2
x),
where =
2
1
+
2
. Hence one dimension works.
To nish the proof we appeal to Zorns lemma (see Appendix A): Let E
be the collection of all extensions
satisfying
(x) (x). Then E can be
partially ordered by inclusion (with respect to the domain) and every linear
chain has an upper bound (dened on the union of all domains). Hence there
is a maximal element by Zorns lemma. This element is dened on X, since
if it were not, we could extend it as before contradicting maximality.
Theorem 4.9 (HahnBanach, complex version). Let X be a complex vector
space and : X R a convex function satisfying (x) (x) if [[ = 1.
If is a linear functional dened on some subspace Y X which satises
[(y)[ (y), y Y , then there is an extension to all of X satisfying
[(x)[ (x), x X.
Proof. Set
r
= Re() and observe
(x) =
r
(x) i
r
(ix).
By our previous theorem, there is a real linear extension
r
satisfying
r
(x)
(x). Now set (x) =
r
(x) i
r
(ix). Then (x) is real linear and by
(ix) =
r
(ix) + i
r
(x) = i(x) also complex linear. To show [(x)[ (x)
we abbreviate =
(x)
|(x)|
and use
[(x)[ = (x) = (x) =
r
(x) (x) (x),
which nishes the proof.
4.2. The HahnBanach theorem and its consequences 69
Note that (x) (x), [[ = 1 is in fact equivalent to (x) = (x),
[[ = 1.
If is a linear functional dened on some subspace, the choice (x) =
|||x| implies:
Corollary 4.10. Let X be a Banach space and let be a bounded linear
functional dened on some subspace Y X. Then there is an extension
X
.
Then x = 0.
Proof. Clearly if (x) = 0 holds for all in some total subset, this holds
for all X
(N) be the
subspace of convergent sequences. Set
l(x) = lim
n
x
n
, x c(N), (4.9)
then l is bounded since
[l(x)[ = lim
n
[x
n
[ |x|
. (4.10)
Hence we can extend it to
(N), where c
0
(N) is the subspace of sequences converging to 0.
Moreover, there is also no other way to identify
(N)
with
1
(N), since
1
(N) is separable whereas
Y = spanx
0
, Y and dene
(y +x
0
) = dist(x
0
, Y ).
70 4. The main theorems about Banach spaces
By construction is linear on
Y and satises (i) and (ii). Moreover, by
dist(x
0
, Y ) |x
0
which vanishes on Y .
If we take the bidual (or double dual) X
. In fact,
consider the linear map J : X X
is isometric
(norm preserving).
Proof. Fix x
0
X. By [J(x
0
)()[ = [(x
0
)[ ||
|x
0
| we have at least
|J(x
0
)|
|x
0
|. Next, by HahnBanach there is a linear functional
0
with
norm |
0
|
= 1 such that
0
(x
0
) = |x
0
|. Hence [J(x
0
)(
0
)[ = [
0
(x
0
)[ =
|x
0
| shows |J(x
0
)|
= |x
0
|.
Thus J : X X
p
(N)
with
q
(N) and choose z
p
(N)
jN
y
j
x
j
, y
q
(N)
=
p
(N)
.
But this implies z(y) = y(x), that is, z = J(x), and thus J is surjective.
(Warning: It does not suce to just argue
p
(N)
=
q
(N)
=
p
(N).)
However,
1
is not reexive since
1
(N)
=
(N) but
(N)
=
1
(N)
as noted earlier.
Example. By the same argument (using the Riesz lemma), every Hilbert
space is reexive.
4.2. The HahnBanach theorem and its consequences 71
Lemma 4.15. Let X be a Banach space.
(i) If X is reexive, so is every closed subspace.
(ii) X is reexive if and only if X
is.
(iii) If X
is separable, so is X.
Proof. (i) Let Y be a closed subspace. Denote by j : Y X the natural
inclusion and dene j
: Y
via (j
(y
))() = y
([
Y
) for y
and X
. Note that j
j j
Y
J
Y
Y
(J
Y
(y))() = J
Y
(y)([
Y
) = (y) = J
X
(y)().
Moreover, since J
X
is surjective, for every y
there is an x X such
that j
(y
) = J
X
(x). Since j
(y
)() = y
([
Y
) vanishes on all X
(y
(Y
) = J
X
(Y ) and J
Y
= j J
X
j
1
is
surjective.
(ii) Suppose X is reexive. Then the two maps
(J
X
)
: X
(J
X
)
: X
J
1
X
x
J
X
are inverse of each other. Moreover, x x
and let x = J
1
X
(x
).
Then J
X
(x
)(x
) = x
(x
) = J(x)(x
) = x
(x) = x
(J
1
X
(x
)), that is J
X
=
(J
X
)
respectively (J
X
)
1
= (J
X
)
, which shows X
reexive if X reexive.
To see the converse, observe that X
reexive implies X
= X is reexive by (i).
(iii) Let
n
n=1
be a dense set in X
n=1
is total in
X. If it were not, we could nd some x Xspanx
n
n=1
and hence there
is a functional X
separable implies X
1
(N)
(N) shows.
Problem 4.6. Let X be some Banach space. Show that
|x| = sup
X
, =1
[(x)[ (4.11)
72 4. The main theorems about Banach spaces
for all x X.
Problem 4.7. Show that |l
y
| = |y|
q
, where l
y
p
(N)
as dened in (4.6).
(Hint: Choose x
p
such that x
n
y
n
= [y
n
[
q
.)
Problem 4.8. Show that every l
p
(N)
nN
y
n
x
n
with some y
q
(N). (Hint: To see y
q
(N) consider x
N
dened such that
x
n
y
n
= [y
n
[
q
for n N and x
n
= 0 for n > N. Now look at [(x
N
)[
|||x
N
|
p
.)
Problem 4.9. Let c
0
(N)
can be written as
l(x) =
nN
y
n
x
n
with some y
1
(N) which satises |y|
1
= ||.
(iii) Show that every l c(N)
can be written as
l(x) =
nN
y
n
x
n
+y
0
lim
n
x
n
with some y
1
(N) which satises [y
0
[ +|y|
1
= ||.
Problem 4.10. Let x
n
X be a total set of linearly independent vectors
and suppose the complex numbers c
n
satisfy [c
n
[ c|x
n
|. Is there a bounded
linear functional X
with (x
n
) = c
n
and || c? (Hint: Consider e.g.
X =
2
(Z).)
4.3. Weak convergence
In the last section we have seen that (x) = 0 for all X
implies x = 0.
Now what about convergence? Does (x
n
) (x) for every X
imply
x
n
x? Unfortunately the answer is no:
Example. Let u
n
be an innite orthonormal set in some Hilbert space.
Then g, u
n
0 for every g since these are just the expansion coecients
of g which are in
2
by Bessels inequality. Since by the Riesz lemma (Theo-
rem 2.10), every bounded linear functional is of this form, we have (u
n
) 0
for every bounded linear functional. (Clearly u
n
does not converge to 0, since
|u
n
| = 1.)
4.3. Weak convergence 73
If (x
n
) (x) for every X
we say that x
n
converges weakly to
x and write
w-lim
n
x
n
= x or x
n
x. (4.12)
Clearly x
n
x implies x
n
x and hence this notion of convergence is
indeed weaker. Moreover, the weak limit is unique, since (x
n
) (x) and
(x
n
) ( x) imply (x x) = 0. A sequence x
n
is called a weak Cauchy
sequence if (x
n
) is Cauchy (i.e. converges) for every X
.
Lemma 4.16. Let X be a Banach space.
(i) x
n
x implies |x| liminf |x
n
|.
(ii) Every weak Cauchy sequence x
n
is bounded: |x
n
| C.
(iii) If X is reexive, then every weak Cauchy sequence converges weakly.
(iv) A sequence x
n
is Cauchy if and only if (x
n
) is Cauchy, uniformly
for X
with || = 1.
Proof. (i) Choose X
. Moreover,
by (ii) we have [j()[ sup |(x
n
)| || sup |x
n
| C|| which shows
j X
with |x
| = 1 and
inf
yY
|x
y| 1 . (4.13)
Proof. Pick x XY and abbreviate d = dist(x, Y ) > 0. Choose y
Y
such that |x y
|
d
1
. Set
x
=
x y
|x y
|
.
Then x
y| =
1
|x y
|
|x (y
+|x y
|y)|
d
|x y
|
1
as required.
Proof. (of Theorem 4.18) If X is nite dimensional, then X is isomorphic
to C
n
and the closed unit ball is compact by the Heine-Borel theorem (The-
orem 1.12).
4.3. Weak convergence 75
Conversely, suppose X is innite dimensional and abbreviate S
1
= x
X[ |x| = 1. Choose x
1
S
1
and set Y
1
= spanx
1
. Then, by the lemma
there is an x
2
S
1
such that |x
2
x
1
|
1
2
. Setting Y
2
= spanx
1
, x
2
and
invoking again our lemma, there is an x
3
S
1
such that |x
3
x
j
|
1
2
for
j = 1, 2. Proceeding by induction, we obtain a sequence x
n
S
1
such that
|x
n
x
m
|
1
2
for m ,= n. In particular, this sequence cannot have any
convergent subsequence. (Recall that in a metric space compactness and
sequential compactness are equivalent Lemma 1.11.)
If we are willing to treat convergence for weak convergence, the situation
looks much brighter!
Theorem 4.20. Let X be a reexive Banach space. Then every bounded
sequence has a weakly convergent subsequence.
Proof. Let x
n
be some bounded sequence and consider Y = spanx
n
.
Then Y is reexive by Lemma 4.15 (i). Moreover, by construction Y is
separable and so is Y
. Thus there is a
limit by Lemma 4.16 (iii).
Note that this theorem breaks down if X is not reexive.
Example. Consider the sequence of vectors
n
(with
n
n
= 1 and
n
m
= 0,
n ,= m) in
p
(N), 1 p < . Then
n
0 for 1 < p < . In fact,
since every l
p
(N)
is of the form l = l
y
for some y
q
(N) we have
l
y
(
n
) = y
n
0.
If we consider the same sequence in
1
(N) there is no weakly convergent
subsequence. In fact, since l
y
(
n
) 0 for every sequence y
(N) with
nitely many nonzero entries, the only possible weak limit is zero. On the
other hand choosing the constant sequence y = (1)
j=1
we see l
y
(
n
) = 1 ,0,
a contradiction.
Example. Let X = L
1
(R). Every bounded gives rise to a linear functional
(f) =
_
f(x)(x) dx
76 4. The main theorems about Banach spaces
in L
1
(R)
n
L(
2
(N)) which shifts a sequence n places to the right
and lls up the rst n places with zeros, that is,
S
n
(x
1
, x
2
, . . . ) = (0, . . . , 0
. .
n places
, x
1
, x
2
, . . . ). (4.17)
Then S
n
converges to zero strongly but not in norm (since |S
n
| = 1) and S
n
converges weakly to zero (since x, S
n
y = S
n
x, y) but not strongly (since
|S
n
x| = |x|) .
Lemma 4.21. Suppose A
n
L(X) is a sequence of bounded operators.
(i) s-lim
n
A
n
= A implies |A| liminf |A
n
|.
4.3. Weak convergence 77
(ii) Every strong Cauchy sequence A
n
is bounded: |A
n
| C.
(iii) If A
n
y Ay for y in a dense set and |A
n
| C, then s-lim
n
A
n
=
A.
The same result holds if strong convergence is replaced by weak convergence.
Proof. (i) and (ii) follow as in Lemma 4.16 (i).
(iii) Just use
|A
n
x Ax| |A
n
x A
n
y| +|A
n
y Ay| +|Ay Ax|
2C|x y| +|A
n
y Ay|
and choose y in the dense subspace such that |xy|
4C
and n large such
that |A
n
y Ay|
2
.
The case of weak convergence is left as an exercise.
For an application of this lemma see Problem 7.12.
Lemma 4.22. Suppose A
n
, B
n
L(X) are two sequences of bounded oper-
ators.
(i) s-lim
n
A
n
= A and s-lim
n
B
n
= B implies s-lim
n
A
n
B
n
=
AB.
(ii) w-lim
n
A
n
= A and s-lim
n
B
n
= B implies w-lim
n
A
n
B
n
=
AB.
(iii) lim
n
A
n
= A and w-lim
n
B
n
= B implies w-lim
n
A
n
B
n
=
AB.
Proof. For the rst case just observe
|(A
n
B
n
AB)x| |(A
n
A)Bx| +|A
n
||(B
n
B)x| 0.
The remaining cases are similar and again left as an exercise.
Example. Consider again the last example. Then
S
n
S
n
(x
1
, x
2
, . . . ) = (0, . . . , 0
. .
n places
, x
n+1
, x
n+2
, . . . )
converges to 0 weakly (in fact even strongly) but
S
n
S
n
(x
1
, x
2
, . . . ) = (x
1
, x
2
, . . . )
does not! Hence the order in the second claim is important.
Remark: For a sequence of linear functionals
n
, strong convergence is
also called weak- convergence. That is, the weak- limit of
n
is if
n
(x) (x) x X. (4.18)
78 4. The main theorems about Banach spaces
Note that this is not the same as weak convergence on X
, since is the
weak limit of
n
if
j(
n
) j() j X
, (4.19)
whereas for the weak- limit this is only required for j J(X) X
(recall
J(x)() = (x)). So the weak topology on X
is the
weakest topology for which all j J(X) remain continuous. In particular,
the weak- topology is weaker than the weak topology and both are equal
if X is reexive.
With this notation it is also possible to slightly generalize Theorem 4.20
(Problem 4.14):
Theorem 4.23. Suppose X is separable. Then every bounded sequence
n
X
and x
n
x in X. Then
n
(x
n
)
(x).
Similarly, suppose s-lim
n
and x
n
x. Then
n
(x
n
) (x).
Problem 4.12. Show that x
n
x implies Ax
n
Ax for A L(X).
Problem 4.13. Show that if
j
X
be dened via
S
x
n
=
_
0 n = 1
x
n1
n > 1
, S
+
x
n
= x
n+1
(5.8)
(i.e., S
shifts each sequence one place right (lling up the rst place with a
0) and S
+
shifts one place left (dropping the rst place)). Then S
+
S
= I
but S
S
+
,= I. So you really need to check both xy = e and yx = e in
general.
Lemma 5.1. Let X be a Banach algebra with identity e. Suppose |x| < 1.
Then e x is invertible and
(e x)
1
=
n=0
x
n
. (5.9)
Proof. Since |x| < 1 the series converges and
(e x)
n=0
x
n
=
n=0
x
n
n=1
x
n
= e
respectively
_
n=0
x
n
_
(e x) =
n=0
x
n
n=1
x
n
= e.
n=0
(x
1
y)
n
x
1
or (x y)
1
=
n=0
x
1
(yx
1
)
n
. (5.10)
In particular, both conditions are satised if |y| < |x
1
|
1
and the set of
invertible elements is open.
5.1. Banach algebras 81
Proof. Just observe x y = x(e x
1
y) = (e yx
1
)x.
The resolvent set is dened as
(x) = C[(x )
1
C, (5.11)
where we have used the shorthand notation x = xe. Its complement
is called the spectrum
(x) = C(x). (5.12)
It is important to observe that the fact that the inverse has to exist as an
element of X. That is, if X are bounded linear operators, it does not suce
that x is bijective, the inverse must also be bounded!
Example. If X = L(C
n
) is the space of n by n matrices, then the spectrum
is just the set of eigenvalues.
Example. If X = C(I), then the spectrum of a function x C(I) is just
its range, (x) = x(I).
The map (x )
1
is called the resolvent of x X. If
0
(x)
we can choose x x
0
and y
0
in (5.10) which implies
(x)
1
=
n=0
(
0
)
n
(x
0
)
n1
, [
0
[ < |(x
0
)
1
|
1
. (5.13)
This shows that (x)
1
has a convergent power series with coecients in
X around every point
0
(x). As in the case of coecients in C, such
functions will be called analytic. In particular, ((x )
1
) is a complex-
valued analytic function for every X
n=0
_
x
_
n
, [[ > |x|,
which implies (x) [ [[ |x| is bounded and thus compact. More-
over, taking norms shows
|(x )
1
|
1
[[
n=0
|x|
n
[[
n
=
1
[[ |x|
, [[ > |x|,
which implies (x )
1
0 as . In particular, if (x) is empty,
then ((x )
1
) is an entire analytic function which vanishes at innity.
82 5. Bounded linear operators
By Liouvilles theorem we must have ((x )
1
) = 0 in this case, and so
(x )
1
= 0, which is impossible.
As another simple consequence we obtain:
Theorem 5.4. Suppose X is a Banach algebra in which every element ex-
cept 0 is invertible. Then X is isomorphic to C.
Proof. Pick x X and (x). Then x is not invertible and hence
x = 0, that is x = . Thus every element is a multiple of the identity.
Theorem 5.5 (Spectral mapping). For every polynomial p and x X we
have
(p(x)) = p((x)). (5.15)
Proof. Fix
0
C and observe
p(x) p(
0
) = (x
0
)q
0
(x).
If p(
0
) , (p(x)) we have
(x
0
)
1
= q
0
(x)((x
0
)q
0
(x))
1
= ((x
0
)q
0
(x))
1
q
0
(x)
(check this since q
0
(x) commutes with (x
0
)q
0
(x) it also commutes
with its inverse). Hence
0
, (x).
Conversely, let
0
(p(x)). Then
p(x)
0
= a(x
1
) (x
n
)
and at least one
j
(x) since otherwise the right-hand side would be
invertible. But then p(
j
) =
0
, that is,
0
p((x)).
Next let us look at the convergence radius of the Neumann series for
the resolvent
(x )
1
=
1
n=0
_
x
_
n
(5.16)
encountered in the proof of Theorem 5.3 (which is just the Laurent expansion
around innity).
The number
r(x) = sup
(x)
[[ (5.17)
is called the spectral radius of x. Note that by (5.14) we have
r(x) |x|. (5.18)
Theorem 5.6. The spectral radius satises
r(x) = inf
nN
|x
n
|
1/n
= lim
n
|x
n
|
1/n
. (5.19)
5.1. Banach algebras 83
Proof. By spectral mapping we have r(x)
n
= r(x
n
) |x
n
| and hence
r(x) inf |x
n
|
1/n
.
Conversely, x X
, and consider
((x )
1
) =
1
n=0
1
n
(x
n
). (5.20)
Then ((x )
1
) is analytic in [[ > r(x) and hence (5.20) converges
absolutely for [[ > r(x) by a well-known result from complex analysis.
Hence for xed with [[ > r(x), (x
n
/
n
) converges to zero for every
X
t
n
n!
|x|
. (5.23)
Consequently
|K
n
x|
|k|
n
n!
|x|
,
that is |K
n
|
k
n
n!
, which shows
r(K) lim
n
|k|
(n!)
1/n
= 0.
84 5. Bounded linear operators
Hence r(K) = 0 and for every C and every y C(I) the equation
x K x = y (5.24)
has a unique solution given by
x = (I K)
1
y =
n=0
n
K
n
y. (5.25)
= x
+y
, (x)
, x
= x, (xy)
= y
, (5.26)
satisfying
|x|
2
= |x
x| (5.27)
is called a C
x| |x||x
|
and hence |x| |x
|. By x
| |x
| = |x|
and hence
|x| = |x
|, |x|
2
= |x
x| = |xx
|. (5.28)
Example. The continuous functions C(I) together with complex conjuga-
tion form a commutative C
algebra.
Example. The Banach algebra L(H) is a C
= e and (x
1
)
= (x
)
1
(show
this). We will always assume that we have an identity. In particular,
(x
) = (x)
. (5.29)
If X is a C
x = xx
, self-
adjoint if x
= x, and unitary if x
= x
1
. Moreover, x is called positive
if x = y
2
for some y = y
(x
2
)|
1/2
= |(xx
(xx
)|
1/2
= |x
x| = |x|
2
and hence r(x) = lim
k
|x
2
k
|
1/2
k
= |x|.
Lemma 5.8. If x is self-adjoint, then (x) R.
Proof. Suppose + i (x), R. Then + i( +) (x + i) and
2
+ ( +)
2
|x + i|
2
= |(x + i)(x i)| = |x
2
+
2
| |x|
2
+
2
.
Hence
2
+
2
+ 2 |x|
2
which gives a contradiction if we let [[
unless = 0.
Given x X we can consider the C
algebra C
(x) = p(x, x
)[p : C
2
C polynomial, xx
= x
x, (5.30)
and in particular C
. (5.31)
Moreover, in this case C
= (f
u|, u H. (5.33)
(Hint: Problem 1.21.)
5.3. Spectral measures
Note: This section requires familiarity with measure theory.
Using the Riesz representation theorem we get another formulation in
terms of spectral measures:
Theorem 5.10. Let H be a Hilbert space, and let A L(H) be self-adjoint.
For every u, v H there is a corresponding complex Borel measure
u,v
(the
spectral measure) such that
u, f(A)v =
_
(A)
f(t)d
u,v
(t), f C((A)). (5.34)
We have
u,v
1
+v
2
=
u,v
1
+
u,v
2
,
u,v
=
u,v
,
v,u
=
u,v
(5.35)
and [
u,v
[((A)) |u||v|. Furthermore,
u
=
u,u
is a positive Borel
measure with
u
((A)) = |u|
2
.
Proof. Consider the continuous functions on I = [|A|, |A|] and note that
every f C(I) gives rise to some f C((A)) by restricting its domain.
Clearly
u,v
(f) = u, f(A)v is a bounded linear functional and the existence
of a corresponding measure
u,v
with [
u,v
[(I) = |
u,v
| |u||v| follows
5.3. Spectral measures 87
from Theorem 9.4. Since
u,v
(f) depends only on the value of f on (A) I,
u,v
is supported on (A).
Moreover, if f 0 we have
u
(f) = u, f(A)u = f(A)
1/2
u, f(A)
1/2
u =
|f(A)
1/2
u|
2
0 and hence
u
is positive and the corresponding measure
u
is positive. The rest follows from the properties of the scalar product.
It is often convenient to regard
u,v
as a complex measure on R by using
u,v
() =
u,v
((A)). If we do this, we can also consider f as a function
on R. However, note that f(A) depends only on the values of f on (A)!
Note that the last theorem can be used to dene f(A) for every bounded
measurable function f B((A)) via Lemma 2.11 and extend the functional
calculus from continuous to measurable functions:
Theorem 5.11 (Spectral theorem). If H is a Hilbert space and A L(H)
is self-adjoint, then there is an homomorphism : B((A)) L(H) given
by
u, f(A)v =
_
(A)
f(t)d
u,v
(t), f B((A)). (5.36)
Moreover, if f
n
(t) f(t) pointwise and sup
n
|f
n
|
is bounded, then f
n
(A)u
f(A)u for every u H.
Proof. The map is well-dened linear operator by Lemma 2.11 since we
have
_
(A)
f(t)d
u,v
(t)
|f|
[
u,v
[((A)) |f|
|u||v|
and (5.35). Next, observe that (f)
= (f
h(A)u). Thus f
n
(A)u
f(A)u and for bounded g we also have that (gf
n
)(A)u (gf)(A)u and
g(A)f
n
(A)u g(A)f(A)u. This establishes the case where f is bounded
and g is continuous. Similarly, approximating g removes the continuity re-
quirement from g.
The last claim follows since f
n
f in L
2
by dominated convergence in
this case.
In particular, given a self-adjoint operator A we can dene the spectral
projections
P
A
() =
= P.
Lemma 5.12. Suppose P is an orthogonal projection. Then H decomposes
in an orthogonal sum
H = Ker(P) Ran(P) (5.38)
and Ker(P) = (I P)H, Ran(P) = PH.
Proof. Clearly, every u H can be written as u = (I P)u +Pu and
(I P)u, Pu = P(I P)u, u = (P P
2
)u, u = 0
shows H = (IP)HPH. Moreover, P(IP)u = 0 shows (IP)H Ker(P)
and if u Ker(P) then u = (IP)u (IP)H shows Ker(P) (IP)H.
In addition, the spectral projections satisfy
P
A
(R) = I, P
A
(
_
n=1
n
)u =
n=1
P
A
(
n
)u, u H. (5.39)
Such a family of projections is called a projection valued measure and
P
A
(t) = P
A
((, t]) (5.40)
is called a resolution of the identity. Note that we have
u,v
() = u, P
A
()v. (5.41)
Using them we can dene an operator-valued integral as usual such that
A =
_
t dP
A
(t). (5.42)
In particular, if P
A
() ,= 0, then is an eigenvalue and Ran(P
A
())
is the corresponding eigenspace since
AP
A
() = P
A
(). (5.43)
The fact that eigenspaces to dierent eigenvalues are orthogonal now
generalizes to
Lemma 5.13. Suppose
1
2
= . Then
Ran(P
A
(
1
)) Ran(P
A
(
2
)). (5.44)
Proof. Clearly
2
=
2
and hence
P
A
(
1
)P
A
(
2
) = P
A
(
1
2
).
Now if
1
2
= , then
P
A
(
1
)u, P
A
(
2
)v = u, P
A
(
1
)P
A
(
2
)v = u, P
A
()v = 0,
which shows that the ranges are orthogonal to each other.
5.4. The StoneWeierstra theorem 89
Example. Let A L(C
n
) be some symmetric matrix and let
1
, . . . ,
m
be its (distinct) eigenvalues. Then
A =
m
j=1
j
P
A
(
j
), (5.45)
where P
A
(
j
) is the projection onto the eigenspace corresponding to the
eigenvalue
j
.
Problem 5.8. Show the following estimates for the total variation of spec-
tral measures
[
u,v
[()
u
()
1/2
v
()
1/2
(5.46)
(Hint: Use
u,v
() = u, P
A
()v = P
A
()u, P
A
()v together with Cauchy
Schwarz to estimate (8.14).)
5.4. The StoneWeierstra theorem
In the last section we have seen that the C
= max
xK
[f(x)[.
Theorem 5.14 (StoneWeierstra, real version). Suppose K is a compact
metric space and let C(K, R) be the Banach algebra of continuous functions
(with the maximum norm).
If F C(K, R) contains the identity 1 and separates points (i.e., for
every x
1
,= x
2
there is some function f F such that f(x
1
) ,= f(x
2
)), then
the algebra generated by F is dense.
Proof. Denote by A the algebra generated by F. Note that if f A, we
have [f[ A: By the Weierstra approximation theorem (Theorem 1.20)
there is a polynomial p
n
(t) such that
[t[ p
n
(t)
<
1
n
for t f(K) and
hence p
n
(f) [f[.
In particular, if f, g are in A, we also have
maxf, g =
(f +g) +[f g[
2
, minf, g =
(f +g) [f g[
2
in A.
Now x f C(K, R). We need to nd some f
A with |f f
< .
90 5. Bounded linear operators
First of all, since A separates points, observe that for given y, z K
there is a function f
y,z
A such that f
y,z
(y) = f(y) and f
y,z
(z) = f(z)
(show this). Next, for every y K there is a neighborhood U(y) such that
f
y,z
(x) > f(x) , x U(y),
and since K is compact, nitely many, say U(y
1
), . . . , U(y
j
), cover K. Then
f
z
= maxf
y
1
,z
, . . . , f
y
j
,z
A
and satises f
z
> f by construction. Since f
z
(z) = f(z) for every z K,
there is a neighborhood V (z) such that
f
z
(x) < f(x) +, x V (z),
and a corresponding nite cover V (z
1
), . . . , V (z
k
). Now
f
= minf
z
1
, . . . , f
z
k
A
satises f
and we have
found a required function.
Theorem 5.15 (StoneWeierstra). Suppose K is a compact metric space
and let C(K) be the C
2
e
inx
, n Z, form an
orthonormal basis for H = L
2
(0, 2).
Problem 5.10. Let k N and I R. Show that the -subalgebra generated
by f
z
0
(t) =
1
(tz
0
)
k
for one z
0
C is dense in the C
algebra C
(I) of
continuous functions vanishing at innity
for I = R if z
0
CR and k = 1, 2,
for I = [a, ) if z
0
(, a) and every k,
for I = (, a] [b, ) if z
0
(a, b) and k odd.
(Hint: Add to R to make it compact.)
Problem 5.11. Let K C be a compact set. Show that the set of all
functions f(z) = p(x, y), where p : R
2
C is polynomial and z = x + iy, is
dense in C(K).
Part 2
Real Analysis
Chapter 6
Almost everything
about Lebesgue
integration
6.1. Borel measures in a nut shell
The rst step in dening the Lebesgue integral is extending the notion of
size from intervals to arbitrary sets. Unfortunately, this turns out to be too
much, since a classical paradox by Banach and Tarski shows that one can
break the unit ball in R
3
into a nite number of (wild choosing the pieces
uses the Axiom of Choice and cannot be done with a jigsaw;-) pieces, rotate
and translate them, and reassemble them to get two copies of the unit ball
(compare Problem 6.4). Hence any reasonable notion of size (i.e., one which
is translation and rotation invariant) cannot be dened for all sets!
A collection of subsets / of a given set X such that
X /,
/ is closed under nite unions,
/ is closed under complements
is called an algebra. Note that / and that, by de Morgan, / is also
closed under nite intersections. If an algebra is closed under countable
unions (and hence also countable intersections), it is called a -algebra.
Example. Let X = 1, 2, 3, then / = , 1, 2, 3, X is an algebra.
Moreover, the intersection of any family of (-)algebras /
is again
a (-)algebra and for any collection S of subsets there is a unique smallest
95
96 6. Almost everything about Lebesgue integration
(-)algebra (S) containing S (namely the intersection of all (-)algebras
containing S). It is called the (-)algebra generated by S.
Example. For a given set X the power set P(X) is clearly the largest
-algebra and , X is the smallest.
If X is a topological space, the Borel -algebra B(X) of X is dened
to be the -algebra generated by all open (respectively, all closed) sets. In
fact, if X is second countable, any countable base will suce to generate
the Borel -algebra (recall Lemma 1.1).
Sets in the Borel -algebra are called Borel sets.
Example. In the case X = R
n
the Borel -algebra will be denoted by B
n
and we will abbreviate B= B
1
. Note that in order to generate B open (or
closed) intervals with rational boundary points suce.
Example. If X is a topological space then any measurable subset Y X
is also a topological space equipped with the relative topology and its Borel
-algebra is given by B(Y ) = A[A B(X), A Y (show this).
Now let us turn to the denition of a measure: A set X together with
a -algebra is called a measurable space. A measure is a map
: [0, ] on a -algebra such that
() = 0,
(
j=1
A
j
) =
j=1
(A
j
) if A
j
A
k
= for all j ,= k (-additivity).
Here the sum is set equal to if one of the summands is or if it diverges.
The measure is called -nite if there is a countable cover X
j
j=1
of
X such that X
j
and (X
j
) < for all j. (Note that it is no restriction
to assume X
j
X
j+1
.) It is called nite if (X) < . The sets in are
called measurable sets and the triple (X, , ) is referred to as a measure
space.
Example. Take a set X and = P(X) and set (A) to be the number
of elements of A (respectively, if A is innite). This is the so-called
counting measure. It will be nite if and only if X is nite and -nite if
and only if X is countable.
Example. Take a set X and = P(X). Fix a point x X and set
(A) = 1 if x A and (A) = 0 else. This is the Dirac measure centered
at x.
6.1. Borel measures in a nut shell 97
Example. Let
1
,
2
be two measures on (X, ) and
1
,
2
0. Then
=
1
1
+
2
2
dened via
(A) =
1
1
(A) +
2
2
(A)
is again a measure. Furthermore, given a countable number of measures
n
and numbers
n
0, then =
n
is again a measure (show this).
Example. Let be a measures on (X, ) and Y X a measurable subset.
Then
(A) = (A Y )
is again a measure on (X, ) (show this).
If Y we can restrict the -algebra [
Y
= A [A Y such that
(Y, [
Y
, [
Y
) is again a measure space. It will be -nite if (X, , ) is.
If we replace the -algebra by an algebra /, then is called a premea-
sure. In this case -additivity clearly only needs to hold for disjoint sets
A
n
for which
n
A
n
/.
We will write A
n
A if A
n
A
n+1
(note A =
n
A
n
) and A
n
A if
A
n+1
A
n
(note A =
n
A
n
).
Theorem 6.1. Any measure satises the following properties:
(i) A B implies (A) (B) (monotonicity).
(ii) (A
n
) (A) if A
n
A (continuity from below).
(iii) (A
n
) (A) if A
n
A and (A
1
) < (continuity from above).
Proof. The rst claim is obvious from (B) = (A) +(BA). To see the
second dene
A
1
= A
1
,
A
n
= A
n
A
n1
and note that these sets are disjoint
and satisfy A
n
=
n
j=1
A
j
. Hence (A
n
) =
n
j=1
(
A
j
)
j=1
(
A
j
) =
(
j=1
A
j
) = (A) by -additivity. The third follows from the second using
A
n
= A
1
A
n
A
1
A implying (
A
n
) = (A
1
) (A
n
) (A
1
A) =
(A
1
) (A).
Example. Consider the counting measure on X = N and let A
n
= j
N[j n, then (A
n
) = , but (
n
A
n
) = () = 0 which shows that the
requirement (A
1
) < in the last claim of Theorem 6.1 is not superu-
ous.
A measure on the Borel -algebra is called a Borel measure if (K) <
for every compact set K. Note that some authors do not require this last
condition.
Example. Let X = R and = B. The Dirac measure is a Borel measure.
The counting measure is no Borel measure since ([a, b]) = for a < b.
98 6. Almost everything about Lebesgue integration
A measure on the Borel -algebra is called outer regular if
(A) = inf
OA,O open
(O) (6.1)
and inner regular if
(A) = sup
KA,K compact
(K). (6.2)
It is called regular if it is both outer and inner regular.
Example. Let X = R and = B. The counting measure is inner regular
but not outer regular (every nonempty open set has innite measure). The
Dirac measure is a regular Borel measure.
But how can we obtain some more interesting Borel measures? We will
restrict ourselves to the case of X = R for simplicity. Then the strategy
is as follows: Start with the algebra of nite unions of disjoint intervals
and dene for those sets (as the sum over the intervals). This yields a
premeasure. Extend this to an outer measure for all subsets of R. Show
that the restriction to the Borel sets is a measure.
Let us rst show how we should dene for intervals: To every Borel
measure on B we can assign its distribution function
(x) =
_
_
((x, 0]), x < 0,
0, x = 0,
((0, x]), x > 0,
(6.3)
which is right continuous and nondecreasing.
Conversely, to obtain a measure from a nondecreasing function : R
R we proceed as follows: Recall that an interval is a subset of the real line
of the form
I = (a, b], I = [a, b], I = (a, b), or I = [a, b), (6.4)
with a b, a, b R, . Note that (a, a), [a, a), and (a, a] denote the
empty set, whereas [a, a] denotes the singleton a. For any proper interval
with dierent endpoints we can dene its measure to be
(I) =
_
_
(b+) (a+), I = (a, b],
(b+) (a), I = [a, b],
(b) (a+), I = (a, b),
(b) (a), I = [a, b),
(6.5)
where (a) = lim
0
(a ) (which exist by monotonicity). If one of the
endpoints is innite we agree to use () = lim
x
(x). For the empty
6.1. Borel measures in a nut shell 99
set we of course set () = 0 and for the singletons we set
(a) = (a+) (a) (6.6)
(which agrees with (6.5) except for the case I = (a, a) which would give a
negative value for the empty set if jumps at a). Note that (a) = 0 if
and only if (x) is continuous at a and that there can be only countably
many points with (a) > 0 since a nondecreasing function can have at
most countably many jumps.
Now we can consider the algebra of nite unions of intervals (check
that this is indeed an algebra) and extend (6.5) to nite unions of disjoint
intervals by summing over all intervals. It is straightforward to verify that
is well-dened (one set can be represented by dierent unions of intervals)
and by construction additive. In fact, it is even a premeasure.
Lemma 6.2. The interval function dened in (6.5) gives rise to a unique
-nite premeasure on the algebra / of nite unions of disjoint intervals.
Proof. It remains to verify -additivity. We need to show that for any
disjoint union
(
_
k
I
k
) =
k
(I
k
)
whenever I
k
/ and I =
k
I
k
/. Since each I
k
is a nite union of in-
tervals, we can as well assume each I
k
is just one interval (just split I
k
into
its subintervals and note that the sum does not change by additivity). Sim-
ilarly, we can assume that I is just one interval (just treat each subinterval
separately).
By additivity is monotone and hence
n
k=1
(I
k
) = (
n
_
k=1
I
k
) (I)
which shows
k=1
(I
k
) (I).
To get the converse inequality, we need to work harder:
We can cover each I
k
by some slightly larger open interval J
k
such that
(J
k
) (I
k
) +
2
k
. First suppose I is compact. Then nitely many of the
J
k
, say the rst n, cover I and we have
(I) (
n
_
k=1
J
k
)
n
k=1
(J
k
)
k=1
(I
k
) +.
100 6. Almost everything about Lebesgue integration
Since > 0 is arbitrary, this shows -additivity for compact intervals. By
additivity we can always add/subtract the endpoints of I and hence -
additivity holds for any bounded interval. If I is unbounded we can again
assume that it is closed by adding an endpoint if necessary. Then for any
x > 0 we can nd an n such that J
k
n
k=1
cover at least I [x, x] and
hence
k=1
(I
k
)
n
k=1
(I
k
)
n
k=1
(J
k
) ([x, x] I) .
Since x > 0 and > 0 are arbitrary, we are done.
In particular, this premeasure on the algebra of nite unions of intervals
which can be extended to a measure:
Theorem 6.3. For every nondecreasing function : R R there exists
a unique Borel measure which extends (6.5). Two dierent functions
generate the same measure if and only if the dierence is a constant away
from the discontinuities.
Since the proof of this theorem is rather involved, we defer it to the next
section and look at some examples rst.
Example. Suppose (x) = 0 for x < 0 and (x) = 1 for x 0. Then we
obtain the so-called Dirac measure at 0, which is given by (A) = 1 if
0 A and (A) = 0 if 0 , A.
Example. Suppose (x) = x. Then the associated measure is the ordinary
Lebesgue measure on R. We will abbreviate the Lebesgue measure of a
Borel set A by (A) = [A[.
A set A is called a support for if (XA) = 0. Since a support
is not unique (see the examples below) one also denes the support of
via
supp() = x X[(O) > 0 for every open neighborhood of x. (6.7)
Equivalently one obtains supp() by removing all points which have an
open neighborhood of measure zero. In particular, this shows that supp()
is closed. If X is second countable, then supp() is indeed a support for :
For every point x , supp() let O
x
be an open neighborhood of measure
zero. These sets cover X supp() and by the Lindelof theorem there is a
countable subcover, which shows that X supp() has measure zero.
Example. Let X = R, = B. The support of the Lebesgue measure
is all of R. However, every single point has Lebesgue measure zero and so
has every countable union of points (by -additivity). Hence any set whose
complement is countable is a support. There are even uncountable sets of
6.1. Borel measures in a nut shell 101
Lebesgue measure zero (see the Cantor set below) and hence a support might
even lack an uncountable number of points.
The support of the Dirac measure centered at 0 is the single point 0.
Any set containing 0 is a support of the Dirac measure.
In general, the support of a Borel measure on R is given by
supp(d) = x R[(x ) < (x +), > 0.
Here we have used d to emphasize that we are interested in the support
of the measure d which is dierent from the support of its distribution
function (x).
A property is said to hold -almost everywhere (a.e.) if it holds on a
support for or, equivalently, if the set where it does not hold is contained
in a set of measure zero.
Example. The set of rational numbers is countable and hence has Lebesgue
measure zero, (Q) = 0. So, for example, the characteristic function of the
rationals Q is zero almost everywhere with respect to Lebesgue measure.
Any function which vanishes at 0 is zero almost everywhere with respect
to the Dirac measure centered at 0.
Example. The Cantor set is an example of a closed uncountable set of
Lebesgue measure zero. It is constructed as follows: Start with C
0
= [0, 1]
and remove the middle third to obtain C
1
= [0,
1
3
][
2
3
, 1]. Next, again remove
the middle thirds of the remaining sets to obtain C
2
= [0,
1
9
] [
2
9
,
1
3
] [
2
3
,
7
9
]
[
8
9
, 1]:
C
0
C
1
C
2
C
3
.
.
.
Proceeding like this, we obtain a sequence of nesting sets C
n
and the limit
C =
n
C
n
is the Cantor set. Since C
n
is compact, so is C. Moreover,
C
n
consists of 2
n
intervals of length 3
n
, and thus its Lebesgue measure
is (C
n
) = (2/3)
n
. In particular, (C) = lim
n
(C
n
) = 0. Using the
ternary expansion, it is extremely simple to describe: C is the set of all
x [0, 1] whose ternary expansion contains no ones, which shows that C is
uncountable (why?). It has some further interesting properties: it is totally
disconnected (i.e., it contains no subintervals) and perfect (it has no isolated
points).
Problem 6.1. Find all algebras over X = 1, 2, 3.
102 6. Almost everything about Lebesgue integration
Problem 6.2. Take some set X. Show that / = A X[A or xA is nite
is an algebra. Show that = A X[A or xA is countable is a -algebra.
(Hint: To verify closedness under unions consider the cases were all sets are
nite and where one set has nite complement.)
Problem 6.3. Show that if X is nite, then every algebra is a -algebra.
Show that this is not true in general if X is countable.
Problem 6.4 (Vitali set). Call two numbers x, y [0, 1) equivalent if xy
is rational. Construct the set V by choosing one representative from each
equivalence class. Show that V cannot be measurable with respect to any
nontrivial nite translation invariant measure on [0, 1). (Hint: How can
you build up [0, 1) from translations of V ?)
6.2. Extending a premeasure to a measure
The purpose of this section is to prove Theorem 6.3. It is rather technical and can
be skipped on rst reading.
In order to prove Theorem 6.3, we need to show how a premeasure can
be extended to a measure. As a prerequisite we rst establish that it suces
to check increasing (or decreasing) sequences of sets when checking whether
a given algebra is in fact a -algebra:
A collection of sets / is called a monotone class if A
n
A implies
A / whenever A
n
/ and A
n
A implies A / whenever A
n
/.
Every -algebra is a monotone class and the intersection of monotone classes
is a monotone class. Hence every collection of sets S generates a smallest
monotone class /(S).
Theorem 6.4. Let / be an algebra. Then /(/) = (/).
Proof. We rst show that /= /(/) is an algebra.
Put M(A) = B /[A B /. If B
n
is an increasing sequence
of sets in M(A), then A B
n
is an increasing sequence in / and hence
n
(A B
n
) /. Now
A
_
_
n
B
n
_
=
_
n
(A B
n
)
shows that M(A) is closed under increasing sequences. Similarly, M(A) is
closed under decreasing sequences and hence it is a monotone class. But
does it contain any elements? Well, if A /, we have / M(A) implying
M(A) = / for A /. Hence AB / if at least one of the sets is in /.
But this shows / M(A) and hence M(A) = / for every A /. So /
is closed under nite unions.
6.2. Extending a premeasure to a measure 103
To show that we are closed under complements, consider M = A
/[XA /. If A
n
is an increasing sequence, then XA
n
is a decreasing
sequence and X
n
A
n
=
n
XA
n
/ if A
n
M and similarly for
decreasing sequences. Hence M is a monotone class and must be equal to
/ since it contains /.
So we know that / is an algebra. To show that it is a -algebra, let
A
n
/ be given and put
A
n
=
kn
A
n
/. Then
A
n
is increasing and
A
n
=
n
A
n
/.
The typical use of this theorem is as follows: First verify some property
for sets in an algebra /. In order to show that it holds for every set in (/),
it suces to show that the collection of sets for which it holds is closed under
countable increasing and decreasing sequences (i.e., is a monotone class).
As an application we show that a premeasure determines the correspond-
ing measure uniquely (if there is one at all):
Theorem 6.5 (Uniqueness of measures). Let be a -nite premeasure on
an algebra /. Then there is at most one extension to (/).
Proof. We rst assume that (X) < . Suppose there is another extension
and consider the set
S = A (/)[(A) = (A).
I claim S is a monotone class and hence S = (/) since / S by assump-
tion (Theorem 6.4).
Let A
n
A. If A
n
S, we have (A
n
) = (A
n
) and taking limits
(Theorem 6.1 (ii)), we conclude (A) = (A). Next let A
n
A and take
limits again. This nishes the nite case. To extend our result to the -nite
case, let X
j
X be an increasing sequence such that (X
j
) < . By the
nite case (A X
j
) = (A X
j
) (just restrict , to X
j
). Hence
(A) = lim
j
(A X
j
) = lim
j
(A X
j
) = (A)
and we are done.
So it remains to ensure that there is an extension at all. For any pre-
measure we dene
(A) = inf
_
n=1
(A
n
)
_
n=1
A
n
, A
n
/
_
(6.8)
where the inmum extends over all countable covers from /. Then the
function
() = 0,
104 6. Almost everything about Lebesgue integration
A
1
A
2
(A
1
)
(A
2
), and
n=1
A
n
)
n=1
(A
n
) (subadditivity).
Note that
be an outer measure.
Then the set of all sets A satisfying the Caratheodory condition
(E) =
(A E) +
(A
E), E X (6.9)
(where A
re-
stricted to this -algebra is a measure.
Proof. We rst show that is an algebra. It clearly contains X and is closed
under complements. Let A, B . Applying Caratheodorys condition
twice nally shows
(E) =
(A B E) +
(A
B E) +
(A B
E)
+
(A
E)
((A B) E) +
((A B)
E),
where we have used de Morgan and
(A B E) +
(A
B E) +
(A B
E)
((A B) E)
which follows from subadditivity and (A B) E = (A B E) (A
B E) (A B
kn
A
n
, A =
n
A
n
. Then for every set
E we have
(
A
n
E) =
(A
n
A
n
E) +
(A
A
n
E)
=
(A
n
E) +
(
A
n1
E)
= . . . =
n
k=1
(A
k
E).
Using
A
n
and monotonicity of
, we infer
(E) =
(
A
n
E) +
(
A
n
E)
k=1
(A
k
E) +
(A
E).
6.2. Extending a premeasure to a measure 105
Letting n and using subadditivity nally gives
(E)
k=1
(A
k
E) +
(A
E)
(A E) +
(B
E)
(E) (6.10)
and we infer that is a -algebra.
Finally, setting E = A in (6.10), we have
(A) =
k=1
(A
k
A) +
(A
A) =
k=1
(A
k
)
and we are done.
Remark: The constructed measure is complete; that is, for every
measurable set A of measure zero, every subset of A is again measurable
(Problem 6.7).
The only remaining question is whether there are any nontrivial sets
satisfying the Caratheodory condition.
Lemma 6.7. Let be a premeasure on / and let
n=1
(A
n
) =
n=1
(A
n
A) +
n=1
(A
n
A
(E A) +
(E A
)
since A
n
A / is a cover for E A and A
n
A
/ is a cover for E A
.
Taking the inmum, we have
(E)
(EA)+
(EA
), which nishes
the proof.
Concerning regularity we note:
Lemma 6.8. Suppose outer regularity (6.1) holds for every set in the alge-
bra, then is outer regular.
Proof. By assumption we can replace each set A
n
in (6.8) by a possibly
slightly larger open set and hene the inmum in (6.8) can be realized with
open sets.
Thus, as a consequence we obtain Theorem 6.3 except for regularity.
Outer regularity is easy to see for a nite union of intervlas since we can
replace each interval by a possibly slightly larger open interval with only
slightly larger measure. Inner regularity will be postponed until Lemma 6.13.
106 6. Almost everything about Lebesgue integration
Problem 6.5. Show that
k=1
for A
n
such that
(A
n
) =
2
n
+
k=1
(B
nk
) and note that B
nk
n,k=1
is a cover for
n
A
n
.)
Problem 6.6. Show that
Y
, since the collection of sets for which it holds forms a -algebra by
f
1
(Y A) = Xf
1
(A) and f
1
(
j
A
j
) =
j
f
1
(A
j
).
We will be mainly interested in the case where (Y,
Y
) = (R
n
, B
n
).
Lemma 6.9. A function f : X R
n
is measurable if and only if
f
1
(I) I =
n
j=1
(a
j
, ). (6.12)
In particular, a function f : X R
n
is measurable if and only if every
component is measurable and a complex-valued function f : X C
n
is
measurable if and only if both its real and imaginary parts are.
6.3. Measurable functions 107
Proof. We need to show that B
n
is generated by rectangles of the above
form. The -algebra generated by these rectangles also contains all open
rectangles of the form I =
n
j=1
(a
j
, b
j
), which form a base for the topology.
n
(n, +], = R
n
(n, +], (6.13)
we see that f : X R is measurable if and only if
f
1
((a, ]) a R. (6.14)
Again the intervals (a, ] can also be replaced by [a, ], [, a), or [, a].
Hence it is not hard to check that the previous lemma still holds if one
either avoids undened expressions of the type and 0 or makes
a denite choice, e.g., = 0 and 0 = 0.
Moreover, the set of all measurable functions is closed under all impor-
tant limiting operations.
Lemma 6.12. Suppose f
n
: X R is a sequence of measurable functions.
Then
inf
nN
f
n
, sup
nN
f
n
, liminf
n
f
n
, limsup
n
f
n
(6.15)
108 6. Almost everything about Lebesgue integration
are measurable as well.
Proof. It suces to prove that sup f
n
is measurable since the rest follows
from inf f
n
= sup(f
n
), liminf f
n
= sup
k
inf
nk
f
n
, and limsup f
n
=
inf
k
sup
nk
f
n
. But (sup f
n
)
1
((a, ]) =
n
f
1
n
((a, ]) are Borel and we
are done.
A few immediate consequences are worthwhile noting: It follows that
if f and g are measurable functions, so are min(f, g), max(f, g), [f[ =
max(f, f), and f
(x) (6.16)
is again lower semicontinuous. Similarly, f is is upper semicontinuous if
the set f
1
([, a)) is open for every a R. In this case the inmum
f(x) = inf
(x) (6.17)
is again upper semicontinuous.
Problem 6.9. Show that the supremum over lower semicontinuous func-
tions is again lower semicontinuous.
6.4. How wild are measurable objects
In this section we want to investigate how far mesurable objects are away
from well understood ones. As our rst task we want to show that measur-
able sets can be well approximated by using closed sets from the inside and
open sets from the outside in nice spaces like R
n
.
Lemma 6.13. Let X be a metric space and a Borel measure which is
nite on nite balls. Then is -nite and for every A B(X) and any
given > 0 there exists an open set O and a closed set F such that
F A O and (OF) . (6.18)
Proof. That is -nite is immediate from the denitions since for any
xed x
0
X the open balls X
n
= B
n
(x
0
) have nite measure and satisfy
X
n
X.
To see that (6.18) holds we begin with the case when is nite. Denote
by / the set of all Borel sets satisfying (6.18). Then / contains every closed
6.4. How wild are measurable objects 109
set F: Given F dene O
n
= x X[d(x, F) < 1/n and note that O
n
are
open sets which satisfy O
n
F. Thus by Theorem 6.1 (iii) (O
n
F) 0
and hence F /.
Moreover, / is even a -algebra. That it is closed under complements
is easy to see (note that
O = XF and
F = XO are the required sets
for
A = XA). To see that it is closed under countable unions consider
A =
n=1
A
n
with A
n
/. Then there are F
n
, O
n
such that (O
n
F
n
)
2
n1
. Now O =
n=1
O
n
is open and F =
N
n=1
F
n
is closed for any
nite N. Since (A) is nite we can choose N suciently large such that
(
N+1
F
n
) /2. Then we have found two sets of the required type:
(OF)
n=1
(O
n
F
n
) + (
N+1
F
n
) . Thus / is a -algebra
containing the open sets, hence it is the entire Borel algebra.
Now suppose is not nite. Pick some x
0
X and set X
0
= B
2/3
(x
0
)
and X
n
= B
n+2/3
(x
0
)B
n2/3
(x
0
), n N. Let A
n
= A X
n
and note that
A =
n=0
A
n
. By the nite case we can choose F
n
A
n
O
n
X
n
such
that (O
n
F
n
) 2
n1
. Now set F =
n
F
n
and O =
n
O
n
and observe
that F is closed. Indeed, let x F and let x
j
be some sequence from F
converging to x. Since d(x
0
, x
j
) d(x
0
, x) this sequence must eventually
lie in F
n
F
n+1
for some xed n implying x F
n
F
n+1
= F
n
F
n+1
F.
Finally, (OF)
n=0
(O
n
F
n
) as required.
This result immediately gives us outer regularity and, if we strengthen
our assumption, also inner regularity.
Corollary 6.14. Under the assumptions of the previous lemma
(A) = inf
OA,O open
(O) = sup
FA,F closed
(F) (6.19)
and is outer regular. If every nite ball in X is compact, then is also
inner regular.
(A) = sup
KA,K compact
(K). (6.20)
Proof. Finally, (6.19) follows from (A) = (O)(OA) = (F)+(AF)
and if every nite ball is compact, for every sequence of closed sets F
n
with (F
n
) (A) we also have compact sets K
n
= F
n
B
n
(x
0
) with
(K
n
) (A).
By the HeineBorel theorem every bounded closed ball in R
n
(or C
n
)
is compact and thus has nite measure by the very denition of a Borel
measure. Hence every Borel measure on R
n
(or C
n
) satises the assumptions
of Lemma 6.13.
110 6. Almost everything about Lebesgue integration
An inner regular measure on a Hausdor space which is locally nite
(every point has a neighborhood of nite measure) is called a Radon mea-
sure. Accordingly every Borel measure on R
n
(or C
n
) is automatically a
Radon measure.
Example. Since Lebesgue measure on R is regular, we can cover the rational
numbers by an open set of arbitrary small measure (it is also not hard to nd
such an set directely) but we cannot cover it be an open set of measure zero
(since any open sets contains an interval and hence has positive measure).
However, if we slightly extend the family of admissible sets, this will be
possible.
Looking at the Borel -algebra the next general sets after open sets are
countable intersections of open sets, known as G
sets
(here F and and for the french words ferme and somme, respectively).
Example. The irrational numbers are a G
set.
Corollary 6.15. A Borel set in R
n
is measurable if and only if it diers
from a G
n
O
n
. Then (GA) (O
n
A) 1/n for any n and thus
(GA) = 0 The second claim is analogous.
Problem 6.10. Show directly (without using regularity) that for every > 0
there is an open set O of Lebesgue measure [O[ < which covers the rational
numbers.
6.5. Integration Sum me up, Henri
Throughout this section (X, , ) will be a measure space. A measurable
function s : X R is called simple if its range is nite; that is, if
s =
p
j=1
j
A
j
, Ran(s) =
j
p
j=1
, A
j
= s
1
(
j
) . (6.21)
Here
A
is the characteristic function of A; that is,
A
(x) = 1 if x A
and
A
(x) = 0 otherwise. Note that the set of simple functions is a vector
6.5. Integration Sum me up, Henri 111
space and while there are dierent ways of writing a simple function as a
linear combination of characteristic functions, the representation (6.21) is
unique.
For a nonnegative simple function s as in (6.21) we dene its integral
as
_
A
s d =
p
j=1
j
(A
j
A). (6.22)
Here we use the convention 0 = 0.
Lemma 6.16. The integral has the following properties:
(i)
_
A
s d =
_
X
A
s d.
(ii)
_
j=1
A
j
s d =
j=1
_
A
j
s d, A
j
A
k
= for j ,= k.
(iii)
_
A
s d =
_
A
s d, 0.
(iv)
_
A
(s +t)d =
_
A
s d +
_
A
t d.
(v) A B
_
A
s d
_
B
s d.
(vi) s t
_
A
s d
_
A
t d.
Proof. (i) is clear from the denition. (ii) follows from -additivity of .
(iii) is obvious. (iv) Let s =
j
j
A
j
, t =
j
j
B
j
and abbreviate
C
jk
= (A
j
B
k
) A. Then, by (ii),
_
A
(s +t)d =
j,k
_
C
jk
(s +t)d =
j,k
(
j
+
k
)(C
jk
)
=
j,k
_
_
C
jk
s d +
_
C
jk
t d
_
=
_
A
s d +
_
A
t d.
(v) follows from monotonicity of . (vi) follows since by (iv) we can write
s =
j
j
C
j
, t =
j
j
C
j
where, by assumption,
j
j
.
Our next task is to extend this denition to arbitrary positive functions
by
_
A
f d = sup
sf
_
A
s d, (6.23)
where the supremum is taken over all simple functions s f. Note that,
except for possibly (ii) and (iv), Lemma 6.16 still holds for arbitrary non-
negative functions s, t.
Theorem 6.17 (Monotone convergence). Let f
n
be a monotone nondecreas-
ing sequence of nonnegative measurable functions, f
n
f. Then
_
A
f
n
d
_
A
f d. (6.24)
112 6. Almost everything about Lebesgue integration
Proof. By property (vi),
_
A
f
n
d is monotone and converges to some num-
ber . By f
n
f and again (vi) we have
_
A
f d.
To show the converse, let s be simple such that s f and let (0, 1). Put
A
n
= x A[f
n
(x) s(x) and note A
n
A (show this). Then
_
A
f
n
d
_
An
f
n
d
_
An
s d.
Letting n , we see
_
A
s d.
Since this is valid for every < 1, it still holds for = 1. Finally, since
s f is arbitrary, the claim follows.
In particular
_
A
f d = lim
n
_
A
s
n
d, (6.25)
for every monotone sequence s
n
f of simple functions. Note that there is
always such a sequence, for example,
s
n
(x) =
n2
n
k=0
k
2
n
f
1
(A
k
)
(x), A
k
= [
k
2
n
,
k + 1
2
n
), A
n2
n = [n, ). (6.26)
By construction s
n
converges uniformly if f is bounded, since s
n
(x) = n if
f(x) n and f(x) s
n
(x) <
1
2
n
if f(x) n.
Now what about the missing items (ii) and (iv) from Lemma 6.16? Since
limits can be spread over sums, the extension is linear (i.e., item (iv) holds)
and (ii) also follows directly from the monotone convergence theorem. We
even have the following result:
Lemma 6.18. If f 0 is measurable, then d = f d dened via
(A) =
_
A
f d (6.27)
is a measure such that
_
g d =
_
gf d. (6.28)
Proof. As already mentioned, additivity of is equivalent to linearity of the
integral and -additivity follows from the monotone convergence theorem:
(
_
n=1
A
n
) =
_
(
n=1
An
)f d =
n=1
_
An
f d =
n=1
(A
n
).
6.5. Integration Sum me up, Henri 113
The second claim holds for simple functions and hence for all functions by
construction of the integral.
If f
n
is not necessarily monotone, we have at least
Theorem 6.19 (Fatous lemma). If f
n
is a sequence of nonnegative mea-
surable function, then
_
A
liminf
n
f
n
d liminf
n
_
A
f
n
d. (6.29)
Proof. Set g
n
= inf
kn
f
k
. Then g
n
f
n
implying
_
A
g
n
d
_
A
f
n
d.
Now take the liminf on both sides and note that by the monotone conver-
gence theorem
liminf
n
_
A
g
n
d = lim
n
_
A
g
n
d =
_
A
lim
n
g
n
d =
_
A
liminf
n
f
n
d,
proving the claim.
If the integral is nite for both the positive and negative part f
=
max(f, 0) of an arbitrary measurable function f, we call f integrable
and set
_
A
f d =
_
A
f
+
d
_
A
f
d. (6.30)
The set of all integrable functions is denoted by /
1
(X, d).
Lemma 6.20. Lemma 6.16 holds for integrable functions s, t.
Similarly, we handle the case where f is complex-valued by calling f
integrable if both the real and imaginary part are and setting
_
A
f d =
_
A
Re(f)d + i
_
A
Im(f)d. (6.31)
Clearly f is integrable if and only if [f[ is.
Lemma 6.21. For all integrable functions f, g we have
[
_
A
f d[
_
A
[f[ d (6.32)
and (triangle inequality)
_
A
[f +g[ d
_
A
[f[ d +
_
A
[g[ d. (6.33)
114 6. Almost everything about Lebesgue integration
Proof. Put =
z
|z|
, where z =
_
A
f d (without restriction z ,= 0). Then
[
_
A
f d[ =
_
A
f d =
_
A
f d =
_
A
Re(f) d
_
A
[f[ d,
proving the rst claim. The second follows from [f +g[ [f[ +[g[.
Lemma 6.22. Let f be measurable. Then
_
X
[f[ d = 0 f(x) = 0 a.e. (6.34)
Moreover, suppose f is nonnegative or integrable. Then
(A) = 0
_
A
f d = 0. (6.35)
Proof. Observe that we have A = x[f(x) ,= 0 =
n
A
n
, where A
n
=
x[ [f(x)[ >
1
n
. If
_
[f[d = 0 we must have (A
n
) = 0 for every n and
hence (A) = lim
n
(A
n
) = 0.
Conversely we have
_
X
[f[d =
_
A
[f[d = 0 where A (dened as before)
satises (A) = 0. By our convention 0 = 0 the claim holds for any
simple function and hence for any f by denition of the integral (6.23).
Since any function can be written as a linear combination of four non-
negative functions this also implies the last part.
Note that the proof also shows that if f is not 0 almost everywhere,
there is an > 0 such that (x[ [f(x)[ ) > 0.
In particular, the integral does not change if we restrict the domain of
integration to a support of or if we change f on a set of measure zero. In
particular, functions which are equal a.e. have the same integral.
Finally, our integral is well behaved with respect to limiting operations.
Theorem 6.23 (Dominated convergence). Let f
n
be a convergent sequence
of measurable functions and set f = lim
n
f
n
. Suppose there is an inte-
grable function g such that [f
n
[ g. Then f is integrable and
lim
n
_
f
n
d =
_
fd. (6.36)
Proof. The real and imaginary parts satisfy the same assumptions and so
do the positive and negative parts. Hence it suces to prove the case where
f
n
and f are nonnegative.
By Fatous lemma
liminf
n
_
A
f
n
d
_
A
f d
6.5. Integration Sum me up, Henri 115
and
liminf
n
_
A
(g f
n
)d
_
A
(g f)d.
Subtracting
_
A
g d on both sides of the last inequality nishes the proof
since liminf(f
n
) = limsup f
n
.
Remark: Since sets of measure zero do not contribute to the value of the
integral, it clearly suces if the requirements of the dominated convergence
theorem are satised almost everywhere (with respect to ).
Example. Note that the existence of g is crucial: The functions f
n
(x) =
1
2n
[n,n]
(x) on R converge uniformly to 0 but
_
R
f
n
(x)dx = 1.
Example. If (x) = (x) is the Dirac measure at 0, then
_
R
f(x)d(x) = f(0).
In fact, the integral can be restricted to any support and hence to 0.
If (x) =
n
(xx
n
) is a sum of Dirac measures, (x) centered at
x = 0, then (Problem 6.11)
_
R
f(x)d(x) =
n
f(x
n
).
Hence our integral contains sums as special cases.
Problem 6.11. Consider a countable set of measures
n
and numbers
n
0. Let =
n
and show
_
A
f d =
n
_
A
f d
n
(6.37)
for any measurable function which is either nonegative or integrable.
Problem 6.12. Show that the set B(X) of bounded measurable functions
with the sup norm is a Banach space. Show that the set S(X) of simple
functions is dense in B(X). Show that the integral is a bounded linear
functional on B(X) if (X) < . (Hence Theorem 1.29 could be used to
extend the integral from simple to bounded measurable functions.)
Problem 6.13. Show that the dominated convergence theorem implies (un-
der the same assumptions)
lim
n
_
[f
n
f[d = 0.
116 6. Almost everything about Lebesgue integration
Problem 6.14. Let X R, Y be some measure space, and f : XY R.
Suppose y f(x, y) is measurable for every x and x f(x, y) is continuous
for every y. Show that
F(x) =
_
A
f(x, y) d(y) (6.38)
is continuous if there is an integrable function g(y) such that [f(x, y)[ g(y).
Problem 6.15. Let X R, Y be some measure space, and f : XY R.
Suppose y f(x, y) is measurable for all x and x f(x, y) is dierentiable
for a.e. y. Show that
F(x) =
_
A
f(x, y) d(y) (6.39)
is dierentiable if there is an integrable function g(y) such that [
x
f(x, y)[
g(y). Moreover, y
x
f(x, y) is measurable and
F
(x) =
_
A
x
f(x, y) d(y) (6.40)
in this case.
6.6. Transformation of measures and integrals
Finally we want to transform measures. Let f : X Y be a measurable
function. Given a measure on X we can introduce a measure f
on Y
via
(f
)(A) = (f
1
(A)). (6.41)
It is straightforward to check that f
) =
_
X
g f d. (6.42)
Proof. In fact, it suces to check this formula for simple functions g, which
follows since
A
f =
f
1
(A)
.
For the rest of this section we will look at Borel measures on R. Recall
that we denote the corresponding (right continuous) distribution function,
dened via (6.3), by the same letter.
For any nondecreasing function f : R R we will dene its (right
continuous) generalized inverse as
f
1
(y) = inf f
1
((y, )). (6.43)
6.6. Transformation of measures and integrals 117
We will need the set
L(f) = y[f
1
((y, )) = (f
1
(y), ). (6.44)
Note that y , L(f) if and only if there is some x such that y [f(x), f(x)).
Lemma 6.25. Let (x) be a nondecreasing right continuous function on R
and its associated measure. Let f(x) be a nondecreasing function on R
such that ((0, )) < if f is bounded above and ((, 0)) < if f is
bounded below.
Then f
)((y
0
, y
1
]) = (f
1
((y
0
, y
1
])) = ((f
1
(y
0
), f
1
(y
1
)])
= (f
1
(y
1
)) (f
1
(y
0
)) = ( f
1
)(y
1
) ( f
1
)(y
0
)
if y
j
is either in L(f) or satises (f
1
(y
j
)) = 0. For the last claim observe
that f
1
((y, )) will jump from (f
1
(y), ) to [f
1
(y), ) at y = f(x).
Example. For example, consider f(x) =
[0,)
(x) and = , the Dirac
measure centered at 0 (note that (x) = f(x)). Then
f
1
(y) =
_
_
+, 1 y,
0, 0 y < 1,
, y < 0,
and L(f) = (, 0)[1, ). Moreover, (f
1
(y)) =
[0,)
(y) and (f
)(y) =
[1,)
(y). If we choose g(x) =
(0,)
(x), then g
1
(y) = f
1
(y) and L(g) =
R. Hence (g
1
(y)) =
[0,)
(y) = (g
)(y).
For later use it is worth while to single out the following consequence:
Corollary 6.26. Let , f be as in the previous lemma. Then, if d(f
) =
d(f
1
) if f is left continuous or is continuous. Similarly, if is assumed
to be left instead of right continuous, then the claim holds if f is right instead
of left continuous and the left continuous version of the generalized inverse
is chosen.
One might suspect that one can always get equality in the above corollary
by choosing a dierent version for the generalized inverse of f. However,
118 6. Almost everything about Lebesgue integration
since the generalized inverse of f looses the information about the value of
f at a discontinuity, this is not possible.
If we choose f to be the distribution function of we get the following
generalization of the integration by substitution rule. To formulate it
we introduce
i
(y) = (
1
(y)). (6.45)
By Ran() we will denote the range of a function and by hull(Ran()) its
convex hull.
Corollary 6.27. Suppose , are two nondecreasing functions on R with
either right continuous or continuous. Then we have
_
R
(g ) d( ) =
_
hull(Ran())
(g i
)d (6.46)
for any Borel function g which is either nonnegative or for which one of the
two integrals is nite. Similarly, if is left continuous and i
is replaced by
(
1
(y+)).
In particular, note that
_
R
(g ) d( ) =
_
Ran()
g d, (x) = 0 x. (6.47)
Proof. First of all we can assume that is supported on hull(Ran())
without loss of generality. Moreover, abbreviate f(y) =
1
(y). Then
by Corollary 6.27 and Theorem 6.24
_
(g ) d( ) =
_
(g ) df
=
_
(g f) d.
(y) y if y =
f(x) = f(x+) and i
Problem 6.16. Show that the left continuous version of the generalized
inverse is given by
f
1
(y) = inf f
1
([y, )).
Problem 6.17. Recall that a (binary) relation R on R is a subset of R
2
.
Given two relations R, S one denes R
1
= (y, x)[(x, y) R and S R =
(x, z)[(x, y) R and (y, z) S for some y.
For a nondecreasing function f introduce the relation
(f) = (x, y)[y [f(x), f(x+)]. (6.49)
Note that (f) does not depend on the values of f at a discontinuity and f
can be partially recovered from (f) using f(x) = inf (f)(x) and f(x+) =
sup (f)(x), where (f)(x) = y[(x, y) (f) = [f(x), f(x+)].
Show that (f
1
) = (f)
1
. Conclude that (f
1
)
1
(x) = f(x+). Show
that (f) (f
1
) = (y, z)[y, z [f(x), f(x+)] for some x and con-
clude that f(f
1
(y)) y if f(x) = f(x) for the largest x with y
[f(x), f(x+)] and f(f
1
(y)) y if f(x) = f(x+) for the smallest x with
y [f(x), f(x+)].
Problem 6.18. Let d() =
[0,1]
()d and f() =
(t,)
(), t R.
Compute f
.
Problem 6.19. Let be Lebesgue measure on R. Show that if f C
1
(R)
with f
> 0, then
d(f
)(x) =
1
f
(f
1
(x))
dx.
6.7. Product measures
Let
1
and
2
be two measures on
1
and
2
, respectively. Let
1
2
be
the -algebra generated by rectangles of the form A
1
A
2
.
Example. Let B be the Borel sets in R. Then B
2
= BB are the Borel
sets in R
2
(since the rectangles are a basis for the product topology).
Any set in
1
2
has the section property; that is,
Lemma 6.29. Suppose A
1
2
. Then its sections
A
1
(x
2
) = x
1
[(x
1
, x
2
) A and A
2
(x
1
) = x
2
[(x
1
, x
2
) A (6.50)
are measurable.
120 6. Almost everything about Lebesgue integration
Proof. Denote all sets A
1
2
with the property that A
1
(x
2
)
1
by
S. Clearly all rectangles are in S and it suces to show that S is a -algebra.
Now, if A S, then (A
)
1
(x
2
) = (A
1
(x
2
))
2
and thus S is closed under
complements. Similarly, if A
n
S, then (
n
A
n
)
1
(x
2
) =
n
(A
n
)
1
(x
2
) shows
that S is closed under countable unions.
This implies that if f is a measurable function on X
1
X
2
, then f(., x
2
) is
measurable on X
1
for every x
2
and f(x
1
, .) is measurable on X
2
for every x
1
(observe A
1
(x
2
) = x
1
[f(x
1
, x
2
) B, where A = (x
1
, x
2
)[f(x
1
, x
2
) B).
Given two measures
1
on
1
and
2
on
2
, we now want to construct
the product measure
1
2
on
1
2
such that
2
(A
1
A
2
) =
1
(A
1
)
2
(A
2
), A
j
j
, j = 1, 2. (6.51)
Theorem 6.30. Let
1
and
2
be two -nite measures on
1
and
2
,
respectively. Let A
1
2
. Then
2
(A
2
(x
1
)) and
1
(A
1
(x
2
)) are mea-
surable and
_
X
1
2
(A
2
(x
1
))d
1
(x
1
) =
_
X
2
1
(A
1
(x
2
))d
2
(x
2
). (6.52)
Proof. Let S be the set of all subsets for which our claim holds. Note that S
contains at least all rectangles. It even contains the algebra of nite disjoint
unions of rectangles (Problem 6.20). Thus it suces to show that S is a
monotone class by Theorem 6.4. If
1
and
2
are nite, measurability and
equality of both integrals follow from the monotone convergence theorem for
increasing sequences of sets and from the dominated convergence theorem
for decreasing sequences of sets.
If
1
and
2
are -nite, let X
i,j
X
i
with
i
(X
i,j
) < for i = 1, 2.
Now
2
((A X
1,j
X
2,j
)
2
(x
1
)) =
2
(A
2
(x
1
) X
2,j
)
X
1,j
(x
1
) and similarly
with 1 and 2 exchanged. Hence by the nite case
_
X
1
2
(A
2
X
2,j
)
X
1,j
d
1
=
_
X
2
1
(A
1
X
1,j
)
X
2,j
d
2
(6.53)
and the -nite case follows from the monotone convergence theorem.
Hence for given A
1
2
we can dene
2
(A) =
_
X
1
2
(A
2
(x
1
))d
1
(x
1
) =
_
X
2
1
(A
1
(x
2
))d
2
(x
2
) (6.54)
or equivalently, since
A
1
(x
2
)
(x
1
) =
A
2
(x
1
)
(x
2
) =
A
(x
1
, x
2
),
2
(A) =
_
X
1
__
X
2
A
(x
1
, x
2
)d
2
(x
2
)
_
d
1
(x
1
)
=
_
X
2
__
X
1
A
(x
1
, x
2
)d
1
(x
1
)
_
d
2
(x
2
). (6.55)
6.7. Product measures 121
Then
1
2
gives rise to a measure on A
1
2
since -additivity
follows from the monotone convergence theorem.
Note that (6.51) uniquely denes
1
2
as a -nite premeasure on
the algebra of nite disjoint unions of rectangles. Hence by Theorem 6.5 it
is the only measure on
1
2
satisfying (6.51).
Finally we have
Theorem 6.31 (Fubini). Let f be a measurable function on X
1
X
2
and
let
1
,
2
be -nite measures on X
1
, X
2
, respectively.
(i) If f 0, then
_
f(., x
2
)d
2
(x
2
) and
_
f(x
1
, .)d
1
(x
1
) are both
measurable and
__
f(x
1
, x
2
)d
1
2
(x
1
, x
2
) =
_ __
f(x
1
, x
2
)d
1
(x
1
)
_
d
2
(x
2
)
=
_ __
f(x
1
, x
2
)d
2
(x
2
)
_
d
1
(x
1
). (6.56)
(ii) If f is complex, then
_
[f(x
1
, x
2
)[d
1
(x
1
) /
1
(X
2
, d
2
), (6.57)
respectively,
_
[f(x
1
, x
2
)[d
2
(x
2
) /
1
(X
1
, d
1
), (6.58)
if and only if f /
1
(X
1
X
2
, d
1
d
2
). In this case (6.56)
holds.
Proof. By Theorem 6.30 and linearity the claim holds for simple functions.
To see (i), let s
n
f be a sequence of nonnegative simple functions. Then it
follows by applying the monotone convergence theorem (twice for the double
integrals).
For (ii) we can assume that f is real-valued by considering its real and
imaginary parts separately. Moreover, splitting f = f
+
f
2
.
Proof. Outer regularity holds for every rectangle and hence also for the
algebra of nite disjoint unions of rectangles. Thus the claim follows from
Lemma 6.8.
Finally, note that we can iterate this procedure.
122 6. Almost everything about Lebesgue integration
Lemma 6.33. Suppose (X
j
,
j
,
j
), j = 1, 2, 3, are -nite measure spaces.
Then (
1
2
)
3
=
1
(
2
3
) and
(
1
2
)
3
=
1
(
2
3
). (6.59)
Proof. First of all note that (
1
2
)
3
=
1
(
2
3
) is the sigma
algebra generated by the rectangles A
1
A
2
A
3
in X
1
X
2
X
3
. Moreover,
since
((
1
2
)
3
)(A
1
A
2
A
3
) =
1
(A
1
)
2
(A
2
)
3
(A
3
)
= (
1
(
2
3
))(A
1
A
2
A
3
),
the two measures coincide on the algebra of nite disjoint unions of rectan-
gles. Hence they coincide everywhere by Theorem 6.5.
Example. If is Lebesgue measure on R, then
n
= is Lebesgue
measure on R
n
. Since is outer regular, so is
n
. Of course regularity also
follows from Corollary 6.14.
Problem 6.20. Show that the set of all nite union of rectangles A
1
A
2
forms an algebra.
Problem 6.21. Let U C be a domain, Y be some measure space, and f :
U Y R. Suppose y f(z, y) is measurable for every z and z f(z, y)
is holomorphic for every y. Show that
F(z) =
_
A
f(z, y) d(y) (6.60)
is holomorphic if for every compact subset V U there is an integrable
function g(y) such that [f(z, y)[ g(y), z V . (Hint: Use Fubini and
Morera.)
6.8. Appendix: The connection with the Riemann integral
In this section we want to investigate the connection with the Riemann
integral. We restrict our attention to compact intervals [a, b] and bounded
real-valued functions f. A partition of [a, b] is a nite set P = x
0
, . . . , x
n
with
a = x
0
< x
1
< < x
n1
< x
n
= b. (6.61)
The number
|P| = min
1jn
x
j
x
j1
(6.62)
6.8. Appendix: The connection with the Riemann integral 123
is called the norm of P. Given a partition P and a bounded real-valued
function f we can dene
s
P,f,
(x) =
n
j=1
m
j
[x
j1
,x
j
)
(x), m
j
= inf
x[x
j1
,x
j
)
f(x), (6.63)
s
P,f,+
(x) =
n
j=1
M
j
[x
j1
,x
j
)
(x), M
j
= sup
x[x
j1
,x
j
)
f(x), (6.64)
Hence s
P,f,
(x) is a step function approximating f from below and s
P,f,+
(x)
is a step function approximating f from above. In particular,
m s
P,f,
(x) f(x) s
P,f,
(x) M, m = inf
x[a,b]
f(x), M = sup
x[a,b]
f(x).
(6.65)
Moreover, we can dene the upper and lower Riemann sum associated with
P as
L(P, f) =
n
j=1
m
j
(x
j
x
j1
), U(P, f) =
n
j=1
M
j
(x
j
x
j1
). (6.66)
Of course, L(f, P) is just the Lebesgue integral of s
P,f,
and U(f, P) is the
Lebesgue integral of s
P,f,+
. In particular, L(P, f) approximates the area
under the graph of f from below and U(P, f) approximates this area from
above.
By the above inequality
m(b a) L(P, f) U(P, f) M (b a). (6.67)
We say that the partition P
2
is a renement of P
1
if P
1
P
2
and it is not
hard to check, that in this case
s
P
1
,f,
(x) s
P
2
,f,
(x) f(x) s
P
2
,f,+
(x) s
P
1
,f,+
(x) (6.68)
as well as
L(P
1
, f) L(P
2
, f) U(P
2
, f) U(P
1
, f). (6.69)
Hence we dene the lower, upper Riemann integral of f as
_
f(x)dx = sup
P
L(P, f),
_
f(x)dx = inf
P
U(P, f), (6.70)
respectively. Clearly
m(b a)
_
f(x)dx
_
f(x)dx M (b a). (6.71)
We will call f Riemann integrable if both values coincide and the common
value will be called the Riemann integral of f.
124 6. Almost everything about Lebesgue integration
Example. Let [a, b] = [0, 1] and f(x) =
Q
(x). Then s
P,f,
(x) = 0 and
s
P,f,+
(x) = 1. Hence
_
f(x)dx = 0 and
_
f(x)dx = 1 and f is not Riemann
integrable.
On the other hand, every continuous function f C[a, b] is Riemann
integrable (Problem 6.22).
Lemma 6.34. A function f is Riemann integrable if and only if there exists
a sequence of partitions P
j
such that
lim
j
L(P
j
, f) = lim
j
U(P
j
, f). (6.72)
In this case the above limit equals the Riemann integral of f and P
j
can be
chosen such that P
j
P
j+1
and |P
j
| 0.
Proof. If there is such a sequence of partitions then f is integrable by
lim
j
L(P
j
, f) sup
P
L(P, f) inf
P
U(P, f) lim
j
U(P
j
, f).
Conversely, given an integrable f, there is a sequence of partitions P
L,j
such that
_
f(x)dx = lim
j
L(P
L,j
, f) and a sequence P
U,j
such that
_
f(x)dx =
lim
j
L(P
U,j
, f). By (6.69) the common renement P
j
= P
L,j
P
U,j
is the
partition we are looking for. Since, again by (6.69), any renement will also
work, the last claim follows.
With the help of this lemma we can give a characterization of Riemann
integrable functions and show that the Riemann integral coincides with the
Lebesgue integral.
Theorem 6.35 (Lebesgue). A bounded function f : [a, b] R is Riemann
integrable if and only if the set of its discontinuities is of Lebesgue measure
zero. Moreover, in this case the Riemann and the Lebesgue integral of f
coincide.
Proof. Suppose f is Riemann integrable and let P
j
be a sequence of parti-
tions as in Lemma 6.34. Then s
f,P
j
,
(x) will be monotone and hence con-
verge to some function s
f,
(x) f(x). Similarly, s
f,P
j
,+
(x) will converge to
some function s
f,+
(x) f(x). Moreover, by dominated convergence
0 = lim
j
_
_
s
f,P
j
,+
(x) s
f,P
j
,
(x)
_
dx =
_
_
s
f,+
(x) s
f,
(x)
_
dx
and thus by Lemma 6.22 s
f,+
(x) = s
f,
(x) almost everywhere. Moreover,
f is continuous at every x at which equality holds and which is not in any
of the partitions. Since the rst as well as the second set have Lebesgue
measure zero, f is continuous almost everywhere and
lim
j
L(P
j
, f) = lim
j
U(P
j
, f) =
_
s
f,
(x)dx =
_
f(x)dx.
6.8. Appendix: The connection with the Riemann integral 125
Conversely, let f be continuous almost everywhere and choose some sequence
of partitions P
j
with |P
j
| 0. Then at every x where f is continuous we
have lim
j
s
f,P
j
,
(x) = f(x) implying
lim
j
L(P
j
, f) =
_
s
f,
(x)dx =
_
f(x)dx =
_
s
f,+
(x)dx = lim
j
U(P
j
, f)
by the dominated convergence theorem.
Problem 6.22. Show that for any function f C[a, b] we have
lim
P0
L(P, f) = lim
P0
U(P, f).
In particular, f is Riemann integrable.
Chapter 7
The Lebesgue spaces
L
p
7.1. Functions almost everywhere
We x some -nite measure space (X, , ) and dene the L
p
norm by
|f|
p
=
__
X
[f[
p
d
_
1/p
, 1 p, (7.1)
and denote by /
p
(X, d) the set of all complex-valued measurable functions
for which |f|
p
is nite. First of all note that /
p
(X, d) is a linear space,
since [f + g[
p
2
p
max([f[, [g[)
p
= 2
p
max([f[
p
, [g[
p
) 2
p
([f[
p
+ [g[
p
). Of
course our hope is that /
p
(X, d) is a Banach space. However, Lemma 6.22
implies that there is a small technical problem (recall that a property is said
to hold almost everywhere if the set where it fails to hold is contained in a
set of measure zero):
Lemma 7.1. Let f be measurable. Then
_
X
[f[
p
d = 0 (7.2)
if and only if f(x) = 0 almost everywhere with respect to .
Thus |f|
p
= 0 only implies f(x) = 0 for almost every x, but not for all!
Hence |.|
p
is not a norm on /
p
(X, d). The way out of this misery is to
identify functions which are equal almost everywhere: Let
A(X, d) = f[f(x) = 0 -almost everywhere. (7.3)
127
128 7. The Lebesgue spaces L
p
Then A(X, d) is a linear subspace of /
p
(X, d) and we can consider the
quotient space
L
p
(X, d) = /
p
(X, d)/A(X, d). (7.4)
If d is the Lebesgue measure on X R
n
, we simply write L
p
(X). Observe
that |f|
p
is well-dened on L
p
(X, d).
Even though the elements of L
p
(X, d) are, strictly speaking, equiva-
lence classes of functions, we will still call them functions for notational
convenience. However, note that for f L
p
(X, d) the value f(x) is not
well dened (unless there is a continuous representative and dierent con-
tinuous functions are in dierent equivalence classes, e.g., in the case of
Lebesgue measure).
With this modication we are back in business since L
p
(X, d) turns out
to be a Banach space. We will show this in the following sections. Moreover,
note that L
2
(X, d) is a Hilbert space with scalar product given by
f, g =
_
X
f(x)
g(x)d(x). (7.5)
But before that let us also dene L
(X, d) L
p
(X, d) and
lim
p
|f|
p
= |f|
, f L
(X, d).
7.2. Jensen Holder Minkowski 129
Problem 7.3. Construct a function f L
p
(0, 1) which has a singularity at
every rational number in [0, 1] (such that the essential supremum is innite
on every open subinterval). (Hint: Start with the function f
0
(x) = [x[
(x) = lim
0
(x)(x)
(x)
+
(x). The
inequality is strict for y ,= x if is strictly convex.
Proof. Abbreviate D(x, y) = D(y, x) =
(y)(x)
yx
and observe that (7.10)
implies
D(x, z) D(y, z) for x < y.
Hence
(x)
+
(x)
(y)
+
(y) for x < y.
So (ii) follows after observing that a monotone function can have at most a
countable number of jumps. Next
+
(x) D(y, x)
(y)
shows (y) (x)+
(x)
is monotone nondecreasing (e.g.,
0 if C
2
).
With these preparations out of the way we can show
Theorem 7.3 (Jensens inequality). Let : (a, b) R be convex (a =
or b = being allowed). Suppose is a nite measure satisfying (X) = 1
and f /
1
(X, d) with a < f(x) < b. Then the negative part of f is
integrable and
(
_
X
f d)
_
X
( f) d. (7.11)
For 0 nondecreasing and f 0 the requirement that f is integrable can
be dropped if (b) is understood as lim
xb
(x).
Proof. By (iii) of the previous lemma we have
(f(x)) (I) +(f(x) I), I =
_
X
f d (a, b).
This shows that the negative part of f is integrable and integrating
over X nishes the proof in the case f /
1
. If f 0 we note that for
X
n
= x X[f(x) n the rst part implies
(
1
(X
n
)
_
Xn
f d)
1
(X
n
)
_
Xn
(f) d.
Taking n the claim follows from X
n
X and the monotone conver-
gence theorem.
Observe that if is strictly convex, then equality can only occur if f is
constant.
Now we are ready to prove
7.2. Jensen Holder Minkowski 131
Theorem 7.4 (Holders inequality). Let p and q be dual indices; that is,
1
p
+
1
q
= 1 (7.12)
with 1 p . If f L
p
(X, d) and g L
q
(X, d), then fg L
1
(X, d)
and
|f g|
1
|f|
p
|g|
q
. (7.13)
Proof. The case p = 1, q = (respectively p = , q = 1) follows directly
from the properties of the integral and hence it remains to consider 1 <
p, q < .
First of all it is no restriction to assume |g|
q
= 1. Let A = x[ [g(x)[ >
0, then (note (1 q)p = q)
|f g|
p
1
=
_
A
[f[ [g[
1q
[g[
q
d
_
A
([f[ [g[
1q
)
p
[g[
q
d =
_
A
[f[
p
d |f|
p
p
,
where we have used Jensens inequality with (x) = [x[
p
applied to the
function h = [f[ [g[
1q
and measure d = [g[
q
d (note (X) =
_
[g[
q
d =
|g|
q
q
= 1).
Note that in the special case p = 2 we have q = 2 and Holders inequality
reduces to the CauchySchwarz ineuality.
As a consequence we also get
Theorem 7.5 (Minkowskis inequality). Let f, g L
p
(X, d). Then
|f +g|
p
|f|
p
+|g|
p
. (7.14)
Proof. Since the cases p = 1, are straightforward, we only consider 1 <
p < . Using [f + g[
p
[f[ [f + g[
p1
+ [g[ [f + g[
p1
, we obtain from
Holders inequality (note (p 1)q = p)
|f +g|
p
p
|f|
p
|(f +g)
p1
|
q
+|g|
p
|(f +g)
p1
|
q
= (|f|
p
+|g|
p
)|(f +g)|
p1
p
.
k=1
x
k
k
n
k=1
k
x
k
, if
n
k=1
k
= 1, (7.15)
for
k
> 0, x
k
> 0. (Hint: Take a sum of Dirac-measures and use that the
exponential function is convex.)
132 7. The Lebesgue spaces L
p
Problem 7.5. Show the following generalization of Holders inequality:
|f g|
r
|f|
p
|g|
q
,
1
p
+
1
q
=
1
r
. (7.16)
Problem 7.6. Show the iterated Holders inequality:
|f
1
f
m
|
r
m
j=1
|f
j
|
p
j
,
1
p
1
+ +
1
p
m
=
1
r
. (7.17)
Problem 7.7. Show that
|u|
p
0
(X)
1
p
0
1
p
|u|
p
, 1 p
0
p.
(Hint: Holders inequality.)
Problem 7.8 (Lyapunov inequality). Let 0 < < 1. Show that if f
L
p
1
L
p
2
, then f L
p
and
|f|
p
|f|
p
1
|f|
1
p
2
, (7.18)
where
1
p
=
p
1
+
1
p
2
.
7.3. Nothing missing in L
p
Finally it remains to show that L
p
(X, d) is complete.
Theorem 7.6 (RieszFischer). The space L
p
(X, d), 1 p , is a
Banach space.
Proof. We begin with the case 1 p < . Suppose f
n
is a Cauchy
sequence. It suces to show that some subsequence converges (show this).
Hence we can drop some terms such that
|f
n+1
f
n
|
p
1
2
n
.
Now consider g
n
= f
n
f
n1
(set f
0
= 0). Then
G(x) =
k=1
[g
k
(x)[
is in L
p
. This follows from
_
_
_
n
k=1
[g
k
[
_
_
_
p
k=1
|g
k
|
p
|f
1
|
p
+ 1
using the monotone convergence theorem. In particular, G(x) < almost
everywhere and the sum
n=1
g
n
(x) = lim
n
f
n
(x)
7.3. Nothing missing in L
p
133
is absolutely convergent for those x. Now let f(x) be this limit. Since
[f(x) f
n
(x)[
p
converges to zero almost everywhere and [f(x) f
n
(x)[
p
(2G(x))
p
L
1
, dominated convergence shows |f f
n
|
p
0.
In the case p = note that the Cauchy sequence property [f
n
(x)
f
m
(x)[ < for n, m > N holds except for sets A
m,n
of measure zero. Since
A =
n,m
A
n,m
is again of measure zero, we see that f
n
(x) is a Cauchy
sequence for x XA. The pointwise limit f(x) = lim
n
f
n
(x), x XA,
is the required limit in L
j=1
O
j
with
O
j
B. Moreover, by considering the set of all nite
unions of elements from B, it is no restriction to assume
n
j=1
O
j
B. Hence
there is an increasing sequence
O
n
O with
O
n
B. By monotone con-
vergence, |
O
On
|
p
0 and hence the set of all characteristic functions
O
with
O B is total.
To nish this chapter, let us show that continuous functions are dense
in L
p
.
134 7. The Lebesgue spaces L
p
Theorem 7.9. Let X be a locally compact metric space and let be a regular
Borel measure. Then the set C
c
(X) of continuous functions with compact
support is dense in L
p
(X, d), 1 p < .
Proof. As in the previous proof the set of all characteristic functions
K
(x)
with K compact is total (using inner regularity). Hence it suces to show
that
K
(x) can be approximated by continuous functions. By outer regu-
larity there is an open set O K such that (OK) . By Urysohns
lemma (Lemma 1.16) there is a continuous function with compact support
f
[
p
d =
_
O\K
[f
[
p
d (OK) ,
we have |f
K
| 0 and we are done.
If X is some subset of R
n
, we can do even better. A nonnegative function
u C
c
(R
n
) is called a mollier if
_
R
n
u(x)dx = 1. (7.19)
The standard mollier is u(x) = exp(
1
|x|
2
1
) for [x[ < 1 and u(x) = 0
otherwise.
If we scale a mollier according to u
k
(x) = k
n
u(k x) such that its mass is
preserved (|u
k
|
1
= 1) and it concentrates more and more around the origin,
E
T
u
k
we have the following result (Problem 7.10):
Lemma 7.10. Let u be a mollier in R
n
and set u
k
(x) = k
n
u(k x). Then
for every (uniformly) continuous function f : R
n
C we have that
f
k
(x) =
_
R
n
u
k
(x y)f(y)dy (7.20)
is in C
(R
n
) and converges to f (uniformly).
Now we are ready to prove
7.3. Nothing missing in L
p
135
Theorem 7.11. If X R
n
is open and is a regular Borel measure, then
the set C
c
(X) of all smooth functions with compact support is dense in
L
p
(X, d), 1 p < .
Proof. By our previous result it suces to show that every continuous
function f(x) with compact support can be approximated by smooth ones.
By setting f(x) = 0 for x , X, it is no restriction to assume X = R
n
.
Now choose a mollier u and observe that f
k
has compact support (since
f has). Moreover, since f has compact support, it is uniformly continuous
and f
k
f uniformly. But this implies f
k
f in L
p
.
We say that f L
p
loc
(X) if f L
p
(K) for every compact subset K X.
Lemma 7.12. Suppose f L
1
loc
(R
n
). Then
_
R
n
(x)f(x)dx = 0, C
c
(R
n
), (7.21)
if and only if f(x) = 0 (a.e.).
Proof. Consider the Borel measure (A) =
_
A
f(x)dx. Let K be some
bounded rectangle and consider some sequence
m
C
c
(R
n
) which has
support in K with 0
m
1 and
m
(x)
K
(x) pointwise for a.e. x.
Then by dominated convergence (K) = lim
m
_
m
f dx = 0 and thus
(A) = 0 for every Borel set since is uniquely determined on the algebra
of nite unions of rectangles by Theorem 6.5.
Problem 7.9. Find a sequence f
n
which converges to 0 in L
p
([0, 1], dx),
1 p < , but for which f
n
(x) 0 for a.e. x [0, 1] does not hold.
(Hint: Every n N can be uniquely written as n = 2
m
+ k with 0 m
and 0 k < 2
m
. Now consider the characteristic functions of the intervals
I
m,k
= [k2
m
, (k + 1)2
m
].)
Problem 7.10. Prove Lemma 7.10. (Hint: To show that f
k
is smooth, use
Problems 6.14 and 6.15.)
Problem 7.11 (Convolution). Show that for f L
1
(R
n
) and g L
p
(R
n
),
the convolution
(g f)(x) =
_
R
n
g(x y)f(y)dy =
_
R
n
g(y)f(x y)dy (7.22)
is in L
p
(R
n
) and satises Youngs inequality
|f g|
p
|f|
1
|g|
p
. (7.23)
(Hint: Without restriction |f|
1
= 1. Now use Jensen and Fubini.)
Problem 7.12 (Smoothing). Suppose f L
p
(R
n
). Show that f
k
dened
as in (7.20) converges to f in L
p
. (Hint: Use Lemma 4.21 and Youngs
inequality.)
136 7. The Lebesgue spaces L
p
7.4. Integral operators
Using Holders inequality, we can also identify a class of bounded operators
in L
p
.
Lemma 7.13 (Schur criterion). Consider L
p
(X, d) and L
p
(Y, d) and let
1
p
+
1
q
= 1. Suppose that K(x, y) is measurable and there are measurable
functions K
1
(x, y), K
2
(x, y) such that [K(x, y)[ K
1
(x, y)K
2
(x, y) and
|K
1
(x, .)|
L
q
(Y,d)
C
1
, |K
2
(., y)|
L
p
(X,d)
C
2
(7.24)
for -almost every x, respectively, for -almost every y. Then the operator
K : L
p
(Y, d) L
p
(X, d), dened by
(Kf)(x) =
_
Y
K(x, y)f(y)d(y), (7.25)
for -almost every x is bounded with |K| C
1
C
2
.
Proof. We assume 1 < p < for simplicity and leave the cases p = 1, to
the reader. Choose f L
p
(Y, d). By Fubinis theorem
_
Y
[K(x, y)f(y)[d(y)
is measurable and by Holders inequality we have
_
Y
[K(x, y)f(y)[d(y)
_
Y
K
1
(x, y)K
2
(x, y)[f(y)[d(y)
__
Y
K
1
(x, y)
q
d(y)
_
1/q
__
Y
[K
2
(x, y)f(y)[
p
d(y)
_
1/p
C
1
__
Y
[K
2
(x, y)f(y)[
p
d(y)
_
1/p
(if K
2
(x, .)f(.) , L
p
(X, d), the inequality is trivially true). Now take this
inequality to the pth power and integrate with respect to x using Fubini
_
X
__
Y
[K(x, y)f(y)[d(y)
_
p
d(x) C
p
1
_
X
_
Y
[K
2
(x, y)f(y)[
p
d(y)d(x)
= C
p
1
_
Y
_
X
[K
2
(x, y)f(y)[
p
d(x)d(y) C
p
1
C
p
2
|f|
p
p
.
Hence
_
Y
[K(x, y)f(y)[d(y) L
p
(X, d) and in particular it is nite for
-almost every x. Thus K(x, .)f(.) is integrable for -almost every x and
_
Y
K(x, y)f(y)d(y) is measurable.
Note that the assumptions are for example satised if |K(x, .)|
L
1
(Y,d)
C and |K(., y)|
L
1
(X,d)
C which follows by choosing K
1
(x, y) = [K(x, y)[
1/q
and K
2
(x, y) = [K(x, y)[
1/p
.
Another case of special importance is the case of integral operators
(Kf)(x) =
_
X
K(x, y)f(y)d(y), f L
2
(X, d), (7.26)
7.4. Integral operators 137
where K(x, y) L
2
(XX, dd). Such an operator is called a Hilbert
Schmidt operator.
Lemma 7.14. Let K be a HilbertSchmidt operator in L
2
(X, d). Then
_
X
_
X
[K(x, y)[
2
d(x)d(y) =
jJ
|Ku
j
|
2
(7.27)
for every orthonormal basis u
j
jJ
in L
2
(X, d).
Proof. Since K(x, .) L
2
(X, d) for -almost every x we infer
_
X
K(x, y)u
j
(y)d(y)
2
=
_
X
[K(x, y)[
2
d(y)
for -almost every x and thus
j
|Ku
j
|
2
=
j
_
X
_
X
K(x, y)u
j
(y)d(y)
2
d(x)
=
_
X
_
X
K(x, y)u
j
(y)d(y)
2
d(x)
=
_
X
_
X
[K(x, y)[
2
d(x)d(y)
as claimed.
Hence, for an operator K L(H) we dene the HilbertSchmidt
norm by
|K|
HS
=
_
jJ
|Ku
j
|
2
_
1/2
, (7.28)
where u
j
jJ
is some orthonormal base in H (we set it equal to if
|Ku
j
| > 0 for an uncountable number of js). Our lemma for integral
operators above indicates that this denition should not depend on the par-
ticular base chosen. In fact,
Lemma 7.15. The HilbertSchmidt norm does not depend on the orthonor-
mal base. Moreover, |K|
HS
= |K
|
HS
and |K| |K|
HS
.
Proof. To see the rst claim let v
j
jJ
be another orthonormal base. Then
j
|Ku
j
|
2
=
k,j
[v
k
, Ku
j
[
2
=
k,j
[u
j
, K
v
k
[
2
=
k
|K
v
k
|
2
shows |K|
HS
= |K
|
HS
upon choosing v
j
= u
j
and hence also base inde-
pendence.
138 7. The Lebesgue spaces L
p
To see |K| |K|
HS
let f =
j
c
j
u
j
and observe
|Kf|
2
=
k
[u
k
, Kf[
2
=
j
c
j
u
k
, Ku
j
k
_
j
[u
k
, Ku
j
[
2
__
j
[c
j
[
2
_
= |K|
2
HS
|f|
2
which establishes the claim.
Generalizing our previus denition for integral operators we will call K
a HilbertSchmidt operator if |K|
HS
< .
Lemma 7.16. Every HilbertSchmidt operator is compact. The set of Hilbert
Schmidt operators forms an ideal in L(H) and
|KA|
HS
|A||K|
HS
, respectively, |AK|
HS
|A||K|
HS
. (7.29)
Proof. To see that a HilbertSchmidt operator K is compact, let P
n
be the
projection onto spanu
j
n
j=1
, where u
j
is some orthonormal base. Then
K
n
= KP
n
is nite-rank and by
|K K
n
|
2
HS
=
jn
|Ku
j
|
2
,
converges to K in HilbertSchmidt norm and thus in norm.
Let K be HilbertSchmidt and A bounded. Then AK is compact and
|AK|
2
HS
=
j
|AKu
j
|
2
|A|
2
j
|Ku
j
|
2
= |A|
2
|K|
2
HS
.
For KA just consider adjoints.
Note that this gives us an easy to check test for compactness of an
integral operator.
Example. Let [a, b] be some compact interval and suppose K(x, y) is
bounded. Then the corresponding integral operator in L
2
(a, b) is Hilbert
Schmidt and thus compact. This generalizes Lemma 3.4.
Problem 7.13. Let H =
2
(N) and let A be multiplication by a sequence
a = (a
j
)
j=1
. Show that A is HilbertSchmidt if and only if a
2
(N).
Furthermore, show that |A|
HS
= |a| in this case.
Problem 7.14. Show that K :
2
(N)
2
(N), f
n
jN
k
n+j
f
j
is
HilbertSchmidt with |K|
HS
|c|
1
if [k
j
[ c
j
, where c
j
is decreasing
and summable.
Chapter 8
More measure theory
8.1. Decomposition of measures
Let , be two measures on a measure space (X, ). They are called
mutually singular (in symbols ) if they are supported on disjoint
sets. That is, there is a measurable set N such that (N) = 0 and (XN) =
0.
Example. Let be the Lebesgue measure and the Dirac measure (cen-
tered at 0). Then : Just take N = 0; then (0) = 0 and
(R0) = 0.
On the other hand, is called absolutely continuous with respect to
(in symbols ) if (A) = 0 implies (A) = 0.
Example. The prototypical example is the measure d = f d (compare
Lemma 6.18). Indeed by Lemma 6.22 (A) = 0 implies
(A) =
_
A
f d = 0 (8.1)
and shows that is absolutely continuous with respect to . In fact, we will
show below that every absolutely continuous measure is of this form.
The two main results will follow as simple consequence of the following
result:
Theorem 8.1. Let , be -nite measures. Then there exists a unique
(-a.e.) nonnegative function f and a set N of measure zero, such that
(A) = (A N) +
_
A
f d. (8.2)
139
140 8. More measure theory
Proof. We rst assume , to be nite measures. Let = + and
consider the Hilbert space L
2
(X, d). Then
(h) =
_
X
hd
is a bounded linear functional by CauchySchwarz:
[(h)[
2
=
_
X
1 hd
__
[1[
2
d
___
[h[
2
d
_
(X)
__
[h[
2
d
_
= (X)|h|
2
.
Hence by the Riesz lemma (Theorem 2.10) there exists a g L
2
(X, d) such
that
(h) =
_
X
hg d.
By construction
(A) =
_
A
d =
_
A
g d =
_
A
g d. (8.3)
In particular, g must be positive a.e. (take A the set where g is negative).
Furthermore, let N = x[g(x) 1. Then
(N) =
_
N
g d (N) = (N) +(N),
which shows (N) = 0. Now set
f =
g
1 g
N
, N
= XN.
Then, since (8.3) implies d = g d, respectively, d = (1 g)d, we have
_
A
fd =
_
A
g
1 g
N
d =
_
AN
g d = (A N
)
as desired. Clearly f is unique, since if there is a second function
f, then
_
A
(f
f)d = 0 for every A shows f
f = 0 a.e. with respect to .
To see the -nite case, observe that X
n
X, (X
n
) < and Y
n
X,
(Y
n
) < implies X
n
Y
n
X and (X
n
Y
n
) < . Hence when
restricted to X
n
Y
n
, we have sets N
n
and functions f
n
. Now take N =
N
n
and choose f such that f[
Xn
= f
n
(this is possible since f
n+1
[
Xn
= f
n
a.e.).
Then (N) = 0 and
(A N
) = lim
n
(A (X
n
N)) = lim
n
_
AXn
f d =
_
A
f d,
which nishes the proof.
Now the anticipated results follow with no eort:
8.1. Decomposition of measures 141
Theorem 8.2 (Lebesgue decomposition). Let , be two -nite measures
on a measure space (X, ). Then can be uniquely decomposed as =
ac
+
sing
, where
ac
and
sing
are mutually singular and
ac
is absolutely
continuous with respect to .
Proof. Taking
sing
(A) = (A N) and d
ac
= f d, there is at least
one such decomposition. To show uniqueness, rst let be nite. If there
is another one, =
ac
+
sing
, then let
N be such that (
N) = 0 and
sing
(
N
) = 0. Then
sing
(A)
sing
(A) =
_
A
(
f f)d. In particular,
_
AN
(
f f)d = 0 and hence
f = f a.e. away from N
N. Since
(N
N) = 0, we have
f = f a.e. and hence
ac
=
ac
as well as
sing
=
ac
=
ac
=
sing
. The -nite case follows as usual.
Theorem 8.3 (RadonNikodym). Let , be two -nite measures on a
measure space (X, ). Then is absolutely continuous with respect to if
and only if there is a positive measurable function f such that
(A) =
_
A
f d (8.4)
for every A . The function f is determined uniquely a.e. with respect to
and is called the RadonNikodym derivative
d
d
of with respect to
.
Proof. Just observe that in this case (A N) = 0 for every A; that is,
sing
= 0.
Problem 8.1. Let be a Borel measure on B and suppose its distribution
function (x) is continuously dierentiable. Show that the RadonNikodym
derivative equals the ordinary derivative
(x).
Problem 8.2. Suppose and are inner regular measures. Show that
if and only if (C) = 0 implies (C) = 0 for every compact set.
Problem 8.3. Suppose (A) C(A) for all A . Then d = f d with
0 f C a.e.
Problem 8.4. Let d = f d. Suppose f > 0 a.e. with respect to . Then
and d = f
1
d.
Problem 8.5 (Chain rule). Show that is a transitive relation. In
particular, if , show that
d
d
=
d
d
d
d
.
Problem 8.6. Suppose . Show that for every measure we have
d
d
d =
d
d
d +d,
142 8. More measure theory
where is a positive measure (depending on ) which is singular with respect
to . Show that = 0 if and only if .
8.2. Derivatives of measures
If is a Borel measure on B and its distribution function (x) is continu-
ously dierentiable, then the RadonNikodym derivative is just the ordinary
derivative
(x))
[B
(x)[
(8.5)
the derivative of at x R
n
provided the above limit exists. (Here B
r
(x)
R
n
is a ball of radius r centered at x R
n
and [A[ denotes the Lebesgue
measure of A B
n
.)
Note that for a Borel measure on B, (D)(x) exists if (x) (as dened
in (6.3)) is dierentiable at x and (D)(x) =
(x))
[B
(x)[
and (D)(x) = liminf
0
(B
(x))
[B
(x)[
. (8.6)
Clearly is dierentiable at x if (D)(x) = (D)(x) < . First of all note
that they are measurable: In fact, this follows from
(D)(x) = lim
n
sup
0<<1/n
(B
(x))
[B
(x)[
(8.7)
since the supremum on the right-hand side is lower semicontinuous with
respect to x (cf. Problem 6.9) as x (B
_
j
B
j
3
n
k
[B
j
k
[. (8.8)
Proof. Assume that the balls B
j
are ordered by radius. Start with B
j
1
=
B
1
= B
r
1
(x
1
) and remove all balls from our list which intersect B
j
1
. Observe
8.2. Derivatives of measures 143
that the removed balls are all contained in 3B
1
= B
3r
1
(x
1
). Proceeding like
this, we obtain B
j
1
, B
j
2
, . . . , such that
_
j
B
j
_
k
3B
j
k
and the claim follows since [3B[ = 3
n
[B[.
Now we can show
Lemma 8.5. Let > 0. For every Borel set A we have
[x A[ (D)(x) > [ 3
n
(A)
(8.9)
and
[x A[ (D)(x) > 0[ = 0, whenever (A) = 0. (8.10)
Proof. Let A
[ 3
n
(O)
there is some r
x
such that B
rx
(x) O
and [B
rx
(x)[ <
1
(B
rx
(x)). By the Lindelof theorem, we can choose a
countable subcover of A
[ 3
n
k
[B
r
k
(x
k
)[ <
3
n
k
(B
r
k
(x
k
)) 3
n
(O)
.
To see the second claim, observe that
x A[ (D)(x) > 0 =
_
j=1
x A[ (D)(x) >
1
j
.
Since is arbitrary, the Lebesgue measure of this set must be zero for every
. That is, the set where the limsup is positive has Lebesgue measure
zero.
The points where (8.11) holds are called Lebesgue points of f.
Note that the balls can be replaced by more general sets: A sequence of
sets A
j
(x) is said to shrink to x nicely if there are balls B
r
j
(x) with r
j
0
and a constant > 0 such that A
j
(x) B
r
j
(x) and [A
j
[ [B
r
j
(x)[. For
example A
j
(x) could be some balls or cubes (not necessarily containing x).
However, the portion of B
r
j
(x) which they occupy must not go to zero! For
example the rectangles (0,
1
j
) (0,
2
j
) R
2
do shrink nicely to 0, but the
rectangles (0,
1
j
) (0,
2
j
2
) do not.
Lemma 8.7. Let f be (locally) integrable. Then at every Lebesgue point we
have
f(x) = lim
j
1
[A
j
(x)[
_
A
j
(x)
f(y)dy (8.12)
whenever A
j
(x) shrinks to x nicely.
Proof. Let x be a Lebesgue point and choose some nicely shrinking sets
A
j
(x) with corresponding B
r
j
(x) and . Then
1
[A
j
(x)[
_
A
j
(x)
[f(y) f(x)[dy
1
[B
r
j
(x)[
_
Br
j
(x)
[f(y) f(x)[dy
and the claim follows.
Corollary 8.8. Let be a Borel measure on R which is absolutely con-
tinuous with respect to Lebesgue measure. Then its distribution function is
dierentiable a.e. and d(x) =
(x)dx.
8.2. Derivatives of measures 145
Proof. By assumption d(x) = f(x)dx for some locally integrable function
f. In particular, the distribution function (x) =
_
x
0
f(y)dy is continuous.
Moreover, since the sets (x, x + r) shrink nicely to x as r 0, Lemma 8.7
implies
lim
r0
((x, x +r))
r
= lim
r0
(x +r) (x)
r
= f(x)
at every Lebesgue point of f. Since the same is true for the sets (x r, x),
(x) is dierentiable at every Lebesgue point and
(x) = f(x).
As another consequence we obtain
Theorem 8.9. Let be a Borel measure on R
n
. The derivative D exists
a.e. with respect to Lebesgue measure and equals the RadonNikodym deriv-
ative of the absolutely continuous part of with respect to Lebesgue measure;
that is,
ac
(A) =
_
A
(D)(x)dx. (8.13)
Proof. If d = f dx is absolutely continuous with respect to Lebesgue mea-
sure, then (D)(x) = f(x) at every Lebesgue point of f by Lemma 8.7
and the claim follows from Theorem 8.6. To see the general case, use the
Lebesgue decomposition of and let N be a support for the singular part
with [N[ = 0. Then (D
sing
)(x) = 0 for a.e. x R
n
N by the second part
of Lemma 8.5.
In particular, is singular with respect to Lebesgue measure if and only
if D = 0 a.e. with respect to Lebesgue measure.
Using the upper and lower derivatives, we can also give supports for the
absolutely and singularly continuous parts.
Theorem 8.10. The set x[(D)(x) = is a support for the singular
and x[0 < (D)(x) < is a support for the absolutely continuous part.
Proof. First suppose is purely singular. Let us show that the set A
m
=
x[ (D)(x) < m satises (A
m
) = 0 for every m N.
Let A A
m
and let O be some open set containing A. For every x A
there is some = (x) such that B
(x) O and (B
(x)) k[B
(x)[. By
the Lindelof theorem, countably many of these balls cover A and hence
(A)
j
(B
j
(x
j
)) m
j
[B
j
(x
j
)[.
Selecting disjoint balls as in Lemma 8.4 further shows
(A) m3
n
k
[B
j
k
(x
j
k
)[ m3
n
[O[.
146 8. More measure theory
Since O is arbitrary we get from outer regularity (A) m3
n
[A[ for every
A A
m
. Hence is absolutely continuous on A
m
and since we assumed
to be singular, we must have (A
m
) = 0.
Thus (D
sing
)(x) = for a.e. x with respect to
sing
and we are done.
= x M
0
[ <
(D)(x) for > 0. Then M
M
0
and
[M
[ =
_
M
dx
1
_
M
(D)(x)dx =
1
ac
(M
)
1
ac
(M
0
) = 0
shows [M
0
[ = lim
0
[M
[ = 0.
Note that the set M = x[0 < (D)(x) is a minimal support of .
Example. The Cantor function is constructed as follows: Take the sets
C
n
used in the construction of the Cantor set C: C
n
is the union of 2
n
closed intervals with 2
n
1 open gaps in between. Set f
n
equal to j/2
n
on the jth gap of C
n
and extend it to [0, 1] by linear interpolation. Note
that, since we are creating precisely one new gap between every old gap
when going from C
n
to C
n+1
, the value of f
n+1
is the same as the value of
f
n
on the gaps of C
n
. In particular, |f
n
f
m
|
2
min(n,m)
and hence
we can dene the Cantor function as f = lim
n
f
n
. By construction f
is a continuous function which is constant on every subinterval of [0, 1]C.
Since C is of Lebesgue measure zero, this set is of full Lebesgue measure
and hence f
_
A
[f(y)[dy.
8.3. Complex measures 147
Problem 8.9 (Tchebychevev inequality). For f L
1
(R
n
) show
[x A[ [f(x)[ [
1
_
A
[f(x)[dx.
8.3. Complex measures
Let (X, ) be some measure space. A map : C is called a complex
measure if
(
_
n=1
A
n
) =
n=1
(A
n
), A
n
A
m
= , n ,= m. (8.14)
Choosing A
n
= for all n in (8.14) shows () = 0.
Note that a positive measure is a complex measure only if it is nite
(the value is not allowed for complex measures). Moreover, the deni-
tion implies that the sum is independent of the order of the sets A
j
, that
is, it converges unconditionally and thus absolutely by the Riemann series
theorem.
Example. Let be a positive measure. For every f L
1
(X, d) we have
that f d is a complex measure (compare the proof of Lemma 6.18 and use
dominated in place of monotone convergence). In fact, we will show that
every complex measure is of this form.
Example. Let
1
,
2
be two complex measures and
1
,
2
two complex
numbers. Then
1
1
+
2
2
is again a complex measure. Clearly we can
extend this to any nite linear combination of complex measures.
Given a complex measure it seems natural to consider the set function
A [(A)[. However, considering the simple example d(x) = sign(x)dx
on X = [1, 1] one sees that this set function is not additive and this simple
approach does not provide a positive measure associated with . To this
end we introduce the total variation of a measure dened as
[[(A) = sup
_
n=1
[(A
n
)[
A
n
disjoint, A =
_
n=1
A
n
_
. (8.15)
Note that by construction we have
[(A)[ [[(A). (8.16)
Theorem 8.12. The total variation [[ of a complex measure is a nite
positive measure.
Proof. We begin by showing that [[ is a positive measure. Suppose A =
n=1
A
n
. We need to show [[(A) =
n=1
[[(A
n
) for disjoint sets A
n
. If
[[(A
n
) = for some n it is not hard to see that [[(A) = and hence we
can assume [[(A
n
) < for all n.
148 8. More measure theory
Let > 0 be xed and for each A
n
choose a disjoint partition B
n,k
such
that
[[(A
n
)
k=1
[(B
n,k
)[ +
2
n
.
Then
n=1
[[(A
n
)
n,k=1
[(B
n,k
)[ + [[(A) +
since
n,k=1
B
n,k
= A. Since was arbitrary this shows [[(A)
n=1
[[(A
n
).
Conversely, if A =
k=1
B
k
, then
k=1
[(B
k
)[ =
k=1
n=1
(B
k
A
n
)
k=1
n=1
[(B
k
A
n
)[
=
n=1
k=1
[(B
k
A
n
)[
n=1
[[(A
n
).
Taking the supremum over all partitions B
k
shows [[(A)
n=1
[[(A
n
).
Hence [[ is a positive measure and it remains to show that it is nite.
Splitting into its real and imaginary part, it is no restriction to assume
that is real-valued since [[(A) [Re()[(A) +[Im()[(A).
The idea is as follows: Suppose we can split any given set A with [[(A) =
into two subsets B and AB such that [(B)[ 1 and [[(AB) = .
Then we can construct a sequence B
n
of disjoint sets with [(B
n
)[ 1 for
which
n=1
(B
n
)
diverges (the terms of a convergent series must converge to zero). But -
additivity requires that the sum converges to (
n
B
n
), a contradiction.
It remains to show existence of this splitting. Let A with [[(A) =
be given. Then there are disjoint sets A
j
such that
n
j=1
[(A
j
)[ 2 +[(A)[.
Now let A
+
=
A
j
[(A
j
) 0 and A
= AA
+
=
A
j
[(A
j
) < 0.
Then for both of them we have [(A
) either A
+
or A
(A) =
[[(A) (A)
2
. (8.18)
By (8.16) both
and
+
are positive measures. This splitting is also known
as Hahn decomposition of a signed measure.
Of course such a decomposition of a signed measure is not unique (we
can always add a positive measure to both parts), however, the Hahn decom-
position is unique in the sense that it is the smallest possible decomposition.
This will be shown in Lemma 8.18 below.
Moreover, we also have:
Theorem 8.13. The set of all complex measures /(X) together with the
norm || = [[(X) is a Banach space.
Proof. Clearly /(X) is a vector space and it is straigthfoward to check
that [[(X) is a norm. Hence it remains to show that every Cauchy sequence
k
has a limit.
First of all, by [
k
(A)
j
(A)[ = [(
k
j
)(A)[ [
k
j
[(A) |
k
j
|,
we see that
k
(A) is a Cauchy sequence in C for every A and we can
dene
(A) = lim
k
k
(A).
It remains to show that satises (8.14). Let A
m
be given disjoint sets and
set
A
n
=
n
m=1
A
m
, A =
m=1
A
m
. Since we can interchange limits with
nite sums, (8.14) holds for nitely many sets. Hence it remains to show
(
A
n
) (A). This follows from
[(
A
n
) (A)[ [(
A
n
)
k
(
A
n
)[ +[
k
(
A
n
)
k
(A)[ +[
k
(A) (A)[
2|
k
| +[
k
(
A
n
)
k
(A)[.
such that d
= f
d.
By construction
_
X
f
d =
L
1
(X, d). Moreover, d = d
+
d
= f d
as required.
In this case the total variation of d = f d is just d[[ = [f[d:
Lemma 8.16. Suppose d = f d, where is a positive measure and f
L
1
(X, d). Then
[[(A) =
_
A
[f[d. (8.20)
Proof. If A
n
are disjoint sets and A =
n
A
n
we have
n
[(A
n
)[ =
_
An
f d
n
_
An
[f[d =
_
A
[f[d.
Hence [[(A)
_
A
[f[d. To show the converse dene
A
n
k
= x[
k 1
n
arg(f(x))
2
<
k
n
, 1 k n.
Then the simple functions
s
n
(x) =
n
k=1
e
2i
k1
n
A
n
k
(x)
converge to f(x)
_
A
s
n
f d
k=1
_
A
n
k
s
n
f d
=
n
k=1
[(A
n
k
)[ [[(A)
shows
_
A
[f[d [[(A).
8.3. Complex measures 151
As a consequence we obtain (Problem 8.10):
Corollary 8.17. If is a complex measure, then d = hd[[, where [h[ = 1.
Moreover, if is real-valued, then d
=
A
d[[, where A
= h
1
(1).
In particular, note that
_
A
f d
|f|
[[(A). (8.21)
As another consequence we can show that [[ is the smallest positive
measure satisfying (8.16).
Lemma 8.18. Let be a complex measure and a positive measure satis-
fying [(A)[ (A) for all measurable sets A. Then [[. (Here
has to be understood as (A) (A) for every measurable set A.)
Furthermore, let be a signed measure and =
+
a decomposition
into positive measures. Then
, where
() =
x[[h(x) e
i
[ . Then (1 )[[(A) (A) for every A A
() shows
1 f a.e. on A
.
Problem 8.12. Let be a complex and a positive measure and suppose
[(A)[ C(A) for all A . Then d = f d with |f|
C. (Hint:
First show [[(A) C(A) and then use Problem 8.3.)
Problem 8.13. Let be a signed measure and
+
(A) = max
B,BA
(B),
(A) = min
B,BA
(B)
Problem 8.14. Let be a complex measure and let
=
r,+
r,
+ i(
i,+
i,
)
be its decompostition into positive measures. Show the estimate
1
s
(A) [[(A)
s
(A),
s
=
r,+
+
r,
+
i,+
+
i,
.
8.4. Appendix: Functions of bounded variation and
absolutely continuous functions
Let [a, b] R be some compact interval and f : [a, b] C. Given a partition
P = a = x
0
, . . . , x
n
= b of [a, b] we dene the variation of f with respect
to the partiton P by
V (P, f) =
n
k=1
[f(x
k
) f(x
k1
)[. (8.23)
The supremum over all partitions
V
b
a
(f) = sup
partitions P of [a, b]
V (P, f) (8.24)
is called the total variation of f over [a, b]. If the total variation is nite,
f is called of bounded variation. Since we clearly have
V
b
a
(f) = [[V
b
a
(f), V
b
a
(f +g) V
b
a
(f) +V
b
a
(g) (8.25)
the space BV [a, b] of all functions of nite total variation is a vector space.
However, the total variation is not a norm since (consider the partition
P = a, x, b)
V
b
a
(f) = 0 f(x) c. (8.26)
8.4. Appendix: Functions of bounded variation 153
Moreover, any function of bounded variation is in particular bounded (con-
sider again the partition P = a, x, b)
sup
x[a,b]
[f(x)[ [f(a)[ +V
b
a
(f). (8.27)
Example. Any real-valued nondecreasing function f is of bounded variation
with variation given by V
b
a
(f) = f(b) f(a). Similarly, every real-valued
nonincreasing function f is of bounded variation with variation given by
V
b
a
(f) = f(a) f(b).
Furthermore, observe V
a
a
(f) = 0 as well as (Problem 8.16)
V
b
a
(f) = V
c
a
(f) +V
b
c
(f), c [a, b], (8.28)
and it will be convenient to set
V
a
b
(f) = V
b
a
(f). (8.29)
Theorem 8.20 (Jordan). Let f : [a, b] R be of bounded variation, then
f can be decomposed as
f(x) = f
+
(x) f
(x), f
(x) =
1
2
(V
x
a
(f) f(x)) , (8.30)
where f
) V
b
a
(f).
Proof. From
f(y) f(x) [f(y) f(x)[ V
y
x
(f) = V
y
a
(f) V
x
a
(f)
for x y we infere V
x
a
(f)f(x) V
y
a
(f)f(y), that is, f
+
is nondecreasing.
Moreover, replacing f by f shows that f
k=1
[((x
k1
, x
k
])[ [[((a, b])
and thus the distribution function is of bounded variation. Furthermore,
consider the measure whose distribution function is (x) = V
x
0
(). Then
we see [((a, b])[ = [(b) (a)[ V
b
a
() = ((a, b]) [[((a, b]). Hence
we obtain [(A)[ (A) [[(A) for all intervals A, thus for all open sets
(by Problem 1.7), and thus for all Borel sets by outer regularity. Hence
Lemma 8.18 implies = [[ and hence [[(x) = V
x
0
(f).
We will call a function f : [a, b] C absolutely continuous if for
every > 0 there is a corresponding > 0 such that
k
[y
k
x
k
[ <
k
[f(y
k
) f(x
k
)[ (8.32)
for every countable collection of pairwise disjoint intervals (x
k
, y
k
) [a, b].
The set of all absolutely continuous functions on [a, b] will be denoted by
AC[a, b]. The special choice of just one interval shows that every absolutely
continuous function is (uniformly) continuous, AC[a, b] C[a, b].
Theorem 8.23. A complex Borel measure on R is absolutely continu-
ous with respect to Lebesgue measure if and only if its distribution function
is locally absolutely continuous (i.e., absolutely continuous on every com-
pact sub-interval). Moreover, in this case the distribution function (x) is
8.4. Appendix: Functions of bounded variation 155
dierentiable almost everywhere and
(x) = (0) +
_
x
0
(y)dy (8.33)
with
integrable,
_
R
[
(y)[dy = [[(R).
Proof. Suppose the measure is absolutely continuous. Since we can write
as a sum of four positive measures, we can suppose is positive. Now
(8.32) follows from (8.22) in the special case where A is a union of pairwise
disjoint intervals.
Conversely, suppose (x) is absolutely continuous on [a, b]. We will verify
(8.22). To this end x and choose such that (x) satises (8.32). By
outer regularity it suces to consider the case where A is open. Moreover,
by Problem 1.7, every open set O (a, b) can be written as a countable
union of disjoint intervals I
k
= (x
k
, y
k
) and thus [O[ =
k
[y
k
x
k
[
implies
(O) =
k
_
(y
k
) (x
k
)
_
k
[(y
k
) (x
k
)[
as required.
The rest follows from Corollary 8.8.
As a simple consequence of this result we can give an equivalent denition
of absolutely continuous functions as precisely the functions for which the
fundamental theorem of calculus holds.
Theorem 8.24. A function f : [a, b] C is absolutely continuous if and
only if it is of the form
f(x) = f(a) +
_
x
a
g(y)dy (8.34)
for some integrable function g. Moreover, in this case f is dierentiable
a.e with respect to Lebesgue measure and f
{(x,y)|y<x}
(x, y)df(y)dg(x)
=
_
[a,b)
_
[a,b)
{(x,y)|y<x}
(x, y)dg(x)df(y)
=
_
[a,b)
_
(y,b)
dg(x)df(y) =
_
[a,b)
g(y+)df(y).
The second formula is shown analogously.
Problem 8.15. Compute V
b
a
(f) for f(x) = sign(x) on [a, b] = [1, 1].
Problem 8.16. Show (8.28).
Problem 8.17. Consider f
j
(x) = x
j
cos(/x) for j N. Show that f
j
C[0, 1] if we set f
j
(0) = 0. Show that f
j
is of bounded variation for j 2
but not for j = 1.
Problem 8.18. Show that if f BV [a, b] then so is f
, [f[ and
V
b
a
(f
) = V
b
a
(f), V
b
a
([f[) V
b
a
(f).
Moreover, show
V
b
a
(Re(f)) V
b
a
(f), V
b
a
(Im(f)) V
b
a
(f).
Problem 8.19. Show that if f, g BV [a, b] then so is f g and
V
b
a
(f g) V
b
a
(f) sup [g[ +V
b
a
(g) sup [f[.
Problem 8.20. Show that BV [a, b] together with the norm
[f(a)[ +V
b
a
(f)
is a Banach space.
Problem 8.21 (Product rule for absolutely continuous functions). Show
that if f, g AC[a, b] then so is f g and (f g)
= f
g+f g
. (Hint: Integration
by parts.)
Chapter 9
The dual of L
p
9.1. The dual of L
p
, p <
After these preparations we are able to compute the dual of L
p
for p < .
Theorem 9.1. Consider L
p
(X, d) for some -nite measure and let q be
the corresponding dual index,
1
p
+
1
q
= 1. Then the map g L
q
g
(L
p
)
given by
g
(f) =
_
X
gf d (9.1)
is isometric. Moreover, for 1 p < it is also surjective.
Proof. Given g L
q
it follows from Holders inequality that
g
is a bounded
linear functional with |
g
| |g|
q
. That in fact |
g
| = |g|
q
can be shown
as in the discrete case (compare Problem 4.7).
To show that this map is surjective, rst suppose (X) < and choose
some (L
p
)
. Since |
A
|
p
= (A)
1/p
, we have
A
L
p
for every A
and we can dene
(A) = (
A
).
Suppose A =
j=1
A
j
. Then, by dominated convergence, |
n
j=1
A
j
A
|
p
0 (this is false for p = !) and hence
(A) = (
j=1
A
j
) =
j=1
(
A
j
) =
j=1
(A
j
).
Thus is a complex measure. Moreover, (A) = 0 implies
A
= 0 in L
p
and hence (A) = (
A
) = 0. Thus is absolutely continuous with respect
157
158 9. The dual of L
p
to and by the complex RadonNikodym theorem d = g d for some
g L
1
(X, d). In particular, we have
(f) =
_
X
fg d
for every simple function f. Clearly, the simple functions are dense in L
p
, but
since we only know g L
1
we cannot control the integral. So suppose f is
bounded and pick a sequence of simple function f
n
converging to f. Without
restriction we can assume that f
n
converges also pointwise and |f
n
|
|f|
An
[g[ d ||(A
n
),
which shows (A
n
) = 0 and hence |g|
||, that is g L
. This nishes
the proof for nite .
If is -nite, let X
n
X with (X
n
) < . Then for every n there is
some g
n
on X
n
and by uniqueness of g
n
we must have g
n
= g
m
on X
n
X
m
.
Hence there is some g and by |g
n
| || independent of n, we have g
L
q
.
Corollary 9.2. Let be some -nite measure. Then L
p
(X, d) is reexive.
Proof. Identify L
p
(X, d)
with L
q
(X, d) and choose h L
p
(X, d)
.
Then there is some f L
p
(X, d) such that
h(g) =
_
g(x)f(x)d(x), g L
q
(X, d)
= L
p
(X, d)
.
But this implies h(g) = g(f), that is, h = J(f), and thus J is surjective.
9.2. The dual of L
(f) =
_
X
fd (9.2)
is a bounded linear functional on B(X) (the Banach space of bounded mea-
surable functions) with norm
|
| = [[(X) (9.3)
by (8.21) and Corollary 8.17. If is absolutely continuous with respect to
, then it will even be a bounded linear functional on L
k=1
(A
k
), A
j
A
k
= , j ,= k. (9.4)
Given a content we can dene the corresponding integral for simple func-
tions s(x) =
n
k=1
A
k
as usual
_
A
s d =
n
k=1
k
(A
k
A). (9.5)
As in the proof of Lemma 6.16 one shows that the integral is linear. More-
over,
[
_
A
s d[ [[(A) |s|
, (9.6)
where
[[(A) = sup
_
n
k=1
[(A
k
)[
A
k
disjoint, A =
n
_
k=1
A
k
_
. (9.7)
(Note that this denition agrees with the one for complex measures.) We will
require [[(X) < . In particular, the set of all nite contents is a Banach
space with norm || = [[(X) (show this). Moreover, for a nite content
this integral can be extended to all of B(X) by Theorem 1.29 (compare
Problem 6.12). However, note that our convergence theorems (monotone
convergence, dominated convergence) will no longer hold in this case (unless
happens to be a measure).
In particular, every complex content gives rise to a bounded linear func-
tional on B(X) and the converse also holds:
160 9. The dual of L
p
Theorem 9.3. Every bounded linear functional B(X)
is of the form
(f) =
_
X
f d (9.8)
for some unique complex content and || = [[(X).
Proof. Let B(X)
k=1
A
k
) =
n
k=1
k
(A
k
) =
n
k=1
[(A
k
)[,
k
= sign((A
k
)),
we see [[(X) ||.
Since the characteristic functions are total, (9.8) holds everywhere by
continuity.
Remark: To obtain the dual of L
_
n=1
[
1
n
,
1
n + 1
)) ,=
n=1
([
1
n
,
1
n + 1
)) = 0. (9.10)
Observe that the corresponding distribution function (dened as in (6.3))
is nondecreasing but not right continuous! If we render right continuous,
we get the the distribution function of the Dirac measure (centered at 0).
In addition, the Dirac measure has the same integral at least for continuous
functions!
Theorem 9.4 (Riesz representation). Let I R be a compact interval.
Every bounded linear functional C(I)
is of the form
(f) =
_
X
f d (9.11)
for some unique complex Borel measure and || = [[(X).
9.2. The dual of L
= ([ [ )/2 it is no restriction
to assume is positive.
Now the idea is as follows: Dene a distribution function for as in (6.3).
By nite additivity of it will be nondecreasing and we can use Theorem 6.3
to obtain an associated measure whose distribution function coincides with
except possibly at points where is discontinuous. It remains to show that
the corresponding integral coincides with for continuous functions.
Let f C(I) be given. Fix points a x
n
0
< x
n
1
< . . . x
n
n
b such that
x
n
0
a, x
n
n
b, and sup
k
[x
n
k1
x
n
k
[ 0 as n . Then the sequence
of simple functions
f
n
(x) = f(x
n
0
)
[x
n
0
,x
n
1
)
+f(x
n
1
)
[x
n
1
,x
n
2
)
+ +f(x
n
n1
)
[x
n
n1
,x
n
n
]
.
converges uniformly to f by continuity of f. Moreover,
_
I
f d = lim
n
_
I
f
n
d lim
n
n
k=1
f(x
n
k1
)((x
k
) (x
k1
))
= lim
n
n
k=1
f(x
n
k1
)( (x
k
) (x
k1
)) = lim
n
_
I
f
n
d
=
_
I
f d = (f)
provided the points x
n
k
are chosen to stay away from all discontinuities of
(x) (recall that there are at most countably many).
To see || = [[(X) recall d = hd[[ where [h[ = 1 (Corollary 8.17) and
approximate h by continuous functions as in the proof of Lemma 7.12.
Note that will be a positive measure if is a positive functional,
that is, (f) 0 whenever f 0.
Problem 9.1. Let M be a closed subspace of a Banach space X. Show that
(X/M)
= X
[M Ker().
Problem 9.2 (Vague convergence of measures). A sequence of measures
n
is said to converge vaguely to a measure if
_
I
fd
n
_
I
fd, f C(I). (9.12)
Show that every bounded sequence of measures has a vaguely convergent
subsequence. Show that the limit is a positive measure if all
n
are.
(Hint: Compare this denition to the denition of weak- convergence in
Section 4.3.)
Chapter 10
The Fourier transform
10.1. The Fourier transform
We rst review some basic facts concerning the Fourier transform which
will be needed in the following section.
Let C
(R
n
) be the set of all complex-valued functions which have partial
derivatives of arbitrary order. For f C
(R
n
) and N
n
0
we set
f =
||
f
x
1
1
x
n
n
, x
= x
1
1
x
n
n
, [[ =
1
+ +
n
. (10.1)
An element N
n
0
is called a multi-index and [[ is called its order. We
will also set (p)
=
||
p
(R
n
)[ sup
x
[x
f)(x)[ < , , N
n
0
(10.2)
which is dense in L
2
(R
n
) (since C
c
(R
n
) o(R
n
) is). Note that if f
o(R
n
), then the same is true for x
f(x) and (
f)
(p) = (ip)
f(p), (x
f(x))
(p) = i
||
f(p). (10.4)
163
164 10. The Fourier transform
Proof. First of all, by integration by parts, we see
(
x
j
f(x))
(p) =
1
(2)
n/2
_
R
n
e
ipx
x
j
f(x)d
n
x
=
1
(2)
n/2
_
R
n
_
x
j
e
ipx
_
f(x)d
n
x
=
1
(2)
n/2
_
R
n
ip
j
e
ipx
f(x)d
n
x = ip
j
f(p).
So the rst formula follows by induction.
Similarly, the second formula follows from induction using
(x
j
f(x))
(p) =
1
(2)
n/2
_
R
n
x
j
e
ipx
f(x)d
n
x
=
1
(2)
n/2
_
R
n
_
i
p
j
e
ipx
_
f(x)d
n
x = i
p
j
f(p),
where interchanging the derivative and integral is permissible by Prob-
lem 6.15. In particular,
f(p) is dierentiable.
To see that
f o(R
n
) if f o(R
n
), we begin with the observation
that
f is bounded; in fact, |
f|
(2)
n/2
|f|
1
. But then p
f)(p) =
i
||||
(
f(x))
f(x) o(R
n
) if f o(R
n
).
(p) = e
iap
f(p), a R
n
, (10.5)
(e
ixa
f(x))
(p) =
f(p a), a R
n
, (10.6)
(f(x))
(p) =
1
f(
p
), > 0. (10.7)
Next, we want to compute the inverse of the Fourier transform. For this
the following lemma will be needed.
Lemma 10.3. We have e
zx
2
/2
o(R
n
) for Re(z) > 0 and
T(e
zx
2
/2
)(p) =
1
z
n/2
e
p
2
/(2z)
. (10.8)
Here z
n/2
has to be understood as (
z)
n
, where the branch cut of the root
is chosen along the negative real axis.
10.1. The Fourier transform 165
Proof. Due to the product structure of the exponential, one can treat each
coordinate separately, reducing the problem to the case n = 1.
Let
z
(x) = exp(zx
2
/2). Then
z
(x)+zx
z
(x) = 0 and hence i(p
z
(p)+
z
z
(p)) = 0. Thus
z
(p) = c
1/z
(p) and (Problem 10.1)
c =
z
(0) =
1
2
_
R
exp(zx
2
/2)dx =
1
z
at least for z > 0. However, since the integral is holomorphic for Re(z) > 0
by Problem 6.21, this holds for all z with Re(z) > 0 if we choose the branch
cut of the root along the negative real axis.
Now we can show
Theorem 10.4. The Fourier transform T : o(R
n
) o(R
n
) is a bijection.
Its inverse is given by
T
1
(g)(x) g(x) =
1
(2)
n/2
_
R
n
e
ipx
g(p)d
n
p. (10.9)
We have T
2
(f)(x) = f(x) and thus T
4
= I.
Proof. Abbreviate
(x) = exp(x
2
/2). By dominated convergence we
have
(
f(p))
(x) =
1
(2)
n/2
_
R
n
e
ipx
f(p)d
n
p
= lim
0
1
(2)
n/2
_
R
n
(p)e
ipx
f(p)d
n
p
= lim
0
1
(2)
n
_
R
n
_
R
n
(p)e
ipx
f(y)e
ipy
d
n
yd
n
p,
and, invoking Fubini and Lemma 10.2, we further see
= lim
0
1
(2)
n/2
_
R
n
(
(p)e
ipx
)
(y)f(y)d
n
y
= lim
0
1
(2)
n/2
_
R
n
1
n/2
1/
(y x)f(y)d
n
y
= lim
0
1
(2)
n/2
_
R
n
1
(z)f(x +
z)d
n
z = f(x),
which nishes the proof, where we used the change of coordinates z =
yx
f(p)[
2
d
n
p =
1
(2)
n/2
_
R
n
_
R
n
f(x)
f(p)e
ipx
d
n
p d
n
x
=
_
R
n
[f(x)[
2
d
n
x (10.10)
for f o(R
n
). Thus, by Theorem 1.29, we can extend T to L
2
(R
n
) by
setting
f(p) = lim
R
1
(2)
n/2
_
|x|R
e
ipx
f(x)d
n
x, (10.11)
where the limit is to be understood in L
2
(R
n
) (Problem 10.5). If f
L
1
(R
n
) L
2
(R
n
), we can omit the limit (why?) and
f is still given by
(10.3).
Theorem 10.5. The Fourier transform T extends to a unitary operator
T : L
2
(R
n
) L
2
(R
n
).
Proof. As already noted, T extends uniquely to a bounded operator on
L
2
(R
n
). Moreover, the same is true for T
1
. Since Parsevals identity
remains valid by continuity of the norm, this extension is a unitary operator.
f = ((ip)
f(p))
, f H
r
(R
n
), [[ r. (10.13)
By Lemma 10.1 this denition coincides with the usual one for every f
o(R
n
) and we have
_
R
n
g(x)(
f)(x)d
n
x = g
, (
f) = g(p)
, (ip)
f(p)
= (1)
||
(ip)
g(p)
,
f(p) = (1)
||
, f
= (1)
||
_
R
n
(
g)(x)f(x)d
n
x, (10.14)
for f, g H
r
(R
n
). Furthermore, recall that a function h L
1
loc
(R
n
) satisfy-
ing
_
R
n
(x)h(x)d
n
x = (1)
||
_
R
n
(
)(x)f(x)d
n
x, C
c
(R
n
), (10.15)
10.1. The Fourier transform 167
is also called the weak derivative or the derivative in the sense of distri-
butions of f (by Lemma 7.12 such a function is unique if it exists). Hence,
choosing g = in (10.14), we see that H
r
(R
n
) is the set of all functions hav-
ing partial derivatives (in the sense of distributions) up to order r, which
are in L
2
(R
n
).
Finally, we note that on L
1
(R
n
) we have
Lemma 10.6 (Riemann-Lebesgue). Let C
(R
n
) denote the Banach space
of all continuous functions f : R
n
C which vanish at equipped with
the sup norm. Then the Fourier transform is a bounded injective map from
L
1
(R
n
) into C
(R
n
) satisfying
|
f|
(2)
n/2
|f|
1
. (10.16)
Proof. Clearly we have
f C
(R
n
) if f o(R
n
). Moreover, since o(R
n
)
is dense in L
1
(R
n
), the estimate
sup
p
[
f(p)[
1
(2)
n/2
sup
p
_
R
n
[e
ipx
f(x)[d
n
x =
1
(2)
n/2
_
R
n
[f(x)[d
n
x
shows that the Fourier transform extends to a continuous map from L
1
(R
n
)
into C
(R
n
).
To see that the Fourier transform is injective, suppose
f = 0. Then
Fubini implies
0 =
_
R
n
(x)
f(x)d
n
x =
_
R
n
(x)f(x)d
n
x
for every o(R
n
). Hence Lemma 7.12 implies f = 0.
Note that T : L
1
(R
n
) C
(R
n
) is not onto (cf. Problem 10.7).
Another useful property is the convolution formula.
Lemma 10.7. The convolution
(f g)(x) =
_
R
n
f(y)g(x y)d
n
y =
_
R
n
f(x y)g(y)d
n
y (10.17)
of two functions f, g L
1
(R
n
) is again in L
1
(R
n
) and we have Youngs
inequality
|f g|
1
|f|
1
|g|
1
. (10.18)
Moreover, its Fourier transform is given by
(f g)
(p) = (2)
n/2
f(p) g(p). (10.19)
168 10. The Fourier transform
Proof. The fact that f g is in L
1
together with Youngs inequality follows
by applying Fubinis theorem to h(x, y) = f(x y)g(y). For the last claim
we compute
(f g)
(p) =
1
(2)
n/2
_
R
n
e
ipx
_
R
n
f(y)g(x y)d
n
y d
n
x
=
_
R
n
e
ipy
f(y)
1
(2)
n/2
_
R
n
e
ip(xy)
g(x y)d
n
xd
n
y
=
_
R
n
e
ipy
f(y) g(p)d
n
y = (2)
n/2
f(p) g(p),
where we have again used Fubinis theorem.
In other words, L
1
(R
n
) together with convolution as a product is a
Banach algebra (without identity). For the case of convolution on L
2
(R
n
)
see Problem 10.9.
Problem 10.1. Show that
_
R
exp(x
2
/2)dx =
f is dierentiable and
f
= g.
Problem 10.4. A function f : R
n
C is called spherically symmetric
if it is invariant under rotations; that is, f(Ox) = f(x) for all O SO(R
n
)
(equivalently, f depends only on the distance to the origin [x[). Show that the
Fourier transform of a spherically symmetric function is again spherically
symmetric.
Problem 10.5. Show (10.11). (Hint: First suppose f has compact support.
Then there is a sequence of functions f
n
C
c
(R
n
) converging to f in L
2
.
The support of these functions can be chosen inside a xed set and hence
this sequence also converges to f in L
1
. Thus (10.11) follows for f L
2
with compact support. To remove this restriction, use that the projection
onto a ball with radius R converges strongly to the identity as R .)
Problem 10.6. Show that C
(R
n
) is indeed a Banach space. Show that
o(R
n
) is dense.
Problem 10.7. Show that T : L
1
(R
n
) C
(R
n
) is not onto as follows:
(i) The range of T is dense.
(ii) T is onto if and only if it has a bounded inverse.
10.1. The Fourier transform 169
(iii) T has no bounded inverse.
(Hint for (iii): Suppose is smooth with compact support in (0, 1) and
set f
m
(x) =
m
k=1
e
ikx
(x k). Then |f
m
|
1
= m||
1
and |
f
m
|
const
since o(R) and hence (p) const(1 +[p[)
2
).
Problem 10.8. Show that the convolution of two o(R
n
) functions is in
o(R
n
).
Problem 10.9. Show that the convolution of two L
2
(R
n
) functions is in
C
(R
n
) and we have |f g|
|f|
2
|g|
2
.
Problem 10.10 (Wiener). Suppose f L
2
(R
n
). Then the set f(x+a)[a
R
n
is total in L
2
(R
n
) if and only if
f(p) ,= 0 a.e. (Hint: Use Lemma 10.2
and the fact that a subspace is total if and only if its orthogonal complement
is zero.)
Problem 10.11. Suppose f(x)e
k|x|
L
1
(R) for some k > 0. Then
f(p)
has an analytic extension to the strip [Im(p)[ < k.
Appendix A
Zorns lemma
A partial order is a binary relation _ over a set T such that for all
A, B, C T:
A _ A (reexivity),
if A _ B and B _ A then A = B (antisymmetry),
if A _ B and B _ C then A _ C (transitivity).
Example. Let T(X) be the collections of all subsets of a set X. Then T is
partially ordered by inclusion .
It is important to emphasize that two elements of T need not be com-
parable, that is, in general neither A _ B nor B _ A might hold. However,
if any two elements are comparable, T will be called totally ordered.
Example. R with is totally ordered.
If T is partially ordered, then every totally ordered subset is also called
a chain. If Q T, then an element M T satisfying A _ M for all A Q
is called an upper bound.
Example. Let T(X) as before. Then a collection of subsets A
n
nN
T(X) satisfying A
n
A
n+1
is a chain. The set
n
A
n
is an upper bound.
An element M T for which M _ A for some A T is only possible if
M = A is called a maximal element.
Theorem A.1 (Zorns lemma). Every partially ordered set in which every
chain has an upper bound contains at least one maximal element.
171
172 A. Zorns lemma
Zorns lemma is one of the equivalent incarnations of the Axiom of
Choice and we are going to take it, as well as the rest of set theory, for
granted.
Bibliography
[1] M. Abramovitz, I. A. Stegun, Handbook of Mathematical Functions, Dover, New
York, 1972.
[2] H. W. Alt, Lineare Funktionalanalysis, 4th ed., Springer, Berlin, 2002.
[3] H. Bauer, Measure and Integration Theory, de Gruyter, Berlin, 2001.
[4] J.L. Kelly, General Topology, Springer, New York, 1955.
[5] E. Lieb and M. Loss, Analysis, Amer. Math. Soc., Providence, 1997.
[6] I. K. Rana, An Introduction to Measure and Integration, 2nd ed., Amer. Math.
Soc., Providence, 2002.
[7] M. Reed and B. Simon, Methods of Modern Mathematical Physics I. Functional
Analysis, rev. and enl. edition, Academic Press, San Diego, 1980.
[8] H. Royden, Real Analysis, Prencite Hall, New Jersey, 1988.
[9] W. Rudin, Real and Complex Analysis, 3rd edition, McGraw-Hill, New York,
1987.
[10] J. Weidmann, Lineare Operatoren in Hilbertr aumen I: Grundlagen, B.G.Teubner,
Stuttgart, 2000.
[11] D. Werner, Funktionalanalysis, 3rd edition, Springer, Berlin, 2000.
173
Glossary of notation
AC[a, b] . . . absolutely continuous functions, 154
arg(z) . . . argument of a complex number
B
r
(x) . . . open ball of radius r around x, 7
B(X) . . . Banach space of bounded measurable functions
BV [a, b] . . . functions of bounded variation, 152
B = B
1
B
n
. . . Borel -eld of R
n
, 96
C . . . the set of complex numbers
C(H) . . . set of compact operators, 47
C(U) . . . set of continuous functions from U to C
C(U, V ) . . . set of continuous functions from U to V
C
c
(U, V ) . . . set of compactly supported smooth functions
p
(N) . . . Banach space of p summable sequences, 19
2
(N) . . . Hilbert space of square summable sequences, 24
. . . complex conjugation
|.| . . . norm, 24
|.|
p
. . . norm in the Banach space
p
and L
p
, 19, 127
|.|
HS
. . . HilbertSchmidt norm, 137
., .. . . . scalar product in H, 24
. . . orthogonal sum of linear spaces or operators, 44
. . . gradient
. . . partial derivative
M
. . . orthogonal complement, 40
(
1
,
2
) = R[
1
< <
2
, open interval
[
1
,
2
] = R[
1
2
, closed interval
Index
a.e., see almost everywhere
absolute convergence, 23
absolutely continuous
function, 154
measure, 139
accumulation point, 8
algebra, 95
almost everywhere, 101
B.L.T. theorem, 31
Baire category theorem, 61
Banach algebra, 79
Banach space, 18
BanachSteinhaus theorem, 62
base, 9
basis, 20
orthonormal, 37
Bessel inequality, 36
bidual space, 70
bijective, 11
BolzanoWeierstra theorem, 15
Borel
function, 107
measure, 97
regular, 98
set, 96
-algebra, 96
boundary condition, 6
boundary value problem, 6
bounded
operator, 30
sesquilinear form, 29
set, 15
bounded variation, 152
Cantor
function, 146
measure, 146
set, 101
Cauchy sequence, 10
CauchySchwarzBunjakowski inequality,
25
characteristic function, 110
closed set, 9
closure, 9
cluster point, 8
compact, 13
locally, 15
sequentially, 14
complete, 10, 18
completion, 29
content, 159
continuous, 11
convergence, 10
convex, 129
convolution, 135, 167
cover, 12
C
algebra, 84
dense, 10
diusion equation, 3
dimension, 39
Dirac measure, 100, 115
direct sum, 33
discrete set, 8
discrete topology, 8
distance, 7, 15
distribution function, 98
domain, 30
dominated convergence theorem, 114
double dual, 70
177
178 Index
eigenspace, 49
eigenvalue, 49
simple, 50
eigenvector, 49
equivalent norms, 27
essential supremum, 128
Extreme value theorem, 15
nite intersection property, 13
form
bounded, 29
Fourier
transform, 163
Fourier series, 38
Fredholm alternative, 59
Fredholm operator, 60
Fubini theorem, 121
function
open, 12
fundamental theorem of calculus, 155
generalized inverse, 116
gradient, 164
GramSchmidt orthogonalization, 38
graph, 64
graph norm, 66
Green function, 54
Holders inequality, 19
Hahn decomposition, 149
HardyLittlewood maximal function, 146
Hausdor space, 9
heat equation, 3
HeineBorel theorem, 15
Hilbert space, 24
dimension, 39
HilbertSchmidt
norm, 137
operator, 137
Holders inequality, 131
homeomorphism, 12
identity, 32, 79
index, 60
induced topology, 9
injective, 11
inner product, 24
inner product space, 24
integrable, 113
Riemann, 123
integral, 111
integration by parts, 155
integration by substitution, 118
interior, 10
interior point, 7
involution, 84
isolated point, 8
Jensens inequality, 130
kernel, 30
Kuratowski closure axioms, 10
Lebesgue
decomposition, 141
measure, 100
point, 144
lemma
Riemann-Lebesgue, 167
limit point, 8
Lindelof theorem, 12
linear
functional, 32, 41
operator, 30
linearly independent, 20
lower semicontinuous, 108
maximum norm, 17
measurable
function, 106
set, 96
space, 96
measure, 96
absolutely continuous, 139
complete, 105
complex, 147
nite, 96
Lebesgue, 100
minimal support, 146
mutually singular, 139
product, 120
space, 96
support, 100
metric space, 7
Minkowskis inequality, 131
mollier, 134
monotone convergence theorem, 111
multi-index, 163
order, 163
neighborhood, 8
Neumann series, 82
Noether operator, 60
norm, 17
HilbertSchmidt, 137
operator, 30
normal, 16, 84
normalized, 25
normed space, 17
nowhere dense, 61
null space, 30
onto, 11
open
ball, 7
Index 179
function, 12
set, 8
operator
adjoint, 42
bounded, 30
closeable, 65
closed, 64
closure, 65
compact, 47
domain, 30
nite rank, 56
HilbertSchmidt, 137
linear, 30
nonnegative, 43
self-adjoint, 49
strong convergence, 76
symmetric, 49
unitary, 39
weak convergence, 76
order
partial, 171
total, 171
orthogonal, 25
orthogonal complement, 40
orthogonal projection, 41
orthogonal projections, 88
orthogonal sum, 44
outer measure, 103
parallel, 25
parallelogram law, 26
Parsevals identity, 166
partial order, 171
partition, 122
partition of unity, 16
perpendicular, 25
polarization identity, 26
premeasure, 97
product measure, 120
product rule, 156
product topology, 12
projection valued measure, 88
pseudo-metric, 7
Pythagorean theorem, 25
quadrangle inequality, 17
quotient space, 33
Radon measure, 110
RadonNikodym
derivative, 141
theorem, 141
range, 30
rank, 56
reexive, 70
relative topology, 9
resolution of the identity, 88
resolvent, 53, 81
resolvent set, 81
Riemann integrable, 123
Riemann integral, 123
lower, 123
upper, 123
Riesz lemma, 41
scalar product, 24
Schauder basis, 20
Schur criterion, 136
second countable, 9
self-adjoint, 84
seminorm, 18
separable, 10, 21
separation of variables, 4
series
absolutely convergent, 23
sesquilinear form, 24
bounded, 29
parallelogram law, 28
polarization identity, 28
-algebra, 95
-nite, 96
signed measure, 149
simple function, 110
singular values, 55
Sobolev space, 166
span, 20
spectral measure, 86
spectral projections, 87
spectral radius, 82
spectrum, 81
spherically symmetric, 168
-subalgebra, 84
StoneWeierstra theorem, 90
strong convergence, 76
SturmLiouville problem, 6
subcover, 12
support, 12
surjective, 11
Tchebychevev inequality, 147, 152
tensor product, 45
Theorem
RieszFischer, 132
theorem
Arzel`aAscoli, 48
B.L.T., 31
Bair, 61
BanachSteinhaus, 62
BolzanoWeierstra, 15
closed graph, 64
dominated convergence, 114
Fubini, 121
fundamental thm. of calculus, 155
HahnBanach, 68
180 Index
HeineBorel, 15
HellingerToeplitz, 66
Jordan, 153
Jordanvon Neumann, 26
LaxMilgram, 44
Lebesgue, 124
Lebesgue decomposition, 141
Lindelof, 12
monotone convergence, 111
open mapping, 63
Pythagorean, 25
RadonNikodym, 141
Riesz representation, 160
Schur, 136
StoneWeierstra, 90
Urysohn, 16
Weierstra, 15, 22
Wiener, 169
Zorn, 171
topological space, 8
topology
base, 9
product, 12
total, 21
total order, 171
total variation, 147, 152
triangel inequality, 7, 18
inverse, 7, 18
trivial topology, 8
uniform boundedness principle, 62
uniformly convex space, 28
unit vector, 25
unitary, 84
upper semicontinuous, 108
Urysohn lemma, 16
vague convergence
measures, 161
Vandermonde determinant, 23
variation, 152
Vitali set, 102
wave equation, 5
weak
derivative, 167
weak convergence, 73
weak topology, 73
weak- convergence, 77
weak- topology, 78
Weierstra approximation, 22
Weierstra theorem, 15
Wiener covering lemma, 142
Young inequality, 135
Youngs inequality, 167
Zorns lemma, 171