Escolar Documentos
Profissional Documentos
Cultura Documentos
y =
n
i=1
x
i
y
i
. Any two vectors x, y '
n
satisfying x
y = 0
are called orthogonal .
If x is a vector in '
n
, the notations x > 0 and x 0 indicate that all
coordinates of x are positive and nonnegative, respectively. For any two
vectors x and y, the notation x > y means that x y > 0. The notations
x y, x < y, etc., are to be interpreted accordingly.
If X is a set and is a scalar we denote by X the set x [ x X.
If X1 and X2 are two subsets of '
n
, we denote by X1 +X2 the vector sum
x
1
+ x
2
[ x
1
X
1
, x
2
X
2
.
6 Convex Analysis and Optimization Chap. 1
We use a similar notation for the sum of any nite number of subsets. In
the case where one of the subsets consists of a single vector x, we simplify
this notation as follows:
x + X = x + x [ x X.
Given sets X
i
'
n
i , i = 1, . . . , m, the Cartesian product of the X
i
,
denoted by X
1
X
m
, is the subset
_
(x
1
, . . . , x
m
) [ x
i
X
i
, i = 1, . . . , m
_
of '
n
1
++nm
.
Subspaces and Linear Independence
A subset S of '
n
is called a subspace if ax + by S for every x, y S and
every a, b '. An ane set in '
n
is a translated subspace, i.e., a set of
the form x + S = x + x [ x S, where x is a vector in '
n
and S is a
subspace of '
n
. The span of a nite collection x
1
, . . . , x
m
of elements
of '
n
(also called the subspace generated by the collection) is the subspace
consisting of all vectors y of the form y =
m
k=1
a
k
x
k
, where each a
k
is a
scalar.
The vectors x
1
, . . . , x
m
'
n
are called linearly independent if there
exists no set of scalars a
1
, . . . , a
m
such that
m
k=1
a
k
x
k
= 0, unless a
k
= 0
for each k. An equivalent denition is that x
1
,= 0, and for every k > 1,
the vector x
k
does not belong to the span of x1, . . . , x
k1
.
If S is a subspace of '
n
containing at least one nonzero vector, a basis
for S is a collection of vectors that are linearly independent and whose
span is equal to S. Every basis of a given subspace has the same number
of vectors. This number is called the dimension of S. By convention, the
subspace 0 is said to have dimension zero. The dimension of an ane set
x + S is the dimension of the corresponding subspace S. Every subspace
of nonzero dimension has an orthogonal basis, i.e., a basis consisting of
mutually orthogonal vectors.
Given any set X, the set of vectors that are orthogonal to all elements
of X is a subspace denoted by X
:
X
= y [ y
x = 0, x X.
If S is a subspace, S
, is dened by [A
= B
.
If X is a subset of '
n
and A is an mn matrix, then the image of
X under A is denoted by AX (or A X if this enhances notational clarity):
AX = Ax [ x X.
If X is subspace, then AX is also a subspace.
Let A be a square matrix. We say that A is symmetric if A
= A. We
say that A is diagonal if [A]ij = 0 whenever i ,= j. We use I to denote the
identity matrix. The determinant of A is denoted by det(A).
Let A be an mn matrix. The range space of A, denoted by R(A),
is the set of all vectors y '
m
such that y = Ax for some x '
n
. The
null space of A, denoted by N(A), is the set of all vectors x '
n
such
that Ax = 0. It is seen that the range space and the null space of A are
subspaces. The rank of A is the dimension of the range space of A. The
rank of A is equal to the maximal number of linearly independent columns
of A, and is also equal to the maximal number of linearly independent rows
of A. The matrix A and its transpose A
.
Another way to state this result is that given vectors a
1
, . . . , a
n
'
m
(the
columns of A) and a vector x '
m
, we have x
i
y = 0 for all i if and only if x =
1
a
1
+ +
n
a
n
for some scalars
1
, . . . ,
n
. This is a special case of Farkas Lemma, an important result
for constrained optimization, which will be discussed later in Section 1.6.
A useful application of this result is that if S
1
and S
2
are two subspaces of
'
n
, then
S
1
+ S
2
= (S1 S2)
.
This follows by introducing matrices B1 and B2 such that S1 = x [ B1x =
0 = N(B
1
) and S
2
= x [ B
2
x = 0 = N(B
2
), and writing
S
1
+S
2
= R
_
[ B
1
B
2
]
_
= N
__
B
1
B
2
__
=
_
N(B
1
)N(B
2
)
_
= (S
1
S
2
)
x)
1/2
=
_
n
i=1
[xi[
2
_
1/2
.
The space '
n
, equipped with this norm, is called a Euclidean space. We
will use the Euclidean norm almost exclusively in this book. In particular,
in the absence of a clear indication to the contrary, | | will denote the
Euclidean norm. Two important results for the Euclidean norm are:
Proposition 1.1.1: (Pythagorean Theorem) For any two vectors
x and y that are orthogonal, we have
|x + y|
2
= |x|
2
+|y|
2
.
Proposition 1.1.2: (Schwartz inequality) For any two vectors x
and y, we have
[x
y[ |x| |y|,
with equality holding if and only if x = y for some scalar .
Sec. 1.1 Linear Algebra and Analysis 9
Two other important norms are the maximum norm ||
(also called
sup-norm or -norm), dened by
|x|
= max
i
[x
i
[,
and the 1-norm | |1, dened by
|x|
1
=
n
i=1
[x
i
[.
Sequences
We use both subscripts and superscripts in sequence notation. Generally,
we prefer subscripts, but we use superscripts whenever we need to reserve
the subscript notation for indexing coordinates or components of vectors
and functions. The meaning of the subscripts and superscripts should be
clear from the context in which they are used.
A sequence x
k
[ k = 1, 2, . . . (or x
k
for short) of scalars is said
to converge if there exists a scalar x such that for every > 0 we have
[x
k
x[ < for every k greater than some integer K (depending on ). We
call the scalar x the limit of x
k
, and we also say that x
k
converges to
x; symbolically, x
k
x or lim
k
x
k
= x. If for every scalar b there exists
some K (depending on b) such that x
k
b for all k K, we write x
k
and lim
k
x
k
= . Similarly, if for every scalar b there exists some K
such that x
k
b for all k K, we write x
k
and lim
k
x
k
= .
A sequence x
k
is called a Cauchy sequence if for every > 0, there
exists some K (depending on ) such that [x
k
x
m
[ < for all k K and
m K.
A sequence x
k
is said to be bounded above (respectively, below) if
there exists some scalar b such that x
k
b (respectively, x
k
b) for all
k. It is said to be bounded if it is bounded above and bounded below.
The sequence x
k
is said to be monotonically nonincreasing (respectively,
nondecreasing) if x
k+1
x
k
(respectively, x
k+1
x
k
) for all k. If x
k
con-
verges to x and is nonincreasing (nondecreasing), we also use the notation
x
k
x (x
k
x, respectively).
Proposition 1.1.3: Every bounded and monotonically nonincreasing
or nondecreasing scalar sequence converges.
Note that a monotonically nondecreasing sequence x
k
is either
bounded, in which case it converges to some scalar x by the above propo-
sition, or else it is unbounded, in which case x
k
. Similarly, a mono-
tonically nonincreasing sequence x
k
is either bounded and converges, or
it is unbounded, in which case x
k
.
10 Convex Analysis and Optimization Chap. 1
The supremum of a nonempty set X of scalars, denoted by sup X, is
dened as the smallest scalar x such that x y for all y X. If no such
scalar exists, we say that the supremum of X is . Similarly, the inmum
of X, denoted by inf X, is dened as the largest scalar x such that x y
for all y X, and is equal to if no such scalar exists. For the empty
set, we use the convention
sup() = , inf() = .
(This is somewhat paradoxical, since we have that the sup of a set is less
than its inf , but works well for our analysis.) If sup X is equal to a scalar
x that belongs to the set X, we say that x is the maximum point of X and
we often write
x = supX = max X.
Similarly, if inf X is equal to a scalar x that belongs to the set X, we often
write
x = inf X = min X.
Thus, when we write max X (or min X) in place of sup X (or inf X, re-
spectively) we do so just for emphasis: we indicate that it is either evident,
or it is known through earlier analysis, or it is about to be shown that the
maximum (or minimum, respectively) of the set X is attained at one of its
points.
Given a scalar sequence x
k
, the supremum of the sequence, denoted
by sup
k
x
k
, is dened as supx
k
[ k = 1, 2, . . .. The inmum of a sequence
is similarly dened. Given a sequence x
k
, let y
m
= supx
k
[ k m,
z
m
= inf x
k
[ k m. The sequences y
m
and z
m
are nonincreasing
and nondecreasing, respectively, and therefore have a limit whenever x
k
'
n
, consider
the sets
_
x [ |x x
| <
_
,
_
x [ |x x
|
_
.
The rst set is open and is called an open sphere centered at x
, while the
second set is closed and is called a closed sphere centered at x
. Sometimes
the terms open ball and closed ball are used, respectively.
Sec. 1.1 Linear Algebra and Analysis 13
Proposition 1.1.6:
(a) The union of nitely many closed sets is closed.
(b) The intersection of closed sets is closed.
(c) The union of open sets is open.
(d) The intersection of nitely many open sets is open.
(e) A set is open if and only if all of its elements are interior points.
(f) Every subspace of '
n
is closed.
(g) A set X is compact if and only if every sequence of elements of
X has a subsequence that converges to an element of X.
(h) If X
k
is a sequence of nonempty and compact sets such that
X
k
X
k+1
for all k, then the intersection
k=0
X
k
is nonempty
and compact.
The topological properties of subsets of '
n
, such as being open,
closed, or compact, do not depend on the norm being used. This is a
consequence of the following proposition, referred to as the norm equiva-
lence property in '
n
, which shows that if a sequence converges with respect
to one norm, it converges with respect to all other norms.
Proposition 1.1.7: For any two norms | | and | |
on '
n
, there
exists a scalar c such that |x| c|x|
K
such that x
k
X
k
, for all k /.
14 Convex Analysis and Optimization Chap. 1
The inner limit of X
k
, denoted liminf
k
X
k
, is the set of all
x '
n
such that every neighborhood of x has a nonempty intersection
with all except nitely many of the sets X
k
, k = 1, 2, . . .. Equivalently,
liminf
k
X
k
is the set of all limits of sequences x
k
such that x
k
X
k
,
for all k = 1, 2, . . ..
The sequence X
k
is said to converge to a set X if
X = liminf
k
X
k
= limsup
k
X
k
,
in which case X is said to be the limit of X
k
. The inner and outer limits
are closed (possibly empty) sets. If each set X
k
consists of a single point
x
k
, limsup
k
X
k
is the set of limit points of x
k
, while liminf
k
X
k
is just the limit of x
k
if x
k
converges, and otherwise it is empty.
Continuity
Let f : X '
m
be a function, where X is a subset of '
n
, and let x be a
point in X. If there exists a vector y '
m
such that the sequence
_
f(x
k
)
_
converges to y for every sequence x
k
X such that lim
k
x
k
= x,
we write lim
zx
f(z) = y. If there exists a vector y '
m
such that the
sequence
_
f(x
k
)
_
converges to y for every sequence x
k
X such that
lim
k
x
k
= x and x
k
x (respectively, x
k
x) for all k, we write
lim
zx
f(z) = y [respectively, lim
zx
f(z)].
Denition 1.1.5: Let X be a subset of '
n
.
(a) A function f : X '
m
is called continuous at a point x X if
lim
zx
f(z) = f(x).
(b) A function f : X '
m
is called right-continuous (respectively,
left-continuous) at a point x X if lim
zx
f(z) = f(x) [respec-
tively, lim
zx
f(z) = f(x)].
(c) A real-valued function f : X ' is called upper semicontinuous
(respectively, lower semicontinuous) at a point x X if f(x)
limsup
k
f(x
k
) [respectively, f(x) liminf
k
f(x
k
)] for ev-
ery sequence x
k
of elements of X converging to x.
If f : X '
m
is continuous at every point of a subset of its domain
X, we say that f is continuous over that subset . If f : X '
m
is con-
tinuous at every point of its domain X, we say that f is continuous. We
use similar terminology for right-continuous, left-continuous, upper semi-
continuous, and lower semicontinuous functions.
Sec. 1.1 Linear Algebra and Analysis 15
Proposition 1.1.9:
(a) The composition of two continuous functions is continuous.
(b) Any vector norm on '
n
is a continuous function.
(c) Let f : '
n
'
m
be continuous, and let Y '
m
be open
(respectively, closed). Then the inverse image of Y ,
_
x '
n
[
f(x) Y
_
, is open (respectively, closed).
(d) Let f : '
n
'
m
be continuous, and let X '
n
be compact.
Then the forward image of X,
_
f(x) [ x X
_
, is compact.
Matrix Norms
A norm | | on the set of nn matrices is a real-valued function that has
the same properties as vector norms do when the matrix is viewed as an
element of '
n
2
. The norm of an n n matrix A is denoted by |A|.
We are mainly interested in induced norms, which are constructed as
follows. Given any vector norm | |, the corresponding induced matrix
norm, also denoted by | |, is dened by
|A| = sup
_
x
n
|x=1
_
|Ax|.
It is easily veried that for any vector norm, the above equation denes a
bona de matrix norm having all the required properties.
Note that by the Schwartz inequality (Prop. 1.1.2), we have
|A| = sup
x=1
|Ax| = sup
y=x=1
[y
Ax[.
By reversing the roles of x and y in the above relation and by using the
equality y
Ax = x
|.
1.1.3 Square Matrices
Denition 1.1.6: A square matrix A is called singular if its determi-
nant is zero. Otherwise it is called nonsingular or invertible.
16 Convex Analysis and Optimization Chap. 1
Proposition 1.1.10:
(a) Let A be an n n matrix. The following are equivalent:
(i) The matrix A is nonsingular.
(ii) The matrix A
is nonsingular.
(iii) For every nonzero x '
n
, we have Ax ,= 0.
(iv) For every y '
n
, there is a unique x '
n
such that
Ax = y.
(v) There is an n n matrix B such that AB = I = BA.
(vi) The columns of A are linearly independent.
(vii) The rows of A are linearly independent.
(b) Assuming that A is nonsingular, the matrix B of statement (v)
(called the inverse of A and denoted by A
1
) is unique.
(c) For any two square invertible matrices A and B of the same
dimensions, we have (AB)
1
= B
1
A
1
.
Denition 1.1.7: The characteristic polynomial of an nn matrix
A is dened by () = det(I A), where I is the identity matrix
of the same size as A. The n (possibly repeated or complex) roots of
are called the eigenvalues of A. A nonzero vector x (with possibly
complex coordinates) such that Ax = x, where is an eigenvalue of
A, is called an eigenvector of A associated with .
Proposition 1.1.11: Let A be a square matrix.
(a) A complex number is an eigenvalue of A if and only if there
exists a nonzero eigenvector associated with .
(b) A is singular if and only if it has an eigenvalue that is equal to
zero.
Note that the only use of complex numbers in this book is in relation
to eigenvalues and eigenvectors. All other matrices or vectors are implicitly
assumed to have real components.
Sec. 1.1 Linear Algebra and Analysis 17
Proposition 1.1.12: Let A be an n n matrix.
(a) If T is a nonsingular matrix and B = TAT
1
, then the eigenval-
ues of A and B coincide.
(b) For any scalar c, the eigenvalues of cI + A are equal to c +
1, . . . , c + n, where 1, . . . , n are the eigenvalues of A.
(c) The eigenvalues of A
k
are equal to
k
1
, . . . ,
k
n
, where
1
, . . . ,
n
are the eigenvalues of A.
(d) If A is nonsingular, then the eigenvalues of A
1
are the recipro-
cals of the eigenvalues of A.
(e) The eigenvalues of A and A
coincide.
Symmetric and Positive Denite Matrices
Symmetric matrices have several special properties, particularly regarding
their eigenvalues and eigenvectors. In what follows in this section, | |
denotes the Euclidean norm.
Proposition 1.1.13: Let A be a symmetric n n matrix. Then:
(a) The eigenvalues of A are real.
(b) The matrix A has a set of n mutually orthogonal, real, and
nonzero eigenvectors x1, . . . , xn.
(c) Suppose that the eigenvectors in part (b) have been normalized
so that |x
i
| = 1 for each i. Then
A =
n
i=1
i
x
i
x
i
,
where
i
is the eigenvalue corresponding to x
i
.
18 Convex Analysis and Optimization Chap. 1
Proposition 1.1.14: Let A be a symmetric n n matrix, and let
1
n
be its (real) eigenvalues. Then:
(a) |A| = max
_
[
1
[, [
n
[
_
, where | | is the matrix norm induced
by the Euclidean norm.
(b) 1|y|
2
y
Ay n|y|
2
for all y '
n
.
(c) If A is nonsingular, then
|A
1
| =
1
min
_
[
1
[, [
n
[
_.
Proposition 1.1.15: Let A be a square matrix, and let | | be the
matrix norm induced by the Euclidean norm. Then:
(a) If A is symmetric, then |A
k
| = |A|
k
for any positive integer k.
(b) |A|
2
= |A
A| = |AA
|.
Denition 1.1.8: A symmetric nn matrix A is called positive de-
nite if x
is positive semidenite. If A
is positive denite and T is invertible, then TAT
is positive
denite.
Sec. 1.1 Linear Algebra and Analysis 19
Proposition 1.1.17:
(a) For any mn matrix A, the matrix A
.
Furthermore, for any such matrix C:
(a) A and C
).
(b) A and C have the same range space: R(A) = R(C).
Proposition 1.1.19: Let A be a square symmetric positive semidef-
inite matrix.
(a) There exists a symmetric matrix Q with the property Q
2
= A.
Such a matrix is called a symmetric square root of A and is de-
noted by A
1/2
.
(b) There is a unique symmetric square root if and only if A is pos-
itive denite.
(c) A symmetric square root A
1/2
is invertible if and only if A is
invertible. Its inverse is denoted by A
1/2
.
(d) There holds A
1/2
A
1/2
= A
1
.
(e) There holds AA
1/2
= A
1/2
A.
20 Convex Analysis and Optimization Chap. 1
1.1.4 Derivatives
Let f : '
n
' be some function, x some x '
n
, and consider the
expression
lim
0
f(x + e
i
) f(x)
,
where e
i
is the ith unit vector (all components are 0 except for the ith
component which is 1). If the above limit exists, it is called the ith partial
derivative of f at the point x and it is denoted by (f/x
i
)(x) or f(x)/x
i
(x
i
in this section will denote the ith coordinate of the vector x). Assuming
all of these partial derivatives exist, the gradient of f at x is dened as the
column vector
f(x) =
_
_
f(x)
x
1
.
.
.
f(x)
xn
_
_.
For any y '
n
, we dene the one-sided directional derivative of f in
the direction y, to be
f
(x; y) = lim
0
f(x +y) f(x)
,
provided that the limit exists. We note from the denitions that
f
(x; ei) = f
(x; ei) f
y = f
y
|y|
= 0, x U, (1.1)
where | | is an arbitrary vector norm. If f is continuously dierentiable
over '
n
, then f is also called a smooth function.
The preceding equation can also be used as an alternative denition
of dierentiability. In particular, f is called Frechet dierentiable at x
Sec. 1.1 Linear Algebra and Analysis 21
if there exists a vector g satisfying Eq. (1.1) with f(x) replaced by g.
If such a vector g exists, it can be seen that all the partial derivatives
(f/x
i
)(x) exist and that g = f(x). Frechet dierentiability implies
(Gateaux) dierentiability but not conversely (see for example [OrR70] for
a detailed discussion). In this book, when dealing with a dierentiable
function f, we will always assume that f is continuously dierentiable
(smooth) over a given open set [f(x) is a continuous function of x over
that set], in which case f is both Gateaux and Frechet dierentiable, and
the distinctions made above are of no consequence.
The denitions of dierentiability of f at a point x only involve the
values of f in a neighborhood of x. Thus, these denitions can be used
for functions f that are not dened on all of '
n
, but are dened instead
in a neighborhood of the point at which the derivative is computed. In
particular, for functions f : X ' that are dened over a strict subset
X of '
n
, we use the above denition of dierentiability of f at a vector
x provided x is an interior point of the domain X. Similarly, we use the
above denition of dierentiability or continuous dierentiability of f over
a subset U, provided U is an open subset of the domain X. Thus any
mention of dierentiability of a function f over a subset implicitly assumes
that this subset is open.
If f : '
n
'
m
is a vector-valued function, it is called dierentiable
(or smooth) if each component f
i
of f is dierentiable (or smooth, respec-
tively). The gradient matrix of f, denoted f(x), is the n m matrix
whose ith column is the gradient f
i
(x) of f
i
. Thus,
f(x) =
_
f
1
(x) f
m
(x)
_
.
The transpose of f is called the Jacobian of f and is a matrix whose ijth
entry is equal to the partial derivative fi/xj.
Now suppose that each one of the partial derivatives of a function
f : '
n
' is a smooth function of x. We use the notation (
2
f/x
i
x
j
)(x)
to indicate the ith partial derivative of f/x
j
at a point x '
n
. The
Hessian of f is the matrix whose ijth entry is equal to (
2
f/x
i
x
j
)(x),
and is denoted by
2
f(x). We have (
2
f/xixj)(x) = (
2
f/xjxi)(x)
for every x, which implies that
2
f(x) is symmetric.
If f : '
m+n
' is a function of (x, y), where x = (x
1
, . . . , x
m
) '
m
and y = (y
1
, . . . , y
n
) '
n
, we write
x
f(x, y) =
_
_
f(x,y)
x
1
.
.
.
f(x,y)
xm
_
_,
y
f(x, y) =
_
_
f(x,y)
y
1
.
.
.
f(x,y)
yn
_
_.
We denote by
2
xx
f(x, y),
2
xy
f(x, y), and
2
yy
f(x, y) the matrices with
components
_
2
xx
f(x, y)
ij
=
2
f(x, y)
xixj
,
_
2
xy
f(x, y)
ij
=
2
f(x, y)
xiyj
,
22 Convex Analysis and Optimization Chap. 1
_
2
yy
f(x, y)
ij
=
2
f(x, y)
y
i
y
j
.
If f : '
m+n
'
r
, f = (f
1
, f
2
, . . . , f
r
), we write
x
f(x, y) =
_
x
f
1
(x, y)
x
f
r
(x, y)
y
f(x, y) =
_
y
f
1
(x, y)
y
f
r
(x, y)
.
Let f : '
k
'
m
and g : '
m
'
n
be smooth functions, and let h
be their composition, i.e.,
h(x) = g
_
f(x)
_
.
Then, the chain rule for dierentiation states that
h(x) = f(x)g
_
f(x)
_
, x '
k
.
Some examples of useful relations that follow from the chain rule are:
_
f(Ax)
_
= A
f(Ax),
2
_
f(Ax)
_
= A
2
f(Ax)A,
where A is a matrix,
x
_
f
_
h(x), y
_
_
= h(x)
h
f
_
h(x), y
_
,
x
_
f
_
h(x), g(x)
_
_
= h(x)
h
f
_
h(x), g(x)
_
+g(x)
g
f
_
h(x), g(x)
_
.
We now state some theorems relating to dierentiable functions that
will be useful for our purposes.
Proposition 1.1.20: (Mean Value Theorem) Let f : '
n
' be
continuously dierentiable over an open sphere S, and let x be a vector
in S. Then for all y such that x+y S, there exists an [0, 1] such
that
f(x + y) = f(x) +f(x + y)
y.
Sec. 1.1 Linear Algebra and Analysis 23
Proposition 1.1.21: (Second Order Expansions) Let f : '
n
f(x) +
1
2
y
_
_
1
0
_
_
t
0
2
f(x + y)d
_
dt
_
y.
(b) There exists an [0, 1] such that
f(x + y) = f(x) + y
f(x) +
1
2
y
2
f(x + y)y.
(c) There holds
f(x + y) = f(x) + y
f(x) +
1
2
y
2
f(x)y + o
_
|y|
2
_
.
Proposition 1.1.22: (Implicit Function Theorem) Consider a
function f : '
n+m
'
m
of x '
n
and y '
m
such that:
(1) f(x, y) = 0.
(2) f is continuous, and has a continuous and nonsingular gradient
matrix
y
f(x, y) in an open set containing (x, y).
Then there exist open sets S
x
'
n
and S
y
'
m
containing x and y,
respectively, and a continuous function : S
x
S
y
such that y = (x)
and f
_
x, (x)
_
= 0 for all x S
x
. The function is unique in the sense
that if x S
x
, y S
y
, and f(x, y) = 0, then y = (x). Furthermore,
if for some integer p > 0, f is p times continuously dierentiable the
same is true for , and we have
(x) =
x
f
_
x, (x)
_
_
y
f
_
x, (x)
_
_
1
, x S
x
.
As a nal word of caution to the reader, let us mention that one can
easily get confused with gradient notation and its use in various formulas,
such as for example the order of multiplication of various gradients in the
chain rule and the Implicit Function Theorem. Perhaps the safest guideline
to minimize errors is to remember our conventions:
24 Convex Analysis and Optimization Chap. 1
(a) A vector is viewed as a column vector (an n 1 matrix).
(b) The gradient f of a scalar function f : '
n
' is also viewed as a
column vector.
(c) The gradient matrix f of a vector function f : '
n
'
m
with
components f
1
, . . . , f
m
is the n m matrix whose columns are the
(column) vectors f
1
, . . . , f
m
.
With these rules in mind one can use dimension matching as an eective
guide to writing correct formulas quickly.
1.2 CONVEX SETS AND FUNCTIONS
We now introduce some of the basic notions relating to convex sets and
functions. The material of this section permeates all subsequent develop-
ments in this book, and will be used in the next section for the discussion of
important issues in optimization, such as the existence of optimal solutions.
1.2.1 Basic Properties
The notion of a convex set is dened below and is illustrated in Fig. 1.2.1.
Denition 1.2.1: Let C be a subset of '
n
. We say that C is convex
if
x + (1 )y C, x, y C, [0, 1]. (1.2)
Note that the empty set is by convention considered to be convex.
Generally, when referring to a convex set, it will usually be apparent from
the context whether this set can be empty, but we will often be specic in
order to minimize ambiguities.
The following proposition lists some operations that preserve convex-
ity of a set.
Sec. 1.2 Convex Sets and Functions 25
Convex Sets Nonconvex Sets
x
y
x + (1 - )y, 0 < < 1
x
x
y
y
x
y
Figure 1.2.1. Illustration of the denition of a convex set. For convexity, linear
interpolation between any two points of the set must yield points that lie within
the set.
Proposition 1.2.1:
(a) The intersection
iI
C
i
of any collection C
i
[ i I of convex
sets is convex.
(b) The vector sum C1 +C2 of two convex sets C1 and C2 is convex.
(c) The set C is convex for any convex set C and scalar . Fur-
thermore, if C is a convex set and
1
,
2
are positive scalars, we
have
(
1
+
2
)C =
1
C +
2
C.
(d) The closure and the interior of a convex set are convex.
(e) The image and the inverse image of a convex set under an ane
function are convex.
Proof: The proof is straightforward using the denition of convexity, cf.
Eq. (1.2). For example, to prove part (a), we take two points x and y
from
iI
C
i
, and we use the convexity of C
i
to argue that the line segment
connecting x and y belongs to all the sets Ci, and hence, to their intersec-
tion. The proofs of parts (b)-(e) are similar and are left as exercises for the
reader. Q.E.D.
26 Convex Analysis and Optimization Chap. 1
A set C is said to be a cone if for all x C and > 0, we have x C.
A cone need not be convex and need not contain the origin (although the
origin always lies in the closure of a nonempty cone). Several of the results
of the above proposition have analogs for cones (see the exercises).
Convex Functions
The notion of a convex function is dened below and is illustrated in Fig.
1.2.2.
Denition 1.2.2: Let C be a convex subset of '
n
. A function f :
C ' is called convex if
f
_
x + (1 )y
_
f(x) +(1 )f(y), x, y C, [0, 1].
(1.3)
The function f is called concave if f is convex. The function f is
called strictly convex if the above inequality is strict for all x, y C
with x ,= y, and all (0, 1). For a function f : X ', we also say
that f is convex over the convex set C if the domain X of f contains
C and Eq. (1.3) holds, i.e., when the domain of f is restricted to C, f
becomes convex.
f(x) + (1 - )f(y)
x y
C
z
f(z)
Figure 1.2.2. Illustration of the denition of a function that is convex over a
convex set C. The linear interpolation f(x) + (1 )f(y) overestimates the
function value f(x + (1 )y) for all [0, 1].
If C is a convex set and f : C ' is a convex function, the level
sets x C [ f(x) and x C [ f(x) < are convex for all scalars
Sec. 1.2 Convex Sets and Functions 27
. To see this, note that if x, y C are such that f(x) and f(y) ,
then for any [0, 1], we have x +(1 )y C, by the convexity of C,
and f
_
x + (1 )y
_
f(x) + (1 )f(y) , by the convexity of f.
However, the converse is not true; for example, the function f(x) =
_
[x[
has convex level sets but is not convex.
We generally prefer to deal with convex functions that are real-valued
and are dened over the entire space '
n
(rather than over just a convex
subset). However, in some situations, prominently arising in the context of
duality theory, we will encounter functions f that are convex over a convex
subset C and cannot be extended to functions that are convex over '
n
. In
such situations, it may be convenient, instead of restricting the domain of
f to the subset C where f takes real values, to extend the domain to the
entire space '
n
, but allow f to take the value .
We are thus motivated to introduce extended real-valued convex func-
tions that can take the value of at some points. In particular, if C '
n
is a convex set, a function f : C (, ] is called convex over C (or
simply convex when C = '
n
) if the condition
f
_
x +(1 )y
_
f(x) + (1 )f(y), x, y C, [0, 1]
holds. The function f is called strictly convex if the above inequality is strict
for all x and y in C such that f(x) < and f(y) < . It can be seen that
if f is convex, the level sets x C [ f(x) and x C [ f(x) <
are convex for all scalars .
One complication when dealing with extended real-valued functions
is that we sometimes need to be careful to exclude the unusual case where
f(x) = for all x C, in which case f is still convex, since it trivially
satises the above inequality (such functions are sometimes called improper
in the literature of convex functions, but we will not use this terminology
here). Another area of concern is working with functions that can take both
values and , because of the possibility of the forbidden expression
arising when sums of function values are considered. For this reason,
we will try to minimize the occurances of such functions. On occasion we
will deal with extended real-valued functions that can take the value
but not the value . In particular, a function f : C [, ), where
C is a convex set, is called concave if the function f : C (, ] is
convex as per the preceding denition.
We dene the eective domain of an extended real-valued function
f : X (, ] to be the set
dom(f) =
_
x X [ f(x) <
_
.
Note that if X is convex and f is convex over X, then dom(f) is a convex
set. Similarly, we dene the eective domain of an extended real-valued
function f : X [, ) to be the set dom(f) =
_
x X [ f(x) >
_
.
28 Convex Analysis and Optimization Chap. 1
Note that by replacing the domain of an extended real-valued convex func-
tion with its eective domain, we can convert it to a real-valued function.
In this way, we can use results stated in terms of real-valued functions, and
we can also avoid calculations with . Thus, the entire subject of con-
vex functions can be developed without resorting to extended real-valued
functions. The reverse is also true, namely that extended real-valued func-
tions can be adopted as the norm; for example, the classical treatment of
Rockafellar [Roc70] uses this approach.
Generally, functions that are real-valued over the entire space '
n
are
more convenient (and even essential) in numerical algorithms and also in
optimization analyses where a calculus-oriented approach based on dier-
entiability is adopted. This is typically the case in nonconvex optimization,
where nonlinear equality and nonconvex inequality constraints are involved
(see Chapter 2). On the other hand, extended real-valued functions oer
notational advantages in a convex programming setting (see Chapters 3
and 4), and in fact may be more natural because some basic construc-
tions around duality involve extended real-valued functions. Since we plan
to deal with nonconvex as well as convex problems, and with analysis as
well as numerical methods, we will adopt a exible approach and use both
real-valued and extended real-valued functions. However, consistently with
prevailing optimization practice, we will generally prefer to avoid using ex-
tended real-valued functions, unless there are strong notational or other
advantages for doing so.
Epigraphs and Semicontinuity
An extended real-valued function f : X (, ] is called lower semi-
continuous at a vector x X if f(x) liminf
k
f(x
k
) for every sequence
x
k
converging to x. This denition is consistent with the corresponding
denition for real-valued functions [cf. Def. 1.1.5(c)]. If f is lower semicon-
tinuous at every x in a subset U X, we say that f is lower semicontinuous
over U.
The epigraph of a function f : X (, ], where X '
n
, is the
subset of '
n+1
given by
epi(f) =
_
(x, w) [ x X, w ', f(x) w
_
;
(see Fig. 1.2.3). Note that if we restrict f to its eective domain
_
x
X [ f(x) <
_
, so that it becomes real-valued, the epigraph remains
unaected. Epigraphs are very useful for our purposes because questions
about convexity and lower semicontinuity of functions can be reduced to
corresponding questions about convexity and closure of their epigraphs, as
shown in the following proposition.
Sec. 1.2 Convex Sets and Functions 29
f(x)
x
Convex function
f(x)
x
Nonconvex function
Epigraph Epigraph
Figure 1.2.3. Illustration of the epigraphs of extended real-valued convex and
nonconvex functions.
Proposition 1.2.2: Let f : X (, ] be a function. Then:
(a) epi(f) is convex if and only if dom(f) is convex and f is convex
over dom(f).
(b) Assuming X = '
n
, the following are equivalent:
(i) The level set x [ f(x) is closed for all scalars .
(ii) f is lower semicontinuous over '
n
.
(iii) epi(f) is closed.
Proof: (a) Assume that dom(f) is convex and f is convex over dom(f).
If (x1, w1) and (x2, w2) belong to epi(f) and [0, 1], we have x1, x2
dom(f) and
f(x
1
) w
1
, f(x
2
) w
2
,
and by multiplying these inequalities with and (1 ), respectively, by
adding, and by using the convexity of f, we obtain
f
_
x
1
+ (1 )x
2
_
f(x
1
) + (1 )f(x
2
) w
1
+(1 )w
2
.
Hence the vector
_
x1 + (1 )x2, w1 + (1 )w2
_
, which is equal to
(x
1
, w
1
) + (1 )(x
2
, w
2
), belongs to epi(f), showing the convexity of
epi(f).
Conversely, assume that epi(f) is convex, and let x
1
, x
2
dom(f)
and [0, 1]. The pairs
_
x
1
, f(x
1
)
_
and
_
x
2
, f(x
2
)
_
belong to epi(f), so
by convexity, we have
_
x
1
+ (1 )x
2
, f(x
1
) + (1 )f(x
2
)
_
epi(f).
30 Convex Analysis and Optimization Chap. 1
Therefore, by the denition of epi(f), it follows that x
1
+ (1 )x
2
dom(f), so dom(f) is convex, while
f
_
x
1
+ (1 )x
2
_
f(x
1
) + (1 )f(x
2
),
so f is convex over dom(f).
(b) We rst show that (i) implies (ii). Assume that the level set
_
x [
f(x)
_
is closed for all scalars , x a vector x '
n
, and let x
k
K
x
k
be a
subsequence along which the limit inferior of f(x
k
) is attained. Then for
such that liminf
k
f(x
k
) < and all suciently large k with k /,
we have f(x
k
) . Since each level set
_
x [ f(x)
_
is closed, it follows
that x belongs to all the level sets with liminf
k
f(x
k
) < , implying
that f(x) liminf
k
f(x
k
), and that f is lower semicontinuous at x.
Therefore, (i) implies (ii).
We next show that (ii) implies (iii). Assume that f is lower semicon-
tinuous over '
n
, and let (x, w) be the limit of a sequence (x
k
, w
k
)
epi(f). Then we have f(x
k
) w
k
, and by taking the limit as k
and by using the lower semicontinuity of f at x, we obtain f(x)
liminf
k
f(x
k
) w. Hence, (x, w) epi(f) and epi(f) is closed. Thus,
(ii) implies (iii).
We nally show that (iii) implies (i). Assume that epi(f) is closed,
and let x
k
be a sequence that converges to some x and belongs to the
level set
_
x [ f(x)
_
for some . Then (x
k
, ) epi(f) for all k and
(x
k
, ) (x, ), so since epi(f) is closed, we have (x, ) epi(f). Hence,
x belongs to the level set
_
x [ f(x)
_
, implying that this set is closed.
Therefore, (iii) implies (i). Q.E.D.
If the epigraph of a function f : X (, ] is a closed set, we say
that f is a closed function. Thus, if we extend the domain of f to '
n
and
consider the function
f given by
f(x) =
_
f(x) if x X,
if x / X,
we see that according to the preceding proposition, f is closed if and only
i=1
i
f
i
_
x + (1 )y
_
i=1
i
_
f
i
(x) + (1 )f
i
(y)
_
=
m
i=1
i
f
i
(x) + (1 )
m
i=1
i
f
i
(y)
= g(x) + (1 )g(y).
Hence g is convex.
Let the functions f1, . . . , fm be closed. Then they are lower semicon-
tinuous at every x '
n
[cf. Prop. 1.2.2(b)], so for every sequence x
k
converging to x, we have f
i
(x) liminf
k
f
i
(x
k
) for all i. Hence
g(x)
m
i=1
i
liminf
k
f
i
(x
k
) liminf
k
m
i=1
i
f
i
(x
k
) = liminf
k
g(x
k
).
where we have used Prop. 1.1.4(d) (the sum of the limit inferiors of se-
quences is less or equal to the limit inferior of the sum sequence). There-
fore, g is lower semicontinuous at all x '
n
, so by Prop. 1.2.2(b), it is
closed.
(b) This is straightforward, along the lines of the proof of part (a).
(c) A pair (x, w) belongs to the epigraph
epi(g) =
_
(x, w) [ g(x) w
_
if and only if fi(x) w for all i I, or (x, w)
iI
epi(fi). Therefore,
epi(g) =
iI
epi(f
i
).
If the f
i
are convex, the epigraphs epi(f
i
) are convex, so epi(g) is convex,
and g is convex by Prop. 1.2.2(a). If the f
i
are closed, then the epigraphs
epi(fi) are closed, so epi(g) is closed, and g is closed. Q.E.D.
Characterizations of Dierentiable Convex Functions
For dierentiable functions, there is an alternative characterization of con-
vexity, given in the following proposition and illustrated in Fig. 1.2.4.
Sec. 1.2 Convex Sets and Functions 33
f(z)
f(x) + (z - x)'f(x)
x z
Figure 1.2.4. Characterization of convexity in terms of rst derivatives. The
condition f(z) f(x) + (z x)
f(x), x, z C. (1.4)
(b) f is strictly convex over C if and only if the above inequality is
strict whenever x ,= z.
Proof: We prove (a) and (b) simultaneously. Assume that the inequality
(1.4) holds. Choose any x, y C and [0, 1], and let z = x +(1 )y.
Using the inequality (1.4) twice, we obtain
f(x) f(z) + (x z)
f(z),
f(y) f(z) + (y z)
f(z).
We multiply the rst inequality by , the second by (1), and add them
to obtain
f(x) + (1 )f(y) f(z) +
_
x + (1 )y z
_
f(z) = f(z),
which proves that f is convex. If the inequality (1.4) is strict as stated in
part (b), then if we take x ,= y and (0, 1) above, the three preceding
inequalities become strict, thus showing the strict convexity of f.
Conversely, assume that f is convex, let x and z be any vectors in C
with x ,= z, and for (0, 1), consider the function
g() =
f
_
x +(z x)
_
f(x)
, (0, 1].
34 Convex Analysis and Optimization Chap. 1
We will show that g() is monotonically increasing with , and is strictly
monotonically increasing if f is strictly convex. This will imply that
(z x)
f(x) = lim
0
g() g(1) = f(z) f(x),
with strict inequality if g is strictly monotonically increasing, thereby show-
ing that the desired inequality (1.4) holds (and holds strictly if f is strictly
convex). Indeed, consider any 1, 2, with 0 < 1 < 2 < 1, and let
=
1
2
, z = x +
2
(z x). (1.5)
We have
f
_
x + (z x)
_
f(z) + (1 )f(x),
or
f
_
x + (z x)
_
f(x)
f
_
x +
2
(z x)
_
f(x)
2
,
or
g(
1
) g(
2
),
with strict inequality if f is strictly convex. Hence g is monotonically
increasing with , and strictly so if f is strictly convex. Q.E.D.
Note a simple consequence of Prop. 1.2.4(a): if f : '
n
' is a convex
function and f(x
) = 0, then x
f(x) +
1
2
(y x)
2
f
_
x + (y x)
_
(y x)
for some [0, 1]. Therefore, using the positive semideniteness of
2
f,
we obtain
f(y) f(x) + (y x)
f(x), x, y C.
From Prop. 1.2.4(a), we conclude that f is convex.
(b) Similar to the proof of part (a), we have f(y) > f(x) + (y x)
f(x)
for all x, y C with x ,= y, and the result follows from Prop. 1.2.4(b).
(c) Suppose that f : '
n
' is convex and suppose, to obtain a con-
tradiction, that there exist some x '
n
and some z '
n
such that
z
2
f(x)z < 0. Using the continuity of
2
f, we see that we can choose z
with small enough norm so that z
2
f(x + z)z < 0 for every [0, 1].
Then, using again Prop. 1.1.21(b), we obtain f(x + z) < f(x) + z
f(x),
which, in view of Prop. 1.2.4(a), contradicts the convexity of f. Q.E.D.
If f is convex over a strict subset C '
n
, it is not necessarily true
that
2
f(x) is positive semidenite at any point of C [take for example
n = 2, C = (x
1
, 0) [ x
1
', and f(x) = x
2
1
x
2
2
]. A strengthened
version of Prop. 1.2.5 is given in the exercises. It can be shown that the
conclusion of Prop. 1.2.5(c) also holds if C is assumed to have nonempty
interior instead of being equal to '
n
.
The following proposition considers a strengthened form of strict con-
vexity characterized by the following equation:
_
f(x) f(y)
_
(x y) |x y|
2
, x, y '
n
, (1.7)
where is some positive number. Convex functions with this property are
called strongly convex with coecient .
Proposition 1.2.6: (Strong Convexity) Let f : '
n
' be
smooth. If f is strongly convex with coecient , then f is strictly
convex. Furthermore, if f is twice continuously dierentiable, then
strong convexity of f with coecient is equivalent to the positive
semideniteness of
2
f(x) I for every x '
n
, where I is the
identity matrix.
Proof: Fix some x, y '
n
such that x ,= y, and dene the function
h : ' ' by h(t) = f
_
x + t(y x)
_
. Consider some t, t
)
dh
dt
(t)
_
(t
t)
=
_
f
_
x + t
(y x)
_
f
_
x +t(y x)
_
_
(y x)(t
t)
(t
t)
2
|x y|
2
> 0.
Thus, dh/dt is strictly increasing and for any t (0, 1), we have
h(t) h(0)
t
=
1
t
_
t
0
dh
d
() d <
1
1 t
_
1
t
dh
d
() d =
h(1) h(t)
1 t
.
Equivalently, th(1) +(1 t)h(0) > h(t). The denition of h yields tf(y) +
(1 t)f(x) > f
_
ty + (1 t)x
_
. Since this inequality has been proved for
arbitrary t (0, 1) and x ,= y, we conclude that f is strictly convex.
Suppose now that f is twice continuously dierentiable and Eq. (1.7)
holds. Let c be a scalar. We use Prop. 1.1.21(b) twice to obtain
f(x +cy) = f(x) + cy
f(x) +
c
2
2
y
2
f(x + tcy)y,
and
f(x) = f(x + cy) cy
f(x + cy) +
c
2
2
y
2
f(x +scy)y,
for some t and s belonging to [0, 1]. Adding these two equations and using
Eq. (1.7), we obtain
c
2
2
y
2
f(x+scy)+
2
f(x+tcy)
_
y =
_
f(x+cy)f(x)
_
(cy) c
2
|y|
2
.
We divide both sides by c
2
and then take the limit as c 0 to conclude
that y
2
f(x)y |y|
2
. Since this inequality is valid for every y '
n
, it
follows that
2
f(x) I is positive semidenite.
For the converse, assume that
2
f(x) I is positive semidenite
for all x '
n
. Consider the function g : ' ' dened by
g(t) = f
_
tx + (1 t)y
_
(x y).
Using the Mean Value Theorem (Prop. 1.1.20), we have
_
f(x)f(y)
_
(x
y) = g(1) g(0) = (dg/dt)(t) for some t [0, 1]. The result follows because
dg
dt
(t) = (x y)
2
f
_
tx + (1 t)y
_
(x y) |x y|
2
,
where the last inequality is a consequence of the positive semideniteness
of
2
f
_
tx + (1 t)y
_
I. Q.E.D.
Sec. 1.2 Convex Sets and Functions 37
As an example, consider the quadratic function
f(x) = x
Qx,
where Q is a symmetric matrix. By Prop. 1.2.5, the function f is convex
if and only if Q is positive semidenite. Furthermore, by Prop. 1.2.6, f is
strongly convex with coecient if and only if
2
f(x) I = 2QI is
positive semidenite for some > 0. Thus f is strongly convex with some
positive coecient (as well as strictly convex) if and only if Q is positive
denite.
1.2.2 Convex and Ane Hulls
Let X be a subset of '
n
. A convex combination of elements of X is a vector
of the form
m
i=1
i
x
i
, where m is a positive integer, x
1
, . . . , x
m
belong to
X, and 1, . . . , m are scalars such that
i
0, i = 1, . . . , m,
m
i=1
i
= 1.
Note that if X is convex, then the convex combination
m
i=1
i
x
i
belongs
to X (this is easily shown by induction; see the exercises), and for any
function f : '
n
' that is convex over X, we have
f
_
m
i=1
i
x
i
_
m
i=1
i
f(x
i
). (1.8)
This follows by using repeatedly the denition of convexity. The preceding
relation is a special case of Jensens inequality and can be used to prove a
number of interesting inequalities in applied mathematics and probability
theory.
The convex hull of a set X, denoted conv(X), is the intersection of
all convex sets containing X, and is a convex set by Prop. 1.2.1(a). It is
straightforward to verify that the set of all convex combinations of elements
of X is convex, and is equal to the convex hull conv(X) (see the exercises).
In particular, if X consists of a nite number of vectors x1, . . . , xm, its
convex hull is
conv
_
x1, . . . , xm
_
=
_
m
i=1
ixi
i 0, i = 1, . . . , m,
m
i=1
i = 1
_
.
We recall that an ane set M is a set of the form x + S, where S
is a subspace, called the subspace parallel to M. If X is a subset of '
n
,
the ane hull of X, denoted a(X), is the intersection of all ane sets
containing X. Note that a(X) is itself an ane set and that it contains
38 Convex Analysis and Optimization Chap. 1
conv(X). It can be seen that the ane hull of X, the ane hull of the
convex hull conv(X), and the ane hull of the closure cl(X) coincide (see
the exercises). For a convex set C, the dimension of C is dened to be the
dimension of a(C).
Given a subset X '
n
, a nonnegative combination of elements of X
is a vector of the form
m
i=1
i
x
i
, where m is a positive integer, x
1
, . . . , x
m
belong to X, and
1
, . . . ,
m
are nonnegative scalars. If the scalars
i
are all positive, the combination
m
i=1
ixi is said to be positive. The
cone generated by X, denoted by cone(X), is the set of all nonnegative
combinations of elements of X. It is easily seen that cone(X) is a convex
cone, although it need not be closed [cone(X) can be shown to be closed
in special cases, such as when X is a nite set this is one of the central
results of polyhedral convexity and will be shown in Section 1.6].
The following is a fundamental characterization of convex hulls.
Proposition 1.2.7: (Caratheodorys Theorem) Let X be a sub-
set of '
n
.
(a) Every x in cone(X) can be represented as a positive combination
of vectors x
1
, . . . , x
m
X that are linearly independent, where
m is a positive integer with m n.
(b) Every x in conv(X) can be represented as a convex combination
of vectors x1, . . . , xm X such that x2 x1, . . . , xm x1 are
linearly independent, where m is a positive integer with m
n + 1.
Proof: (a) Let x be a nonzero vector in the cone(X), and let m be the
smallest integer such that x has the form
m
i=1
i
x
i
, where
i
> 0 and
xi X for all i = 1, . . . , m. If the vectors xi were linearly dependent, there
would exist scalars
1
, . . . ,
m
, with
m
i=1
i
x
i
= 0 and at least one of the
i
is positive. Consider the linear combination
m
i=1
(
i
i
)x
i
, where
is the largest such that
i
i
0 for all i. This combination provides
a representation of x as a positive combination of fewer than m vectors of
X a contradiction. Therefore, x
1
, . . . , x
m
, are linearly independent, and
since any linearly independent set of vectors contains at most n elements,
we must have m n.
(b) The proof will be obtained by applying part (a) to the subset of '
n+1
given by
Y =
_
(x, 1) [ x X
_
.
If x conv(X), then x =
m
i=1
i
x
i
for some positive
i
with 1 =
m
i=1
i
,
i.e., (x, 1) cone(Y ). By part (a), we have (x, 1) =
m
i=1
i
(x
i
, 1), for
some positive
1
, . . . ,
m
and vectors (x
1
, 1), . . . , (x
m
, 1), which are linearly
Sec. 1.2 Convex Sets and Functions 39
independent (implying that m n + 1) or equivalently
x =
m
i=1
i
x
i
, 1 =
m
i=1
i
.
Finally, to show that x
2
x
1
, . . . , x
m
x
1
are linearly independent, assume
to arrive at a contradiction, that there exist
2
, . . . ,
m
, not all 0, such that
m
i=2
i
(x
i
x
1
) = 0.
Equivalently, dening
1
= (
2
+ +
m
), we have
m
i=1
i(xi, 1) = 0,
which contradicts the linear independence of (x
1
, 1), . . . , (x
m
, 1). Q.E.D.
It is not generally true that the convex hull of a closed set is closed
[take for instance the convex hull of the set consisting of the origin and the
subset (x1, x2) [ x1x2 = 1, x1 0, x2 0 of '
2
]. We have, however, the
following.
Proposition 1.2.8: The convex hull of a compact set is compact.
Proof: Let X be a compact subset of '
n
. By Caratheodorys Theorem,
a sequence in conv(X) can be expressed as
_
n+1
i=1
k
i
x
k
i
_
, where for all k
and i,
k
i
0, x
k
i
X, and
n+1
i=1
k
i
= 1. Since the sequence
_
(
k
1
, . . . ,
k
n+1
, x
k
1
, . . . , x
k
n+1
)
_
belongs to a compact set, it has a limit point
_
(
1
, . . . ,
n+1
, x
1
, . . . , x
n+1
)
_
such that
n+1
i=1
i
= 1, and for all i,
i
0, and x
i
X. Thus, the vector
n+1
i=1
i
x
i
, which belongs to conv(X), is a limit point of the sequence
_
n+1
i=1
k
i
x
k
i
_
, showing that conv(X) is compact. Q.E.D.
1.2.3 Closure, Relative Interior, and Continuity
We now consider some generic topological properties of convex sets and
functions. Let C be a nonempty convex subset of '
n
. The closure cl(C)
of C is also a nonempty convex set (Prop. 1.2.1). While the interior of C
40 Convex Analysis and Optimization Chap. 1
may be empty, it turns out that convexity implies the existence of interior
points relative to the ane hull of C. This is an important property, which
we now formalize.
We say that x is a relative interior point of C, if x C and there
exists a neighborhood N of x such that Na(C) C, i.e., x is an interior
point of C relative to a(C). The relative interior of C, denoted ri(C), is
the set of all relative interior points of C. For example, if C is a line
segment connecting two distinct points in the plane, then ri(C) consists of
all points of C except for the end points. The set C is said to be relatively
open if ri(C) = C. A point x cl(C) such that x / ri(C) is said to be a
relative boundary point of C. The set of all relative boundary points of C
is called the relative boundary of C.
The following proposition gives some basic facts about relative interior
points.
Proposition 1.2.9: Let C be a nonempty convex set.
(a) (Line Segment Principle) If x ri(C) and x cl(C), then all
points on the line segment connecting x and x, except possibly
x, belong to ri(C).
(b) (Nonemptiness of Relative Interior) ri(C) is a nonempty and con-
vex set, and has the same ane hull as C. In fact, if m is the di-
mension of a(C) and m > 0, there exist vectors x
0
, x
1
, . . . , x
m
ri(C) such that x
1
x
0
, . . . , x
m
x
0
span the subspace parallel
to a(C).
(c) x ri(C) if and only if every line segment in C having x as one
endpoint can be prolonged beyond x without leaving C [i.e., for
every x C, there exists a > 1 such that x+(1)(xx) C].
Proof: (a) In the case where x C, see Fig. 1.2.5. In the case where
x / C, to show that for any (0, 1] we have x = x+(1 )x ri(C),
consider a sequence x
k
C that converges to x, and let x
k,
= x+(1
)x
k
. Then as in Fig. 1.2.5, we see that z [ |z x
k,
| < a(C) C
for all k. Since for large enough k, we have
z [ |z x
| < /2 z [ |z x
k,
| < ,
it follows that z [ |z x
x
x
= x + (1 - )x
C
Figure 1.2.5. Proof of the Line Segment Principle for the case where x C. Since
x ri(C), there exists a sphere S = {z | z x < } such that S a(C) C.
For all (0, 1], let x = x + (1 )x and let S = {z | z x < }. It
can be seen that each point of S a(C) is a convex combination of x and some
point of S a(C). Therefore, S a(C) C, implying that x ri(C).
relative interior point. If m > 0, we can nd m linearly independent vectors
z
1
, . . . , z
m
in C that span a(C); otherwise there would exist r < m linearly
independent vectors in C whose span contains C, contradicting the fact that
the dimension of a(C) is m. Thus z
1
, . . . , z
m
form a basis for a(C).
Consider the set
X =
_
x
x =
m
i=1
izi,
m
i=1
i < 1, i > 0, i = 1, . . . , m
_
(see Fig. 1.2.6). This set is open relative to a(C), i.e., for every x X,
there exists an open set N such that x N and N a(C) X. [To see
this, note that X is the inverse image of the open set in '
m
_
(
1
, . . . ,
m
)
i=1
i
< 1,
i
> 0, i = 1, . . . , m
_
under the linear transformation from a(C) to '
m
that maps
m
i=1
i
z
i
into (
1
, . . . ,
m
); openness of the above set follows by continuity of linear
transformation.] Therefore all points of X are relative interior points of
C, and ri(C) is nonempty. Since by construction, a(X) = a(C) and
X ri(C), it follows that ri(C) and C have the same ane hull.
To show the last assertion of part (b), consider vectors
x
0
=
m
i=1
z
i
, x
i
= x
0
+ z
i
, i = 1, . . . , m,
42 Convex Analysis and Optimization Chap. 1
where is a positive scalar such that (m+1) < 1. The vectors x
0
, . . . , x
m
are in the set X and in the relative interior of C, since X ri(C). Further-
more, because x
i
x
0
= z
i
for all i and vectors z
1
, . . . , z
m
span a(C),
the vectors x
1
x
0
, . . . , x
m
x
0
also span a(C).
(c) If x ri(C), the given condition holds by the Line Segment Principle.
Conversely, let x satisfy the given condition. We will show that x ri(C).
By part (b), there exists a vector x ri(C). We may assume that x ,= x,
since otherwise we are done. By the given condition, since x is in C,
there is a > 1 such that y = x + ( 1)(x x) C. Then we have
x = (1 )x + y, where = 1/ (0, 1), so by part (a), we obtain
x ri(C). Q.E.D.
X
z
1
0
C
z
2
Figure 1.2.6. Construction of the relatively open set X in the proof of nonempti-
ness of the relative interior of a convex set C that contains the origin. We choose
m linearly independent vectors z1, . . . , zm C, where m is the dimension of
a(C), and let
X =
_
m
i=1
i
z
i
i=1
i
< 1,
i
> 0, i = 1, . . . , m
_
.
In view of Prop. 1.2.9(b), C and ri(C) all have the same dimension.
It can also be shown that C and cl(C) have the same dimension (see the
exercises). The next proposition gives several properties of closures and
relative interiors of convex sets.
Sec. 1.2 Convex Sets and Functions 43
Proposition 1.2.10: Let C be a nonempty convex set.
(a) cl(C) = cl
_
ri(C)
_
.
(b) ri(C) = ri
_
cl(C)
_
.
(c) Let C be another nonempty convex set. Then the following three
conditions are equivalent:
(i) C and C have the same relative interior.
(ii) C and C have the same closure.
(iii) ri(C) C cl(C).
(d) A ri(C) = ri(A C) for all mn matrices A.
(e) If C is bounded, then A cl(C) = cl(A C) for all mn matrices
A.
Proof: (a) Since ri(C) C, we have cl
_
ri(C)
_
cl(C). Conversely, let
x cl(C). We will show that x cl
_
ri(C)
_
. Let x be any point in ri(C)
[there exists such a point by Prop. 1.2.9(b)], and assume that that x ,= x
(otherwise we are done). By the Line Segment Principle [Prop. 1.2.9(a)],
we have x +(1 )x ri(C) for all (0, 1]. Thus, x is the limit of the
sequence
_
(1/k)x +(1 1/k)x [ k 1
_
that lies in ri(C), so x cl
_
ri(C)
_
.
(b) The inclusion C cl(C) follows from the denition of a relative interior
point and the fact a(C) = a
_
cl(C)
_
(see the exercises). To prove the
reverse inclusion, let z ri
_
cl(C)
_
. We will show that z ri(C). By Prop.
1.2.9(b), there exists an x ri(C). We may assume that x ,= z (otherwise
we are done). We choose > 1, with suciently close to 1 so that the
vector y = z + ( 1)(z x) belongs to ri
_
cl(C)
_
[cf. Prop. 1.2.9(c)], and
hence also to cl(C). Then we have z = (1)x+y where = 1/ (0, 1),
so by the Line Segment Principle [Prop. 1.2.9(a)], we obtain z ri(C).
(c) If ri(C) = ri(C), part (a) implies that cl(C) = cl(C). Similarly, if
cl(C) = cl(C), part (b) implies that ri(C) = ri(C). Furthermore, if any
of these conditions hold the relation ri(C) C cl(C) implies condition
(iii). Finally, assume that condition (iii) holds. Then by taking closures,
we have cl
_
ri(C)
_
cl(C) cl(C), and by using part (a), we obtain
cl(C) cl(C) cl(C). Hence C and C have the same closure.
(d) For any set X, we have A cl(X) cl(A X), since if a sequence
x
k
X converges to some x cl(X) then the sequence Ax
k
A X
converges to Ax, implying that Ax cl(A X). We use this fact and part
(a) to write
A ri(C) A C A cl(C) = A cl
_
ri(C)
_
cl
_
A ri(C)
_
.
44 Convex Analysis and Optimization Chap. 1
Thus AC lies between the set Ari(C) and the closure of that set, implying
that the relative interiors of the sets A C and A ri(C) are equal [part (c)].
Hence ri(AC) Ari(C). We will show the reverse inclusion by taking any
z A ri(C) and showing that z ri(A C). Let x be any vector in A C,
and let z ri(C) and x C be such that Az = z and Ax = x. By part
Prop. 1.2.9(c), there exists > 1 such that the vector y = z +( 1)(zx)
belongs to C. Thus we have Ay A C and Ay = z +( 1)(z x), so by
Prop. 1.2.9(c) it follows that z ri(A C).
(e) By the argument given in part (d), we have A cl(C) cl(A C). To
show the converse, choose any x cl(A C). Then, there exists a sequence
x
k
C such that Ax
k
x. Since C is bounded, x
k
has a subsequence
that converges to some x cl(C), and we must have Ax = x. It follows
that x A cl(C). Q.E.D.
Note that if C is closed but unbounded, the set A C need not be
closed [cf. part (e) of the above proposition]. For example take the closed
set C =
_
(x
1
, x
2
) [ x
1
x
2
1, x
1
0, x
2
0
_
and let A have the eect
of projecting the typical vector x on the horizontal axis, i.e., A (x
1
, x
2
) =
(x1, 0). Then A C is the (nonclosed) haline
_
(x1, x2) [ x1 > 0, x2 = 0
_
.
Proposition 1.2.11: Let C
1
and C
2
be nonempty convex sets:
(a) Assume that the sets ri(C1) and ri(C2) have a nonempty inter-
section. Then
cl(C
1
C
2
) = cl(C
1
) cl(C
2
), ri(C
1
C
2
) = ri(C
1
) ri(C
2
).
(b) ri(C1 + C2) = ri(C1) + ri(C2).
Proof: (a) Let y cl(C
1
) cl(C
2
). Let x be a vector in the intersection
ri(C
1
) ri(C
2
) (which is nonempty by assumption). By the Line Segment
Principle [Prop. 1.2.9(a)], the vector x+(1)y belongs to ri(C1)ri(C2)
for all (0, 1]. Hence y is the limit of a sequence
k
x + (1
k
)y
ri(C
1
) ri(C
2
) with
k
0, implying that y cl
_
ri(C
1
) ri(C
2
)
_
. Hence,
we have
cl(C
1
) cl(C
2
) cl
_
ri(C
1
) ri(C
2
)
_
cl(C
1
C
2
).
Also C
1
C
2
is contained in cl(C
1
) cl(C
2
), which is a closed set, so we
have
cl(C
1
C
2
) cl(C
1
) cl(C
2
).
Thus, equality holds throughout in the preceding two relations, so that
cl(C
1
C
2
) = cl(C
1
) cl(C
2
). Furthermore, the sets ri(C
1
) ri(C
2
) and
Sec. 1.2 Convex Sets and Functions 45
C
1
C
2
have the same closure. Therefore, by Prop. 1.2.10(c), they have
the same relative interior, implying that
ri(C
1
C
2
) ri(C
1
) ri(C
2
).
To show the converse, take any x ri(C
1
) ri(C
2
) and any y
C
1
C
2
. By Prop. 1.2.9(c), the line segment connecting x and y can be
prolonged beyond x by a small amount without leaving C1 C2. By the
same proposition, it follows that x ri(C
1
C
2
).
(b) Consider the linear transformation A : '
2n
'
n
given by A(x
1
, x
2
) =
x
1
+ x
2
for all x
1
, x
2
'
n
. The relative interior of the Cartesian product
C
1
C
2
(viewed as a subset of '
2n
) is easily seen to be ri(C
1
) ri(C
2
) (see
the exercises). Since A(C1 C2) = C1 + C2, the result follows from Prop.
1.2.10(d). Q.E.D.
The requirement that ri(C
1
) ri(C
2
) ,= is essential in part (a) of
the above proposition. As an example, consider the subsets of the real
line C
1
= x [ x > 0 and C
2
= x [ x < 0. Then we have cl(C
1
C2) = ,= 0 = cl(C1) cl(C2). Also, consider C1 = x [ x 0 and
C
2
= x [ x 0. Then we have ri(C
1
C
2
) = 0 , = = ri(C
1
) ri(C
2
).
Continuity of Convex Functions
We close this section with a basic result on the continuity properties of
convex functions.
Proposition 1.2.12: If f : '
n
' is convex, then it is continuous.
More generally, if f : '
n
(, ] is convex, then f when restricted
to dom(f), is continuous over the relative interior of dom(f).
Proof: Restricting attention to the ane hull of dom(f) and using a
transformation argument if necessary, we assume without loss of gener-
ality, that the origin is an interior point of dom(f) and that the unit cube
X = x [ |x|
i
e
i
, where each
i
is a nonnegative
scalar and
2
n
i=1
i
= 1. Let A = max
i
f(e
i
). From Jensens inequality
[Eq. (1.8)], it follows that f(x) A for every x X.
For the purpose of proving continuity at zero, we can assume that
x
k
X and x
k
,= 0 for all k. Consider the sequences y
k
and z
k
given
46 Convex Analysis and Optimization Chap. 1
by
y
k
=
x
k
|x
k
|
, z
k
=
x
k
|x
k
|
;
(cf. Fig. 1.2.7). Using the denition of a convex function for the line
segment that connects y
k
, x
k
, and 0, we have
f(x
k
)
_
1 |x
k
|
_
f(0) +|x
k
|
f(y
k
).
Since |x
k
|
0 and f(y
k
) A for all k, by taking the limit as k ,
we obtain
limsup
k
f(x
k
) f(0).
Using the denition of a convex function for the line segment that connects
x
k
, 0, and z
k
, we have
f(0)
|x
k
|
|x
k
|
+ 1
f(z
k
) +
1
|x
k
|
+ 1
f(x
k
)
and letting k , we obtain
f(0) liminf
k
f(x
k
).
Thus, lim
k
f(x
k
) = f(0) and f is continuous at zero. Q.E.D.
e
1
x
k
x
k+1
0
y
k e
3
e
2
e
4
z
k
Figure 1.2.7. Construction for proving
continuity of a convex function (cf. Prop.
1.2.12).
A straightforward consequence of the continuity of a real-valued func-
tion f that is convex over '
n
is that its epigraph as well as the level sets
_
x [ f(x)
_
are closed and convex (cf. Prop. 1.2.2).
Sec. 1.2 Convex Sets and Functions 47
1.2.4 Recession Cones
Some of the preceding results [Props. 1.2.8, 1.2.10(e)] have illustrated how
boundedness aects the topological properties of sets obtained through
various operations on convex sets. In this section we take a closer look at
this issue.
Given a convex set C, we say that a vector y is a direction of recession
of C if x + y C for all x C and 0. In words, y is a direction
of recession of C if starting at any x in C and going indenitely along y,
we never cross the relative boundary of C to points outside C. The set of
all directions of recession is a cone containing the origin. It is called the
recession cone of C and it is denoted by R
C
(see Fig. 1.2.8). The following
proposition gives some properties of recession cones.
0
x + y
x
Convex Set C
Recession Cone R
C
y
Figure 1.2.8. Illustration of the recession cone R
C
of a convex set C. A direction
of recession y has the property that x +y C for all x C and 0.
Proposition 1.2.13: (Recession Cone Theorem) Let C be a
nonempty closed convex set.
(a) The recession cone R
C
is a closed convex cone.
(b) A vector y belongs to R
C
if and only if there exists a vector
x C such that x + y C for all 0.
(c) R
C
contains a nonzero direction if and only if C is unbounded.
(d) The recession cones of C and ri(C) are equal.
(e) If D is another closed convex set such that C D ,= , we have
R
CD
= R
C
R
D
.
48 Convex Analysis and Optimization Chap. 1
Proof: (a) If y
1
, y
2
belong to R
C
and
1
,
2
are positive scalars such that
1 + 2 = 1, we have for any x C and 0
x + (
1
y
1
+
2
y
2
) =
1
(x + y
1
) +
2
(x + y
2
) C,
where the last inclusion holds because x+y
1
and x +y
2
belong to C by
the denition of R
C
. Hence
1
y
1
+
2
y
2
R
C
, implying that R
C
is convex.
Let y be in the closure of R
C
, and let y
k
R
C
be a sequence
converging to y. For any x C and 0 we have x + y
k
C for all k,
and because C is closed, we have x + y C. This implies that y R
C
and that R
C
is closed.
(b) If y R
C
, every vector x C has the required property by the def-
inition of R
C
. Conversely, let y be such that there exists a vector x C
with x + y C for all 0. We x x C and > 0, and we show
that x +y C. We may assume that y ,= 0 (otherwise we are done) and
without loss of generality, we may assume that |y| = 1. Let
z
k
= x + ky, k = 1, 2, . . . .
If x = z
k
for some k, then x +y = x +(k +1)y which belongs to C and
we are done. We thus assume that x ,= z
k
for all k, and we dene
y
k
=
z
k
x
|z
k
x|
, k = 1, 2, . . .
(see the construction of Fig. 1.2.9).
x + y
x
Convex Set C
x
z
k-2
z
k-1
z
k
x + y
x + y
k
Unit Ball
x + y
Figure 1.2.9. Construction for the proof of Prop. 1.2.13(b).
We have
y
k
=
|z
k
x|
|z
k
x|
z
k
x
|z
k
x|
+
x x
|z
k
x|
=
|z
k
x|
|z
k
x|
y +
x x
|z
k
x|
.
Sec. 1.2 Convex Sets and Functions 49
Because z
k
is an unbounded sequence,
|z
k
x|
|z
k
x|
1,
x x
|z
k
x|
0,
so by combining the preceding relations, we have y
k
y. Thus x + y
is the limit of x + y
k
. The vector x + y
k
lies between x and z
k
in
the line segment connecting x and z
k
for all k such that |z
k
x| , so
by convexity of C, we have x + y
k
C for all suciently large k. Since
x+y
k
x+y and C is closed, it follows that x+y must belong to C.
(c) Assuming that C is unbounded, we will show that R
C
contains a nonzero
direction (the reverse is clear). Choose any x C and any unbounded
sequence z
k
C. Consider the sequence y
k
, where
y
k
=
z
k
x
|z
k
x|
,
and let y be a limit point of y
k
(compare with the construction of Fig.
1.2.9). For any xed 0, the vector x+y
k
lies between x and z
k
in the
line segment connecting x and z
k
for all k such that |z
k
x| . Hence
by convexity of C, we have x + y
k
C for all suciently large k. Since
x +y is a limit point of x +y
k
, and C is closed, we have x +y C.
Hence the nonzero vector y is a direction of recession.
(d) If y R
ri(C)
, then for a xed x ri(C) and all 0, we have
x + y ri(C) C. Hence, by part (b), we have y R
C
. Conversely, if
y R
C
, for any x ri(C), we have x + y C for all 0. It follows
from the Line Segment Principle [cf. Prop. 1.2.9(a)] that x+y ri(C) for
all 0, so that y belongs to R
ri(C)
.
(e) By the denition of direction of recession, y R
CD
implies that
x + y C D for all x C D and all > 0. By part (b), this in turn
implies that y R
C
and y R
D
, so that R
CD
R
C
R
D
. Conversely,
by the denition of direction of recession, if y R
C
R
D
and x CD, we
have x+y C D for all > 0, so y R
CD
. Thus, R
C
R
D
R
CD
.
Q.E.D.
Note that part (c) of the above proposition yields a characterization
of compact and convex sets, namely that a closed convex set is bounded
if and only if R
C
= 0. A useful generalization is that for a compact set
W '
m
and an mn matrix A, the set
V = x C [ Ax W
is compact if and only if R
C
N(A) = 0. To see this, note that the
recession cone of the set
V = x '
n
[ Ax W
50 Convex Analysis and Optimization Chap. 1
is N(A) [clearly N(A) R
V
; if x / N(A) but x R
V
we must have
Ax W for all > 0, which contradicts the boundedness of W]. Hence,
the recession cone of V is R
C
N(A), so by Prop. 1.2.13(c), V is compact
if and only if R
C
N(A) = 0.
One possible use of recession cones is to obtain conditions guaran-
teeing the closure of linear transformations and vector sums of convex sets
in the absence of boundedness, as in the following two propositions (some
renements are given in the exercises).
Proposition 1.2.14: Let C be a nonempty closed convex subset of
'
n
and let A be an m n matrix with nullspace N(A) such that
R
C
N(A) = 0. Then AC is closed.
Proof: For any y cl(AC), the set
C
=
_
x C [ |y Ax|
_
is nonempty for all > 0. Furthermore, by the discussion following the
proof of Prop. 1.2.13, the assumption R
C
N(A) = 0 implies that C
is
compact. It follows that the set >0C is nonempty and any x >0C
satises Ax = y, so y AC. Q.E.D.
Proposition 1.2.15: Let C
1
, . . . , C
m
be nonempty closed convex sub-
sets of '
n
such that the equality y
1
+ + y
m
= 0 for some vectors
y
i
R
C
i
implies that y
i
= 0 for all i = 1, . . . , m. Then the vector sum
C
1
+ + C
m
is a closed set.
Proof: Let C be the Cartesian product C
1
C
m
viewed as a sub-
set of '
mn
and let A be the linear transformation that maps a vector
(x1, . . . , xm) '
mn
into x1 + + xm. We have
R
C
= R
C
1
+ + R
Cm
(see the exercises) and
N(A) =
_
(y1, . . . , ym) [ y1 + + ym = 0, yi '
n
_
,
so under the given condition, we obtain R
C
N(A) = 0. Since AC =
C
1
+ + C
m
, the result follows by applying Prop. 1.2.14. Q.E.D.
When specialized to just two sets C
1
and C
2
, the above proposition
says that if there is no nonzero direction of recession of C
1
that is the
Sec. 1.2 Convex Sets and Functions 51
opposite of a direction of recession of C
2
, then C
1
+ C
2
is closed. This is
true in particular if R
C
1
= 0 which is equivalent to C1 being compact
[cf. Prop. 1.2.13(c)]. We thus obtain the following proposition.
Proposition 1.2.16: Let C
1
and C
2
be closed, convex sets. If C
1
is
bounded, then C1 + C2 is a closed and convex set. If both C1 and C2
are bounded, then C
1
+ C
2
is a compact and convex set.
Proof: Closedness of C
1
+ C
2
follows from the preceding discussion. If
both C
1
and C
2
are bounded, then C
1
+C
2
is also bounded and hence also
compact. Q.E.D.
Note that if C
1
and C
2
are both closed and unbounded, the vector
sum C1 +C2 need not be closed. For example consider the closed sets of '
2
given by C
1
=
_
(x
1
, x
2
) [ x
1
x
2
1, x
1
0, x
2
0
_
and C
2
= (x
1
, x
2
) [
x
1
= 0
_
. Then C
1
+ C
2
is the open halfspace (x
1
, x
2
) [ x
1
> 0.
E X E R C I S E S
1.2.1
(a) Show that a set is convex if and only if it contains all the convex combina-
tions of its elements.
(b) Show that the convex hull of a set coincides with the set of all the convex
combinations of its elements.
1.2.2
Let C be a nonempty set in !
n
, and let 1 and 2 be positive scalars. Show
by example that the sets (1 + 2)C and 1C + 2C may dier when C is not
convex [cf. Prop. 1.2.1].
1.2.3 (Properties of Cones)
(a) For any collection |Ci [ i I of cones, the intersection iICi is a cone.
(b) The Cartesian product C1 C2 of two cones C1 and C2 is a cone.
52 Convex Analysis and Optimization Chap. 1
(c) The vector sum C1 + C2 of two cones C1 and C2 is a cone.
(d) The closure of a cone is a cone.
(e) The image and the inverse image of a cone under a linear transformation
is a cone.
1.2.4 (Convex Cones)
(a) For any collection of vectors |ai [ i I, the set C = |x [ a
i
x 0, i I
is a closed convex cone.
(b) Show that a cone C is convex if and only if C +C C.
(c) Let C1 and C2 be convex cones containing the origin. Show that
C1 + C2 = conv(C1 C2),
C1 C2 =
_
[0,1]
_
(1 )C1 C2
_
.
1.2.5 (Properties of Cartesian Product)
Given sets Xi !
n
i
, i = 1, . . . , m, let X = X1 Xm be their Cartesian
product.
(a) Show that the convex hull (closure, ane hull) of X is equal to the Cartesian
product of the convex hulls (closures, ane hulls, respectively) of the Xis.
(b) Show that cone(X) is equal to the Cartesian product of cone(Xi).
(c) Assuming X1, . . . , Xm are convex, show that the relative interior (reces-
sion cone) of X is equal to the Cartesian product of the relative interiors
(recession cones) of the Xis.
1.2.6
Let |Ci [ i I be an arbitrary collection of convex sets in !
n
, and let C be the
convex hull of the union of the collection. Show that
C =
_
_
iI
iCi
_
,
where the union is taken over all convex combinations such that only nitely
many coecients i are nonzero.
Sec. 1.2 Convex Sets and Functions 53
1.2.7
Let X be a nonempty set.
(a) Show that X, conv(X), and cl(X) have the same dimension.
(b) Show that cone(X) = cone
_
conv(X)
_
.
(c) Show that the dimension of conv(X) is at most as large as the dimension
of cone(X). Give an example where the dimension of conv(X) is smaller
than the dimension of cone(X).
(d) Assuming that the origin belongs to conv(X), show that conv(X) and
cone(X) have the same dimension.
1.2.8 (Lower Semicontinuity under Composition)
(a) Let f : !
n
!
m
be a continuous function and g : !
m
! be a lower
semicontinuous function. Show that the function h dened by h(x) =
g
_
f(x)
_
is lower semicontinuous.
(b) Let f : !
n
! be a lower semicontinuous function, and g : ! ! be
a lower semicontinuous and monotonically nondecreasing function. Show
that the function h dened by h(x) = g
_
f(x)
_
is lower semicontinuous.
1.2.9 (Convexity under Composition)
(a) Let f be a convex function dened on a convex set C !
n
, and let g
be a convex monotonically nondecreasing function of a single variable [i.e.,
g(y) g(y) for y < y]. Show that the function h dened by h(x) = g
_
f(x)
_
is convex over C. In addition, if g is monotonically increasing and f is
strictly convex, then h is strictly convex.
(b) Let f = (f1, . . . , fm) where each fi : !
n
! is a convex function, and let
g : !
m
! be a function such that g(u1, . . . , um) is convex and nondecreas-
ing in ui for each i. Show that the function h dened by h(x) = g
_
f(x)
_
is convex.
1.2.10 (Convex Functions)
Show that the following functions are convex:
(a) f1 : X ! is given by
f1(x1, . . . , xn) = (x1x2 xn)
1
n
where X = |(x1, . . . , xn) [ x1 , . . . , xn 0.
(b) f2(x) = ln
_
e
x
1
+ + e
xn
_
with (x1, . . . , xn) !
n
.
(c) f3(x) = |x|
p
with p 1 and x !
n
.
54 Convex Analysis and Optimization Chap. 1
(d) f4(x) =
1
f(x)
with f a concave function over !
n
and f(x) > 0 for all x.
(e) f5(x) = f(x) + with f a convex function over !
n
, and and scalars
such that 0.
(f ) f6(x) = max|0, f(x) with f a convex function over !
n
.
(g) f7(x) = [[Ax b[[ with A an mn matrix and b a vector in !
m
.
(h) f8(x) = x
Ax + b
Ax
with A an nn positive semidenite symmetric matrix and
a positive scalar.
(j) f10(x) = f(Ax +b) with f a convex function over !
m
, A an mn matrix,
and b a vector in !
m
.
1.2.11
Use the Line Segment Principle and the method of proof of Prop. 1.2.5(c) to show
that if C is a convex set with nonempty interior, and f : !
n
! is convex and
twice continuously dierentiable over C, then
2
f(x) is positive semidenite for
all x C.
1.2.12
Let C !
n
be a convex set and let f : !
n
! be twice continuously dieren-
tiable over C. Let S be the subspace that is parallel to the ane hull of C. Show
that f is convex over C if and only if y
2
f(x)y 0 for all x C and y S.
1.2.13
Let f : !
n
! be a dierentiable function. Show that f is convex over a convex
set C if and only if
_
f(x) f(y)
_
(x y) 0, x, y C.
Hint: The condition above says that the function f, restricted to the line segment
connecting x and y, has monotonically nondecreasing gradient; see also the proof
of Prop. 1.2.6.
1.2.14 (Ascent/Descent Behavior of a Convex Function)
Let f : ! ! be a convex function of a single variable.
(a) (Monotropic Property) Use the denition of convexity to show that f is
turning upwards in the sense that if x1, x2, x3 are three scalars such that
x1 < x2 < x3, then
f(x2) f(x1)
x2 x1
f(x3) f(x2)
x3 x2
.
Sec. 1.2 Convex Sets and Functions 55
(b) Use part (a) to show that there are four possibilities as x increases to :
(1) f(x) decreases monotonically to , (2) f(x) decreases monotonically
to a nite value, (3) f(x) reaches some value and stays at that value, (4)
f(x) increases monotonically to when x x for some x !.
1.2.15 (Posynomials)
A posynomial of scalar variables y1, . . . , yn is a function of the form
g(y1, . . . , yn) =
m
i=1
iy
a
i1
1
y
a
in
n ,
where the scalars yi and i are positive, and the exponents aij are real numbers.
Show the following:
(a) A posynomial need not be convex.
(b) By a logarithmic change of variables, where we set
f(x) = ln
_
g(y1, . . . , yn)
_
, bi = lni, xi = ln yi, i = 1, . . . , m,
we obtain a convex function
f(x) = ln exp(Ax + b)
dened for all x !
n
, where lnexp(x) = ln
_
e
x
1
+ +e
xn
_
, A is an mn
matrix with entries aij, and b !
m
is a vector with components bi.
(c) In general, every function g : !
n
! of the form
g(y) = g1(y)
1
gr(y)
r
,
where gk is a posynomial and k > 0 for all k, can be transformed by the
logarithmic change of variables into a convex function f given by
f(x) =
r
k=1
k ln exp(Akx +bk),
with the matrix Ak and the vector bk being associated with the posynomial
gk for each k.
1.2.16 (Arithmetic-Geometric Mean Inequality)
Show that if 1, . . . , n are positive scalars with
n
i=1
i = 1, then for every set
of positive scalars x1, . . . , xn, we have
x
1
1
x
2
2
x
n
n
1x1 + 2x2 + + nxn,
with equality if and only if x1 = x2 = = xn. Hint: Show that lnx is a
strictly convex function on (0, ).
56 Convex Analysis and Optimization Chap. 1
1.2.17
Use the result of Exercise 1.2.16 to verify Youngs inequality
xy
x
p
p
+
y
q
q
,
where p > 0, q > 0, 1/p + 1/q = 1, x 0, and y 0. Then, use Youngs
inequality to verify Holders inequality
n
i=1
[xiyi[
_
n
i=1
[xi[
p
_
1/p
_
n
i=1
[yi[
q
_
1/q
.
1.2.18 (Convexity under Minimization I)
Let h : !
n+m
! be a convex function. Consider the function f : !
n
!
given by
f(x) = inf
uU
h(x, u),
where we assume that U is a nonempty and convex subset of !
m
, and that
f(x) > for all x. Show that f is convex, and for each x, the set |u U [
h(x, u) = f(x) is convex. Hint: There cannot exist [0, 1], x1, x2, u1 U,
u2 U such that f
_
x1 + (1 )x2
_
> h(x1, u1) + (1 )h(x2, u2).
1.2.19 (Convexity under Minimization II)
(a) Let C be a convex set in !
n+1
and let
f(x) = inf|w [ (x, w) C.
Show that f is convex over !
n
.
(b) Let f1, . . . , fm be convex functions over !
n
and let
f(x) = inf
_
m
i=1
fi(xi)
i=1
xi = x
_
.
Assuming that f(x) > for all x, show that f is convex over !
n
.
(c) Let h : !
m
! be a convex function and let
f(x) = inf
By=x
h(y),
where B is an n m matrix. Assuming that f(x) > for all x, show
that f is convex over the range space of B.
(d) In parts (b) and (c), show by example that if the assumption f(x) >
for all x is violated, then the set
_
x [ f(x) >
_
need not be convex.
Sec. 1.2 Convex Sets and Functions 57
1.2.20 (Convexity under Minimization III)
Let |fi [ i I be an arbitrary collection of convex functions on !
n
. Dene
the convex hull of these functions f : !
n
! as the pointwise inmum of the
collection, i.e.,
f(x) = inf
_
w [ (x, w) conv
_
iIepi(fi)
__
.
Show that f(x) is given by
f(x) = inf
_
iI
ifi(xi)
iI
ixi = x
_
,
where the inmum is taken over all representations of x as a convex combination
of elements xi, such that only nitely many coecients i are nonzero.
1.2.21 (Convexication of Nonconvex Functions)
Let X be a nonempty subset of !
n
, let f : X ! be a function that is bounded
below over X. Dene the function F : conv(X) ! by
F(x) = inf
_
w [ (x, w) conv
_
epi(f)
__
.
(a) Show that F is convex over conv(X) and it is given by
F(x) = inf
_
m
i=1
if(xi)
i=1
ixi = x, xi X,
m
i=1
i = 1, i 0, m 1
_
(b) Show that
inf
xconv(X)
F(x) = inf
xX
f(x).
(c) Show that the set of global minima of F over conv(X) includes all global
minima of f over X.
1.2.22 (Minimization of Linear Functions)
Show that minimization of a linear function over a set is equivalent to minimiza-
tion over its convex hull. In particular, if X !
n
and c !
n
, then
inf
xX
c
x = inf
conv(X)
c
x.
Furthermore, the inmum in the left-hand side above is attained if and only if
the inmum in the right-hand side is attained.
58 Convex Analysis and Optimization Chap. 1
1.2.23 (Extension of Caratheodorys Theorem)
Let X1 and X2 be subsets of !
n
, and let X = conv(X1) + cone(X2). Show that
every vector x in X can be represented in the form
x =
k
i=1
ixi +
m
i=k+1
ixi,
where m is a positive integer with m n+1, the vectors x1, . . . , xk belong to X1,
the vectors x
k+1
, . . . , xm belong to X2, and the scalars 1, . . . , m are nonnegative
with 1+ +
k
= 1. Furthermore, the vectors x2x1, . . . , x
k
x1, x
k+1
, . . . , xm
are linearly independent.
1.2.24
Let x0, . . . , xm be vectors in !
n
such that x1 x0, . . . , xm x0 are linearly
independent. The convex hull of x0, . . . , xm is called an m-dimensional simplex,
and x0, . . . , xm are called the vertices of the simplex.
(a) Show that the dimension of a convex set C is the maximum of the dimen-
sions of the simplices included in C.
(b) Use part (a) to show that a nonempty convex set has a nonempty relative
interior.
1.2.25
Let X be a bounded subset of !
n
. Show that
cl
_
conv(X)
_
= conv
_
cl(X)
_
.
In particular, if X is closed and bounded, then conv(X) is closed and bounded
(cf. Prop. 1.2.8).
1.2.26
Let C1 and C2 be two nonempty convex sets such that C1 C2.
(a) Give an example showing that ri(C1) need not be a subset of ri(C2).
(b) Assuming that the sets ri(C1) and ri(C2) have nonempty intersection, show
that ri(C1) ri(C2).
(c) Assuming that the sets C1 and ri(C2) have nonempty intersection, show
that the set ri(C1) ri(C2) is nonempty.
Sec. 1.2 Convex Sets and Functions 59
1.2.27
Let C be a nonempty convex set.
(a) Show the following renement of the Line Segment Principle [Prop. 1.2.9(c)]:
x ri(C) if and only if for every x a(C), there exists > 1 such that
x + ( 1)(x x) C.
(b) Assuming that the origin lies in ri(C), show that cone(C) coincides with
a(C).
(c) Show the following extension of part (b) to a nonconvex set: If X is a
nonempty set such that the origin lies in the relative interior of conv(X),
then cone(X) coincides with a(X).
1.2.28
Let C be a compact set.
(a) Assuming that C is a convex set not containing the origin in its relative
boundary, show that cone(C) is closed.
(b) Give examples showing that the assertion of part (a) fails if C is unbounded
or C contains the origin in its relative boundary.
(c) The convexity assumption in part (a) can be relaxed as follows: assuming
that conv(C) does not contain the origin in its boundary, show that cone(C)
is closed. Hint: Use part (a) and Exercise 1.2.7(b).
1.2.29
(a) Let C be a convex cone. Show that ri(C) is also a convex cone.
(b) Let C = cone
_
|x1, . . . , xm
_
. Show that
ri(C) =
_
m
i=1
ixi
i > 0, i = 1, . . . , m
_
.
1.2.30
Let A be an mn matrix and let C be a nonempty convex set in !
m
. Assuming
that the inverse image A
1
C is nonempty, show that
ri(A
1
C) = A
1
ri(C), cl(A
1
C) = A
1
cl(C).
[Compare these relations with those of Prop. 1.2.10(d) and (e), respectively.]
60 Convex Analysis and Optimization Chap. 1
1.2.31 (Lipschitz Property of Convex Functions)
Let f : !
n
! be a convex function and X be a bounded set in !
n
. Then f
has the Lipschitz property over X, i.e., there exists a positive scalar c such that
[f(x) f(y)[ c [[x y[[, x, y X.
1.2.32
Let C be a closed convex set and let M be an ane set such that the intersection
CM is nonempty and bounded. Show that for every ane set M that is parallel
to M, the intersection C M is bounded when nonempty.
1.2.33 (Recession Cones of Nonclosed Sets)
Let C be a nonempty convex set.
(a) Show by counterexample that part (b) of the Recession Cone Theorem need
not hold when C is not closed.
(b) Show that
RC R
cl(C)
, cl(RC) R
cl(C)
.
Give an example where the inclusion cl(RC) R
cl(C)
is strict, and another
example where cl(RC) = R
cl(C)
. Also, give an example showing that R
ri(C)
need not be a subset of RC.
(c) Let C be a closed convex set such that C C. Show that RC R
C
. Give
an example showing that the inclusion can fail if C is not closed.
1.2.34 (Recession Cones of Relative Interiors)
Let C be a nonempty convex set.
(a) Show that a vector y belongs to R
ri(C)
if and only if there exists a vector
x ri(C) such that x + y C for every 0.
(b) Show that R
ri(C)
= R
cl(C)
.
(c) Let C be a relatively open convex set such that C C. Show that RC
R
C
. Give an example showing that the inclusion can fail if C is not relatively
open. [Compare with Exercise 1.2.33(b).]
Hint: In part (a), follow the proof of part (b) of the Recession Cone Theorem.
In parts (b) and (c), use the result of part (a).
Sec. 1.3 Convexity and Optimization 61
1.2.35
Let C be a nonempty convex set in !
n
and let A be an mn matrix.
(a) Show the following renement of Prop. 1.2.14: if R
cl(C)
N(A) = |0, then
cl(A C) = A cl(C), A R
cl(C)
= R
Acl(C)
.
(b) Give an example showing that A R
cl(C)
and R
Acl(C)
can dier when
R
cl(C)
N(A) ,= |0.
1.2.36 (Lineality Space and Recession Cone)
Let C be a nonempty convex set in !
n
. Dene the lineality space of C, denoted by
L, to be a subspace of vectors y such that simultaneously y RC and y RC.
(a) Show that for every subspace S L
C = (C S
) +S.
(b) Show the following renement of Prop. 1.2.14 and Exercise 1.2.35: if A is
an mn matrix and R
cl(C)
N(A) is a subspace of L, then
cl(A C) = A cl(C), R
Acl(C)
= A R
cl(C)
.
1.2.37
This exercise is a renement of Prop. 1.2.15.
(a) Let C1, . . . , Cm be nonempty closed convex sets in !
n
such that the equality
y1 + +ym = 0 with yi RC
i
implies that each yi belongs to the lineality
space of Ci. Then the vector sum C1 + + Cm is a closed set and
RC
1
++Cm
= RC
1
+ + RCm
.
(b) Show the following extension of part (a) to nonclosed sets: Let C1, . . . , Cm
be nonempty convex sets in !
n
such that the equality y1+ +ym = 0 with
yi R
cl(C
i
)
implies that each yi belongs to the lineality space of cl(Ci).
Then we have
cl(C1 + + Cm) = cl(C1) + + cl(Cm),
R
cl(C
1
++Cm)
= R
cl(C
1
)
+ + R
cl(Cm)
.
62 Convex Analysis and Optimization Chap. 1
1.3 CONVEXITY AND OPTIMIZATION
In this section we discuss applications of convexity to some basic optimiza-
tion issues, such as the existence and uniqueness of global minima. Several
other applications, relating to optimality conditions and polyhedral con-
vexity, will be discussed in subsequent sections.
1.3.1 Global and Local Minima
Let X be a nonempty subset of '
n
and let f : '
n
(, ] be a
function. We say that a vector x
X is a minimum of f over X if
f(x
) = inf
xX
f(x). We also call x
arg min
xX
f(x).
We use similar terminology for maxima, i.e., a vector x
X such that
f(x
) = sup
xX
f(x) is said to be a maximum of f over X, and we indicate
this by writing
x
arg max
xX
f(x).
If the domain of f is the set X (instead of '
n
), we also call x
a (global)
minimum or (global) maximum of f (without the qualier over X).
A basic question in minimization problems is whether an optimal
solution exists. This question can often be resolved with the aid of the
classical theorem of Weierstrass, which states that a continuous function
attains a minimum over a compact set. We will provide a more general
version of this theorem, and to this end, we introduce some terminology.
We say that a function f : '
n
(, ] is coercive if
lim
k
f(x
k
) =
for every sequence x
k
such that |x
k
| for some norm | |. Note
that as a consequence of the denition, the level sets
_
x [ f(x) of a
coercive function f are bounded whenever they are nonempty.
Sec. 1.3 Convexity and Optimization 63
Proposition 1.3.1: (Weierstrass Theorem) Let f : '
n
(, ]
be a closed function. Assume that one of the following three conditions
holds:
(1) dom(f) is compact.
(2) dom(f) is closed and f is coercive.
(3) There exists a scalar such that the set
_
x [ f(x)
_
is nonempty and compact.
Then, f attains a minimum over '
n
.
Proof: If f(x) = for all x '
n
, then every x '
n
attains the min-
imum of f over '
n
. Thus, with no loss of generality, we assume that
inf
x
n f(x) < . Assume condition (1). Let x
k
dom(f) be a se-
quence such that
lim
k
f(x
k
) = inf
x
n
f(x).
Since dom(f) is bounded, this sequence has at least one limit point x
[cf.
Prop. 1.2.2(b)], so that f(x
) lim
k
f(x
k
) = inf
x
n f(x), and x
is a
minimum of f.
Assume condition (2). Consider a sequence x
k
as in the proof under
condition (1). Since f is coercive, x
k
must be bounded and the proof
proceeds similar to the proof under condition (1).
Assume condition (3). If the given is equal to inf
x
n f(x), the set
of minima of f over '
n
is
_
x [ f(x)
_
, and since by assumption this set
is nonempty, we are done. If inf
x
n f(x) < , consider a sequence x
k
as
in the proof under condition (1). Then, for all k suciently large, x
k
must
belong to the set
_
x [ f(x)
_
. Since this set is compact, x
k
must be
bounded and the proof proceeds similar to the proof under condition (1).
Q.E.D.
The most common application of the above proposition is when we
want to minimize a real-valued function f : '
n
' over a nonempty set
X. Then by applying Prop. 1.3.1 to the extended real-valued function
f(x) =
_
f(x) if x X,
otherwise,
(1.9)
we see that f attains a minimum over X if X is closed and f is lower
semicontinuous over X (implying that
f is closed), and either (1) X is
64 Convex Analysis and Optimization Chap. 1
bounded or (2) f is coercive or (3) some level set
_
x X [ f(x)
_
is
nonempty and compact.
For another example, suppose that we want to minimize a closed
convex function f : '
n
(, ] over a closed convex set X. Then, by
using the function
f of Eq. (1.9), we see that f attains a minimum over
X if the set X dom(f) is compact or if f is coercive or if some level set
_
x X [ f(x)
_
is nonempty and compact.
Note that with appropriate adjustments, the preceding analysis ap-
plies to the existence of maxima of f over X. For example, if a real-valued
function f is upper semicontinuous at all points of a compact set X, then
f attains a maximum over X.
We say that a vector x
is a
local minimum that is not global. Then there must exist an x X such that
f(x) < f(x
+ (1 )x
_
f(x
).
Thus, f has strictly lower value than f(x
with x, except at x
over X.
Sec. 1.3 Convexity and Optimization 65
Proposition 1.3.2: If X '
n
is a convex set and f : X (, ]
is a convex function, then a local minimum of f is also a global mini-
mum. If in addition f is strictly convex, then there exists at most one
global minimum of f.
Proof: See Fig. 1.3.1 for a proof that a local minimum of f is also global.
Let f be strictly convex, and to obtain a contradiction, assume that two
distinct global minima x and y exist. Then the midpoint (x + y)/2 must
belong to X, since X is convex. Furthermore, the value of f must be
smaller at the midpoint than at x and y by the strict convexity of f. Since
x and y are global minima, we obtain a contradiction. Q.E.D.
1.3.2 The Projection Theorem
In this section we develop a basic result of analysis and optimization.
Proposition 1.3.3: (Projection Theorem) Let C be a nonempty
closed convex set and let | | be the Euclidean norm.
(a) For every x '
n
, there exists a unique vector z C that mini-
mizes |z x| over all z C. This vector is called the projection
of x on C, and is denoted by P
C
(x), i.e.,
P
C
(x) = arg min
zC
|z x|.
(b) For every x '
n
, a vector z C is equal to P
C
(x) if and only if
(y z)
(x z) 0, y C.
(c) The function f : '
n
C dened by f(x) = P
C
(x) is continuous
and nonexpansive, i.e.,
_
_
P
C
(x) P
C
(y)
_
_
|x y|, x, y '
n
.
(d) The distance function d : '
n
', dened by
d(x, C) = min
zC
|z x|,
is convex.
66 Convex Analysis and Optimization Chap. 1
Proof: (a) Fix x and let w be some element of C. Minimizing |xz| over
all z C is equivalent to minimizing the same function over all z C such
that |xz| |xw|, which is a compact set. Furthermore, the function
g dened by g(z) = |z x|
2
is continuous. Existence of a minimizing
vector follows by Weierstrass Theorem (Prop. 1.3.1).
To prove uniqueness, notice that the square of the Euclidean norm
is a strictly convex function of its argument because its Hessian matrix is
the identity matrix, which is positive denite [Prop. 1.2.5(b)]. Therefore,
g is strictly convex and it follows that its minimum is attained at a unique
point (Prop. 1.3.2).
(b) For all y and z in C we have
|yx|
2
= |yz|
2
+|zx|
2
2(yz)
(xz) |zx|
2
2(yz)
(xz).
Therefore, if z is such that (y z)
= y + (1 )z. We have
|x y
|
2
= |(1 )(x z) + (x y)|
2
= (1 )
2
|x z|
2
+
2
|x y|
2
+ 2(1 )(x z)
(x y).
Viewing |x y
|
2
as a function of , we have
_
|x y
|
2
_
=0
= 2|x z|
2
+2(x z)
(x y) = 2(y z)
(x z).
Therefore, if (y z)
_
|x y
|
2
_
=0
< 0
and for positive but small enough , we obtain |x y
(x z) 0 for all
y C.
(c) Let x and y be elements of '
n
. From part (b), we have
_
wP
C
(x)
_
_
x
P
C
(x)
_
0 for all w C. Since P
C
(y) C, we obtain
_
P
C
(y) P
C
(x)
_
_
x P
C
(x)
_
0.
Similarly,
_
P
C
(x) P
C
(y)
_
_
y P
C
(y)
_
0.
Adding these two inequalities, we obtain
_
P
C
(y) P
C
(x)
_
_
x P
C
(x) y + P
C
(y)
_
0.
Sec. 1.3 Convexity and Optimization 67
By rearranging and by using the Schwartz inequality, we have
_
_
P
C
(y) P
C
(x)
_
_
2
_
P
C
(y) P
C
(x)
_
(y x)
_
_
P
C
(y)P
C
(x)
_
_
|y x|,
showing that P
C
() is nonexpansive and a fortiori continuous.
(d) Assume, to arrive at a contradiction, that there exist x
1
, x
2
'
n
and
an (0, 1) such that
d
_
x
1
+ (1 )x
2
, C
_
> d(x
1
, C) + (1 )d(x
2
, C).
Then there must exist z1, z2 C such that
d
_
x
1
+ (1 )x
2
, C
_
> |z
1
x
1
| + (1 )|z
2
x
2
|,
which implies that
_
_
z
1
+ (1 )z
2
x
1
(1 )x
2
_
_
> |x
1
z
1
| + (1 )|x
2
z
2
|.
This contradicts the triangle inequality in the denition of norm. Q.E.D.
Figure 1.3.2 illustrates the necessary and sucient condition of part
(b) of the Projection Theorem.
x P
C
(x)
y
C
Figure 1.3.2. Illustration of the condition
satised by the projection P
C
(x). For each
vector y C, the vectors x P
C
(x) and
y P
C
(x) form an angle greater than or
equal to /2 or, equivalently,
_
y P
C
(x)
_
_
x P
C
(x)
_
0.
1.3.3 Directions of Recession and Existence of Optimal Solutions
The recession cone, discussed in Section 1.2.4, is also useful for analyzing
the existence of optimal solutions of convex optimization problems. A key
idea here is that a convex function can be described in terms of its epigraph,
which is a convex set. The recession cone of the epigraph can be used to
obtain the directions along which the function decreases monotonically.
This is the idea underlying the following proposition.
68 Convex Analysis and Optimization Chap. 1
Proposition 1.3.4: Let f : '
n
(, ] be a closed convex func-
tion and consider the level sets L
a
=
_
x [ f(x) a
_
.
(a) All the level sets L
a
that are nonempty have the same recession
cone, given by
R
La
=
_
y [ (y, 0) R
epi(f)
_
,
where R
epi(f)
is the recession cone of the epigraph of f.
(b) If one nonempty level set L
a
is compact, then all nonempty level
sets are compact.
Proof: From the formula for the epigraph
epi(f) =
_
(x, w) [ f(x) w
_
,
it can be seen that for all a for which L
a
is nonempty, we have
_
(x, a) [ x L
a
_
= epi(f)
_
(x, a) [ x '
n
_
.
The recession cone of the set in the left-hand side above is
_
(y, 0) [ y
R
La
_
. By Prop. 1.2.13(e), the recession cone of the set in the right-hand
side is equal to the intersection of the recession cone of epi(f) and the
recession cone of
_
(x, a) [ x '
n
_
, which is equal to
_
(y, 0) [ y '
n
_
, the
horizontal subspace that passes through the origin. Thus we have
_
(y, 0) [ y R
La
_
=
_
(y, 0) [ (y, 0) R
epi(f)
_
,
from which it follows that
R
La
=
_
y [ (y, 0) R
epi(f)
_
.
This proves part (a). Part (b) follows by applying Prop. 1.2.13(c) to the
recession cone of epi(f). Q.E.D.
For a closed convex function f : '
n
(, ], the (common)
recession cone of the nonempty level sets of f is referred to as the recession
cone of f, and is denoted by R
f
(see Fig. 1.3.3). Thus
R
f
=
_
y [ f(x) f(x + y), x '
n
, 0
_
.
Each y R
f
is called a direction of recession of f. Since f is closed, its
level sets are closed, so by the Recession Cone Theorem (cf. Prop. 1.2.13),
R
f
is also closed.
Sec. 1.3 Convexity and Optimization 69
0
Level Sets of Convex
Function f
Recession Cone R
f
Figure 1.3.3. Recession cone R
f
of a closed
convex function f. It is the (common) re-
cession cone of the nonempty level sets of
f.
The most intuitive way to look at directions of recession is from a
descent viewpoint: if we start at any x '
n
and move indenitely along a
direction of recession y, we must stay within each level set that contains x,
or equivalently we must encounter exclusively points z with f(z) f(x). In
words, a direction of recession of f is a direction of uninterrupted nonascent
for f. Conversely, if we start at some x '
n
and while moving along a
direction y, we encounter a point z with f(z) > f(x), then y cannot be a
direction of recession. It is easily seen via a convexity argument that once
we cross the relative boundary of a level set of f we never cross it back again,
and with a little thought (see Fig. 1.3.4 and the exercises), it follows that a
direction that is not a direction of recession of f is a direction of eventual
uninterrupted ascent of f. In view of these observations, it is not surprising
that directions of recession play a prominent role in characterizing the
existence of solutions of convex optimization problems, as shown in the
following proposition.
Proposition 1.3.5: (Existence of Solutions of Convex Pro-
grams) Let X be a closed convex subset of '
n
, and let f : '
n
= inf
xX
f(x), and note that
X
= X
_
x [ f(x) f
_
.
Since by assumption f
f(x)
(a)
f(x + y)
f(x)
(b)
f(x + y)
f(x)
(c)
f(x + y)
f(x)
(d)
f(x + y)
f(x)
(e)
f(x + y)
f(x)
(f)
Figure 1.3.4. Ascent/descent behavior of a closed convex function starting at
some x
n
and moving along a direction y. If y is a direction of recession of f,
there are two possibilities: either f decreases monotonically to a nite value or
[gures (a) and (b), respectively], or f reaches a value that is less or equal to
f(x) and stays at that value [gures (c) and (d)]. If y is not a direction of recession
of f, then eventually f increases monotonically to [gures (e) and (f)]; i.e., for
some 0 and all
1
,
2
with
1
<
2
we have f(x +
1
y) < f(x +
2
y).
1.2.13(c)]. Therefore, there is no nonzero vector in the intersection of the
recession cones of X and
_
x [ f(x) f
_
. This is equivalent to X and f
having no common nonzero direction of recession.
Sec. 1.3 Convexity and Optimization 71
Conversely, let a be a scalar such that the set
X
a
= X
_
x [ f(x) a
_
is nonempty and has no nonzero direction of recession. Then, Xa is closed
[since X is closed and
_
x [ f(x) a
_
is closed by the closedness of f], and
by Prop. 1.2.13(c), X
a
is compact. Since minimization of f over X and
over X
a
yields the same set of minima, X
X
a
, we see that X
is bounded.
Since X
is
nonempty and unbounded [take for example, X = (, 0] and f(x) =
max0, x].
Another interesting question is what happens when X and f have a
common direction of recession, call it y, but f is bounded below over X:
f
= inf
xX
f(x) > .
Then for any x X, we have x + y X (since y is a direction of
recession of X), and f(x + y) is monotonically nondecreasing to a nite
value as (since y is a direction of recession of f and f
> ).
Generally, the minimum of f over X need not be attained. However, it
turns out that the minimum is attained in an important special case: when
f is quadratic and X is polyhedral (i.e., is dened by linear equality and
inequality constraints).
To understand the main idea, consider the problem
minimize f(x) = c
x +
1
2
x
Qx
subject to Ax = 0,
(1.10)
where Q is a positive semidenite symmetric n n matrix, c '
n
is a
given vector, and A is an m n matrix. Let N(Q) and N(A) denote the
nullspaces of Q and A, respectively. There are two possibilities:
(a) For some x N(A) N(Q), we have c
x + (1/2)x
R
Qx
R
, where x
R
is
the (nonzero) component of x along R(Q). Hence f(x) = c
x +
(1/2)
2
x
R
Qx
R
for all > 0, with x
R
Qx
R
> 0. It follows that f is
bounded below along all feasible directions x N(A).
We thus conclude that for f to be bounded from below along all directions
in N(A) it is necessary and sucient that c
x +
1
2
x
Qx,
where Q is a positive semidenite symmetric nn matrix and c '
n
is
a given vector. Let also A be an mn matrix and b '
m
be a vector.
Denote by N(A) and N(Q), the nullspaces of A and Q, respectively.
(a) Let X = x [ Ax = b and assume that X is nonempty. The
following are equivalent:
(i) f attains a minimum over X.
(ii) f
= inf
xX
f(x) > .
(iii) c
= inf
xX
f(x) > .
(iii) c
(x + y) +
1
2
(x + y)
Q(x + y) = f(x) +c
y.
If c
f(x + y) = or lim
f(x + y) =
, and we must have f
y = 0 for all
y N(A) N(Q).
We nally show that (iii) implies (i) by rst using a translation argu-
ment and then using a penalty function argument. Choose any x X, so
that X = x+N(A). Then minimizing f over X is equivalent to minimizing
f(x + y) over y N(A), or
minimize h(y)
subject to Ay = 0,
where
h(y) = f(x + y) = f(x) +f(x)
y +
1
2
y
Qy.
74 Convex Analysis and Optimization Chap. 1
For any integer k > 0, let
h
k
(y) = h(y) +
k
2
|Ay|
2
= f(x) +f(x)
y +
1
2
y
Qy +
k
2
|Ay|
2
. (1.11)
Note that for all k
h
k
(y) h
k+1
(y), y '
n
,
and
inf
y
n
h
k
(y) inf
y
n
h
k+1
(y) inf
Ay=0
h(y) h(0) = f(x). (1.12)
Denote
S =
_
N(A) N(Q)
_
= N(A) N(Q).
Then, by using the assumption c
w = (c +
Qx)
y
k
+|y
k
|
_
1
2
y
k
Q y
k
+
k
2
|A y
k
|
2
__
= 0,
where y
k
= y
k
/|y
k
|. For this to be true, all limit points y of the bounded
sequence y
k
must be such that y
y +
1
2
y
Qy + limsup
k
k
2
|Ay
k
|
2
inf
Ay=0
h(y).
Sec. 1.3 Convexity and Optimization 75
It follows that Ay = 0 and that y minimizes h(y) over Ay = 0. This implies
that the vector y = x + y minimizes f(x) subject to Ax = b.
(b) Clearly (i) implies (ii), and similar to the proof of part (a), (ii) implies
that c
j
x = bj, where the a
j
are the rows of
A.
For any sequence x
k
X with f(x
k
) f
, we can extract a
subsequence such that J(x
k
) is constant and equal to some J. Accordingly,
we select a sequence x
k
X such that f(x
k
) f
j
x = b
j
, j J.
(1.15)
Assume, to come to a contradiction, that this problem does not have a
solution. Then, by part (a), we have c
j
, j J. Consider the line
x
k
+ y [ > 0. Since y N(Q), we have
f(x
k
+ y) = f(x
k
) + c
y,
so that
f(x
k
+ y) < f(x
k
), > 0.
Furthermore, since y N(A), we have
a
j
(x
k
+y) = b
j
, j J, > 0.
We must also have a
j
y > 0 for at least one j / A [otherwise (iii) would
be violated], so the line x
k
+ y [ > 0 crosses the relative boundary
of X for some
k
> 0. The sequence x
k
, where x
k
= x
k
+
k
y, satises
x
k
X, f(x
k
) f
[since f(x
k
) f(x
k
)], and the active index set
J(x
k
) strictly contains J for all k. This contradicts the maximality of J,
and shows that problem (1.15) has an optimal solution, call it x.
Since x
k
is a feasible solution of problem (1.15), we have
f(x) f(x
k
), k,
so that
f(x) f
.
76 Convex Analysis and Optimization Chap. 1
We will now show that x minimizes f over X, by showing that x X,
thereby completing the proof. Assume, to arrive at a contradiction, that
x / X. Let x
k
be a point in the interval connecting x
k
and x that belongs
to X and is closest to x. We have that J( x
k
) strictly contains J for all k.
Since f(x) f(x
k
) and f is convex over the interval [x
k
, x], it follows that
f( x
k
) max
_
f(x
k
), f(x)
_
= f(x
k
).
Thus f( x
k
) f
j=1
j
g
j
(x)
involving the vector = (
1
, . . . ,
r
) '
r
, and the dual problem
maximize inf
xX
L(x, )
subject to 0.
(1.17)
The original (primal) problem (1.16) can also be written as
minimize sup
0
L(x, )
subject to x X
[if x violates any of the constraints g
j
(x) 0, we have sup
0
L(x, ) =
, and if it does not, we have sup
0
L(x, ) = f(x)]. Thus the pri-
mal and the dual problems (1.16) and (1.17) can be viewed in terms
of a minimax problem.
We will now derive conditions guaranteeing that
sup
zZ
inf
xX
(x, z) = inf
xX
sup
zZ
(x, z), (1.18)
and that the inf and the sup above are attained. This is a major issue in
duality theory because it connects the primal and the dual problems [cf.
Eqs. (1.16) and (1.17)] through their optimal values and optimal solutions.
In particular, when we discuss duality in Chapter 3, we will see that a
major question is whether there is no duality gap, i.e., whether the optimal
primal and dual values are equal. This is so if and only if
sup
0
inf
xX
L(x, ) = inf
xX
sup
0
L(x, ). (1.19)
We will prove in this section one major result, the Saddle Point Theo-
rem, which guarantees the equality (1.18), as well as the attainment of the
78 Convex Analysis and Optimization Chap. 1
inf and sup, assuming convexity/concavity assumptions on and (essen-
tially) compactness assumptions on X and Z. Unfortunately, the Saddle
Point Theorem is only partially adequate for the development of duality
theory, because compactness of Z and, to some extent, compactness of
X turn out to be restrictive assumptions [for example Z corresponds to
the set [ 0 in Eq. (1.19), which is not compact]. We will state
another major result, the Minimax Theorem, which is more relevant to du-
ality theory and gives conditions guaranteing the minimax equality (1.18),
although it does not guarantee the attainment of the inf and sup. We will
also give additional theorems of the minimax type in Chapter 3, when we
discuss duality and make a closer connection with the theory of Lagrange
multipliers.
A rst observation regarding the potential validity of the minimax
equality (1.18) is that we always have the inequality
sup
zZ
inf
xX
(x, z) inf
xX
sup
zZ
(x, z), (1.20)
[for every z Z, write inf
xX
(x, z) inf
xX
sup
zZ
(x, z) and take the
supremum over z Z of the left-hand side]. However, special conditions
are required to guarantee equality.
Suppose that x
i,j
aijxizj or x
Az. The
main result, a special case of the existence result we will prove shortly, is that
these two optimal values are equal, implying that there is an amount that can
be meaningfully viewed as the value of the game for its participants.
Sec. 1.3 Convexity and Optimization 79
Then we have
sup
zZ
inf
xX
(x, z) = inf
xX
(x, z
) (x
, z
) sup
zZ
(x
, z) = inf
xX
sup
zZ
(x, z).
(1.23)
If the minimax equality [cf. Eq. (1.18)] holds, then equality holds through-
out above, so that
sup
zZ
(x
, z) = (x
, z
) = inf
xX
(x, z
), (1.24)
or equivalently
(x
, z) (x
, z
) (x, z
), x X, z Z. (1.25)
A pair of vectors x
X and z
and z
, z
) is a saddle point,
then the denition (1.24) implies that
inf
xX
sup
zZ
(x, z) sup
zZ
inf
xX
(x, z).
This, together with the minimax inequality (1.20) guarantee that the min-
imax equality (1.18) holds and from Eq. (1.23), x
and z
are optimal
solutions of problems (1.21) and (1.22), respectively.
We summarize the above discussion in the following proposition.
Proposition 1.3.7: A pair (x
, z
and z
, where X
and Z
are the sets of optimal solutions of problems (1.21) and (1.22), respectively.
In other words x
and z
and Z
and Z
are nonempty.
One can visualize saddle points in terms of the sets of minimizing
points over X for xed z Z and maximizing points over Z for xed
x X:
X(z) =
_
x [ x minimizes (x, z) over X
_
,
80 Convex Analysis and Optimization Chap. 1
x
z
Curve of maxima
Curve of minima
(x,z)
Saddle point
(x
*
,z
*
)
^
(x(z),z)
(x,z(x))
^
Figure 1.3.5. Illustration of a saddle point of a function (x, z) over x X and
z Z; the function plotted here is (x, z) =
1
2
(x
2
+ 2xz z
2
). Let
x(z) = arg min
xX
(x, z), z(x) = arg max
zZ
(x, z).
In the case illustrated, x(z) and z(x) consist of unique minimizing and maximiz-
ing points, respectively, so we view x(z) and z(x) as (single-valued) functions;
otherwise x(z) and z(x) should be viewed as set-valued mappings. We consider
the corresponding curves
_
x(z), z
_
and
_
x, z(x)
_
. By denition, a pair (x
, z
)
is a saddle point if and only if
max
zZ
(x
, z) = (x
, z
) = min
xX
(x, z
),
or equivalently, if (x
, z
= x(z
) and z
= z(x
)]. At
such a pair, we also have
max
zZ
_
x(z), z
_
= max
zZ
min
xX
(x, z) = (x
, z
) = min
xX
max
zZ
(x, z) = min
xX
_
x, z(x)
_
,
so that
_
x(z), z
_
(x
, z
)
_
x, z(x)
_
, x X, z Z
(see Prop. 1.3.7). Visually, the curve of maxima
_
x, z(x)
_
must lie above the
curve of minima
_
x(z), z
_
(completely, i.e., for all x X and z Z), but the
two curves should meet at (x
, z
).
Sec. 1.3 Convexity and Optimization 81
Z(x) =
_
z [ z maximizes (x, z) over Z
_
.
The denition implies that the pair (x
, z
X(z
), z
Z(x
);
see Fig. 1.3.5.
We now consider conditions that guarantee the existence of a saddle
point. We have the following classical result.
Proposition 1.3.8: (Saddle Point Theorem) Let X and Z be
closed, convex subsets of '
n
and '
m
, respectively, and let : XZ
' be a function. Assume that for each z Z, the function (, z) :
X ' is convex and lower semicontinuous, and for each x X, the
function (x, ) : Z ' is concave and upper semicontinuous. Then
there exists a saddle point of under any one of the following four
conditions:
(1) X and Z are compact.
(2) Z is compact and there exists a vector z Z such that (, z) is
coercive.
(3) X is compact and there exists a vector x X such that (x, )
is coercive.
(4) There exist vectors x X and z Z such that (, z) and
(x, ) are coercive.
Proof: We rst prove the result under the assumption that X and Z
are compact [condition (1)], and the additional assumption that (x, ) is
strictly concave for each x X.
By Weierstrass Theorem (Prop. 1.3.1), the function
f(x) = max
zZ
(x, z), x X,
is real-valued and the maximum above is attained for each x X at a
point denoted z(x), which is unique by the strict concavity assumption.
Furthermore, by Prop. 1.2.3(c), f is lower semicontinuous, so again by
Weierstrass Theorem, f attains a minimum over x X at some point x
.
Let z
= z(x
), so that
(x
, z) (x
, z
) = f(x
), z Z. (1.26)
We will show that (x
, z
, z
) (x, z
) for all x X.
82 Convex Analysis and Optimization Chap. 1
Choose any x X and let
x
k
=
1
k
x +
_
1
1
k
_
x
, z
k
= z(x
k
), k = 1, 2, . . . . (1.27)
Let z be any limit point of z
k
corresponding to a subsequence / of
positive integers. Using the convexity of (, z
k
), we have
f(x
) f(x
k
) = (x
k
, z
k
)
1
k
(x, z
k
) +
_
1
1
k
_
(x
, z
k
). (1.28)
Taking the limit as k and k /, and using the upper semicontinuity
of (x
, ), we obtain
f(x
) limsup
k, kK
(x
, z
k
) (x
, z) max
zZ
(x
, z) = f(x
).
Hence equality holds throughout above, and it follows that (x
, z)
(x
, z) = f(x
, )
over Z, we see that z = z(x
) = z
, so that z
k
has z
= z(x
, z
k
) (x
, z
) = f(x
), k = 1, 2, . . .
Combining this last relation with Eq. (1.28), we have
(x
, z
)
1
k
(x, z
k
) +
_
1
1
k
_
(x
, z
), x X,
or
(x
, z
) (x, z
k
), x X.
Taking the limit as z
k
z
, z
) (x, z
), x X.
Combining this relation with Eq. (1.26), we see that (x
, z
) is a saddle
point of .
Next, we remove the assumption of strict concavity of (x
, ). We
introduce the functions
k
: X Z ' given by
k
(x, z) = (x, z)
1
k
|z|
2
, k = 1, 2, . . .
Sec. 1.3 Convexity and Optimization 83
Since
k
(x
k
, z
k
) of
k
, satisfying
(x
k
, z)
1
k
|z|
2
(x
k
, z
k
)
1
k
|z
k
|
2
(x, z
k
)
1
k
|z
k
|
2
, x X, z Z.
Let (x
, z
) be a limit point of (x
k
, z
k
). By taking limit as k and by
using the semicontinuity assumptions on , it follows that
(x
, z) liminf
k
(x
k
, z) limsup
k
(x, z
k
) (x, z
), x X, z Z.
(1.29)
By setting alternately x = x
and z = z
, z) (x
, z
) (x, z
), x X, z Z,
so (x
, z
) is a saddle point of .
We now prove the existence of a saddle point under conditions (2),
(3), or (4). For each integer k, we introduce the convex and compact sets
X
k
=
_
x X [ |x| k
_
, Z
k
=
_
z Z [ |z| k
_
.
Choose k large enough so that
k
_
_
_
max
zZ
|z| if condition (2) holds,
max
xX
|x| if condition (3) holds,
max|x|, |z| if condition (4) holds,
and X
k
is nonempty under condition (2) while Z
k
is nonempty under con-
dition (3). Note that for k k, the sets X
k
and Z
k
are nonempty. Fur-
thermore, Z = Z
k
under condition (2) and X = X
k
under condition (3).
Using the result already shown under condition (1), for each k k, there
exists a saddle point over X
k
Z
k
, i.e., a pair (x
k
, z
k
) such that
(x
k
, z) (x
k
, z
k
) (x, z
k
), x X
k
, z Z
k
. (1.30)
Assume that condition (2) holds. Then, since Z is compact, z
k
is
bounded. If x
k
were unbounded, the coercivity of (, z) would imply
that (x
k
, z) and from Eq. (1.30) it would follow that (x, z
k
)
for all x X. By the upper semicontinuity of (x, ), this contradicts the
boundedness of z
k
. Hence x
k
must be bounded, and (x
k
, z
k
) must
have a limit point (x
, z
, z) liminf
k
(x
k
, z) limsup
k
(x, z
k
) (x, z
), x X, z Z.
(1.31)
84 Convex Analysis and Optimization Chap. 1
By alternately setting x = x
and z = z
, z
) is a saddle point of .
A symmetric argument with the obvious modications, shows the
result under condition (3). Finally, under condition (4), note that Eq.
(1.30) yields for all k,
(x
k
, z) (x
k
, z
k
) (x, z
k
).
If x
k
were unbounded, the coercivity of (, z) would imply that (x
k
, z)
and hence (x, z
k
) , which together with the upper semicontinuity
of (x, ) violates the coercivity of (x, ). Hence x
k
must be bounded,
and a symmetric argument shows that z
k
must be bounded. Thus (x
k
, z
k
)
must have a limit point (x
, z
x
1
x
2
+ zx1,
which satisfy the convexity/concavity and lower/upper semicontinuity as-
sumptions of Prop. 1.3.8. For all z 0, we have
inf
x0
_
e
x
1
x
2
+ zx1
_
= 0,
since the expression in braces is nonnegative for x 0 and can approach zero
by taking x1 0 and x1x2 . Hence
sup
z0
inf
x0
(x, z) = 0.
We also have for all x 0
sup
z0
_
e
x
1
x
2
+ zx1
_
=
_
1 if x1 = 0,
if x1 > 0.
Sec. 1.3 Convexity and Optimization 85
Hence
inf
x0
sup
z0
(x, z) = 1,
so inf
x0
sup
z0
(x, z) > sup
z0
inf
x0
(x, z). The diculty here is that
the compactness/coercivity assumptions of Prop. 1.3.8 are violated.
On the other hand, with convexity/concavity and lower/upper semi-
continuity of , together with one additional assumption, given in the fol-
lowing proposition, it is possible to guarantee the minimax equality (1.18)
without guaranteeing the existence of a saddle point. The proof of the
proposition requires the machinery of hyperplanes, and will be given at the
end of the next section.
Proposition 1.3.9: (Minimax Theorem) Let X and Z be non-
empty convex subsets of '
n
and '
m
, respectively, and let : XZ
' be a function. Assume that for each z Z, the function (, z) :
X ' is convex and lower semicontinuous, and for each x X, the
function (x, ) : Z ' is concave and upper semicontinuous. Let
the function p : '
m
[, ] be dened by
p(u) = inf
xX
sup
zZ
(x, z) u
z. (1.32)
Assume that
< p(0) < ,
and that p(0) liminf
k
p(u
k
) for every sequence u
k
with u
k
0. Then, the minimax equality holds, i.e.,
sup
zZ
inf
xX
(x, z) = inf
xX
sup
zZ
(x, z).
Within the duality context discussed in the beginning of this subsec-
tion, the minimax equality implies that there is no duality gap. The above
theorem indicates that, aside from convexity and semicontinuity assump-
tions, the properties of the function p around u = 0 are critical. As an
illustration, note that in Example 1.3.1, p is given by
p(u) = inf
x0
sup
z0
_
e
x
1
x
2
+ z(x
1
u)
_
=
_
if u < 0,
1 if u = 0,
0 if u > 0.
86 Convex Analysis and Optimization Chap. 1
Thus it can be seen that even though p(0) is nite, p is not lower semicon-
tinuous at 0. As a result the assumptions of Prop. 1.3.9 are violated and
the minimax equality does not hold.
The signicance of the function p within the context of convex pro-
gramming and duality will be elaborated on and claried further in Chapter
4. Furthermore, the proof of the above theorem, given in Section 1.4.2, will
indicate some common circumstances where p has the desired properties
around u = 0.
E X E R C I S E S
1.3.1
Let f : !
n
! be a convex function, let X be a closed convex set, and assume
that f and X have no common direction of recession. Let X
be the optimal
solution set (nonempty and compact by Prop. 1.3.5) and let f
= infxX f(x).
Show that:
(a) For every > 0 there exists a > 0 such that every vector x X with
f(x) f
+ satises min
x
X
|x x
| .
(b) Every sequence |x
k
X satisfying lim
k
f(x
k
) f
is bounded and
all its limit points belong to X
.
1.3.2
Let C be a convex set and S be a subspace. Show that projection on S is a
linear transformation and use this to show that the projection of the set C on S
is a convex set, which is compact if C is compact. Is the projection of C always
closed if C is closed?
1.3.3 (Existence of Solution of Quadratic Programs)
This exercise deals with an extension of Prop. 1.3.6 to the case where the quadratic
cost may not be convex. Consider a problem of the form
minimize c
x +
1
2
x
Qx
subject to Ax b,
where Q is a symmetric (not necessarily positive semidenite) matrix, c and b
are given vectors, and A is a matrix. Show that the following are equivalent:
Sec. 1.4 Hyperplanes 87
(a) There exists at least one optimal solution.
(b) The cost function is bounded below over the constraint set.
(c) The problem has at least one feasible solution, and for any feasible x, there
is no y !
n
such that Ay 0 and either y
Qy < 0 or y
Qy = 0 and
(c + Qx)
y < 0.
1.3.4
Let C !
n
be a nonempty convex set, and let f : !
n
! be an ane function
such that infxC f(x) is attained at some x
)
for all x C.
1.3.5 (Existence of Optimal Solutions)
Let f : !
n
! be a convex function, and consider the problem of minimizing
f over a closed and convex set X. Suppose that f attains a minimum along all
half lines of the form |x+y [ 0 where x X and y is in the recession cone
of X. Show that we may have infxX f(x) = . Hint: Use the case n = 2,
X = !
2
, f(x) = minzC |z x|
2
x1, where C =
_
(x1, x2) [ x
2
1
x2
_
.
1.3.6 (Saddle Points in two Dimensions)
Consider a function of two real variables x and z taking values in compact
intervals X and Z, respectively. Assume that for each z Z, the function (, z)
is minimized over X at a unique point denoted x(z). Similarly, assume that for
each x X, the function (x, ) is maximized over Z at a unique point denoted
z(x). Assume further that the functions x(z) and z(x) are continuous over Z
and X, respectively. Show that has a unique saddle point (x
, z
). Use this to
investigate the existence of saddle points of (x, z) = x
2
+z
2
over X = [0, 1] and
Z = [0, 1].
1.4 HYPERPLANES
Some of the most important principles in convexity and optimization,
including duality, revolve around the use of hyperplanes, i.e., (n 1)-
dimensional ane sets. A hyperplane has the property that it divides the
space into two halfspaces. We will see shortly that a distinguishing feature
of a closed convex set is that it is the intersection of all the halfspaces that
contain it. Thus any closed convex set can be described in dual fashion: (a)
as the union of all points contained in the set, and (b) as the intersection
of all halfspaces containing the set. This fundamental principle carries over
to a closed convex function, once the function is specied in terms of its
88 Convex Analysis and Optimization Chap. 1
(closed and convex) epigraph. In this section, we develop the principle just
outlined, and we apply it to a construction that is central in duality theory.
We then use this construction to prove the minimax theorem given at the
end of the preceding section.
1.4.1 Support and Separation Theorems
A hyperplane is a set of the form x [ a
x = b, where a '
n
, a ,= 0,
and b ', as illustrated in Fig. 1.4.1. An equivalent denition is that a
hyperplane in '
n
is an ane set of dimension n1. The vector a is called
the normal vector of the hyperplane (it is orthogonal to the dierence xy
of any two vectors x and y that belong to the hyperplane). The two sets
x [ a
x b, x [ a
x b,
are called the halfspaces associated with the hyperplane (also referred to as
the positive and negative halfspaces, respectively). We have the following
result, which is also illustrated in Fig. 1.4.1. The proof is based on the
Projection Theorem and is illustrated in Fig. 1.4.2.
a
C
a C
2
C
1
(c) (b)
Positive Halfspace
{x | a'x b}
Hyperplane {x | a'x = b}
a
(a)
Negative Halfspace
{x | a'x b}
x
Figure 1.4.1. (a) A hyperplane {x | a
x a
x, x C. (1.33)
Proof: Consider the closure cl(C) of C, which is a convex set by Prop.
1.2.1(d). Let x
k
be a sequence of vectors not belonging to cl(C), which
converges to x; such a sequence exists because x does not belong to the
interior of C. If x
k
is the projection of x
k
on cl(C), we have by part (b) of
the Projection Theorem (Prop. 1.3.3)
( x
k
x
k
)
(x x
k
) 0, x cl(C).
Hence we obtain for all x cl(C) and k,
( x
k
x
k
)
x ( x
k
x
k
)
x
k
= ( x
k
x
k
)
( x
k
x
k
)+( x
k
x
k
)
x
k
( x
k
x
k
)
x
k
.
We can write this inequality as
a
k
x a
k
x
k
, x cl(C), k = 0, 1, . . . , (1.34)
where
a
k
=
x
k
x
k
| x
k
x
k
|
.
90 Convex Analysis and Optimization Chap. 1
We have |a
k
| = 1 for all k, and hence the sequence a
k
has a subsequence
that converges to a nonzero limit a. By considering Eq. (1.34) for all a
k
belonging to this subsequence and by taking the limit as k , we obtain
Eq. (1.33). Q.E.D.
Proposition 1.4.2: (Separating Hyperplane Theorem) If C1
and C
2
are two nonempty, disjoint, and convex subsets of '
n
, there
exists a hyperplane that separates them, i.e., a vector a ,= 0 such that
a
x
1
a
x
2
, x
1
C
1
, x
2
C
2
. (1.35)
Proof: Consider the convex set
C = x [ x = x2 x1, x1 C1, x2 C2.
Since C
1
and C
2
are disjoint, the origin does not belong to C, so by the
Supporting Hyperplane Theorem there exists a vector a ,= 0 such that
0 a
x, x C,
which is equivalent to Eq. (1.35). Q.E.D.
Proposition 1.4.3: (Strict Separation Theorem) If C
1
and C
2
are two nonempty, disjoint, and convex sets such that C
1
is closed and
C
2
is compact, there exists a hyperplane that strictly separates them,
i.e., a vector a ,= 0 and a scalar b such that
a
x1 < b < a
x.
Sec. 1.4 Hyperplanes 91
Then, a ,= 0, since x
1
C
1
, x
2
C
2
, and C
1
and C
2
are disjoint. The
hyperplane
x [ a
x = b
contains x, and it can be seen from the preceding discussion that x
1
is the
projection of x on C
1
, and x
2
is the projection of x on C
2
(see Fig. 1.4.3).
By Prop. 1.3.3(b), we have
(x x
1
)
(x
1
x
1
) 0, x
1
C
1
or equivalently, since x x
1
= a,
a
x
1
a
x
1
= a
x + a
(x
1
x) = b |a|
2
< b, x
1
C
1
.
Thus, the left-hand side of Eq. (1.36) is proved. The right-hand side is
proved similarly. Q.E.D.
(b)
a
C
1
(a)
C
2
x
2
x
1
x
C
2
= {(
1
,
2
) |
1
> 0,
2
>0,
1
2
1}
C
1
= {(
1
,
2
) |
1
0}
C = {x
1
- x
2
| x
1
C
1
, x
2
C
2
}
= {(
1
,
2
) |
1
< 0}
Figure 1.4.3. (a) Illustration of the construction of a strictly separating hyper-
plane of two disjoint closed convex sets C1 and C2 one of which is also bounded
(cf. Prop. 1.4.3). (b) An example showing that if none of the two sets is compact,
there may not exist a strictly separating hyperplane. This is due to the fact that
the distance d(x
2
, C
1
) can approach 0 arbitrarily closely as x
2
ranges over C
2
,
but can never reach 0.
The preceding proposition may be used to provide a fundamental
characterization of closed convex sets.
Proposition 1.4.4: The closure of the convex hull of a set C '
n
is the intersection of the halfspaces that contain C. In particular, a
closed and convex set is the intersection of the halfspaces that contain
it.
92 Convex Analysis and Optimization Chap. 1
Proof: Assume rst that C is closed and convex. Then C is contained in
the intersection of the halfspaces that contain C, so we focus on proving
the reverse inclusion. Let x / C. Applying the Strict Separation Theorem
(Prop. 1.4.3) to the sets C and x, we see that there exists a halfspace
containing C but not containing x. Hence, if x / C, then x cannot belong
to the intersection of the halfspaces containing C, proving that C contains
that intersection. Thus the result is proved for the case where C is closed
and convex.
Consider the case of a general set C. Each halfspace H that contains
C must also contain conv(C) (since H is convex), and also cl
_
conv(C)
_
(since H is closed). Hence the intersection of all halfspaces containing
C and the intersection of all halfspaces containing cl
_
conv(C)
_
coincide.
From what has been proved for the case of a closed convex set, the latter
intersection is equal to cl
_
conv(C)
_
. Q.E.D.
1.4.2 Nonvertical Hyperplanes and the Minimax Theorem
In the context of optimization theory, supporting hyperplanes are often
used in conjunction with epigraphs of functions f on '
n
. Since epi(f) =
(x, w) [ w f(x) is a subset of '
n+1
, the normal vector of a hyperplane
is a nonzero vector of the form (, ), where '
n
and '. We say
that the hyperplane is horizontal if = 0 and we say that it is vertical if
= 0.
Note that f is bounded if and only if epi(f) is contained in a halfspace
corresponding to a horizontal hyperplane. Note also that if a hyperplane
with normal (, ) is nonvertical (i.e., ,= 0), then it crosses the (n +1)st
axis (the axis associated with w) at a unique point. If (x, w) is any vector
on the hyperplane, the crossing point has the form (0, ), where
=
x +w, (1.37)
since from the hyperplane equation, we have (0, )
(, ) = (x, w)
(, ).
On the other hand, it can be seen that if the hyperplane is vertical, it either
contains the entire (n+1)st axis, or else it does not cross it at all; see Fig.
1.4.4.
Vertical lines in '
n+1
are sets of the form (x, w) [ w ', where x
is a xed vector in '
n
. It can be seen that vertical hyperplanes, as well
as the corresponding halfspaces, consist of the union of the vertical lines
that pass through their points. If f(x) > for all x, then epi(f) cannot
contain a vertical line, and it appears plausible that epi(f) is contained in
some halfspace corresponding to a nonvertical hyperplane. We prove this
fact in greater generality in the following proposition, which will also be
useful as a rst step in the subsequent development.
Sec. 1.4 Hyperplanes 93
(,)
w
x
Nonvertical
Hyperplane
(,0)
Vertical
Hyperplane
(x,w)
_ _
(/) x + w
_ _
Figure 1.4.4. Illustration of vertical and nonvertical hyperplanes in
n+1
.
Proposition 1.4.5: Let C be a nonempty convex subset of '
n+1
,
and let the points of '
n+1
be denoted by (u, w), where u '
n
and
w '.
(a) If C contains no vertical lines, then C is contained in a halfspace
corresponding to a nonvertical hyperplane; that is, there exists
(, ) such that '
n
, ', ,= 0, and ' such that
u + w , (u, w) C.
(b) If C contains no vertical lines and (u, w) does not belong to
the closure of C, there exists a nonvertical hyperplane strictly
separating (u, w) from C.
Proof: We rst note that if C contains no vertical lines, then ri(C) con-
tains no vertical lines, which implies that cl(C) contains no vertical lines,
since the recession cones of cl(C) and ri(C) coincide, as shown in the exer-
cises to Section 1.2. Thus if we prove the result assuming that C is closed,
the proof for the case where C is not closed will readily follow by replacing
C with cl(C). Hence we assume without loss of generality that C is closed.
(a) By Prop. 1.4.4, C is the intersection of all halfspaces that contain it.
If every hyperplane containing C in one of its halfspaces were vertical, we
would have
C =
iI
_
(u, w) [
i
u
i
_
for a collection of nonzero vectors
i
, i I, and scalars
i
, i I. Then
for every (u, w) C, the vertical line
_
(u, w) [ w '
_
also belongs to C.
94 Convex Analysis and Optimization Chap. 1
It follows that if no vertical line belongs to C, there exists a nonvertical
hyperplane containing C.
(b) Since (u, w) does not belong to C, there exists a hyperplane strictly
separating (u, w) from C (cf. Prop. 1.4.3). If this hyperplane is nonvertical,
we are done, so assume otherwise. Then we have a nonzero vector and a
scalar such that
u > >
u, (u, w) C.
Consider a nonvertical hyperplane containing C in one of its subspaces, so
that for some (, ) and , with ,= 0, we have
u + w , (u, w) C.
By multiplying this relation with any > 0 and adding it to the preceding
relation, we obtain
( + )
u + w > +, (u, w) C.
Since > ol
u + w.
From the above two relations, we obtain
( + )
u + w > ( + )
u + w, (u, w) C,
implying that there is a nonvertical hyperplane with normal ( + , )
that strictly separates (u, w) from C. Q.E.D.
Min Common/Max Crossing Duality
Hyperplanes allow insightful visualization of duality concepts. This is par-
ticularly so in a construction involving two simple optimization problems,
which we now discuss. These problems will prove particularly relevant in
the context of duality, and will also form the basis for the proof of the
Minimax Theorem, stated at the end of the preceding section.
Let S be a subset of '
n+1
and consider the following two problems.
(a) Min Common Point Problem: Among all points that are common to
both S and the (n + 1)st axis, we want to nd one whose (n + 1)st
component is minimum.
(b) Max Crossing Point Problem: Consider nonvertical hyperplanes that
contain S in their corresponding upper halfspace, i.e., the halfspace
that contains the vertical haline
_
(0, w) [ w 0
_
in its recession cone
Sec. 1.4 Hyperplanes 95
0
(a)
Min Common Point w
*
Max Crossing Point q
*
M
_
0
(b)
M
M
0
(c)
S
_
M
M
Max Crossing Point q
*
Max Crossing Point q
*
Min Common Point w
*
Min Common Point w
*
Figure 1.4.5. Illustration of the optimal values of the min common and max
crossing problems. In (a), the two optimal values are not equal. In (b), when S
is extended upwards along the (n + 1)st axis it yields the set
S =
_
(u, w) | there exists w with w w and (u, w) S
_
,
which is closed and convex. As a result, the two optimal values are equal. In (c),
the set S is convex but not closed, and there are points (0, w) on the vertical axis
with w < w
< w
.
(see Fig. 1.4.5). We want to nd the maximum crossing point of the
(n + 1)st axis with such a hyperplane.
Figure 1.4.5 suggests that the optimal value of the max crossing point
problem is no larger than the optimal value of the min common point
problem, and that under favorable circumstances the two optimal values
are equal.
We now formalize the analysis of the two above problems and pro-
vide conditions that guarantee equality of their optimal values. The min
96 Convex Analysis and Optimization Chap. 1
common point problem is
minimize w
subject to (0, w) S,
(1.38)
and its optimal value is denoted by w
, i.e.,
w
= inf
(0,w)S
w.
Given a nonvertical hyperplane in '
n+1
, multiplication of its normal
vector (, ), ,= 0, by a nonzero scalar maintains the normality property.
Hence, the set of nonvertical hyperplanes can be equivalently described as
the set of all hyperplanes with normals of the form (, 1). A hyperplane of
this type crosses the (n +1)st axis at
= w +
u,
where (u, w) is any vector that lies on the hyperplane [cf. Eq. (1.37)]. In
order for the upper halfspace corresponding to the hyperplane to contain
S, we must have
= w +
u w +
u, (u, w) S.
Combining the two above equations, we see that the max crossing point
problem can be expressed as
maximize
subject to w +
u, (u, w) S,
'
n
,
or equivalently,
maximize
subject to inf
(u,w)S
w +
u, '
n
.
Thus, the max crossing point problem is given by
maximize q()
subject to '
n
,
(1.39)
where
q() = inf
(u,w)S
w +
u. (1.40)
Sec. 1.4 Hyperplanes 97
We denote by q
= sup
n
q().
Note that for every (u, w) S and every '
n
, we have
q() = inf
(u,w)S
w +
u inf
(0,w)S
w = w
,
so by taking the supremum of the left-hand side over '
n
, we obtain
q
, (1.41)
i.e., the max crossing point is no higher than the min common point, as
suggested by Fig. 1.4.5. Note also that if < w
) is
seen to be a closure point of the set S, so if in addition S is closed and
convex, and admits a nonvertical supporting hyperplane at (0, w
), then we
have q
= w
and w
< w
= w
and w
= w
< .
(2) The set
S =
_
(u, w) [ there exists w with w w and (u, w) S
_
is convex.
(3) For every sequence
_
(u
k
, w
k
)
_
S with u
k
0,
w
liminf
k
w
k
.
Then we have q
= w
.
Proof: We rst note that (0, w
.
98 Convex Analysis and Optimization Chap. 1
We next show by contradiction that S does not contain any vertical
lines. If this were not so, since S is convex, the direction
_
(0, ) [ 0
_
would be a direction of recession of cl(S) and hence also a direction of
recession of ri(S) (see the exercises in Section 1.2). Since (0, w
) is a closure
point of S, there exists a sequence
_
(u
k
, w
k
)
_
ri(S) converging to (0, w
),
so the sequence
_
(u
k
, w
k
)
_
belongs to ri(S) for each xed > 0. In view
of the denition of S, for any > 0, there is a sequence
_
(u
k
, w
k
)
_
S
with w
k
w
k
for all k, so that liminf
k
w
k
w
. Since u
k
0,
this contradicts assumption (3).
As shown above, S does not contain any vertical lines. Furthermore,
using condition (3) and the fact that w
)
does not belong to the closure of S. It follows, by using Prop. 1.4.5(b), that
there exists a hyperplane separating (0, w
and w
of the max
crossing point problem. Since can be arbitrarily small, it follows that
q
z
hold, we will show that
sup
zZ
inf
xX
(x, z) = inf
xX
sup
zZ
(x, z).
We rst prove the following lemma.
Sec. 1.4 Hyperplanes 99
Lemma 1.4.1: Let X and Z be nonempty convex subsets of '
n
and
'
m
, respectively, and let : X Z ' be a function. Assume that
for each z Z, the function (, z) : X ' is convex, and consider
the function p dened by
p(u) = inf
xX
sup
zZ
(x, z) u
z, u '
m
.
Then the set
P =
_
u [ p(u) <
_
is convex and p satises
p
_
u
1
+ (1 )u
2
_
p(u
1
) +(1 )p(u
2
) (1.42)
for all [0, 1] and all u
1
, u
2
P.
Proof: Let u
1
P and u
2
P. Rewriting p(u) as
p(u) = inf
xX
l(x, u),
where l(x, u) = sup
zZ
_
(x, z) u
z
_
, we have that there exist sequences
x
1
k
X and x
2
k
X such that
l(x
1
k
, u
1
) p(u
1
), l(x
2
k
, u
2
) p(u
2
).
By convexity of X, we have x
1
k
+ (1 )x
2
k
X for all [0, 1]. Using
also the convexity of (, z) for each z Z, we have
p
_
u
1
+(1 )u
2
_
l
_
x
1
k
+ (1 )x
2
k
, u
1
+ (1 )u
2
_
= sup
zZ
_
_
x
1
k
+ (1 )x
2
k
, z
_
_
u
1
+(1 )u
2
_
z
_
sup
zZ
_
(x
1
k
, z) + (1 )(x
2
k
, z)
_
u
1
+ (1 )u
2
_
z
_
sup
zZ
_
(x
1
k
, z) u
1
z
_
+ (1 ) sup
zZ
_
(x
2
k
, z) u
2
z
_
= l(x
1
k
, u
1
) + (1 )l(x
2
k
, u
2
).
Since
l(x
1
k
, u
1
) + (1 )l(x
2
k
, u
2
) p(u
1
) + (1 )p(u
2
),
it follows that
p
_
u
1
+(1 )u
2
_
p(u
1
) + (1 )p(u
2
).
100 Convex Analysis and Optimization Chap. 1
Furthermore, since u
1
and u
2
belong to P, the left-hand side of the above
relation is less than , so u
1
+ (1 )u
2
P for all [0, 1], implying
that P is convex. Q.E.D.
Note that Lemma 1.4.1 allows the possibility that the function p not
only takes nite values and the value , but also the value . However,
under the additional assumption of Prop. 1.3.9 that p(0) liminf
k
p(u
k
)
for all sequences u
k
with u
k
0, we must have p(u) > for all
u '
n
. Furthermore, p is is an extended-real valued convex function,
which is lower semicontinuous at u = 0. The reason is that if p(u) =
for some u '
m
, then from Eq. (1.42), we must have p(u) = for
all (0, 1], contradicting the assumption that p(0) liminf
k
p(u
k
)
for all sequences u
k
with u
k
0. Thus, under the assumptions of
Prop. 1.3.9, p maps '
n
into (, ], and in view of Lemma 1.4.1, p is an
extended-real valued convex function. Furthermore, p is lower semicontinu-
ous at u = 0 in view again of the assumption that p(0) liminf
k
p(u
k
)
for all sequences u
k
with u
k
0.
We now prove Prop. 1.3.9 by reducing it to an application of the Min
Common/Max Crossing Theorem (cf. Prop. 1.4.6) of the preceding section.
In particular, we show that with an appropriate selection of the set S, the
assumptions of Prop. 1.3.9 are essentially equivalent to the corresponding
assumptions of the Min Common/Max Crossing Theorem. Furthermore,
the optimal values of the corresponding min common and max crossing
point problems are
q
= sup
zZ
inf
xX
(x, z), w
= inf
xX
sup
zZ
(x, z).
We choose the set S (as well the set S) in that theorem to be the
epigraph of p, i.e.,
S = S =
_
(u, w) [ u P, p(u) w
_
,
which is convex in view of the convexity of the function p shown above.
Thus assumption (2) of the Min Common/Max Crossing Theorem is sat-
ised. We clearly have
w
= p(0),
so the assumption < p(0) < of Prop. 1.3.9 is equivalent to as-
sumption (1) of the Min Common/Max Crossing Theorem. Furthermore,
the assumption that p(0) liminf
k
p(u
k
) for all sequences u
k
with
u
k
0 is equivalent to the last remaining assumption (3) of the Min Com-
mon/Max Crossing Theorem. Thus the conclusion q
= w
of the theorem
holds.
From the denition of p, it follows that
w
= p(0) = inf
xX
sup
zZ
(x, z).
Sec. 1.4 Hyperplanes 101
Thus, to complete the proof of Prop. 1.3.9, all that remains is to show that
for the max crossing point problem corresponding to the epigraph S of p,
we have
q
= sup
m
inf
(u,w)S
w +
u = sup
zZ
inf
xX
(x, z).
To prove this, let us write for every '
m
,
q() = inf
(u,w)S
w +
u
= inf
_
(u,w)|uP, p(u)w
_
w +
u
= inf
uP
_
p(u) +
u
_
From the denition of p, we have for every '
m
,
q()= inf
uP
_
p(u) + u
_
,
= inf
uP
inf
xX
sup
zZ
_
(x, z) + u
( z)
_
= inf
xX
inf
uP
sup
zZ
_
(x, z) + u
( z)
_
.
We now show that q() is given by
q() =
_
inf
xX
(x, ) if Z,
if / Z.
(1.43)
Indeed, if Z, then
sup
zZ
_
(x, z) + u
( z)
_
(x, ), x X, u P,
so
inf
uP
sup
zZ
_
(x, z) + u
( z)
_
(x, ), x X. (1.44)
To show that we have equality in Eq. (1.44), we dene the function r
x
(z) :
'
m
(, ] as
r
x
(z) =
_
(x, z) if z Z,
+ otherwise,
which is closed and convex for all x X by the given concavity/upper
semicontinuity assumptions, so the epigraph of rx(z), epi
_
rx
_
, is a closed
and convex set. Since Z, the point
_
, r
x
()
_
belongs to epi(r
x
). For
some > 0, we consider the point
_
, r
x
()
_
, which does not belong to
epi(r
x
). Since r
x
(z) > for all z, the set epi(r
x
) does not contain any
vertical lines. Therefore, by Prop. 1.4.5(b), for all x X, there exists a
102 Convex Analysis and Optimization Chap. 1
nonvertical hyperplane with normal (u, ) with ,= 0, and a scalar c such
that
u
+
_
r
x
()
_
< c < u
z + w, (z, w) epi(r
x
).
Since w can be made arbitrarily large, we have > 0, and we can take
= 1. In particular for w = rx(z), with z Z, we have
u
+
_
r
x
()
_
< c < u
z + r
x
(z), z Z,
or equivalently,
(x, z) + u
( z) < (x, ) + , z Z.
Letting 0, we obtain
inf
u
m
sup
zZ
_
(x, z)+u
(z)
_
sup
zZ
_
(x, z)+u
(z)
_
(x, ), x X
which together with Eq. (1.44) implies that
inf
u
m
sup
zZ
_
(x, z) + u
( z)
_
= (x, ), x X.
Therefore, if Z, q() has the form of Eq. (1.43).
If / Z, we consider a sequence w
k
that goes to . Since / Z,
the sequence of points (, w
k
) does not belong to the epi(r
x
). Therefore,
similar to the argument above, there exists a sequence of nonvertical hy-
perplanes with normals (u
k
, 1) such that
(x, z) + u
k
( z) < w
k
, z Z, k,
so that
inf
u
m
sup
zZ
_
(x, z) + u
( z)
_
sup
zZ
_
(x, z) + u
k
( z)
_
w
k
, k.
Taking the limit in the above equation along the relevant subsequence, we
obtain
inf
u
m
sup
zZ
_
(x, z) + u
( z)
_
= , x X,
which proves that q() has the form given in Eq. (1.43) and it follows that
q
= sup
zZ
inf
xX
(x, z).
E X E R C I S E S
Sec. 1.5 Conical Approximations 103
1.4.1 (Strong Separation)
Let C1 and C2 be two nonempty, convex subsets of !
n
, and let B denote the unit
ball in !
n
. A hyperplane H is said to separate strongly C1 and C2 if there exists
an > 0 such that C1 +B is contained in one of the open halfspaces associated
with H and C2 + B is contained in the other. Show that:
(a) The following three conditions are equivalent:
(i) There exists a hyperplane strongly separating C1 and C2.
(ii) There exists a vector b !
n
such that infxC
1
b
x > sup
xC
2
b
x.
(iii) infx
1
C
1
, x
2
C
2
|x1 x2| > 0.
(b) If C1 and C2 are closed and have no common directions of recession, there
exists a hyperplane strongly separating C1 and C2.
(c) If the two sets C1 and C2 have disjoint closures, and at least one of the two
is bounded, there exists a hyperplane strongly separating them.
1.4.2 (Proper Separation)
Let C1 and C2 be two nonempty, convex subsets of !
n
. A hyperplane H is said
to separate properly C1 and C2 if C1 and C2 are not both contained in H. Show
that the following three conditions are equivalent:
(i) There exists a hyperplane properly separating C1 and C2.
(ii) There exists a vector b !
n
such that
inf
xC
1
b
x sup
xC
2
b
x, sup
xC
1
b
x > inf
xC
2
b
x,
(iii) The relative interiors ri(C1) and ri(C2) have no point in common.
1.5 CONICAL APPROXIMATIONS AND CONSTRAINED
OPTIMIZATION
Optimality conditions for constrained optimization problems revolve around
the behavior of the cost function at points of the contraint set around a
candidate for optimality x
= y [ y
x 0, x C,
104 Convex Analysis and Optimization Chap. 1
is called the polar cone of C (see Fig. 1.5.1). Clearly, the polar cone C
,
being the intersection of a collection of closed halfspaces, is closed and con-
vex (regardless of whether C is closed and/or convex). If C is a subspace,
it can be seen that the polar cone C
, which
holds in the case where C is a subspace.
0
C
C
a
1
a
2
(a)
C
a
1
0
C
a
2
(b)
Figure 1.5.1. Illustration of the polar cone C
of a set C
2
. In (a) C
consists of just two points, a
1
and a
2
, while in (b) C is the convex cone {x | x =
1
a
1
+
2
a
2
,
1
0,
2
0}. The polar cone C
=
_
cl(C)
_
=
_
conv(C)
_
=
_
cone(C)
_
.
(b) For any cone C, we have
(C
= cl
_
conv(C)
_
.
In particular, if C is closed and convex, then (C
= C.
Proof: (a) Clearly, we have
_
cl(C)
_
. Conversely, if y C
, then
y
x
k
0 for all k and all sequences x
k
C, so that y
and C
_
cl(C)
_
.
Similarly, we have
_
conv(C)
_
. Conversely, if y C
, then
Sec. 1.5 Conical Approximations 105
y
and C
_
conv(C)
_
. A nearly
identical argument also shows that C
=
_
cone(C)
_
.
(b) Figure 1.5.2 shows that if C is closed and convex, then (C
= C.
From this it follows that
__
cl
_
conv(C)
__
= cl
_
conv(C)
_
,
and by using part (a) in the left-hand side above, we obtain (C
=
cl
_
conv(C)
_
. Q.E.D.
x
C
y
z
0
C
z
^
2z
^
z - z
^
Figure 1.5.2. Proof of the Polar Cone Theorem for the case where C is a closed
and convex cone. If x C, then for all y C
, we have x
. Hence, C (C
,
and let z be the unique projection of z on C, as shown in the gure. Since C is
closed, the projection exists by the Projection Theorem (Prop. 1.3.3), which also
implies that
(z z)
(x z) 0, x C.
By taking in the preceding relation x = 0 and x = 2z (which belong to C since C
is a closed cone), it is seen that
(z z)
z = 0.
Combining the last two relations, we obtain (zz)
, and since z (C
, we obtain (z z)
z = 0 yields z z
2
0. Therefore, z = z and z C, implying that
(C
C.
The analysis of a constrained optimization problem often centers on
how the cost function behaves along directions leading away from a local
106 Convex Analysis and Optimization Chap. 1
minimum to some neighboring feasible points. The sets of the relevant
directions constitute cones that can be viewed as approximations to the
constraint set, locally near a point of interest. Let us introduce two such
cones, which are important in connection with optimality conditions.
Denition 1.5.1: Given a subset X of '
n
and a vector x X, a
feasible direction of X at x is a vector y '
n
such that there exists
an > 0 with x + y X for all [0, ]. The set of all feasible
directions of X at x is a cone denoted by F
X
(x).
It can be seen that if X is convex, the feasible directions at x are the
vectors of the form (x x) with > 0 and x X. However, when X
is nonconvex, the cone of feasible directions need not provide interesting
information about the local structure of the set X near the point x. For
example, often there is no nonzero feasible direction at x when X is noncon-
vex [think of the set X =
_
x [ h(x) = 0
_
, where h : '
n
' is a nonlinear
function]. The next denition introduces a cone that provides information
on the local structure of X even when there is no nonzero feasible direction.
Denition 1.5.2: Given a subset X of '
n
and a vector x X, a
vector y is said to be a tangent of X at x if either y = 0 or there exists
a sequence x
k
X such that x
k
,= x for all k and
x
k
x,
x
k
x
|x
k
x|
y
|y|
.
The set of all tangents of X at x is a cone called the tangent cone of
X at x, and is denoted by T
X
(x).
x
k
x
k-1
x + y
x + y
k
x + y
k-1
Ball of
radius ||y||
x + y
k+1
x
X
Tangent at x
Figure 1.5.3. Illustration of a tangent
y at a vector x X. There is a se-
quence {x
k
} X that converges to x
and is such that the normalized direction
sequence (x
k
x)/x
k
x converges to
y/y, the normalized direction of y, or
equivalently, the sequence
y
k
=
y(x
k
x)
x
k
x
illustrated in the gure converges to y.
Sec. 1.5 Conical Approximations 107
Thus a nonzero vector y is a tangent at x if it is possible to approach x
with a feasible sequence x
k
such that the normalized direction sequence
(x
k
x)/|x
k
x| converges to y/|y|, the normalized direction of y (see
Fig. 1.5.3). The following proposition provides an equivalent denition of
a tangent, which is occasionally more convenient.
x
1
x
2
(a)
x
1
x
2
(b)
(1,2)
T
X
(x) = cl(F
X
(x))
x = (0,1)
x = (0,1)
T
X
(x)
Figure 1.5.4. Examples of the cones F
X
(x) and T
X
(x) of a set X at the vector
x = (0, 1). In (a), we have
X =
_
(x1, x2) | (x1 + 1)
2
x2 0, (x1 1)
2
x2 0
_
.
Here X is convex and the tangent cone T
X
(x) is equal to the closure of the cone of
feasible directions F
X
(x) (which is an open set in this example). Note, however,
that the vectors (1, 2) and (1, 2) (as well the origin) belong to T
X
(x) and also
to the closure of F
X
(x), but are not feasible directions. In (b), we have
X =
_
(x1, x2) |
_
(x1 + 1)
2
x2
__
(x1 1)
2
x2
_
= 0
_
.
Here the set X is nonconvex, and T
X
(x) is closed but not convex. Furthermore,
F
X
(x) consists of just the zero vector.
Proposition 1.5.2: A vector y is a tangent of a set X at a vector
x X if and only if there exists a sequence x
k
X with x
k
x,
and a positive sequence
k
such that
k
0 and (x
k
x)/
k
y.
Proof: If x
k
is the sequence in the denition of a tangent, take
k
=
|x
k
x|/|y|. Conversely, if
k
0, (x
k
x)/
k
y, and y ,= 0, clearly
x
k
x and
x
k
x
|x
k
x|
=
(x
k
x)/
k
|(x
k
x)/
k
|
y
|y|
,
108 Convex Analysis and Optimization Chap. 1
so x
k
satises the denition of a tangent. Q.E.D.
Figure 1.5.4 illustrates the cones T
X
(x) and F
X
(x) with examples.
The following proposition gives some of the properties of the cones F
X
(x)
and T
X
(x).
Proposition 1.5.3: Let X be a nonempty subset of '
n
and let x
be a vector in X. The following hold regarding the cone of feasible
directions F
X
(x) and the tangent cone T
X
(x).
(a) T
X
(x) is a closed cone.
(b) cl
_
F
X
(x)
_
T
X
(x).
(c) If X is convex, then F
X
(x) and T
X
(x) are convex, and we have
cl
_
F
X
(x)
_
= T
X
(x).
Proof: (a) Let y
k
be a sequence in T
X
(x) that converges to y. We will
show that y T
X
(x). If y = 0, then y T
X
(x), so assume that y ,= 0. By
the denition of a tangent, for every y
k
there is a sequence x
i
k
X with
x
i
k
,= x such that
lim
i
x
i
k
= x, lim
i
x
i
k
x
[[x
i
k
x[[
=
y
k
[[y
k
[[
.
For k = 1, 2, . . ., choose an index i
k
such that i
1
< i
2
< . . . < i
k
and
lim
k
[[x
i
k
k
x[[ = 0, lim
k
_
_
_
_
_
x
i
k
k
x
[[x
i
k
k
x[[
y
k
[[y
k
[[
_
_
_
_
_
= 0.
We also have for all k
_
_
_
_
_
x
i
k
k
x
[[x
i
k
k
x[[
y
[[y[[
_
_
_
_
_
_
_
_
_
_
x
i
k
k
x
[[x
i
k
k
x[[
y
k
[[y
k
[[
_
_
_
_
_
+
_
_
_
_
y
k
[[y
k
[[
y
[[y[[
_
_
_
_
,
so the fact y
k
y, and the preceding two relations imply that
lim
k
|x
i
k
k
x| = 0, lim
k
_
_
_
_
_
x
i
k
k
x
[[x
i
k
k
x[[
y
[[y[[
_
_
_
_
_
= 0.
Hence y T
X
(x) and T
X
(x) is closed.
(b) Clearly every feasible direction is also a tangent, so F
X
(x) T
X
(x).
Since by part (a), T
X
(x) is closed, the result follows.
Sec. 1.5 Conical Approximations 109
(c) If X is convex, the feasible directions at x are the vectors of the form
(x x) with > 0 and x X. From this it can be seen that F
X
(x) is
convex. Convexity of T
X
(x) will follow from the convexity of F
X
(x) once
we show that cl
_
F
X
(x)
_
= T
X
(x).
In view of part (b), it will suce to show that T
X
(x) cl
_
F
X
(x)
_
.
Let y T
X
(x) and, using Prop. 1.5.2, let x
k
be a sequence in X and
k
be a positive sequence such that
k
0 and (x
k
x)/
k
y. Since
X is a convex set, the direction (x
k
x)/
k
is feasible at x for all k. Hence
y cl
_
F
X
(x)
_
, and it follows that T
X
(x) cl
_
F
X
(x)
_
. Q.E.D.
The tangent cone nds an important application in the following basic
necessary condition for local optimality:
Proposition 1.5.4: Let f : '
n
' be a smooth function, and let
x
y 0, y T
X
(x
).
If X is convex, this condition can be equivalently written as
f(x
(x x
) 0, x X,
and in the case where X = '
n
, reduces to f(x
) = 0.
Proof: Let y be a nonzero tangent of X at x
k
and a sequence x
k
X such that x
k
,= x
for all k,
k
0, x
k
x
,
and
x
k
x
|x
k
x
|
=
y
|y|
+
k
.
By the Mean Value Theorem, we have for all k
f(x
k
) = f(x
) +f( x
k
)
(x
k
x
),
where x
k
is a vector that lies on the line segment joining x
k
and x
. Com-
bining the last two equations, we obtain
f(x
k
) = f(x
) +
|x
k
x
|
|y|
f( x
k
)
y
k
, (1.45)
where
y
k
= y +|y|
k
.
110 Convex Analysis and Optimization Chap. 1
If f(x
y < 0, since x
k
x
and y
k
y, it follows that for all su-
ciently large k, f( x
k
)
y
k
< 0 and [from Eq. (1.45)] f(x
k
) < f(x
). This
contradicts the local optimality of x
.
When X is convex, we have cl
_
F
X
(x)
_
= T
X
(x) (cf. Prop. 1.5.3).
Thus the condition shown can be written as
f(x
y 0, y cl
_
F
X
(x)
_
,
which in turn is equivalent to
f(x
(x x
) 0, x X.
If X = '
n
, by setting x = x
+ e
i
and x = x
e
i
, where e
i
is the ith
unit vector (all components are 0 except for the ith, which is 1), we obtain
f(x
)/x
i
= 0 for all i = 1, . . . , n, so f(x
) = 0. Q.E.D.
A direction y for which f(x
+ y) = f(x
) + f(x
)
for suciently small but positive . Thus Prop. 1.5.4 says that if x
is
a local minimum, there is no descent direction within the tangent cone
T
X
(x
).
Note that the necessary condition of Prop. 1.5.4 can equivalently be
written as
f(x
) T
X
(x
(see Fig. 1.5.5). There is an interesting converse of this result, namely that
given any vector z T
X
(x
) = z and x
) T
X
(x
for x
, so the
graph of N
X
() is equal to the closure of the graph of T
X
()
:
_
(x, z) [ x X, z N
X
(x)
_
= cl
__
(x, z) [ x X, z T
X
(x)
__
if X is closed.
It can be seen that N
X
(x) is a closed cone containing T
X
(x)
, but
it need not be convex like T
X
(x)
= N
X
(x), we say that X is regular at x. An important
consequence of convexity of X is that it implies regularity, as shown in the
following proposition.
Proposition 1.5.5: Let X be a convex set. Then for all x X, we
have
z T
X
(x)
if and only if z
(x x) 0, x X. (1.46)
Furthermore, X is regular at all x X.
112 Convex Analysis and Optimization Chap. 1
a
1
a
2
N
X
(x)
x = 0
X = T
X
(x)
N
X
(x)
x = 0
X
T
X
(x) = R
n
(a)
(b)
Figure 1.5.6. Examples of normal cones. In the case of gure (a), X is the union
of two lines passing through the origin:
X = {x | (a
1
x)(a
2
x) = 0}.
For x = 0 we have T
X
(x) = X, T
X
(x)
= {0}, while N
X
(x) is the nonconvex set
consisting of the two lines of vectors that are collinear to either a1 or a2. Thus
X is not regular at x = 0. At all other vectors x X, we have regularity with
T
X
(x)
and N
X
(x) equal to either the line of vectors that are collinear to a1 or
the line of vectors that are collinear to a2.
In the case of gure (b), X is regular at all points except at x = 0, where
we have T
X
(x) =
n
, T
X
(x)
= {0}, while N
X
(x) is equal to the horizontal axis.
Sec. 1.5 Conical Approximations 113
Proof: Since (x x) F
X
(x) T
X
(x) for all x X, it follows that
if z T
X
(x)
, then z
y > 0. Since cl
_
F
X
(x)
_
= T
X
(x) [cf. Prop. 1.5.3(c)], there exists a
sequence y
k
F
X
(x) such that y
k
y, so that y
k
=
k
(x
k
x) for
some
k
> 0 and some x
k
X. Since z
y > 0, we have
k
z
(x
k
x) > 0
for large enough k, which is a contradiction.
If x X and z N
X
(x), there must exist sequences x
k
X and
z
k
such that x
k
x, z
k
z, and z
k
T
X
(x
k
)
k
(x x
k
) 0
for all x X. Taking the limit as k , we obtain z
(x x) 0 for
all x X, which by Eq. (1.46), implies that z T
X
(x)
. Thus, we have
N
X
(x) T
X
(x)
1
, x
2
, x
3
) denes an optimal triangle, there exists
a vector z
i
) TC
i
(x
i
)
, i = 1, 2, 3.
1.5.3 (Cone Decomposition Theorem)
Let C !
n
be a closed convex cone and let x be a given vector in !
n
. Show
that:
(a) x is the projection of x on C if and only if
x C, x x C
, (x x)
x = 0.
(b) The following two properties are equivalent:
(i) x1 and x2 are the projections of x on C and C
, respectively.
(ii) x = x1 + x2 with x1 C, x2 C
, and x
1
x2 = 0.
1.5.4
Let C !
n
be a closed convex cone and let a be a given vector in !
n
. Show
that for any positive scalars and , we have
max
x, xC
a
x if and only if a C
+|x [ |x| /.
1.5.5 (Quasiregularity)
Consider the set
X =
_
x [ hi(x) = 0, i = 1, . . . , m, gj(x) 0, j = 1, . . . , r
_
,
where the functions hi : !
n
! and gj : !
n
! are smooth. For any x X
let A(x) =
_
j [ gj (x) 0
_
and consider the subspace
V (x) =
_
y [ hi(x)
y = 0, i = 1, . . . , m, gj (x)
y 0, j A(x)
_
.
Show that:
(a) TX(x) V (x).
(b) TX(x) = V (x) if either the gradients hi(x), i = 1, . . . , m, gi(x), j
A(x), are linearly independent, or all the functions hi and gj are linear.
Note: The property TX(x) = V (x) is called quasiregularity, and will be
signicant in the Lagrange multiplier theory of Chapter 2.
Sec. 1.6 Polyhedral Convexity 115
1.5.6 (Regularity of Constraint Systems)
Consider the set X of Exercise 1.5.5. Show that if at a given x X, the quasireg-
ularity condition TX(x) = V (x) of that exercise holds, then X is regular at x.
1.5.7 (Necessary and Sucient Optimality Condition)
Let f : !
n
! be a smooth convex function and let X be a nonempty subset of
!
n
. Show that the condition
f(x
(x x
) 0, x X,
is necessary and sucient for a vector x
j
x 0, j = 1, . . . , r,
where a
1
, . . . , a
r
are some vectors in '
n
. We say that a cone C '
n
is
nitely generated, if it is generated by a nite set of vectors, i.e., if it has
the form
C = cone
_
a
1
, . . . , a
r
_
=
_
_
_
x
x =
r
j=1
j
a
j
,
j
0, j = 1, . . . , r
_
_
_
,
116 Convex Analysis and Optimization Chap. 1
(a)
a
1
0
a
3
a
2
a
1
0
a
3
a
2
(b)
Figure 1.6.1. (a) Polyhedral cone dened by the inequality constraints a
j
x 0,
j = 1, 2, 3. (b) Cone generated by the vectors a
1
, a
2
, a
3
.
where a
1
, . . . , a
r
are some vectors in '
n
. These denitions are illustrated
in Fig. 1.6.1.
Note that sets dened by linear equality constraints (as well as linear
inequality constraints) are also polyhedral cones, since a linear equality
constraint may be converted into two inequality constraints. In particular,
if e
1
, . . . , e
m
and a
1
, . . . , a
r
are given vectors, the set
x [ e
i
x = 0, i = 1, . . . , m, a
j
x 0, j = 1, . . . , r
is a polyhedral cone since it can be written as
x [ e
i
x 0, e
i
x 0, i = 1, . . . , m, a
j
x 0, j = 1, . . . , r.
Using a related conversion, it can be seen that the cone
_
_
_
x
x =
m
i=1
i
e
i
+
r
j=1
j
a
j
,
i
', i = 1, . . . , m,
j
0, j = 1, . . . , r
_
_
_
is nitely generated, since it can be written as
_
x
x =
m
i=1
+
i
e
i
+
m
i=1
i
(e
i
) +
r
j=1
j
a
j
,
+
i
0,
i
0, i = 1, . . . , m, j 0, j = 1, . . . , r
_
.
A polyhedral cone is closed, since it is the intersection of closed half-
spaces. A nitely generated cone is also closed and is in fact polyhedral,
but this is a fairly deep fact, which is shown in the following proposition.
Sec. 1.6 Polyhedral Convexity 117
Proposition 1.6.1:
(a) Let a
1
, . . . , a
r
be vectors of '
n
. Then, the nitely generated cone
C =
_
_
_
x
x =
r
j=1
jaj, j 0, j = 1, . . . , r
_
_
_
(1.47)
is closed and its polar cone is the polyhedral cone given by
C
=
_
x [ a
j
x 0, j = 1, . . . , r
_
. (1.48)
(b) (Farkas Lemma) Let x, e
1
, . . . , e
m
, and a
1
, . . . , a
r
be vectors of
'
n
. We have x
ei = 0, i = 1, . . . , m, y
aj 0, j = 1, . . . , r,
if and only if x can be expressed as
x =
m
i=1
iei +
r
j=1
jaj,
where
i
and
j
are some scalars with
j
0 for all j.
(c) (Minkowski-Weyl Theorem) A cone is polyhedral if and only if it
is nitely generated.
Proof: (a) We rst show that the polar cone of C has the desired form
(1.48). If y satises y
a
j
0 for all j, then y
. Conversely, if
y C
, i.e., if y
x =
r
j=1
j
a
j
,
j
0, j = 1, . . . , r
_
_
_
are closed. Then, we will show that a cone of the form
C
r+1
=
_
_
_
x
x =
r+1
j=1
j
a
j
,
j
0, j = 1, . . . , r + 1
_
_
_
is also closed. Without loss of generality, assume that |aj| = 1 for all j.
If the vectors a
1
, . . . , a
r+1
belong to C
r+1
, then C
r+1
is the subspace
spanned by a
1
, . . . , a
r+1
and is therefore closed. If the negative of one of
the vectors, say a
r+1
, does not belong to C
r+1
, consider the cone
C
r
=
_
_
_
x
x =
r
j=1
j
a
j
,
j
0, j = 1, . . . , r
_
_
_
,
which is closed by the induction hypothesis. Let
= min
xCr, x=1
a
r+1
x.
Since the set x C
r
[ |x| = 1 is nonempty and compact, the minimum
above is attained at some x
C
r
with |x
| = 1 by Weierstrass Theorem.
Using the Schwartz Inequality, we have
= a
r+1
x
|ar+1| |x
| = 1,
with equality if and only if x
= a
r+1
. Because x
C
r
and a
r+1
/ C
r
,
equality cannot hold above, so that > 1. Let x
k
C
r+1
be a
convergent sequence. We will prove that its limit belongs to C
r+1
, thereby
showing that C
r+1
is closed. Indeed, for all k, we have x
k
=
k
a
r+1
+ y
k
,
where
k
0 and y
k
C
r
. Using the fact |a
r+1
| = 1 and the denition of
, we obtain
|x
k
|
2
=
2
k
+|y
k
|
2
+ 2
k
a
r+1
y
k
2
k
+|y
k
|
2
+ 2
k
|y
k
|
=
_
k
|y
k
|)
2
+ 2(1 + )
k
|y
k
|.
Since x
k
converges,
k
0, and 1 + > 0, it follows that the sequences
k
and y
k
are bounded and hence, they have limit points denoted by
and y, respectively. The limit of x
k
is
lim
k
(
k
a
r+1
+ y
k
) = a
r+1
+ y,
Sec. 1.6 Polyhedral Convexity 119
which belongs to C
r+1
, since 0 and y C
r
(by the closedness hypothesis
on Cr). We conclude that Cr+1 is closed, completing the proof.
We now give an alternative proof that C is closed, by showing that it
is polyhedral. The proof is constructive and uses induction on the number
of vectors r. When r = 1, there are two possibilities: (a) a
1
= 0, in which
case C = 0, and
C = x [ u
i
x 0, u
i
x 0, i = 1, . . . , n,
where ui is the ith unit coordinate vector; (b) a1 ,= 0, in which case C is
a closed haline, which using the Polar Cone Theorem, is characterized as
the set of vectors that are orthogonal to the subspace orthogonal to a
1
and
also make a nonnegative inner product with a
1
, i.e.,
C = x [ b
i
x 0, b
i
x 0, i = 1, . . . , n 1, a
1
x 0,
where b
1
, . . . , b
n1
are basis vectors for the (n 1)-dimensional subspace
that is orthogonal to the vector a
1
. In both cases (a) and (b), C is poly-
hedral.
Assume that for some r 1, a set of the form
C
r
=
_
_
_
x
x =
r
j=1
j
a
j
,
j
0, j = 1, . . . , r
_
_
_
has a polyhedral representation
P
r
=
_
x [ b
j
x 0, j = 1, . . . , m
_
.
Let a
r+1
be a given vector in '
n
, and consider the set
C
r+1
=
_
_
_
x
x =
r+1
j=1
j
a
j
,
j
0, j = 1, . . . , r +1
_
_
_
.
We will show that C
r+1
has a polyhedral representation.
Let
j
= a
r+1
b
j
, j = 1, . . . , m,
and dene the index sets
J
= j [ j < 0, J
0
= j [ j = 0, J
+
= j [ j > 0.
If J
J
0
= , or equivalently a
r+1
b
j
> 0 for all j, we see that a
r+1
is an
interior point of P
r
. Hence there is an > 0 such that for each
r+1
> 0,
the open sphere centered at
r+1
a
r+1
of radius
r+1
is contained in
P
r
. It follows that for any x '
n
we have that
r+1
a
r+1
+ x P
r
120 Convex Analysis and Optimization Chap. 1
for all suciently large
r+1
, implying that x belongs to C
r+1
. Therefore
Cr+1 = '
n
, so that Cr+1 is a polyhedral set.
We may thus assume that J
J
0
,= . We will show that if J
+
=
or J
= , the set C
r+1
has the polyhedral representation
P
r+1
=
_
x [ b
j
x 0, j J
J
0
_
,
and if J
+
,= and J
j
x 0, j J
J
0
, b
l,k
x 0, l J
+
, k J
_
,
where
b
l,k
= b
l
l
k
b
k
, l J
+
, k J
.
This will complete the induction.
Indeed, we have C
r+1
P
r+1
since by construction, the vectors
a1, . . . , ar+1 satisfy the inequalities dening Pr+1. To show the reverse in-
clusion, we x a vector x P
r+1
and we verify that there exist
1
, . . . ,
r+1
0 such that x =
r+1
j=1
j
a
j
, or equivalently that there exists
r+1
0 such
that
x
r+1
a
r+1
P
r
.
We consider three cases.
(1) J
+
= . Then, since x P
r+1
, we have b
j
x 0 for all j J
J
0
,
implying x P
r
, so that x
r+1
a
r+1
P
r
for
r+1
= 0.
(2) J
+
,= , J
0
,= , and J
= . Then, since x P
r+1
, we have
b
j
x 0 for all j J
0
, so that
b
j
(x a
r+1
) 0, j J
0
, 0,
b
j
(x a
r+1
) 0, j J
+
, max
jJ
+
b
j
x
j
.
Hence, for
r+1
suciently large, we have x
r+1
a
r+1
P
r
.
(3) J
+
,= and J
l,k
x 0 for all
l J
+
, k J
, implying that
b
l
x
k
x
k
, l J
+
, k J
. (1.49)
Furthermore, for x Pr+1, there holds b
k
x/
k
0 for all k J
, so
that for
r+1
satisfying
max
_
0, max
lJ
+
b
l
x
l
_
r+1
min
kJ
k
x
k
,
Sec. 1.6 Polyhedral Convexity 121
we have
b
j
x r+1b
j
ar+1 0, j J
0
,
b
k
x
r+1
b
k
a
r+1
=
k
_
b
k
x
r+1
_
0, k J
,
b
l
x
r+1
b
l
a
r+1
=
l
_
b
l
x
r+1
_
0, l J
+
.
Hence b
j
x
r+1
a
r+1
0 for all j, implying that x
r+1
a
r+1
P
r
,
and completing the proof.
(b) Dene ar+i = ei and ar+m+i = ei, i = 1, . . . , m. The result to be
shown translates to
x P
x C,
where
P = y [ y
aj 0, j = 1, . . . , r + 2m ,
C =
_
_
_
x
x =
r+2m
j=1
jaj, j 0
_
_
_
.
Since by part (a), P = C
=
_
C
j
x 0, j = 1, . . . , r.
By part (a),
P =
_
_
_
x
x =
r
j=1
j
a
j
,
j
0, j = 1, . . . , r
_
_
_
,
and by using also the closedness of nitely generated cones proved in part
(a) and the Polar Cone Theorem, we have
P
=
_
_
_
x
x =
r
j=1
jaj, j 0, j = 1, . . . , r
_
_
_
.
Thus, P
j
x b
j
, j = 1, . . . , r
_
,
where a
j
are some vectors in '
n
and b
j
are some scalars. A polyhedron
may also involve linear equalities, which may be converted into two linear
inequalities. In particular, if e
1
, . . . , e
m
and a
1
, . . . , a
r
are given vectors,
and d
1
, . . . , d
m
and b
1
, . . . , b
r
are given scalars, the set
x [ e
i
x = d
i
, i = 1, . . . , m, a
j
x b
j
, j = 1, . . . , r
is polyhedral, since it can be written as
x [ e
i
x d
i
, e
i
x d
i
, i = 1, . . . , m, a
j
x b
j
, j = 1, . . . , r.
The following is a fundamental result, showing that a polyhedral set
can be represented as the sum of the convex hull of a nite set of points
and a nitely generated cone.
Proposition 1.6.2: A set P is polyhedral if and only if there exist
a nonempty nite set of vectors v
1
, . . . , v
m
and a nitely generated
cone C such that P = conv
_
v
1
, . . . , v
m
_
+ C, i.e.,
P =
_
_
_
x
x =
m
j=1
j
v
j
+ y,
m
j=1
j
= 1,
j
0, j = 1, . . . , m, y C
_
_
_
.
Proof: Assume that P is polyhedral. Then, it has the form
P =
_
x [ a
j
x b
j
, j = 1, . . . , r
_
,
for some vectors aj and scalars bj. Consider the polyhedral cone in '
n+1
given by
P =
_
(x, w) [ 0 w, a
j
x b
j
w, j = 1, . . . , r
_
and note that
P =
_
x [ (x, 1)
P
_
.
By the Minkowski-Weyl Theorem [Prop. 1.6.1(c)],
P is nitely generated,
so it has the form
P =
_
_
_
(x, w)
x =
m
j=1
j
v
j
, w =
m
j=1
j
d
j
,
j
0, j = 1, . . . , m
_
_
_
,
Sec. 1.6 Polyhedral Convexity 123
for some vectors v
j
and scalars d
j
. Since w 0 for all vectors (x, w)
P,
we see that dj 0 for all j. Let
J
+
= j [ d
j
> 0, J
0
= j [ d
j
= 0.
By replacing
j
by
j
/d
j
for all j J
+
, we obtain the equivalent descrip-
tion
P =
_
_
_
(x, w)
x =
m
j=1
j
v
j
, w =
jJ
+
j
,
j
0, j = 1, . . . , m
_
_
_
.
Since P = x [ (x, 1)
P, we obtain
P =
_
_
_
x
x =
jJ
+
jvj +
jJ
0
jvj,
jJ
+
j = 1, j 0, j = 1, . . . , m
_
_
_
.
Thus, P is the vector sum of the convex hull of the vectors v
j
, j J
+
, and
the nitely generated cone
_
_
_
jJ
0
j
v
j
j
0, j J
0
_
_
_
.
To prove that the vector sum of conv
_
v
1
, . . . , v
m
_
and a nitely
generated cone is a polyhedral set, we use a reverse argument; we pass to
a nitely generated cone description, we use the Minkowski-Weyl Theorem
to assert that this cone is polyhedral, and we nally construct a polyhedral
set description. The details are left as an exercise for the reader. Q.E.D.
1.6.3 Extreme Points
Given a convex set C, a vector x C is said to be an extreme point of C
if there do not exist vectors y C and z C, with y ,= x and z ,= x, and
a scalar (0, 1) such that x = y +(1 )z. An equivalent denition is
that x cannot be expressed as a convex combination of some vectors of C,
all of which are dierent from x.
Thus an extreme point of a set cannot lie strictly between the end-
points of any line segment contained in the set. It follows that an extreme
point of a set must lie on the relative boundary of the set. As a result, an
open set has no extreme points, and more generally a convex set that coin-
cides with its relative interior, has no extreme points, except in the special
case where the set consists of a single point. As another consequence of the
denition, a convex cone may have at most one extreme point, the origin.
124 Convex Analysis and Optimization Chap. 1
Extreme
Points
Extreme
Points
Extreme
Points
(a) (b) (c)
Figure 1.6.2. Illustration of extreme points of various convex sets. In the set of
(c), the extreme points are the ones that lie on the circular arc.
We also show in what follows that a bounded polyhedral set has a nite
number of extreme points and in fact it is equal to the convex hull of its
extreme points. Figure 1.6.2 illustrates the extreme points of various types
of sets.
The following proposition provides some intuition into the nature of
extreme points.
Proposition 1.6.3: Let C be a nonempty closed convex set in '
n
.
(a) If H is a hyperplane that passes through a relative boundary
point of C and contains C in one of its halfspaces, then every
extreme point of C H is also an extreme point of C.
(b) C has at least one extreme point if and only if it does not contain
a line, i.e., a set L of the form L = x + d [ ' where d is
a nonzero vector in '
n
and x is some vector in C.
(c) (Krein - Milman Theorem) If C is bounded (in addition to being
convex and closed), then C is equal to the convex hull of its
extreme points.
Proof: (a) Let x be an element of C H which is not an extreme point of
C. Then we have x = y + (1 )z for some (0, 1), and some y C
and z C, with y ,= x and z ,= x. Since x H, the halfspace containing C
is of the form x [ a
x a
x, where a ,= 0. Then a
y a
x and a
z a
x,
which in view of x = y + (1 )z, implies that a
y = a
x and a
z = a
x.
Therefore, y C H and z C H, showing that x is not an extreme
point of C H.
(b) To arrive at a contradiction, assume that C has an extreme point x
and contains a line x+d [ ', where x C and d ,= 0. Then by the
Sec. 1.6 Polyhedral Convexity 125
Recession Cone Theorem [Prop. 1.2.13(b)] and the closedness of C, both
d and d are directions of recession of C, so the line x + d [ '
belongs to C. This contradicts the fact that x is an extreme point.
Conversely, we use induction on the dimension of the space to show
that if C does not contain a line, it must have an extreme point. This
is true in the real line ', so assume it is true in '
n1
. Since C contains
no line, we must have C ,= '
n
, so that C has a relative boundary point.
Take any x on the relative boundary of C, and any hyperplane H passing
through x and containing C in one of its halfspaces. By using a translation
argument if necessary, we may assume without loss of generality that x = 0.
Then H is an (n 1)-dimensional subspace, so the set C H lies in an
(n 1)-dimensional space and contains no line. Hence, by the induction
hypothesis, it must have an extreme point. By part (a), this extreme point
must also be an extreme point of C.
(c) The proof is based on induction on the dimension of the space. On the
real line, every compact convex set C is a line segment whose endpoints are
the extreme points of C, so that C is the convex hull of its extreme points.
Suppose now that every compact convex set in '
n1
is the convex hull of its
extreme points. Let C be a compact convex set in '
n
. Since C is bounded,
it contains no line and by part (b), it has at least one extreme point. By
convexity of C, the convex hull of all extreme points of C is contained
in C. To show the inverse inclusion, choose any x C, and consider a
line that lies in a(C) and passes through x. Since C is compact, the
intersection of this line and C is a line segment whose endpoints, say x1
and x
2
, belong to the relative boundary of C. Let H
1
be a hyperplane that
passes through x
1
and contains C in one of its halfspaces. Similarly, let
H
2
be a hyperplane that passes through x
2
and contains C in one of its
halfspaces. The intersections C H
1
and C H
2
are compact convex sets
that lie in the hyperplanes H
1
and H
2
, respectively. By viewing H
1
and H
2
as (n 1)-dimensional spaces, and by using the inductive hypothesis, we
see that each of the sets CH
1
and CH
2
is the convex hull of its extreme
points. Hence x
1
is a convex combination of some extreme points of CH
1
,
and x
2
is a convex combination of some extreme points of C H
2
. By part
(a), every extreme point of CH
1
and every extreme point of CH
2
is also
an extreme point of C, so that each of x1 and x2 is a convex combination
of some extreme points of C. Since x lies in the line segment connecting x
1
and x
2
, it follows that x is a convex combination of some extreme points
of C, showing that C is contained in the convex hull of all extreme points
of C. Q.E.D.
The boundedness assumption is essential in the Krein - Milman The-
orem. For example, the only extreme point of the ray C = y [ 0,
where is y is a given nonzero vector, is the origin, but none of its other
points can be generated as a convex combination of extreme points of C.
126 Convex Analysis and Optimization Chap. 1
As an example of application of the preceding proposition, consider
a nonempty polyhedral set of the form
x [ Ax = b, x 0,
or of the form
x [ Ax b, x 0,
where A is an mn matrix and b '
n
. Such polyhedra arise commonly in
linear programming formulations, and clearly do not contain a line. Hence
by Prop. 1.6.3(b) they always possess at least one extreme point.
One of the fundamental linear programming results is that if a linear
function f attains a minimum over a polyhedral set C having at least one
extreme point, then f attains a minimum at some extreme point of C (as
well as possibly at some other nonextreme points). We will come to this
fact after considering the more general case where f is concave, and C is
closed and convex.
Proposition 1.6.4: Let C '
n
be a closed convex set that does not
contain a line, and let f : C ' be a concave function attaining a
minimum over C. Then f attains a minimum at some extreme point
of C.
Proof: If C consists of just a single point, the result holds trivially. We
may thus assume that C contains at least two distinct points. We rst
show that f attains a minimum at some vector that is not in ri(C). Let x
+ ( 1)(x
x) C.
Therefore
x
=
1
x +
1
x,
and by the concavity of f and the optimality of x
, we have
f(x
)
1
f(x) +
1
f( x) f(x
),
implying that f(x) = f(x
P =
_
_
_
x
x =
m
j=1
j
v
j
,
m
j=1
j
= 1,
j
0, j = 1, . . . , m
_
_
_
.
We note that an extreme point x of P cannot be of the form x = x + y,
where x
P and y ,= 0, y C, since in this case x would be the midpoint
of the line segment connecting the distinct vectors x and x+2y. Therefore,
an extreme point of P must belong to
P, and since
P P, it must also be
an extreme point of
P. An extreme point of
P must be one of the vectors
v
1
, . . . , v
m
, since otherwise this point would be expressible as a convex
combination of v1, . . . , vm. Thus the set of extreme points of P is either
empty, or nonempty and nite. Using Prop. 1.6.3(b), it follows that the set
of extreme points of P is nonempty and nite if and only if P contains no
line.
If P is bounded, then we must have P =
P, and it follows from
the Krein - Milman Theorem that P is equal to the convex hull of its
extreme points (not just the convex hull of the vectors v
1
, . . . , v
m
). The
following proposition gives another and more specic characterization of
extreme points of polyhedral sets, which is central in the theory of linear
programming.
Sec. 1.6 Polyhedral Convexity 129
Proposition 1.6.6: Let P be a polyhedral set in '
n
.
(a) If P has the form
P =
_
x [ a
j
x b
j
, j = 1, . . . , r
_
,
where a
j
and b
j
are given vectors and scalars, respectively, then
a vector v P is an extreme point of P if and only if the set
A
v
=
_
a
j
[ a
j
v = b
j
, j 1, . . . , r
_
contains n linearly independent vectors.
(b) If P has the form
P = x [ Ax = b, x 0,
where A is a given mn matrix and b is a given vector in '
m
,
then a vector v P is an extreme point of P if and only if the
columns of A corresponding to the nonzero coordinates of v are
linearly independent.
(c) If P has the form
P = x [ Ax = b, c x d,
where A is a given m n matrix, b is a given vector in '
m
,
and c, d are given vectors in '
n
, then a vector v P is an
extreme point of P if and only if the columns of A corresponding
to the coordinates of v that lie strictly between the corresponding
coordinates of c and d are linearly independent.
Proof: (a) If the set Av contains fewer than n linearly independent vectors,
then the system of equations
a
j
w = 0, a
j
A
v
has a nonzero solution w. For suciently small > 0, we have v +w P
and v w P, thus showing that v is not an extreme point. Thus, if v
is an extreme point, A
v
must contain n linearly independent vectors.
Conversely, suppose that A
v
contains a subset
A
v
consisting of n
linearly independent vectors. Suppose that for some y P, z P, and
(0, 1), we have v = y + (1 )z. Then for all a
j
A
v
, we have
b
j
= a
j
v = a
j
y +(1 )a
j
z b
j
+ (1 )b
j
= b
j
.
130 Convex Analysis and Optimization Chap. 1
Thus v, y, and z are all solutions of the system of n linearly independent
equations
a
j
w = b
j
, a
j
A
v
.
Hence v = y = z, implying that v is an extreme point of P.
(b) Let k be the number of zero coordinates of v, and consider the matrix
x =
m
j=1
j vj + y,
m
j=1
j = 1, j 0, j = 1, . . . , m, y C
_
,
where v1, . . . , vm are some vectors and C is a nitely generated cone (cf. Prop.
1.6.2). Show that the recession cone of P is equal to C.
1.6.3
Show that the image and the inverse image of a polyhedral set under a linear
transformation is a polyhedral set.
134 Convex Analysis and Optimization Chap. 1
1.6.4
Let C1 !
n
and C2 !
m
be polyhedral sets. Show that the Cartesian product
C1 C2 is a polyhedral set. Show also that if C1 and C2 are polyhedral cones,
then C1 C2 is a polyhedral cone.
1.6.5
Let C1 and C2 be two polyhedral sets in !
n
. Show that C1 +C2 is a polyhedral
set. Show also that if C1 and C2 are polyhedral cones, then C1+C2 is a polyhedral
cone.
1.6.6 (Polyhedral Strong Separation)
Let C1 and C2 be two nonempty disjoint polyhedral sets in !
n
. Show that C1
and C2 can be strongly separated, i.e., there is a nonzero vector a !
n
such that
sup
xC
1
a
x < inf
zC
2
a
z.
(Compare this with Exercise 1.4.1.)
1.6.7
Let C be a polyhedral set containing the origin.
(a) Show that cone(C) is a polyhedral cone.
(b) Show by counterexample that if C does not contain the origin, cone(C)
may not be a polyhedral cone.
1.6.8 (Polyhedral Functions)
A function f : !
n
(, ] is polyhedral if its epigraph is a polyhedral set
in !
n+1
not containing vertical lines [i.e., lines of the form |(x, w) [ w ! for
some x !
n
]. Show the following:
(a) A polyhedral function is convex over !
n
.
(b) f is polyhedral if and only if dom(f) is a polyhedral set and
f(x) = max|a
1
x +b1, . . . , a
m
x +bm, x dom(f)
for some vectors ai !
n
and scalars bi.
(c) The function given by
F(y) = sup
m
i=1
i
=1,
i
0
__
m
i=1
iai
_
y +
m
i=1
ibi
_
is polyhedral.
Sec. 1.6 Polyhedral Convexity 135
1.6.9
Let A be an mn matrix and let g : !
m
(, ] be a polyhedral function.
Show that the function f given by f(x) = g(Ax) is polyhedral on !
n
.
1.6.10
Let P be a polyhedral set represented as
P =
_
x
x =
m
j=1
j vj + y,
m
j=1
j = 1, j 0, j = 1, . . . , m, y C
_
,
where v1, . . . , vm are some vectors and C is a nitely generated cone (cf. Prop.
1.6.2). Show that each extreme point of P is equal to some vector vi that cannot
be represented as a convex combination of the remaining vectors vj, j ,= i.
1.6.11
Consider a nonempty polyhedral set of the form
P =
_
x [ a
j
x bj , j = 1, . . . , r
_
,
where aj are some vectors and bj are some scalars. Show that P has an extreme
point if and only if the set of vectors |aj [ j = 1, . . . , r contains a subset of n
linearly independent vectors.
1.6.12 (Isomorphic Polyhedral Sets)
It is easily seen that if a linear transformation is applied to a polyhedron, the
result is a polyhedron whose extreme points are the images of some (but not
necessarily all) of the extreme points of the original polyhedron. This exer-
cise claries the circumstances where the extreme points of the original and the
transformed polyhedra are in one-to-one correspondence. Let P and Q be two
polyhedra of !
n
and !
m
, respectively.
(a) Show that there is an ane function f : !
n
!
m
that maps each extreme
point of P into a distinct extreme point of Q if and only if there exists an
ane function g : !
n
!
m
that maps each extreme point of Q into a
distinct extreme point of P. Note: If there exist such ane mappings f
and g, we say that P and Q are isomorphic.
(b) Show that P and Q are isomorphic if and only if there exist ane functions
f : !
n
!
m
and g : !
n
!
m
such that x = g
_
f(x)
_
for all x P and
y = f
_
g(y)
_
for all y Q.
(c) Show that if P and Q are isomorphic then P and Q have the same dimen-
sion.
136 Convex Analysis and Optimization Chap. 1
(d) Let A be an k n matrix and let
P = |x !
n
[ Ax b, x 0,
Q =
_
(x, z) !
n+k
[ Ax + z = b, x 0, z 0
_
.
Show that P and Q are isomorphic.
1.6.13
Show by an example that the set of extreme points of a compact set need not be
closed. Hint : Consider a line segment C1 = |(x1, x2, x3) [ x1 = 0, x2 = 0, 1
x3 1 and a circular disk C2 = |(x1, x2, x3) [ (x1 1)
2
+x
2
2
1, x3 = 0, and
verify that conv(C1 C2) is compact, while its set of the extreme points is not
closed.
1.6.14
Show that a compact convex set is polyhedral if and only if it has a nite number
of extreme points. Give an example showing that the assertion fails if bound-
edness of the set is replaced by a weaker assumption that the set contains no
lines.
1.6.15 (Gordans Theorem of the Alternative)
Let a1, . . . , ar be vectors in !
n
.
(a) Then exactly one of the following two conditions holds:
(i) There exists a vector x !
n
such that
a
1
x < 0, . . . , a
r
x < 0.
(ii) There exists a vector !
r
such that ,= 0, 0, and
1a1 + + rar = 0.
(b) Show that an alternative and equivalent statement of part (a) is the follow-
ing: a polyhedral cone has nonempty interior if and only if its polar cone
does not contain a line, i.e., a set of the form |z [ !, where z is a
nonzero vector.
Sec. 1.6 Polyhedral Convexity 137
1.6.16
Let a1, . . . , ar be vectors in !
n
and let b1, . . . br be scalars. Then exactly one of
the following two conditions holds:
(i) There exists a vector x !
n
such that
a
1
x b1, . . . , a
r
x br.
(ii) There exists a vector !
r
such that 0 and
1a1 + + rar = 0, 1b1 + + rbr < 0.
1.6.17 (Convex System Alternatives)
Let C be a nonempty convex set in !
n
and let fi : C !, i = 1, . . . , r be convex
functions. Then exactly one of the following two conditions holds:
(i) There exists a vector x C such that
f1(x) < 0, . . . , fr(x) < 0.
(ii) There exists a vector !
r
such that ,= 0, 0, and
1f1(x) + + rfr(x) 0, x C.
1.6.18 (Convex-Ane System Alternatives)
Let C be a nonempty convex set in !
n
. Let fi : C !, i = 1, . . . , r be convex
functions, and let fi : C !, i = r + 1, . . . , r be ane functions such that the
system
f
r+1
(x) 0, . . . , fr(x) 0
has a solution x ri(C). Then exactly one of the following two conditions holds:
(i) There exists a vector x C such that
f1(x) < 0, . . . , f
r
(x) < 0, f
r+1
(x) 0, . . . , fr(x) 0.
(ii) There exists a vector !
r
such that ,= 0, 0, and
1f1(x) + + rfr(x) 0, x C.
138 Convex Analysis and Optimization Chap. 1
1.6.19 (Facets)
Let P be a polyhedral set and let H be a hyperplane that passes through a
relative boundary point of P and contains P in one of its halfspaces (cf., Prop.
1.6.3). The set F = P H is called a facet of P.
(a) Show that each facet is a polyhedral set.
(b) Show that the number of distinct facets of P is nite.
(c) Show that each extreme point of C, viewed as a singleton set, is a facet.
(d) Show that if dim(P) > 0, there exists a facet of P whose dimension is
dim(P) 1.
1.6.20
Let f : !
n
(, ] be a polyhedral function and let P !
n
be a polyhedral
set. Show that if infxC f(x) is nite, then the set of minimizers of f over C is
nonempty. Hint : Use Exercise 1.6.8(b), and replace the problem of minimizing
f over C by an equivalent linear program.
1.6.21
Show that an mn matrix A is totally unimodular if and only if every subset J
of |1, . . . , n can be partitioned into two subsets J1 and J2 such that
jJ
1
aij
jJ
2
aij
1, i = 1, . . . , m.
1.6.22
Let A be a matrix with components that are either 1 or -1 or 0. Suppose that
A has at most two nonzero components in each column. Show that A is totally
unimodular.
1.6.23
Let A be a matrix with components that are either 0 or 1. Suppose that in each
column, all the components that are equal to 1 appear consecutively. Show that
A is totally unimodular.
Sec. 1.7 Subgradients 139
1.6.24
Let A be a square invertible matrix such that all the components of A are integer.
Show that A is unimodular if and only if the solution of the system Ax = b
has integer components for every vector b that has integer components. Hint:
To prove that A is unimodular, use the system Ax = ui, where ui is the ith
unit vector, to show that A
1
has integer components. Then use the equality
det(A) det(A
1
) = 1.
1.7 SUBGRADIENTS
Much of optimization theory revolves around comparing the value of the
cost function at a given point with its values at neighboring points. This
calls for an analysis approach that uses derivatives, as we have seen in Sec-
tion 1.5. When the cost function is nondierentiable, this approach breaks
down, but fortunately, it turns out that in the case of a convex cost function
there is a convenient substitute, the notion of directional dierentiabity and
the related notion of a subgradient, which are the subject of this section.
1.7.1 Directional Derivatives
Convex sets and functions can be characterized in many ways by their be-
havior along lines. For example, a set is convex if and only if its intersection
with any line is convex, a convex set is bounded if and only if its intersec-
tion with every line is bounded, a function is convex if and only if it is
convex along any line, and a convex function is coercive if and only if it
is coercive along any line. Similarly, it turns out that the dierentiability
properties of a convex function are determined by the corresponding prop-
erties along lines. With this in mind, we rst consider convex functions of
a single variable.
Let I be an interval of real numbers, and let f : I ' be convex. If
x, y, z I and x < y < z, then we can show the relation
f(y) f(x)
y x
f(z) f(x)
z x
f(z) f(y)
z y
, (1.51)
which is illustrated in Fig. 1.7.1. For a formal proof, note that, using the
denition of a convex function [cf. Eq. (1.3)], we obtain
f(y)
_
y x
z x
_
f(z) +
_
z y
z x
_
f(x)
140 Convex Analysis and Optimization Chap. 1
slope =
slope = slope =
y - x
f(y) - f(x)
z - y
f(z) - f(y)
z - x
f(z) - f(x)
x y z
Figure 1.7.1. Illustration of the inequalities (1.51). The rate of change of the
function f is nondecreasing with its argument.
and either of the desired inequalities follows by appropriately rearranging
terms.
Let a and b be the inmum and the supremum, respectively, of I,
also referred to as the left and right end points of I, respectively. For any
x I that is not equal to the right end point, and for any > 0 such that
x + I, we dene
s
+
(x, ) =
f(x + ) f(x)
.
Let 0 <
to obtain s
+
(x, ) s
+
(x,
). Therefore, s
+
(x, ) is a
nondecreasing function of and, as decreases to zero, it converges either
to a nite number or to . Let f
+
(x) be the value of the limit, which we
call the right derivative of f at the point x. Similarly, for any x I that
is not equal to the left end point, and for any > 0 such that x I,
we dene
s
(x, ) =
f(x) f(x )
.
By a symmetrical argument, s
(a) = and f
+
(b) = . The basic facts
about the dierentiability properties of one-dimensional convex functions
can be easily visualized, and are given in the following proposition.
Sec. 1.7 Subgradients 141
Proposition 1.7.1: Let I ' be an interval with end points a and
b, and let f : I ' be a convex function.
(a) We have f
(x) f
+
(x) for every x I.
(b) If x belongs to the interior of I, then f
+
(x) and f
(z).
(d) The functions f
, f
+
: I [, +] are nondecreasing.
(e) The function f
+
(respectively, f
(x) = (df/dx)(x).
(g) For any x, z I and any d satisfying f
(x) d f
+
(x), we
have
f(z) f(x) + d(z x).
(h) The function f
+
: I (, ] [respectively, f
: I [, )]
is upper (respectively, lower) semicontinuous at every x I.
Proof: (a) If x is an end point of I, the result is trivial because f
(a) =
and f
+
(b) = . We assume that x is an interior point, we let > 0,
and use Eq. (1.51), with y = x + and z = y , to obtain s
(x, )
s
+
(x, ). Taking the limit as decreases to zero, we obtain f
(x) f
+
(x).
(b) Let x belong to the interior of I and let > 0 be such that x I.
Then f
(x) s
_
z, (z x)/2
_
. The result then follows because f
+
(x) s
+
_
x, (z x)/2
_
and s
_
z, (z x)/2
_
f
(z).
(d) This follows by combining parts (a) and (c).
(e) Fix some x I, x ,= b, and some positive and such that x++ < b.
We allow x to be equal to a, in which case f is assumed to be continuous
at a. We have f
+
(x + ) s
+
(x +, ). We take the limit, as decreases
to zero, to obtain lim
0
f
+
(x +) s
+
(x, ). We have used here the fact
that s
+
(x, ) is a continuous function of x, which is a consequence of the
continuity of f (Prop. 1.2.12). We now let decrease to zero to obtain
lim
0
f
+
(x + ) f
+
(x). The reverse inequality is also true because f
+
142 Convex Analysis and Optimization Chap. 1
is nondecreasing and this proves the rightcontinuity of f
+
. The proof for
f
is similar.
(f) This is immediate from the denition of f
+
and f
.
(g) Fix some x, z I. The result is trivially true for x = z. We only
consider the case x < z; the proof for the case x > z is similar. Since
s
+
(x, ) is nondecreasing in , we have
_
f(z) f(x)
_
/(z x) s
+
(x, )
for belonging to (0, z x). Letting decrease to zero, we obtain
_
f(z)
f(x)
_
/(z x) f
+
(x) d and the result follows.
(h) This follows from parts (a), (d), (e), and the dening property of semi-
continuity (Denition 1.1.5). Q.E.D.
We will now discuss notions of directional dierentiablity of multi-
dimensional real-valued functions (the extended real-valued case will be
discussed later). Given a function f : '
n
', a point x '
n
, and a vec-
tor y '
n
, we say that f is directionally dierentiable at x in the direction
y if there is a scalar f
(x; y) = lim
0
f(x +y) f(x)
.
We call f
(x; y) is a
linear function of y denoted by
f
(x; y) = f(x)
y
where f(x) is the gradient of f at x.
An interesting property of real-valued convex functions is that they
are directionally dierentiable at all points x. This is a consequence of the
directional dierentiability of scalar convex functions, as can be seen from
the relation
f
(x; y) = lim
0
f(x +y) f(x)
= lim
0
Fy() Fy(0)
= F
+
y (0), (1.52)
where F
+
y
(0) is the right derivative of the convex scalar function
F
y
() = f(x + y)
at = 0. Note that the above calculation also shows that the left derivative
F
y
(0) of F
y
is equal to f
y
(0) F
+
y
(0), or equivalently,
f
(x; y) f
(x; y) = inf
>0
f(x +y) f(x)
.
The following proposition generalizes the upper semicontinuity prop-
erty of right derivatives of scalar convex functions [Prop. 1.7.1(h)], and
shows that if f is dierentiable, then its gradient is continuous.
Proposition 1.7.2: Let f : '
n
' be convex, and let f
k
be a
sequence of convex functions f
k
: '
n
' with the property that
lim
k
f
k
(x
k
) = f(x) for every x '
n
and every sequence x
k
that
converges to x. Then for any x '
n
and y '
n
, and any sequences
x
k
and y
k
converging to x and y, respectively, we have
limsup
k
f
k
(x
k
; y
k
) f
< , .
Hence, for , we have
f
k
(x
k
+ y
k
) f
k
(x
k
)
for all suciently large k, and using Eq. (1.52), we obtain
limsup
k
f
k
(x
k
; y
k
) < .
Since this is true for all > f
y = limsup
k
f
(x
k
; y) f
(x; y) = f(x)
y.
By replacing y by y in the preceding argument, we obtain
liminf
k
f(x
k
)
y = limsup
k
_
f(x
k
)
y
_
f(x)
y.
Therefore, we have f(x
k
)
y f(x)
d, z '
n
. (1.55)
If instead f is a concave function, we say that d is a subgradient of f at
x if d is a subgradient of the convex function f at x. The set of all
subgradients of a convex (or concave) function f at x '
n
is called the
subdierential of f at x, and is denoted by f(x).
A subgradient admits an intuitive geometrical interpretation: it can
be identied with a nonvertical supporting hyperplane to the graph of the
function, as illustrated in Fig. 1.7.2. Such a hyperplane provides a linear
approximation to the function, which is an underestimate in the case of
a convex function and an overestimate in the case of a concave function.
Figure 1.7.3 provides some examples of subdierentials.
z
(x,f(x))
f(z)
( - d, 1)
Figure 1.7.2. Illustration of a subgradient of a convex function f. The dening
relation (1.55) can be written as
f(z) z
d f(x) x
d, z
n
.
Equivalently, the hyperplane in
n+1
that has normal (d, 1) and passes through
_
x, f(x)
_
supports the epigraph of f, as shown in the gure.
The directional derivative and the subdierential of a convex function
are closely linked, as shown in the following proposition.
Sec. 1.7 Subgradients 145
f(x) = |x| f(x) = max{0, (1/2)(x
2
- 1)}
0 0 1 - 1 x
f(x) f(x)
0 0 1 - 1
1
- 1
x x
x
Figure 1.7.3. The subdierential of some scalar convex functions as a function
of the argument x.
Proposition 1.7.3: Let f : '
n
' be convex. For every x '
n
,
the following hold:
(a) A vector d is a subgradient of f at x if and only if
f
(x; y) y
d, y '
n
.
(b) The subdierential f(x) is a nonempty, convex, and compact
set, and there holds
f
(x; y) = max
df(x)
y
d, y '
n
. (1.56)
In particular, f is dierentiable at x with gradient f(x), if and
only if it has f(x) as its unique subgradient at x.
(c) If X is a bounded set, the set
xX
f(x) is bounded.
Proof: (a) The subgradient inequality (1.55) is equivalent to
f(x + y) f(x)
d, y '
n
, > 0.
Since the quotient on the left above decreases monotonically to f
(x; y) as
0 [Eq. (1.51)], we conclude that the subgradient inequality (1.55) is
146 Convex Analysis and Optimization Chap. 1
equivalent to f
(x; y) y
(x; y) y
d, y '
n
. (1.57)
(b) From Eq. (1.57), we see that f(x) is the intersection of the closed
halfspaces
_
d [ y
d f
(x; y)
_
, where y ranges over the nonzero vectors of
'
n
. It follows that f(x) is closed and convex.
To show that f(x) is also bounded, suppose to arrive at a contradic-
tion that there is a sequence d
k
f(x) with [[d
k
[[ . Let y
k
=
d
k
||d
k
||
.
Then, from the subgradient inequality (1.55), we have
f(x + y
k
) f(x) + y
k
d
k
= f(x) +[[d
k
[[, k,
so it follows that f(x +y
k
) . This is a contradiction since f is convex
and hence continuous, so it is bounded on any bounded set. Thus f(x) is
bounded.
To show that f(x) is nonempty and that Eq. (1.56) holds, we rst
observe that Eq. (1.57) implies that f
(x; y) max
df(x)
y
d [where the
maximum is if f(x) is empty]. To show the reverse inequality, take
any x and y in '
n
, and consider the subset of '
n+1
C
1
=
_
(, z) [ > f(z)
_
,
and the half-line
C
2
=
_
(, z) [ = f(x) +f
(x; y), z = x + y, 0
_
;
see Fig. 1.7.4. Using the denition of directional derivative and the convex-
ity of f, it follows that these two sets are nonempty, convex, and disjoint.
By applying the Separating Hyperplane Theorem (Prop. 1.4.2), we see that
there exists a nonzero vector (, w) '
n+1
such that
+w
z
_
f(x)+f
(x; y)
_
+w
(x+y), 0, z '
n
, > f(z).
(1.58)
We cannot have < 0 since then the left-hand side above could be made
arbitrarily small by choosing suciently large. Also if = 0, then Eq.
(1.58) implies that w = 0, which is a contradiction. Therefore, > 0 and
by dividing with in Eq. (1.58), we obtain
+(zx)
(w/) f(x)+f
(x; y)+y
(w/), 0, z '
n
, > f(z).
(1.59)
By setting = 0 in the above relation and by taking the limit as f(z),
we obtain f(z) f(x) + (z x)
(w/)
Sec. 1.7 Subgradients 147
C
1
f(z)
0 y x z
C
2
Figure 1.7.4. Illustration of the sets C
1
and C
2
used in the hyperplane separation
argument of the proof of Prop. 1.7.3(b).
f
d f
(x; y) = f(x)
(x
k
; y
k
) d
k
y
k
= |d
k
|,
so it follows that f
(x
k
; y
k
) . This contradicts, however, Eq. (1.54),
which requires that limsup
k
f
(x
k
; y
k
) f
d
k
f
(x
k
; y), y '
n
.
If d is a limit point of d
k
, we have by taking limit in the above relation
and by using Prop. 1.7.2
y
d limsup
k
f
(x
k
; y) f
d
1
, z '
n
,
f
2
(z) f
2
(x) + (z x)
d
2
, z '
n
,
so by adding, we obtain
f(z) f(x) + (z x)
(d
1
+ d
2
), z '
n
.
Hence d
1
+ d
2
f(x), implying that f
1
(x) + f
2
(x) f(x).
To prove the reverse inclusion, suppose to arrive at a contradiction,
that there exists a d f(x) such that d / f
1
(x)+f
2
(x). Since by Prop.
1.7.3(b), the sets f
1
(x) and f
2
(x) are compact, the set f
1
(x) + f
2
(x)
is compact (cf. Prop. 1.2.16), and by Prop. 1.4.3, there exists a hyperplane
strictly separating d from f1(x) + f2(x), i.e., a vector y and a scalar b
such that
y
(d
1
+ d
2
) < b < y
d, d
1
f
1
(x), d
2
f
2
(x).
Therefore,
max
d
1
f
1
(x)
y
d
1
+ max
d
2
f
2
(x)
y
d
2
< y
d,
and by Prop. 1.7.3(b),
f
1
(x; y) + f
2
(x; y) < y
d.
Sec. 1.7 Subgradients 149
By using the denition of directional derivative, we have f
1
(x; y)+f
2
(x; y) =
f
(x; y) < y
d,
which is a contradiction in view of Prop. 1.7.3(a). Q.E.D.
We close this section with some versions of the chain rule for direc-
tional derivatives and subgradients.
Proposition 1.7.5: (Chain Rule)
(a) Let f : '
n
' be a convex function and let A be an m n
matrix. Then the subdierential of the function F dened by
F(x) = f(Ax),
is given by
F(x) = A
f(Ax) =
_
A
g [ g f(Ax)
_
.
(b) Let f : '
n
' be a convex function and let g : ' ' be a
smooth scalar function. Then the function F dened by
F(x) = g
_
f(x)
_
is directionally dierentiable at all x, and its directional deriva-
tive is given by
F
(x; y) = g
_
f(x)
_
f
(x; y) = f
z f
(Ax; z) z '
m
,
150 Convex Analysis and Optimization Chap. 1
and in particular,
g
Ay f
g)
y F
g F(x), so that A
f(Ax) F(x).
To prove the reverse inclusion, suppose to come to a contradiction,
that there exists a d F(x) such that d / A
(A
g) < b < y
d, g f(Ax).
From this we obtain
max
gf(Ax)
(Ay)
g < y
d,
or, by using Prop. 1.7.3(b),
f
d.
Since f
(Ax; Ay) = F
(x; y) < y
d,
which is a contradiction in view of Prop. 1.7.3(a).
(b) We have
F
(x; y) = lim
0
F(x + y) F(x)
= lim
0
g
_
f(x + y)
_
g
_
f(x)
_
. (1.62)
From the convexity of f it follows that there are three possibilities: (1)
for some > 0, f(x + y) = f(x) for all (0, ], (2) for some > 0,
f(x + y) > f(x) for all (0, ], (3) for some > 0, f(x + y) < f(x)
for all (0, ].
In case (1), from Eq. (1.62), we have F
(x; y) = f
(x; y) = lim
0
f(x +y) f(x)
g
_
f(x + y)
_
g
_
f(x)
_
f(x + y) f(x)
.
As 0, we have f(x + y) f(x), so the preceding equation yields
F
(x; y) = g
_
f(x)
_
f
d F
d g
_
f(x)
_
f
d
g
_
f(x)
_ f
d , z '
n
. (1.63)
If instead f is a concave function, we say that d is an -subgradient of f
at x if d is a subgradient of the convex function f at x. The set of all
-subgradients of a convex (or concave) function f at x '
n
is called the
-subdierential of f at x, and is denoted by f(x). The -subdierential
is illustrated geometrically in Fig. 1.7.5.
-subgradients nd several applications in nondierentiable optimiza-
tion, particularly in the context of computational methods. Some of their
important properties are given in the following proposition.
152 Convex Analysis and Optimization Chap. 1
0 x
f(z)
Slopes = endpoints of
-subdifferential at x
z
Figure 1.7.5. Illustration of the -subdierential f(x) of a convex one-dimensional
function f : . The -subdierential is a bounded interval, and corresponds
to the set of slopes indicated in the gure. Its left endpoint is
f
(x) = sup
<0
f(x +) f(x) +
,
and its right endpoint is
f
+
j,
(x) = inf
>0
f(x +) f(x) +
.
Sec. 1.7 Subgradients 153
Proposition 1.7.6: Let f : '
n
' be convex and be a positive
scalar. For every x '
n
, the following hold:
(a) The -subdierential f
= max
df(x)
y
d.
(b) We have 0
f(x), then
inf
>0
f(x +y) < f(x) .
(d) If 0 /
f(x).
(e) If f is equal to the sum f
1
+ + f
m
of convex functions f
j
:
'
n
', j = 1, . . . , m, then
f(x) f1(x) + +fm(x) mf(x).
Proof: (a) We have
d
d , > 0, y '
n
.
(1.64)
Hence
d
f(x) inf
>0
f(x +y) f(x) +
d, y '
n
. (1.65)
It follows that
inf
>0
f(x + y) f(x) +
d
_
154 Convex Analysis and Optimization Chap. 1
as y ranges over '
n
. Hence
f(x)
is bounded.
To show that
= max
df(x)
y
d, y '
n
,
we argue similar to the proof of Prop. 1.7.3(b). Consider the subset of
'
n+1
C1 =
_
(, z) [ > f(z)
_
,
and the half-line
C
2
= (, z) [ = f(x)+ inf
>0
f(x + y) f(x) +
, z = x+y, 0.
These sets are nonempty and convex. They are also disjoint, since we have
for all (, z) C
2
= f(x) + inf
>0
f(x + y) f(x) +
f(x) +
f(x + y) f(x) +
= f(x + y)
= f(z).
Hence there exists a hyperplane separating them, i.e., a vector (, w) ,=
(0, 0) such that for all 0, z '
n
, and > f(z),
+ w
z
_
f(x) + inf
>0
f(x + y) f(x) +
_
+ w
(x + y).
(1.66)
We cannot have > 0, since then the left-hand side above could be made
arbitrarily large by choosing to be suciently large. Also, if = 0,
Eq. (1.66) implies that w = 0, contradicting the fact that (, w) ,= (0, 0).
Therefore, < 0 and after dividing Eq. (1.66) by , we obtain for all 0,
z '
n
, and > f(z),
+
_
w
(z x) f(x) + inf
>0
f(x + y) f(x) +
+
_
w
y.
(1.67)
Sec. 1.7 Subgradients 155
Taking the limit above as f(z) and setting = 0, we obtain
f(z) f(x) +
_
(z x), z '
n
.
Hence w/ belongs to
+ inf
>0
f(x +y) f(x) +
.
Since can be chosen as large as desired, we see that
y inf
>0
f(x +y) f(x) +
.
Combining this relation with Eq. (1.65), we obtain
max
df(x)
d
y = inf
>0
f(x + y) f(x)
.
(b) By denition, 0 f(x) if and only if f(z) f(x) for all z '
n
,
which is equivalent to inf
z
n f(z) f(x) .
(c) Assume that a direction y is such that
max
df(x)
y
d < 0, (1.68)
while inf
>0
f(x + y) f(x) . Then f(x + y) f(x) for all
> 0, or equivalently
f(x +y) f(x) +
0, > 0.
Consequently, using part (a), we have
max
df(x)
y
d = inf
>0
f(x + y) f(x) +
0.
which contradicts Eq. (1.68).
(d) The vector d is the projection of the zero vector on the convex and
compact set
f(x). If 0 ,
d 0, d f(x).
Hence
d
d [[d[[
2
> 0, d
f(x).
156 Convex Analysis and Optimization Chap. 1
(e) It will suce to prove the result for the case where f = f
1
+ f
2
. If
d1 f1(x) and d2 f2(x), then from Eq. (1.64), we have
f
1
(x + y) f
1
(x) + y
d
1
, > 0, y '
n
,
f
2
(x + y) f
2
(x) + y
d
2
, > 0, y '
n
,
so by adding, we obtain
f(x + y) f(x) + y
f
1
(x) +
f
2
(x)
2
f(x).
To prove that
f(x)
f
1
(x)+
f
2
(x), we use an argument similar
to the one of the proof of Prop. 1.7.4(b). Suppose to come to a contradic-
tion, that there exists a d f(x) such that d / f1(x)+f2(x). Since by
part (a), the sets f1(x) and f2(x) are compact, the set f1(x)+f2(x)
is compact (cf. Prop. 1.2.16), and by Prop. 1.4.3, there exists a hyperplane
strictly separating d from
f
1
(x) +
f
2
(x), i.e., a vector y and a scalar b
such that
y
d, d1 f1(x), d2 f2(x).
From this we obtain
max
d
1
f
1
(x)
y
d
1
+ max
d
2
f
2
(x)
y
d
2
< y
d,
and by part (a),
inf
>0
f
1
(x + y) f
1
(x) +
+ inf
>0
f
2
(x + y) f
2
(x) +
< y
d.
Let
j
, j = 1, 2, be positive scalars such that
f
1
(x +
1
y) f
1
(x) +
1
+
f
2
(x +
2
y) f
2
(x) +
2
< y
d. (1.69)
Dene
=
1
1/
1
+ 1/
2
.
As a consequence of the convexity of f
j
, j = 1, 2, the ratio
_
f
j
(x + y)
fj(x)
_
/ is monotonically nondecreasing in . Thus, since j , we have
f
j
(x +
j
y) f
j
(x)
j
f
j
(x +y) f
j
(x)
, j = 1, 2,
Sec. 1.7 Subgradients 157
and Eq. (1.69) together with the denition of yields
y
d >
f
1
(x +
1
y) f
1
(x) +
1
+
f
1
(x +
1
y) f
1
(x) +
1
+
f
1
(x + y) f
1
(x)
+
f
2
(x + y) f
2
(x)
inf
>0
f(x + y) f(x) +
.
This contradicts Eq. (1.65), thus implying that f(x) f1(x)+f2(x).
Q.E.D.
Parts (b)-(d) of Prop. 1.7.6 contain the elements of an important algo-
rithm (called -descent algorithm and introduced by Bertsekas and Mitter
[BeM71], [BeM73]) for minimizing convex functions to within a tolerance
of . At a given point x, we check whether 0 f(x). If this is so, then
by Prop. 1.7.6(a), x is an -optimal solution. If not, then by going along
the direction opposite to the vector of minimum norm on
f(x), we are
guaranteed a cost improvement of at least . An implementation of this
algorithm due to Lemarechal [Lem74], [Lem75] is given in the exercises.
1.7.4 Subgradients of Extended Real-Valued Functions
We have focused so far on real-valued convex functions f : '
n
', which
are dened over the entire space '
n
and are convex over '
n
. The notion
of a subdierential and a subgradient of an extended real-valued convex
function f : '
n
(, ] can be developed along similar lines. In
particular, a vector d is a subgradient of f at a vector x dom(f) [i.e.,
f(x) < ] if the subgradient inequality holds, i.e.,
f(z) f(x) + (z x)
d, z '
n
.
If the extended real-valued function f : '
n
[, ) is concave, d is
said to be a subgradient of f at a vector x with f(x) > if d is a
subgradient of the convex function f at x.
The subdierential f(x) is the set of all subgradients of the convex
(or concave) function f. By convention, f(x) is considered empty for all
x with f(x) = . Note that contrary to the case of real-valued functions,
f(x) may be empty, or closed but unbounded. For example, the extended
real-valued convex function given by
f(x) =
_
x if 0 x 1,
otherwise,
has the subdierential
f(x) =
_
_
_
1
2
x
if 0 < x < 1,
[1/2, ) if x = 1,
if x 0 or 1 < x.
158 Convex Analysis and Optimization Chap. 1
Thus, f(x) can be empty and can be unbounded at points x that belong
to the eective domain of f (as in the cases x = 0 and x = 1, respectively,
of the above example). However, it can be shown that f(x) is nonempty
and compact at points x that are interior points of the eective domain of
f, as also illustrated by the above example.
Similarly, a vector d is an -subgradient of f at a vector x such that
f(x) < if
f(z) f(x) + (z x)
d , z '
n
.
The -subdierential f(x) is the set of all -subgradients of f. Figure
1.7.6 illustrates the denition of
Slopes = endpoints of
-subdifferential at x
D
0
x
f(z)
(x) =
_
sup
<0, x+D
f(x+)f(x)+
if inf D < x,
if inf D = x,
and its right endpoint is
f
+
j,
(x) =
_
inf
>0, x+D
f(x+)f(x)+
if x < supD,
if x = supD.
Note that these endpoints can be (as in the gure on the right) or . For
= 0, the above formulas also give the endpoints of the subdierential f(x).
Note that while f(x) is nonempty for all x in the interior of D, it may be empty
for x at the relative boundary of D (as in the gure on the right).
One can provide generalized versions of the results of Props. 1.7.3-
1.7.6 within the context of extended real-valued convex functions, but with
Sec. 1.7 Subgradients 159
appropriate adjustments and additional assumptions to deal with cases
where f(x) may be empty or noncompact. Some of these generalizations
are discussed in the exercises.
1.7.5 Directional Derivative of the Max Function
As mentioned in Section 1.3.4, the max function f(x) = max
zZ
(x, z)
arises in a variety of interesting contexts, including duality. It is therefore
important to characterize this function, and the following proposition gives
the directional derivative and the subdierential of f for the case where
(, z) is convex for all z Z.
160 Convex Analysis and Optimization Chap. 1
Proposition 1.7.7: (Danskins Theorem) Let Z '
m
be a
compact set, and let : '
n
Z ' be continuous and such that
(, z) : '
n
' is convex for each z Z.
(a) The function f : '
n
' given by
f(x) = max
zZ
(x, z) (1.70)
is convex and has directional derivative given by
f
(x; y) = max
zZ(x)
(x, z) = max
zZ
(x, z)
_
.
In particular, if Z(x) consists of a unique point z and (, z) is
dierentiable at x, then f is dierentiable at x, and f(x) =
x
(x, z), where
x
(x, z) is the vector with coordinates
(x, z)
x
i
, i = 1, . . . , n.
(b) If (, z) is dierentiable for all z Z and x(x, ) is continuous
on Z for each x, then
f(x) = conv
_
x
(x, z) [ z Z(x)
_
, x '
n
.
(c) The conclusion of part (a) also holds if, instead of assuming the
Z is compact, we assume that Z(x) is nonempty for all x '
n
,
and that and Z are such that for every convergent sequence
x
k
, there exists a bounded sequence z
k
with z
k
Z(x
k
) for
all k.
Proof: (a) The convexity of f has been established in Prop. 1.2.3(c). We
note that since is continuous and Z is compact, the set Z(x) is nonempty
by Weierstrass Theorem (Prop. 1.3.1) and f takes real values. For any
Sec. 1.7 Subgradients 161
z Z(x), y '
n
, and > 0, we use the denition of f to obtain
f(x + y) f(x)
(x + y, z) (x, z)
.
Taking the limit as decreases to zero, we obtain f
(x; y)
(x, z; y).
Since this is true for every z Z(x), we conclude that
f
(x; y) sup
zZ(x)
k
0 and let x
k
= x +
k
y. For each k, let z
k
be a vector in Z(x
k
). Since
z
k
belongs to the compact set Z, it has a subsequence converging to some
z Z. Without loss of generality, we assume that the entire sequence z
k
converges to z. We have
(x
k
, z
k
) (x
k
, z), z Z,
so by taking the limit as k and by using the continuity of , we obtain
(x, z) (x, z), z Z.
Therefore, z Z(x). We now have
f
(x; y)
f(x +
k
y) f(x)
k
=
(x +
k
y, z
k
) (x, z)
(x +
k
y, z
k
) (x, z
k
)
k
=
(x +
k
y +
k
(y), z
k
) (x +
k
y, z
k
)
(x +
k
y, z
k
; y)
(x +
k
y, z
k
; y),
(1.73)
where the last inequality follows from inequality (1.53). By letting k ,
we obtain
f
(x; y) limsup
k
(x +
k
y, z
k
; y)
(x, z; y),
where the right inequality in the preceding relation follows from Prop. 1.4.2
with f
k
() = (, z
k
), x
k
= x +
k
y, and y
k
= y. This relation together
with inequality (1.72) proves Eq. (1.71).
162 Convex Analysis and Optimization Chap. 1
For the last statement of part (a), if Z(x) consists of the unique point
z, Eq. (1.71) and the dierentiability assumption on yield
f
(x; y) =
(x, z; y) = y
(x; y) = max
zZ(x)
x
(x, z)
y,
while by Prop. 1.7.3, we have
f
(x; y) = max
zf(x)
d
y.
For all z Z(x) and y '
n
, we have
f(y) = max
zZ
(y, z)
(y, z)
(x, z) +
x
(x, z)
(y x)
= f(x) +x(x, z)
(y x).
Therefore,
x
(x, z) is a subgradient of f at x, implying that
conv
_
x
(x, z) [ z Z(x)
_
f(x).
To prove the reverse inclusion, we use a hyperplane separation argu-
ment. By the continuity of (x, ) and the compactness of Z, we see that
Z(x) is compact, so by the continuity of
x
(x, ), the set
_
x
(x, z) [
z Z(x)
_
is compact. By Prop. 1.2.8, it follows that conv
_
x(x, z) [ z
Z(x)
_
is compact. If d f(x) while d / conv
_
x
(x, z) [ z Z(x)
_
, by
the Strict Separation Theorem (Prop. 1.4.3), there exist y ,= 0 and ',
such that
d
y > >
x
(x, z)
y, z Z(x).
Therefore, we have
d
y > max
zZ(x)
x
(x, z)
y = f
(x; y),
contradicting Prop. 1.7.3. Thus f(x) conv
_
x(x, z) [ z Z(x)
_
and
the proof is complete.
(c) The proof of this part is nearly identical to the proof of part (a).
Q.E.D.
Sec. 1.7 Subgradients 163
A simple but important application of the above proposition is when
Z is a nite set and is linear in x for all z. In this case, f(x) can be
represented as
f(x) = maxa
1
x + b
1
, . . . , a
m
x + b
m
,
where a
1
, . . . , a
m
are given vectors in '
n
and b
1
, . . . , b
m
are given scalars.
Then f(x) is the convex hull of the set of all vectors a
i
such that a
i
x+b
i
=
maxa
1
x +b1, . . . , a
mx + bm.
E X E R C I S E S
1.7.1 (Properties of Directional Derivative)
Let f : !
n
! be a convex function and let x !
n
be a xed vector. Then for
the directional derivative function f
(x; ) : !
n
! the following hold.
(a) f
(y; w) = lim
0, zw
g(y + z) g(y)
.
Then the composite function F(x) = g
_
f(x)
_
is directionally dierentiable at x
and the following chain rule applies
F
(x; d) = g
_
f(x); f
(x; d)
_
, d !
n
.
1.7.3
Let f : !
n
! be a convex function. Show that a vector d !
n
is a subgradient
of f at x if and only if the function d
f(x) f(y)
L |x y|, x, y X.
Show also that
f
(x; y) L [[y[[, x X, y !
n
.
Hint: Use the boundedness property of the subdierential (Prop. 1.7.3).
1.7.6 (Polar Cones of Level Sets)
Let f : !
n
! be a convex function and let x !
n
be a xed vector.
(a) Show that
|y [ f
(x; y) 0 =
_
cone
_
f(x)
_
_
.
(b) Assuming that the level set |z [ f(z) f(x) is nonempty, show that the
normal cone of |z [ f(z) f(x) at the point x coincides with cone
_
f(x)
_
.
Sec. 1.7 Subgradients 165
1.7.7
Let f : !
n
(, ] be a convex function. Show that f(x) is nonempty if
x belongs to the relative interior of dom(f). Show also that f(x) is nonempty
and bounded if and only if x belongs to the interior of dom(f).
1.7.8
Let fi : !
n
(, ], i = 1, . . . , m be convex functions, and let f = f1 + +
fm. Show that
f1(x) + + fm(x) f(x), x !
n
.
Furthermore, assuming that the relative interiors of dom(fi) have a nonempty
intersection, show that
f1(x) + + fm(x) = f(x).
Hint: Follow the line of proof of Prop. 1.7.8.
1.7.9
Let C1, . . . , Cm be convex sets in !
n
such that ri(C1) ri(Cm) is nonempty.
Show that
NC
1
Cm
(x) = NC
1
(x) + + NCm
(x).
Hint: For each i, let fi(x) = 0 when x C and fi(x) = otherwise. Then use
Exercises 1.7.4(c) and 1.7.8.
1.7.10
Let f : !
n
! be a convex function. Show that
>0f(x) = f(x), x !
n
.
1.7.11
Let f : !
n
! be a convex function and X !
n
a bounded set. Show that for
every > 0, the set xXf(x) is bounded.
166 Convex Analysis and Optimization Chap. 1
1.7.12 Continuity of -Subdierential [Nur77]
Let f : !
n
! be a convex function and > 0 be a scalar. For every x !
n
,
the following hold:
(a) If a sequence |xk converges to x and dk f(xk) for all k, then the
sequence |dk is bounded and each of its limit points is an -subgradient
of f at x.
(b) If d f(x), then for every sequence |xk converging to x, there is a
sequence |d
k
such that d
k
f(x
k
) and d
k
d.
Note: This exercise shows that a point-to-set mapping x f(x) is continuous.
In particular, part (a) shows that the mapping x f(x) is upper semicontin-
uous, while part (b) shows that this mapping is lower semicontinuous.
1.7.13 (Steepest Descent Directions of Convex Functions)
Let f : !
n
! be a convex function and x a vector x !
n
. Show that a
vector d is the vector of minimum norm in f(x) if and only if either d = 0 or
else d/|d| minimizes f
(x; d) < 0.
This exercise provides a method for generating a descent direction in circum-
stances where obtaining a single subgradient at x is relatively easy.
Assume that x does not minimize f and let g1 be some subgradient of f at
x. For k = 2, 3, . . ., let wk be the vector of minimum norm in the convex hull of
g1, . . . , gk1,
w
k
= arg min
gconv{g
1
,...,g
k1
}
|g|.
If wk is a descent direction stop; else let gk be an element of f(x) such that
g
k
w
k
= min
gf(x)
g
w
k
= f
(x; w
k
).
Show that this process terminates in a nite number of steps with a descent
direction. Hint: If w
k
is not a descent direction, then gi
w
k
|w
k
|
2
|g
|
2
> 0
for all i = 1, . . . , k 1, where g
w
k
= min
gf(x)
g
w
k
.
Show that this process will terminate in a nite number of steps with either an
improvement of the value of f by at least , or by conrmation that x is an
-optimal solution.
1.8 OPTIMALITY CONDITIONS
In this section we generalize the constrained optimality conditions of Sec-
tion 1.5 to problems involving a nondierentiable but convex cost function.
We will rst use directional derivatives to provide a necessary and sucient
condition for optimality in the problem of minimizing a convex function
f : '
n
' over a convex set X '
n
. It can be seen that x
is a global
minimum of f over X if and only if
f
(x
; x x
) 0, x X.
This follows using the denition (1.52) of directional derivative and the fact
that the dierence quotient
f
_
x
+(x x
)
_
f(x
) such that
d
(x x
) 0, x X.
Equivalently, x
) + T
X
(x
,
where T
X
(x
.
Proof: Suppose that for some d f(x
(x
x
) d
(xx
) 0 for all x X,
so x
minimizes f over
X. Consider the set of feasible directions of X at x
F
X
(x
) = w ,= 0 [ x
=
_
d [ d
w 0, w F
X
(x
)
_
.
If f(x
) W = . Since f(x
) is
compact and W is closed, by Prop. 1.4.3 there exists a hyperplane strictly
separating f(x
y < c < d
y, g f(x
), d W.
Using the fact that W is a closed cone, it follows that
c < 0 d
y, d W, (1.74)
which when combined with the preceding inequality, also yields
max
gf(x
)
g
y < c < 0.
Thus, using part (b), we have f
(x
(x
; y
k
) < 0, which contradicts the optimality of x
.
Sec. 1.8 Optimality Conditions 169
The last statement follows from the convexity of X which implies that
T
X
(x
(xx
(x x
) 0, x X.
In the special case where X = '
n
, we obtain a basic necessary and sucient
condition for unconstrained optimality of x
:
0 f(x
).
This optimality condition is also evident from the subgradient inequality
(1.55).
We nally extend the optimality conditions of Props. 1.6.2 and 1.8.1
to the case where the cost function involves a (possibly nonconvex) smooth
component and a convex (possibly nondierentiable) component.
Proposition 1.8.2: Let x
)
is convex, and that f has the form
f(x) = f
1
(x) + f
2
(x),
where f
1
is convex and f
2
is smooth. Then
f2(x
) f1(x
) + T
X
(x
.
Proof: The proof is analogous to the one of Prop. 1.6.2. Let y be a
nonzero tangent of X at x
for all k,
k
0, x
k
x
,
and
x
k
x
|x
k
x
|
=
y
|y|
+
k
.
We write this equation as
x
k
x
=
|x
k
x
|
|y|
y
k
, (1.75)
170 Convex Analysis and Optimization Chap. 1
where
y
k
= y +|y|
k
.
By the convexity of f
1
, we have
f
1
(x
k
; x
k
x
) f
1
(x
k
; x
x
k
)
and
f
1
(x
k
) + f
1
(x
k
; x
x
k
) f
1
(x
),
and by adding these inequalities, we obtain
f
1
(x
k
) f
1
(x
) + f
1
(x
k
; x
k
x
), k.
Also, by the Mean Value Theorem, we have
f
2
(x
k
) = f
2
(x
) +f
2
( x
k
)
(x
k
x
), k,
where x
k
is a vector on the line segment joining x
k
and x
. By adding the
last two relations,
f(x
k
) f(x
) + f
1
(x
k
; x
k
x
) +f
2
( x
k
)
(x
k
x
), k,
so using Eq. (1.75), we obtain
f(x
k
) f(x
) +
|x
k
x
|
|y|
_
f
1
(x
k
; y
k
) +f
2
( x
k
)
y
k
_
, k. (1.76)
We now show by contradiction that f
1
(x
; y)+f
2
(x
y 0. Indeed,
if f
1
(x
; y) + f
2
(x
and y
k
y, it follows,
using also Prop. 1.7.2, that for all suciently large k,
f
1
(x
k
; y
k
) +f2( x
k
)
y
k
< 0
and [from Eq. (1.76)] f(x
k
) < f(x
.
We have thus shown that f
1
(x
; y)+f
2
(x
y 0 for all y T
X
(x
),
or equivalently, by Prop. 1.7.3(b),
max
df
1
(x
)
d
y +f
2
(x
y 0,
or equivalently [since T
X
(x
) is a cone]
min
y1
yT
X
(x
)
max
df
1
(x
)
_
d +f
2
(x
)
_
y = 0.
Sec. 1.9 Notes and Sources 171
We can now apply the Saddle Point Theorem (Prop. 1.3.8 the convex-
ity/concavity and compactness assumptions of that proposition are satis-
ed) to assert that there exists a d f
1
(x
) such that
min
y1
yT
X
(x
)
_
d +f2(x
)
_
y = 0.
This implies that
_
d +f
2
(x
)
_
T
X
(x
,
which in turn is equivalent to f
2
(x
) f
1
(x
) + T
X
(x
. Q.E.D.
Note that in the special case where f
1
(x) 0, we obtain Prop. 1.6.2.
The convexity assumption on T
X
(x
is a local minimum of
f over X, there exists a subgradient d f
1
(x
) such that
_
d +f2(x
)
_
(x x
) 0, x X.
This is because for a convex set X, we have z T
X
(x
if and only if
z
(x x
) 0 for all x X.
E X E R C I S E S
1.8.1
Consider the problem of minimizing a convex function f : !
n
! over the
polyhedral set
X = |x [ a
j
x bj, j = 1, . . . , r.
Show that x
1
, . . . ,
r
such that
(i)
j
0 for all j, and
j
= 0 for all j such that a
j
x
< bj .
(ii) 0 f(x
) +
r
j=1
j
aj .
Hint: Characterize the cone TX(x