Review8 Merged

EE364 Review Session 8
Outline:
• Approximate TV de-noising
• Inertia of the KKT matrix
• Inequality constrained problems

Approximate TV de-noising
• TV denoising, a bicriterion optimization problem
cor
Pn−1
minimize kx − x k2 + µ i=1 |xi+1 − xi|
• xcor ∈ Rn is the corrupted signal and x ∈ Rn is the de-noise

be computed
• Problem can be formulated as an SOCP or a QP
• Approximate TV denoising
minimize kx − xcork22 + µφatv,
where
X p
n−1
φatv(x) = 2 + (xi+1 − xi)2 −
i=1

• Objective (ψ(x)) is twice differentiable
• Can solve this problem using Newton’s method
• Gradient and Hessian are
∇ψ(x) = 2(x − xcor) + µ∇φatv(x), ∇2ψ(x) = 2I + µ
• Use chain rule to compute ∇φatv(x) and ∇2φatv(x)
• Let f : R → R denote the function

p
f (u) = 2 + u2 −
• Its first and second derivatives are
f 0(u) = u(2 + u2)−1/2, f 00(u) = 2(2 + u2)−

• Define F : R(n−1) → R as
n−1
X
F (u1, . . . , un−1) = f (ui)
i=1
• Its gradient and Hessian are
∇F (u) = (f 0(u1), . . . , f 0(un−1)), ∇2F (u) = diag(f 00(u
• We have φatv(x) = F (Dx), where

 
−1 1
 −1 1 
D=
 ... ...  ∈ R(n−1)×n

−1 1

• Using the chain rule
∇φatv (x) = D T ∇F (Dx), ∇2φatv(x) = D T ∇2F (
• Therefore
∇ψ(x) = 2(x − xcor) + µD T ∇F (Dx)

∇2ψ(x) = 2I + µD T ∇2F (Dx)D.
• Hessian is tridiagonal
• We can compute the Newton step in O(n) operations.

Matlab code for Newton method:
% Newton method for approximate total variation de

D = spdiags([-1*ones(n,1) ones(n,1)], 0:1, n-1, n)
% Newton’s method
ALPHA = 0.01;
BETA = 0.5;
MAXITERS = 100;
NTTOL = 1e-10;
x = zeros(n,1);
newt_dec = [];
for iter = 1:MAXITERS

d = (D*x);
val = (x-xcor)’*(x-xcor) + ...
MU*sum(sqrt(EPSILON^2+d.^2)-EPSILON*ones(n-1,
grad = 2*(x - xcor) + ...
MU*D’*(d./sqrt(EPSILON^2+d.^2));

hess = 2*speye(n) + ...
MU*D’*spdiags(EPSILON^2*(EPSILON^2+d.^2).^(-3
0,n-1,
v = -hess\grad;
lambdasqr = -grad’*v;
newt_dec = [newt_dec sqrt(lambdasqr)];
if (lambdasqr/2) < NTTOL, break; end;
t = 1;
while ((x+t*v-xcor)’*(x+t*v-xcor) + ...
MU*sum(sqrt(EPSILON^2+(D*(x+t*v)).^2)-...
EPSILON*ones(n-1,1)) > val - ALPHA*t*lam
t = BETA*t;
end;
x = x+t*v;
end;

Progress of Newton’s method
2
10
1
10
Newton decrement λ(x)
0
10
−1
10
−2
10
−3
10
−4
10
Sfrag replacements
−5
10
0 2 4 6 8 10 12

Approximate TV de-noising with = 0.001, µ = 50
3
x
xcor
−1
Sfrag replacements−2
−3
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5

Tikhonov regularized smoothing with µ = 250
3
xtikh
xcor
−1
Sfrag replacements
−2
−3
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5
t

Tikhonov regularized smoothing with µ = 20000
3
xtikh
xcor
−1
Sfrag replacements
−2
−3
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5
t

Inertia of KKT matrix
• KKT matrix for equality-constrained quadratic minimization

T

P A
M= ,
A 0
P ∈ Sn+, A ∈ Rp×n , rank A = p < n
• Inertia of a matrix: (p, z, n), where p is number of positive e

z is number of 0 eigenvalues, and n is number of negative e
• Condition for non-singularity of KKT matrix: P + AT A 0
• If M is non-singular, there exists nonsingular matrix R ∈ Rn

that
RT (P + AT A)R = I

• Let AR = U ΣV1T be the singular value decomposition of A
U ∈ Rp×p , Σ = diag(σ1, . . . , σp) ∈ Rp×p and V1 ∈ Rn×p
n×(n−p)

• Let V2 ∈ R be
such that V = V1 V2 is orthogo
define S = Σ 0 ∈ Rp×n
• Since AR = U SV T ,
V T RT (P + AT A)RV = V T RT P RV + S T S = I
• Therefore V T RT P RV = I − S T S is a diagonal matrix Λ
Λ = V T RT P RV = diag(1 − σ12, . . . , 1 − σp2, 1, . . .
• Congruence transformations preserve inertia

T T T

V R 0 P A RV 0 Λ S
=
0 UT A 0 0 U S

• Applying a permutation to matrix on the right gives a block
matrix with n diagonal blocks

λ i σi
, i = 1, . . . , p, λi = 1, i = p + 1, .
σi 0
• The eigenvalues of the 2 × 2-blocks are

p
λi ± λ2i + 4σi2
,
2
i.e., one eigenvalue is positive and one is negative
• So p + n − p = n positive eigenvalues, and p negative eigen

Inequality constrained problems
• Problems with inequality constraints:
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
Ax = b
• Exact reformulation with indicator function:

Pm
minimize f0(x) + i=1 I−(fi(x))
subject to Ax = b
where I−(u) = 0 if u ≤ 0, I−(u) = ∞ otherwise
• Approximation via barrier function:

Pm
minimize f0(x) + (1/t) i=1 h(fi (x))
subject to Ax = b

• h convex (twice differentiable) increasing function with dom
• Approximation improves as t → ∞
Pm
• Logarithmic barrier function: φ(x) = − i=1 log(−fi(x)), w
dom φ = {x | f1(x) < 0, . . . , fm(x) < 0}
• Approximation via log-barrier:
minimize tf0(x) + φ(x)

subject to Ax = b
is an equality constrained problem

• Example:
minimize x2 + 1
subject to 2 ≤ x ≤ 4,
• Feasible set is [2, 4], and optimal point x? = 2
b = − log(x − 2) − log(4 − x)
• Log-barrier function I(x)
60
50
40
30
20
PSfrag replacements
10
0
1 2 3 4 5
x
b for t = 10−1, 10−0.8, 10−0.6, . . . , 10
Figure 1: f0 + (1/t)I,

• Newton step for the approximate problem:
2 2 T

t∇ f0(x) + ∇ φ(x) A ∆xnt t∇f0(x) +
=−
A 0 νnt 0
• Gradient and Hessian of the logarithmic barrier function φ a
m
X 1
∇φ(x) = ∇fi(x),
i=1
−f i (x)
m
X X 1 m
1
∇2φ(x) = 2
T
∇fi(x)∇fi(x) + ∇
f
i=1 i
(x) i=1
−f i (x)

EE364 Review
Administrative info:
• TAs: Alessandro Magnani, Argyris Zymnis and Joelle Skaf
• office hours: Tue 5:30 - 7:30pm, Wed 4:00 - 8:00pm, Packard 106
• review session: example problems and hw hints :)
• homeworks due Thursdays by 5pm
• hw file-cabinet, Packard 2nd floor near lunch area
• newsgroup: su.class.ee364
1
Important sets
• subspace
• affine set
• convex set
• cones
examples: line (say passing through origin), hyperplanes, halfspaces,

polyhedra (bounded and unbounded), norm balls, etc.
EE364 Review Session 1 2

Combinations and hulls
y = θ1x1 + · · · + θk xk is a
• linear combination of x1, . . . , xk

P
• affine combination if i θi = 1
P
• convex combination if i θi = 1, θi ≥ 0
• conic combination if θi ≥ 0
(linear, affine, . . . ) hull of S = {x1, . . . , xk } is a set of all

(linear, affine, . . . ) combinations from S
linear hull: span(S)

affine hull: Aff (S)
convex hull: conv(S)
conic hull: Cone(S)

example: S = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}
what is linear hull? affine hull? convex hull? conic hull?

Important rules
• intersection
   
subspace subspace
 affine  \  affine 
Sα is 
 convex
 for α ∈ A =⇒
 Sα is 
 convex


α∈A
convex cone convex cone
example: polyhedron is intersection of a finite number of halfspaces

and hyperplanes.
• affine functions and convexity
• epigraph: connection between convex sets and convex functions (soon)

Dual cones
if K is a cone, dual cone is defined as
K ? = { y | xT y ≥ 0 for all x ∈ K }
PSfrag replacements K?
90◦
90◦

Exercise 2.10: Solution set of a quadratic inequality
Let C ⊆ Rn be the solution set of a quadratic inequality,
C = {x ∈ Rn | xT Ax + bT x + c ≤ 0},
with A ∈ Sn, b ∈ Rn, and c ∈ R.
1. Show that C is convex if A 0.
2. Show that the intersection of C and the hyperplane defined by

g T x + h = 0 (where g 6= 0) is convex if A + λgg T 0 for some λ ∈ R.

Exercise 2.13: Conic hull of outer products
Consider the set of rank-k outer products, defined as

{XX T | X ∈ Rn×k , rank X = k}. Describe its conic hull in simple
terms.

EE364 Review
• Convex sets and convex functions
• Examples from chapter 3
• Installing and using CVX
1
Operations that preserve convexity
Convex sets Convex functions

Intersection Nonnegative weighted sum
Affine transformation Composition with an affine function
Perspective transformation Perspective transformation
Pointwise maximum and supremum
Minimization
• Convex sets and convex functions are related via the epigraph.
• Composition rules are extremely important.

Simple composition rules
Let h : R → R and g : Rn → R. Let f (x) = h(g(x)). Then:
• f is convex if h is convex and nondecreasing, and g is convex
• f is convex if h is convex and nonincreasing, and g is concave
• f is concave if h is concave and nondecreasing, and g is concave
• f is concave if h is concave and nonincreasing, and g is convex

Ex. 3.6 Functions and epigraphs. When is the epigraph of a function a
halfspace? When is the epigraph of a function a convex cone? When is the
epigraph of a function a polyhedron?
Solution:
If the function is affine, positively homogeneous (f (αx) = αf (x) for

α ≥ 0), and piecewise-affine, respectively.
Ex. 3.19 Nonnegative weighted sums and integrals.
Pr
1. Show that f (x) = i=1 αix[i] is a convex function of x, where
α1 ≥ α2 ≥ · · · ≥ αr ≥ 0, and x[i] denotes the ith largest component of
Pk
x. (You can use the fact that f (x) = i=1 x[i] is convex on Rn.)
2. Let T (x, ω) denote the trigonometric polynomial
T (x, ω) = x1 + x2 cos ω + x3 cos 2ω + · · · + xn cos(n − 1)ω.

Show that the function
Z 2π
f (x) = − log T (x, ω) dω
0
is convex on {x ∈ Rn | T (x, ω) > 0, 0 ≤ ω ≤ 2π}.
Solution:
1. We can express f as
f (x) = αr (x[1] + x[2] + · · · + x[r]) + (αr−1 − αr )(x[1] + x[2] + · · · + x[r−1])

+(αr−2 − αr−1)(x[1] + x[2] + · · · + x[r−2]) + · · · + (α1 − α2)x[1],
which is a nonnegative sum of the convex functions
x[1], x[1]+x[2], x[1]+x[2]+x[3], ..., x[1]+x[2]+· · ·+x[r].

2. The function
g(x, ω) = − log(x1 + x2 cos ω + x3 cos 2ω + · · · + +xn cos(n − 1)ω)
is convex in x for fixed ω. Therefore

Z 2π
f (x) = g(x, ω)dω
0
is convex in x.

Ex. 3.22 Composition rules. Show that the following functions are convex.
aT
1. f (x) = − log(− log( m i x+bi )) on
P
i=1 e
Pm aT x+b
domPf = {x | i=1 e
i i < 1}. You can use the fact that
n
log( i=1 eyi ) is convex.
√
2. f (x, u, v) = − uv − xT x on
dom f = {(x, u, v) | uv > xT x, u, v > 0}. Use the fact that xT x/u is
√
convex in (x, u) for u > 0, and that − x1x2 is convex on R2++.
3. f (x, u, v) = − log(uv − xT x) on
dom f = {(x, u, v) | uv > xT x, u, v > 0}.

Solution:
aT
1. g(x) = log( m i x+bi ) is convex (composition of the log-sum-exp
P
i=1 e
function and an affine mapping), so −g is concave. The function
h(y) = − log y is convex and decreasing. Therefore f (x) = h(−g(x)) is
convex.
p
2. We can express f as f (x, u, v) = − u(v − xT x/u). The function
√
h(x1, x2) = − x1x2 is convex on R2++, and decreasing in each
argument. The functions g1(u, v, x) = u and g2(u, v, x) = v − xT x/u
are concave. Therefore f (u, v, x) = h(g(u, v, x)) is convex.
3. We can express f as
f (x, u, v) = − log u − log(v − xT x/u).
The first term is convex. The function v − xT x/u is concave because v

is linear and xT x/u is convex on {(x, u) | u > 0}. Therefore the second
term in f is convex: it is the composition of a convex decreasing
function − log t and a concave function.

is convex on dom f = Sn++.

Ex. 3.18a Show that f (X) = tr X −1
Solution:
Define g(t) = f (Z + tV ), where Z 0 and V ∈ Sn.
g(t) = tr((Z + tV )−1)

= tr Z (I + tZ
−1 −1/2
VZ )
−1/2 −1
−1 T

= tr Z Q(I + tΛ) Q
−1
T −1

= tr Q Z Q(I + tΛ) −1
n
X
= (QT Z −1Q)ii(1 + tλi)−1,
i=1
where we used the eigenvalue decomposition Z −1/2V Z −1/2 = QΛQT . In

the last equality we express g as a positive weighted sum of convex
functions 1/(1 + tλi), hence it is convex.

CVX and disciplined convex programming
• CVX is a Matlab-based modeling system for convex optimization
• Can solve any problem that obeys the disciplined convex programming
ruleset
• Converts a problem to standard form and solves it using the solver

SeDuMi
You can download CVX from:
http://www.stanford.edu/~boyd/cvx/

Using CVX
Typical CVX script:
cvx_begin
variable [name]([size1],[size2]) (optional)[type]
(optional)minimize(convex scalar function of the variables)
constraints
cvx_end

Example:
minimize kAx − bk2,
where x ∈ Rn, b ∈ Rm.
CVX source code:
cvx_begin
variable x(n)
minimize(norm(A*x-b))
cvx_end
Note: This is for demonstration purposes only. It is perhaps easier to solve

this least-squares problem using x=A\b.

Example:
minimize 1T x
subject to Ax = b
x ≥ 0,
where x ∈ Rn, b ∈ Rm.
CVX source code:
cvx_begin
variable x(n)
minimize(ones(1,n)*x)
subject to
A*x == b
x >= 0
cvx_end

Example:
minimize maxk=1,...,m max(aTk x, 1/aTk x)
subject to 0 ≤ x ≤ 1,
where x ∈ Rn.
CVX source code:
cvx_begin
variable x(n)
minimize(max(max([A*x inv_pos(A*x)]’)))
subject to
x >= 0
x <= 1
cvx_end

EE364: Convex Optimization
Section 3
April 25, 2005
EE364 §3
Outline
• Generalized eigenvalues
• Hyperbolic constraints
• Homework hints
• Conjugate function example
• Proof of Hölder’s inequality
EE364 §3 1
Generalized eigenvalues
the maximum generalized eigenvalue of a pair of symmetric matrices (X, Y ),

with Y 0 is
uT Xu
λmax(X, Y ) = sup T , dom f = Sn × Sn++
u6=0 u Y u
for each u 6= 0, the function

uT Xu
uT Y u
is linear-fractional in (X, Y ), hence a quasiconvex function of (X, Y ).
the function λmax(X, Y ) is quasiconvex, since it is the supremum of a

family of quasiconvex functions
EE364 §3 2
Hyperbolic constraints as SOC constraints.
Problem 4.26
xT x ≤ yz, y ≥ 0, z≥0
if and only if

2x
≤ y + z,

y−z y ≥ 0, z≥0
2
with x ∈ Rn, y, z ∈ R
EE364 §3 3
Proof

2x
y − z ≤ y + z, y ≥ 0, z ≥ 0 ⇐⇒

2
2
2x ≤ (y + z)2,

y−z y ≥ 0, z ≥ 0 ⇐⇒
2
4xT x + (y − z)2 ≤ y 2 + z 2 + 2yz, y ≥ 0, z ≥ 0 ⇐⇒

xT x ≤ yz, y ≥ 0, z≥0
EE364 §3 4
Maximizing harmonic mean
Pm T
−1
maximize i=1 1/(ai x + bi ) ,
with domain {x | Ax b}, where aTi is the ith row of A
• the function is log-concave
• can be cast as an SOCP
EE364 §3 5
the problem is equivalent to
minimize 1T t
subject to ti(aTi x + bi) ≥ 1, i = 1, . . . , m
t0
writing the hyperbolic constraints as SOC constraints yields an SOCP
T
minimize t
1
2
≤ a T x + b i + ti ,
subject to
T i i = 1, . . . , m
a i x + b i − t i 2
ti ≥ 0, aTi x + bi ≥ 0, i = 1, . . . , m
EE364 §3 6
Maximizing geometric mean
Qm T
1/m
maximize i=1 (ai x − bi ) ,
with domain {x | Ax b}, where aTi is the ith row of A
• the function is concave
• can be cast as an SOCP
EE364 §3 7
consider m = 4 as an example
the problem is equivalent to
maximize y1y2y3y4
subject to y = Ax − b
y 0,
and
maximize t1t2
y1y2 ≥ t21
y3y4 ≥ t22
y 0, t1 ≥ 0, t2 ≥ 0,
and also
maximize t
y1y2 ≥ t21
y3y4 ≥ t22
t1 t2 ≥ t 2
y 0, t1, t2, t ≥ 0
EE364 §3 8
expressing the three hyperbolic constraints
y1y2 ≥ t21, y3y4 ≥ t22, t 1 t2 ≥ t 2
as SOC constraints yields an SOCP:
minimize −t

2t1
≤ y1 + y2, y1 ≥ 0, y2 ≥ 0
subject to

y 1 − y 2

2
2t2
y3 − y4 ≤ y3 + y4, y3 ≥ 0, y4 ≥ 0

2
2t
t1 − t2 ≤ t1 + t2, t1 ≥ 0, t2 ≥ 0

2
y = Ax − b
EE364 §3 9
General case
• we can assume without loss of generality m = 2K for some integer K
• we can resurisvely expand the objective funtion as for m = 4
EE364 §3 10
Problem 3.49 log-concavity
(a) linear − log-sum-exp
(c) to show Qn
xi
i=1
f (x) = Pn , domf = Rn++
x
i=1 i
is log-concave
we need to show
n
X n
X
g(x) = log f (x) = log xi − log xi
i=1 i=1
is concave on Rn++
EE364 §3 11
Problem 3.49 (c) continued...
we will show ∇2g(x) 0.
partial derivatives:
∂g(x) 1 1
= − Pn ,
∂xi xi x
i=1 i
and
∂ 2g(x) 1 1
= − 2 + Pn 2,
∂x2i xi (
i=1 xi )
∂ 2g(x) 1
= Pn 2, i 6= j
∂xi∂xj ( i=1 xi )
therefore
T

11 1 1
∇2g(x) = Pn 2 − diag , . . . ,
( i=1 xi) x21 x2n
EE364 §3 12
Problem 3.49 (c) continued...
to show uT ∇2g(x)u ≤ 0, for all u ∈ Rn, i.e.,
!
T

T 1 1 11
u diag ,..., 2 − Pn 2 u ≥ 0,
x21 xn ( i=1 xi)
same as n Pn 2
X u2 i ( i=1 ui )
≥ 2
x2i
Pn
i=1 ( i=1 xi)
using Cauchy-Schwartz inequality
n
!2 n
! n
!
X X u2i X
ui ≤ 2 x2i
i=1
x
i=1 i i=1
n
! n
!2
X u2i X
≤ xi
i=1
x2i i=1
EE364 §3 13
Problem 3.49 (d)
• restrict the function to a line X = Z + tV
• use part c of this problem
EE364 §3 14
Problem 4.8
(a) Minimizing a linear function over an affine set
minimize cT x
subject to Ax = b
• if b 6∈ R(A), infeasible, optimal value is +∞

• we can write
c = AT λ + ĉ
where ĉ ∈ N (A)
• if ĉ = 0
cT x = λT Ax = λT b
optimal value is λT b, all feasible solutions are optimal
• if ĉ 6= 0 and x0 is feasible
x = x0 − tĉ is feasible too
cT x = λT Ax + ĉT (x0 − tĉ) = λT b + ĉT x0 − kĉk2t

the problem is unbounded (p? = −∞)
EE364 §3 15
Problem 4.8 continued...
(c) Minimizing a linear function over a rectangle
minimize cT x
subject to l x u,
where l and u satisfy l u

• the objective and the constraints are separable
• we can solve the problem by minimizing over each component of x
independently
EE364 §3 16
the optimal x?i minimizes cixi subject to li ≤ xi ≤ ui
• if ci > 0, then x?i = li
• if ci < 0, then x?i = ui
• if ci = 0, then any xi in [li, ui] is optimal
the optimal value is
p? = l T c + + u T c − ,
where c+
i = max{c i , 0} and c −
i = max{−ci , 0}
EE364 §3 17
Conjugate function
find the conjugate of f (x) = |x|p/p, p > 1, x ∈ R.
f ∗(y) = sup(xy − f (x))

x
let y > 0, taking the derivative and setting it to zero
y − xp−1 = 0
and
p/(p−1)
y
f ∗(y) = y p/(p−1) −
p
define q such that 1/p + 1/q = 1, then (with similar analysis for y ≤ 0),
f ∗(y) = |y|q /q
EE364 §3 18
Proof of Hölder’s inequality
Hölder’s inequality: for p > 1, 1/p + 1/q = 1 and x, y ∈ Rn,
xT y ≤ kxkpkykq
Young’s inequality: For f ∗, the conjugate function of f ,
xT y ≤ f (x) + f ∗(y)
for p > 1, the conjugate function of
f (x) = kxkpp/p,
is
f ∗(y) = kykqq /q,
where 1/p + 1/q = 1
EE364 §3 19
Proof continued ...
thus
T
kxkpp kykqq
x y≤ +
p q
apply to x/kxkp, y/kykq ,
T
p q
x y 1 x
1 y

=1+1=1
≤ +
kxkpkykq p kxkp p q kykq q p q
therefore
xT y ≤ kxkpkykq
EE364 §3 20
EE364 Review
Outline:
• convex optimization examples
• solving quasiconvex problems by bisection
• exercise 4.47
1
Convex optimization problems
we have seen
• linear programming (LP)
• quadratic programming (QP)
• quadratically constrained QP (QCQP)
• second-order cone programming (SOCP)
• geometric programming (GP)
• semidefinite programming (SDP)

Force/moment generation with thrusters
• rigid body with center of mass at origin p = 0 ∈ R2
• n forces with magnitude ui, acting at pi = (pix, piy ), in direction θi
ui
θi
(pix, piy )
PSfrag replacements

Pn
• resulting horizontal force: Fx = i=1 ui cos θi
Pn
• resulting vertical force: Fy = i=1 ui sin θi
Pn
• resulting torque: T = i=1 (piy ui cos θi − pixui sin θi)
• force limits: 0 ≤ ui ≤ 1 (thrusters)
• fuel usage: u1 + · · · + un
problem: find thruster forces ui that yield given desired forces and torques
Fxdes, Fydes, T des, and minimize fuel usage (if feasible)

can be expressed as LP:
minimize 1T u
subject to F u = f des
0 ≤ ui ≤ 1, i = 1, . . . , n
where
 
cos θ1 ··· cos θn
F = sin θ1 ··· sin θn ,
p1y cos θ1 − p1x sin θ1 · · · pny cos θn − pnx sin θn
f des = (Fxdes, Fydes, T des), 1 = ( 1, 1, · · · 1 )

clear all; close all;
% input data
% ----------
% thrusters x-coordinates
px = [-3 -2 -1 1.5 2 ];
% thrusters y-coordinates
py = [ 0 1 -2 1 -2.5];
% angles
thetas = [-85 30 -150 0 85]*pi/180;
F = [ cos(thetas);
sin(thetas);
py.*cos(thetas) - px.*sin(thetas)];
% different problem specified by each column of f_des
f_des = [ 0 0 1 -.5 0 0; ...
.5 -1 0 0 0 0; ...
0 0 0 0 2 -2];

% problem solution
thrus = [];
for i=1:6
cvx_begin
variable u(5)
minimize ( sum ( u ) )
F*u == f_des(:,i)
u >= 0
u <= 1
cvx_end
thrus = [thrus u];
end

for Fxdes = 0, Fydes = 0.5, T des = 0:
−1
−2
−3
−4
−4 −3 −2 −1 0 1 2 3 4

for Fxdes = 0, Fydes = −1, T des = 0:
−1
−2
−3
−4
−4 −3 −2 −1 0 1 2 3 4

for Fxdes = 1, Fydes = 0, T des = 0:
−1
−2
−3
−4
−4 −3 −2 −1 0 1 2 3 4

for Fxdes = −0.5, Fydes = 0, T des = 0:
−1
−2
−3
−4
−4 −3 −2 −1 0 1 2 3 4

for Fxdes = 0, Fydes = 0, T des = 2:
−1
−2
−3
−4
−4 −3 −2 −1 0 1 2 3 4

for Fxdes = 0, Fydes = 0, T des = −2:
−1
−2
−3
−4
−4 −3 −2 −1 0 1 2 3 4

Extensions of thruster problem
• opposing thruster pairs:

Pn
minimize kuk1 = i=1 |ui|
|ui| ≤ 1, i = 1, . . . , n
can express as LP
• more accurate fuel use model:

Pn
minimize i=1 φi(ui)
0 ≤ ui ≤ 1, i = 1, . . . , n
φi are piecewise linear increasing convex functions

can express as LP

• minimize maximum force/moment error:
minimize kF u − f desk∞
subject to 0 ≤ ui ≤ 1, i = 1, . . . , n
can express as LP
• minimize number of thrusters used:
minimize # thrusters on
0 ≤ ui ≤ 1, i = 1, . . . , n
can’t express as LP
(but we could check feasibility of each of the 2n subsets of thrusters)

Optimizing structural dynamics
linear elastic structure
PSfrag replacements
f1
f2
f3
f4
dynamics (ignoring damping): M d¨ + Kd = 0
• d(t) ∈ Rk : vector of displacements

• M = M T 0 is mass matrix; K = K T 0 is stiffness matrix

Fundamental frequency
• solutions have form

k
X
di(t) = αij cos(ωj t − φj )
j=1
where 0 ≤ ω1 ≤ ω2 ≤ · · · ≤ ωk are the modal frequencies, i.e., positive

solutions of det(ω 2M − K) = 0
• fundamental frequency:
1/2 1/2
ω1 = λmin (K, M ) = λmin(M −1/2KM −1/2 )
– structure behaves like mass at frequencies below ω1

– gives stiffness measure (the larger ω1, the stiffer the structure)
• ω1 ≥ Ω ⇐⇒ Ω2M − K 0 so ω1 is quasiconcave function of M , K

• design variables: xi, cross-sectional area of structural member i
(geometry of structure fixed)
P P
• M (x) = M0 + i xi M i , K(x) = K0 + i xi K i
P
• structure weight w = w0 + i xi w i
• problem: minimize weight s.t. ω1 ≥ Ω, limits on cross-sectional areas
as SDP: P
minimize w0 + i xiwi
subject to Ω2M (x) − K(x) 0
li ≤ x i ≤ u i

Solving quasiconvex problems via bisection
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
Ax = b
fi convex, f0 quasiconvex
idea: express sublevel set f0(x) ≤ t as sublevel set of convex function:
f0(x) ≤ t ⇔ φt(x) ≤ 0
where φt : Rn → R is convex in x for each t
now solve quasiconvex problem by bisection on t, solving convex feasibility

problem
φt(x) ≤ 0, fi(x) ≤ 0, i = 1, . . . , m, Ax = b
(with variable x) at each iteration

bisection method for quasiconvex problem:
given l < p∗; feasible x; > 0

u := f0(x)
repeat
t := (u + l)/2
solve convex feasibility problem
φt(x) ≤ 0, fi(x) ≤ 0, Ax = b
if feasible,
u := t
x := any solution of feas. problem
else l := t
until u − l ≤
• reduces quasiconvex problem to sequence of convex feasibility problems

4.47 Maximum determinant positive semidefinite matrix completion.
We consider a matrix A ∈ Sn, with some entries specified, and the others
not specified. Say
 
3.0 0.5 0.25
 0.5 2.0 0.75 
A=
 .
0.75 1.0 
0.25 5.0
The positive semidefinite matrix completion problem is to determine values

of the unspecified entries of the matrix so that A 0 (or to determine
that such a completion does not exist).

(a) Why can we assume (w.l.o.g) that Aii are specified?
(b) Formulate this problem as an SDP feasibility problem?

(c) Assume that A has at least one completion that is positive definite, and
the diagonal entries of A are specified (i.e., fixed). The positive definite
completion with largest determinant is called the maximum determinant
completion. Show that the maximum determinant completion is unique.
Show that if A? is the maximum determinant completion, then (A?)−1
has zeros in all the entries of the original matrix that were not specified.
(d) Suppose A is specified on its tridiagonal part, i.e., we are given

A11, . . . , Ann and A12, . . . , An−1,n . Show that if there exists a positive
definite completion of A, then there is a positive definite completion
whose inverse is tridiagonal.

Matlab code:
n = 4;
% create and solve the problem

cvx_begin sdp
% A is a PSD symmetric matrix (n-by-n)
variable A(n,n) symmetric;
maximize( det_rootn( A ) )
A >= 0;
% constrained matrix entries.
A(1,1) == 3;
A(2,2) == 2;
A(3,3) == 1;
A(4,4) == 5;
A(1,2) == .5;
A(1,4) == .25;
A(2,3) == .75;
cvx_end

Matrix A with maximum determinant (20.578) is:
A =
3.0000 0.5000 0.1874 0.2500
0.5000 2.0000 0.7500 0.0417
0.1874 0.7500 1.0000 0.0156
0.2500 0.0417 0.0156 5.0000
Its eigenvalues are:
eigs =
0.5964
2.0908
3.2773
5.0355
The inverse of matrix A is:
0.3492 -0.0870 0.0000 -0.0167

-0.0870 0.7174 -0.5217 -0.0000
0.0000 -0.5217 1.3913 0.0000
-0.0167 -0.0000 0.0000 0.2008

EE364 Review
EE364: Review Session 5
Outline:
• Duality examples
• Strong duality
• Farkas’ lemma
• Mixed strategies for matrix games
• Homework hints
1
Duality
• Primal problem
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
Pm Pp
• Lagrangian L(x, λ, ν) = f0(x) + i=1 λi fi (x) + i=1 νi hi (x)
• Dual g(λ, ν) = inf x L(x, λ, ν)
• For λ 0, g(λ, ν) ≤ p∗
• Dual problem
maximize g(λ, ν)
subject to λ 0
EE364: Review Session 5 2

LP duality example
Example: Show that the dual of the following LP
minimize 3x1 + 2x2 + x3

subject to x1 + x2 ≥ 4
x2 + 2x3 ≥ 2
x1, x2, x3 ≥ 0,
can be expressed as
maximize 4y1 + 2y2
subject to y1 ≤ 3
y1 + y2 ≤ 2
2y2 ≤ 1
y1, y2 ≥ 0.

Dual of an SOCP
Example: Find the dual of the SOCP
minimize f T x
subject to kAx + bk2 ≤ cT x + d,
with variables x ∈ Rn. The problem data are f, c ∈ Rn, A ∈ Rm×n,

b ∈ Rm and d ∈ R.
Solution: We can express the problem as
minimize f T x
subject to kyk2 ≤ t
Ax + b = y
cT x + d = t.

Let ν ∈ R, u ∈ Rm and µ ∈ R be the Lagrange multipliers for the above
problem. The Lagrangian is
L(x, y, ν, u, µ) =f T x + ν(kyk2 − t) + uT (y − Ax − b) + µ(t − cT x − d)

=(f − AT u − µc)T x + (νkyk2 + uT y)+
+ t(−ν + µ) − uT b − νd.
Therefore the dual function is
g(ν, u, µ) = inf (f − AT u − µc)T x + inf (νkyk2 + uT y)+

x y
+ inf [t(−ν + µ)] − uT b − νd.

t
Using Cauchy-Schwarz, we have that

0, if kuk2 ≤ ν
inf (νkyk2 + uT y) =
y −∞, otherwise.

Thus
−uT b − νd, if AT u + µc = f, kuk2 ≤ ν, µ = ν

g(ν, u, µ) =
−∞, otherwise.
The dual of the original SOCP can be written as
maximize −uT b − νd
subject to kuk2 ≤ ν
AT u + νc = f.

Strong duality
• Slater’s condition: strong duality holds for a convex problem if it is

strictly feasible, i.e.,
∃x ∈ int D : fi(x) < 0, i = 1, . . . , m, Ax = b
• Sharper version: Affine constraints need not be strictly feasible, only

feasible
• Example: convex problem for which strong duality fails
minimize e−x
subject to x2/y ≤ 0,
with domain D = {(x, y) | y > 0}
• Optimal value p? = 1

• Lagrangian L(x, y, λ) = e−x + λx2/y, dual function is

0 λ≥0
g(λ) = inf (e−x + λx2/y) =
y>0,x −∞ λ < 0,
• Dual problem:
maximize 0
subject to λ ≥ 0,
• Dual optimal value d? = 0
• Global sensitivity inequality p?(u) ≥ p?(0) − λ?u does not hold:

– p?(u) = 1 if u = 0
– p?(u) = 0 if u > 0
– p?(u) = ∞ if u < 0

Strong duality for LPs
• Strong duality holds if either primal or dual is feasible (Slater’s

condition)
• No strong duality only if both primal and dual are infeasible: p? = +∞,
d? = −∞
• Example
minimize x

0 −1
subject to x
1 1
• Dual LP
maximize z1 − z2
subject to z2 + 1 = 0
z1 , z 2 ≥ 0

Farkas’ lemma
• Two sets of inequalities are called strong alternatives if exactly one of

the two is feasible
• Farkas’ lemma: The system of inequalities
Ax 0, cT x < 0, (1)
where A ∈ Rm×n and c ∈ Rn, and the system of inequalities
AT y + c = 0, y 0, (2)
are strong alternatives
• Can be proved using strong duality for LPs

Farkas’ lemma (contd.)
• Consider the LP
minimize cT x
subject to Ax 0.
• Optimum value 0 if 1 is not feasible and −∞ if 1 is feasible
• Dual of this LP
maximize 0
subject to AT y + c = 0
y 0.
• Dual has optimal value 0 if 2 is feasible and −∞ if 2 is not feasible
• Since the primal LP is feasible, strong duality holds
• Thus 1 and 2 are strong alternatives

Mixed strategies for matrix games
• Two players: player 1 and player 2
• Player 1 makes a choice k ∈ {1, . . . , m}. Player 2 makes a choice

l ∈ {1, . . . , n}
• Player 1 then makes a payment Pkl to player 2, where P ∈ Rm×n is the

payoff matrix
• The goal of player 1 is to minimize the payment to player 2, while the

goal of player 2 is to maximize it
• We assume that the players use randomized or mixed strategies
prob(k = i) = ui, i = 1, . . . , m, prob(l = i) = vi, i = 1, . . . , n

Game from player 1’s perspective:
• Assume that player 1’s strategy u is known to player 2
• This clearly gives an advantage to player 2
• Player 2 will choose his/her strategy v to maximize uT P v, which results

in the expected payoff
sup{uT P v | v 0, 1T v = 1} = max (P T u)i

i=1,...,n
• Best thing that player 1 can do is to choose u to minimize this

worst-case payoff
minimize maxi=1,...,n(P T u)i

(3)
subject to u 0, 1T u = 1
• Let the optimal value of this problem be p∗

Game from player 2’s perspective:
• Assume that player 2’s strategy v is known to player 1
• This clearly gives an advantage to player 1
• Player 1 will choose his/her strategy u to minimize uT P v, which results

in the expected payoff
inf{uT P v | u 0, 1T u = 1} = min (P v)i

i=1,...,m
• Best thing that player 2 can do is to choose u to maximize this

worst-case payoff
maximize mini=1,...,m(P v)i

(4)
subject to v 0, 1T v = 1
• Let the optimal value of this problem be q ∗

Analysis:
• Clearly knowing opponents strategy gives an advantage (or at least

cannot hurt), thus p∗ ≥ q ∗
• We will prove that p∗ = q ∗, using strong duality for LPs
• We can express problem 3 as the LP
minimize t
subject to u 0, 1T u = 1
P T u t1,
with extra variable t ∈ R
• Introduce multipliers λ ∈ Rn for P T u t1, µ ∈ Rm for u 0 and

ν ∈ R for 1T u = 1

• The Lagrangian is
L(x, t, λ, µ, ν) = t + λT (P T u − t1) − µT u + ν(1 − 1T u)

= ν + (1 − 1T λ)t + (P λ − ν1 − µ)T u
• Thus the dual function is
1T λ = 1,

ν P λ − ν1 = µ
g(λ, µ, ν) =
−∞ otherwise
• The dual problem is
maximize ν
subject to λ 0, 1T λ = 1, µ0
P λ − ν1 = µ

• By eliminating µ we can rewrite the dual as
maximize ν
subject to λ 0, 1T λ = 1
P λ ν1
• This problem is clearly equivalent to problem 4
• Since both LPs are feasible, we have strong duality
• The optimal values of 3 and 4 are the same

EE364 Review
EE364: Review Session 6
Outline:
• Variable bounds and dual feasibility
• SDP relaxation
• Monotone transformation of the objective
• Homework hints
1
Variable bounds and dual feasibility
in many problems the constraints include variable bounds, as in
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
li ≤ xi ≤ ui, i = 1, . . . , n
the Lagrangian is
m
X
L(x, λ, µ, ν) = f0(x) + λifi(x) + µT (x − u) + ν T (l − x)
i=1

for any x ∈ Rn and any λ, we can choose µ 0 and ν 0 so that x
minimizes L(x, λ, µ, ν)
we have
m
X
∇xL(x, λ, µ, ν) = ∇f0(x) + λi∇fi(x) + (µ − ν)
i=1
if x minimizes L, we have ∇xL = 0 and therefore
m
X
ν − µ = ∇f0(x) + λi∇fi(x)
i=1

" m
#+
X
ν = ∇f0(x) + λi∇fi(x)
i=1
m m
!
1 X X
= |∇f0(x) + λi∇fi(x)| + ∇f0(x) + λi∇fi(x)
2 i=1 i=1
and
" m
#−
X
µ = ∇f0(x) + λi∇fi(x)
i=1
m m
!
1 X X
= |∇f0(x) + λi∇fi(x)| − ∇f0(x) − λi∇fi(x)
2 i=1 i=1
where | · | is componentwise

• therefore, if λ 0 then (λ, µ, ν) is dual feasible
• we can obtain a lower bound for any λ 0

Example
with x = (l + u)/2 and λ = 0 we can find a dual feasible point and a lower
bound on f ?
we have
1
ν = (∇f0((l + u)/2) + |∇f0((l + u)/2)|)
2
1
µ = (−∇f0((l + u)/2) + |∇f0((l + u)/2)|)
2

and therefore the lower bound becomes
T
l+u 1 l+u l+u l+u
L(x, 0, µ, ν) = f0( )+ −∇f0( ) + |∇f0( )| ( − u)
2 2 2 2 2
T
1 l+u l+u l+u
+ ∇f0( ) + |∇f0( )| (l − )
2 2 2 2
l+u u−l T l+u
= f0 ( )−( ) |∇f0( )|
2 2 2

this bound can also be derived directly
since f0 is convex
u+l u+l T ? u+l

f ? ≥ f0 ( ) + ∇f0( ) (x − )
2 2 2
u+l u+l T u+l
≥ f0 ( ) + inf ∇f0( ) (x − )
2 lxu 2 2
but inf lxu ∇f0((u + l)/2)T (x − (l + u)/2) is obtained for
• xi = (ui − li)/2 if ∇f0((l + u)/2)i ≤ 0,
• xi = (li + ui)/2) if ∇f0((l + u)/2)i > 0
therefore,
inf lxu ∇f0((u + l)/2)T (x − (u + l)/2) = −|∇f0((u + l)/2)|T (u − l)/2
we get
f ? ≥ f0((u + l)/2) − |∇f0((u + l)/2)|T (u − l)/2

SDP relaxations of two-way partitioning problem
consider the problem
minimize xT W x
subject to x2i = 1, i = 1, . . . , n,
• xi = 1 if it belongs to the one partition
• xi = −1 if it belongs to the other partition
• Wij is the cost of having i and j in the same partition

the Lagrangian is
n
X
L(x, ν) = xT W x + νi(x2i − 1)
i=1
= xT (W + diag(ν))x − 1T ν
and therefore the dual problem is
maximize −1T ν
subject to W + diag(ν) 0
the optimal value of the dual is a lower bound on the optimal value of the
partitioning problem

Another bound
since
xT W x = tr(xT W x) = tr(W xxT )
and
(xxT )ii = x2i
we can write the original problem as
minimize tr(W X)
subject to X 0, rank X = 1
Xii = 1, i = 1, . . . , n,

the problem
minimize tr(W X)
subject to X 0, rank X = 1
Xii = 1, i = 1, . . . , n,
is not convex but we can write a relaxation by removing the rank constraint
minimize tr(W X)
subject to X 0,
Xii = 1, i = 1, . . . , n,
• this problem is convex (SDP) and gives a lower bound on the original
problem
• if the solution has rank 1 we solved the original problem

we now find the dual of the dual problem
minimize 1T ν
subject to W + diag(ν) 0
introducing a Lagrange multiplier X ∈ Sn for the matrix inequality
L(ν, X) = 1T ν − tr(X(W + diag(ν)))

n
X
= 1T ν − tr(XW ) − νiXii
i=1
n
X
= − tr(XW ) + νi(1 − Xii)
i=1
this is bounded below as a function of ν only if Xii = 1 for all i, so we

obtain the dual problem
maximize − tr(W X)
subject to X 0
Xii = 1, i = 1, . . . , n
this is the same as the relaxation problem

Monotone transformation of the objective
consider the convex optimization problem
minimize f0(x)
subject to fi(x) ≤ 0, i = 1, . . . , m
suppose φ : R → R is increasing and convex

then the problem
minimize f˜0(x) = φ(f0(x))

subject to fi(x) ≤ 0, i = 1, . . . , m
is convex and equivalent to it
• we are interested in the connection between the duals
• we consider φ(a) = exp a

suppose λ is feasible for the dual of the first problem and x̄ minimizes
m
X
f0(x) + λifi(x)
i=1
it can be shown that x̄ also minimizes

m
X
exp f0(x) + λ̃ifi(x)
i=1
for appropriate choice of λ̃

thus, λ̃ is dual feasible for the second problem

Pm
since x̄ minimizes f0(x) + i=1 λi fi (x) we have
m
X
∇f0(x̄) + λi∇fi(x̄) = 0
i=1
but
" m
# m
∂ X X
exp f0(x) + λ̃i∇fi(x̄) = exp f0(x̄)∇f0(x̄) + λ̃i∇fi(x̄)
∂x i=1 i=1
x=x̄
" m
#
X
= exp f0(x) ∇f0(x̄) + λ̃ie−f0(x̄)∇fi(x̄)
i=1
if we take λ̃i = exp f0(x̄)λi ≥ 0

" m
#
∂ X
exp f0(x) + λ̃i∇fi(x̄) =0
∂x i=1 x=x̄

if p? denote the optimal value of the first problem
the optimal value of the second is exp p?
we have bound
p? ≥ g(λ),
where g is the dual function of the first problem and
exp p? ≥ g̃(λ̃)
where g̃ is the dual function of the second problem or equivalently
p? ≥ log g̃(λ̃)

f0 (x̄)
Pm f0 (x̄)
we have g̃(λ̃) = e + i=1 e λifi(x̄) and therefore
m
!
X
log g̃(λ̃) = log ef0(x̄) + ef0(x̄)λifi(x̄)
i=1
m
!
X
= f0(x̄) + log 1 + λifi(x̄)
i=1
the bound we from the modified problem is always worse, i.e.,

log g̃(λ̃) ≤ g(λ) in fact
m
! m
X X
log g̃(λ̃) − g(λ) = log 1 + λifi(x̄) − λifi(x̄)
i=1 i=1
from the identity log(1 + y) − y ≤ 0 we conclude

Additional problem hint
square( square( x + y ) ) <= x - y
• the problem is that square() can only accept affine arguments,

because it is convex, but not increasing
• we can restrict square() to R+ so that it’s convex and increasing
• we use square_pos() instead:
square_pos( square( x + y ) ) <= x - y
• we can introduce additional variable
variable t
square( x+y ) <= t;
square( t ) <= x - y

EE364 Review
Outline:
• numerical linear algebra examples
• gradients and chain rule
• homework hints
1
Numerical linear algebra: factor and solve
factor-solve method for Ax = b
• computation cost f + s
• consider set of n linear equations in n variables, i.e., A is square
• use LU factorization, A = LU
• f = (2/3)n3 (Gaussian elimination)
• s = 2n2 (back and forward solve)
• for example, can compute n × n matrix inverse with cost

f + ns = (8/3)n3 (why?)
ans: solve AX = I system with factor-solve method

Numerical linear algebra examples
examples setup:
we give naive but correct algorithms (in Matlab notation)
• compute flop count for each algorithm using LU factor-solve

• if possible, give a more efficient method (and its flop count)
1. calculate cT A−1b where c ∈ Rn, A ∈ Rn×n, b ∈ Rn, and matrix A is

nonsingular
naive algorithm: val = c’*(inv(A)*b)
flop count:
ans: (8/3)n3 + 2n2 + 2n ≈ (8/3)n3
more efficient method:
ans: val = c’*(A\b) with flop count about (2/3)n3

2. calculate cT A−1B where c ∈ Rn, A ∈ Rn×n, B ∈ Rn×m, and matrix A
is nonsingular
naive algorithm: val = c’*(inv(A)*B)
flop count:
ans: (8/3)n3 + 2n2m + 2nm ≈ (8/3)n3 + 2n2m

ans: val = (A’\c)’*B with flop count about (2/3)n3

3. solve the set of equations

A 0 b
x= .
0 B c
where A ∈ Rn×n, B ∈ Rn×n, b ∈ Rn, c ∈ Rn, and matrices A and B

are nonsingular
naive algorithm:
x = [A, zeros(n,n); zeros(n,n), B] \ [b; c]
flop count:
ans: (2/3)(2n)3 = (16/3)n3

ans: [A\b; B\c] with flop count 2(2/3)n3 = (4/3)n3


A B b1
x= .
C I b2
where A ∈ Rn×n, B ∈ Rn×10n, C ∈ R10n×n, b1 ∈ Rn, b2 ∈ R10n, and I

is the 10n × 10n identity matrix; also assume that the whole matrix is
nonsingular
naive algorithm: x = [A, B; C eye(10*n)] \ [b1; b2]
flop count:
ans: (2/3)(11n)3 = (2662/3)n3
ans: we use elimination of variables to get equations
(A − BC)x1 = b1 − Bb2 and x2 = b2 − Cx1
flop count: forming A − BC costs 20n3, b1 − Bb2 is 20n2, backslash is

(2/3)n3, and computing x2 costs 20n2, overall about (62/3)n3.


I B b1
x= .
C I b2
where B ∈ Rm×n, C ∈ Rn×m, and n > m; also assume that the whole
matrix is nonsingular
naive algorithm: x = [eye(m), B; C, eye(n)] \ [b1; b2]
flop count:
ans: (2/3)(n + m)3
ans: we use elimination of variables to get equations
(I − BC)x1 = b1 − Bb2 and x2 = b2 − Cx1
flop count: forming I − BC costs 2nm2, b1 − Bb2 is 2nm, backslash is

(2/3)m3, and computing x2 costs 2nm, overall about 2nm2 + (2/3)m3.

Solving almost separable linear equations
Consider the following system of equations
Ax + By = c
Dx + Ey + F z = g
Hy + Jz = k
where A, J ∈ Rn×n, B, H ∈ Rn×m, D, F ∈ Rm×n, E ∈ Rm×m, c, k ∈ Rn,

g ∈ Rm and n > m
In other words, we need to solve the following system

    
A B 0 x c
 D E F  y  =  g 
0 H J z k

The naive way would be to do the following:
   −1  
x A B 0 c
 y = D E F   g 
z 0 H J k
but we can take advantage of the structure by first reordering the

equations and variables
    
A 0 B x c
 0 J H  z  =  k 
D F E y g
The system now looks like an ”arrow” system, which we can efficiently
solve by block elimination.

Since
A 0 x B c
+ y=
0 J z H k
then
x A c −1
A B−1
= − y
z J −1k J −1H
We know that
x
D F
+ Ey = g
z
then using the expression derived before

A c −1
A B −1
D F − z + Ey = g
J −1k J −1H
and therefore
(E − DA−1B − F J −1H)y = g − DA−1c − F J −1k

We can therefore solve the system of equations efficiently by taking
advantage of structure in the following way
• Form
M = A−1B, n = A−1c,
P = J −1H, q = J −1k.
• Compute r = g − Dn − F q.
• Compute S = E − DM − F P .
• Find
y = S −1r, x = n − M y, z = q − P y.

Derivative and gradients
When f is real-valued (i.e., f : Rn → R) the derivative Df (x) is a 1 × n

matrix, i.e., it is a row vector.
Its transpose is called the gradient of the function:
∇f (x) = Df (x)T ,
which is a (column) vector, i.e., in Rn.
Its components are the partial derivatives of f :
∂f (x)
∇f (x)i = , i = 1, . . . , n.
∂xi
The first-order approximation of f at a point x can be expressed as (the

affine function of z)
f (x) + ∇f (x)T (z − x).

• Consider f : Rn → R,
f (x) = (1/2)xT P x + q T x + r,
where P ∈ Sn, q ∈ Rn, and r ∈ R.

Its derivative at x is the row vector Df (x) = xT P + q T , and its
gradient is ∇f (x) = P x + q.
n Pm
• Consider g : R → R, g(x) = log i=1 exp(xi ). Its gradient is
 
exp x1
1 ..
∇g(x) = Pm  , (1)
i=1 exp xi exp xm

Matrix derivatives
Let f : Sn → R.
One (tedious) way to find the gradient of f is to introduce a basis for Sn,
find the gradient of the associated function, and finally translate the result
back to Sn.
Instead, we will directly find the first-order approximation of f at

X ∈ Sn++.
Let Z ∈ Sn++ be close to X, and let ∆X = Z − X (which is assumed to

be small).
We need to find the matrix D such that
f (Z) ≈ f (X) + tr(D(Z − X))

Let f (X) = log det X, dom f = Sn++.
Let Z ∈ Sn++ be close to X, and let ∆X = Z − X (which is assumed to

be small).
log det Z = log det(X + ∆X)

1/2 −1/2 −1/2 1/2
= log det X (I + X ∆XX )X
= log det X + log det(I + X −1/2∆XX −1/2)

Xn
= log det X + log(1 + λi),
i=1
where λi is the ith eigenvalue of X −1/2∆XX −1/2.
Now we use the fact that ∆X is small, which implies λi are small, so to
first order we have log(1 + λi) ≈ λi.

we get
n
X
log det Z ≈ log det X + λi
i=1
= log det X + tr(X −1/2∆XX −1/2)

= log det X + tr(X −1∆X)
−1

= log det X + tr X (Z − X) ,
The first-order approximation of f at X is the affine function of Z given by

−1

f (Z) ≈ f (X) + tr X (Z − X) .
Thus, we can write the simple formula
∇f (X) = X −1.
This result should not be surprising, since the derivative of log x, on R++,
is 1/x.

Chain rule
Suppose f : Rn → Rm is differentiable at x ∈ int dom f and

g : Rm → Rp is differentiable at f (x) ∈ int dom g. Define h : Rn → Rp
by h(z) = g(f (z)). Then
Dh(x) = Dg(f (x))Df (x). (2)
Composition with an affine function:
Suppose f : Rn → Rm is differentiable, A ∈ Rn×p, and b ∈ Rn. Define

g : Rp → Rm as g(x) = f (Ax + b), with dom g = {x | Ax + b ∈ dom f }.
The derivative of g is Dg(x) = Df (Ax + b)A.
When f is real-valued (i.e., m = 1),
∇g(x) = AT ∇f (Ax + b).

• Consider the function f : Rn → R, with dom f = Rn and
m
X
f (x) = log exp(aTi x + bi),
i=1
where a1, . . . , am ∈ Rn, and b1, . . . , bm ∈ R.

Note that f is the composition of the affine function Ax + b, where
A ∈ Rm×n with Pm rows a T
1 , . . . , a T
m , and the function g : R m
→ R given
by g(y) = log( i=1 exp yi).
Then by the composition formula we have
1
∇f (x) = AT z
1T z
where zi = exp(aTi x + bi), i = 1, . . . , m.

• Consider
h(x) = log det(F0 + x1F1 + · · · + xnFn),
where F0, . . . , Fn ∈ Sp, and
dom h = {x ∈ Rn | F0 + x1F1 + · · · + xnFn 0}.
The function h is the composition of the affine mapping from x ∈ Rn to

F0 + x1F1 + · · · + xnFn ∈ Sp, with the function log det X.
∂h(x)
= tr(Fi∇ log det(F )) = tr(F −1Fi),
∂xi
where F = F0 + x1F1 + · · · + xnFn. Thus we have

 
tr(F −1
F1 )
∇h(x) =  .. .
tr(F −1Fn)

Homework hints
• Let f (X) = tr AX, dom f = Sn then ∇f (X) = A.
• The backward difference matrix

 
−1
 1 −1
 ∈ RN ×N .

D=  ... ... 
1 −1
can be built in MATLAB as follows:
D = diag(-ones(N,1)) + diag(ones(N-1,1),-1);
or
D = toeplitz([-1; 1; zeros(N-1,1)], [-1 zeros(1,N-1)]);

• The (sparse) tridiagonal matrix ∆ ∈ Rn×n
 
1 −1 0 ··· 0 0 0

 −1 2 −1 · · · 0 0 0 

 0 −1 2 ··· 0 0 0 

∆= .. .. .. .. .. .. 
.
 

 0 0 0 ··· 2 −1 0 

 0 0 0 ··· −1 2 −1 
0 0 0 ··· 0 −1 1
can be built in MATLAB as follows:
d_1 = 2*ones(n,1); d_1(1) = 1; d_1(n) = 1;

d_2 = -ones(n,1);
D = spdiags([d_2 d_1 d_2],[-1 0 1], n,n);

Review8 Merged

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Review8 Merged

Enviado por

Direitos autorais:

Formatos disponíveis

EE364 Review Session 8

• Inertia of the KKT matrix

• Inequality constrained problems

• TV denoising, a bicriterion optimization problem

• xcor ∈ Rn is the corrupted signal and x ∈ Rn is the de-noise

• Problem can be formulated as an SOCP or a QP

minimize kx − xcork22 + µφatv,

EE364 Review Session 8

• Can solve this problem using Newton’s method

• Gradient and Hessian are

∇ψ(x) = 2(x − xcor) + µ∇φatv(x), ∇2ψ(x) = 2I + µ

• Use chain rule to compute ∇φatv(x) and ∇2φatv(x)

• Let f : R → R denote the function

• Its first and second derivatives are

f 0(u) = u(2 + u2)−1/2, f 00(u) = 2(2 + u2)−

EE364 Review Session 8

• Its gradient and Hessian are

∇F (u) = (f 0(u1), . . . , f 0(un−1)), ∇2F (u) = diag(f 00(u

• We have φatv(x) = F (Dx), where

EE364 Review Session 8

∇φatv (x) = D T ∇F (Dx), ∇2φatv(x) = D T ∇2F (

∇ψ(x) = 2(x − xcor) + µD T ∇F (Dx)

• We can compute the Newton step in O(n) operations.

EE364 Review Session 8

% Newton method for approximate total variation de

for iter = 1:MAXITERS

EE364 Review Session 8

if (lambdasqr/2) < NTTOL, break; end;

EE364 Review Session 8

EE364 Review Session 8

EE364 Review Session 8

EE364 Review Session 8

EE364 Review Session 8

• KKT matrix for equality-constrained quadratic minimization

P ∈ Sn+, A ∈ Rp×n , rank A = p < n

• Inertia of a matrix: (p, z, n), where p is number of positive e

• Condition for non-singularity of KKT matrix: P + AT A  0

• If M is non-singular, there exists nonsingular matrix R ∈ Rn

EE364 Review Session 8

• Therefore V T RT P RV = I − S T S is a diagonal matrix Λ

Λ = V T RT P RV = diag(1 − σ12, . . . , 1 − σp2, 1, . . .

• Congruence transformations preserve inertia

EE364 Review Session 8

• The eigenvalues of the 2 × 2-blocks are

i.e., one eigenvalue is positive and one is negative

• So p + n − p = n positive eigenvalues, and p negative eigen

EE364 Review Session 8

• Problems with inequality constraints:

• Exact reformulation with indicator function:

where I−(u) = 0 if u ≤ 0, I−(u) = ∞ otherwise

• Approximation via barrier function:

EE364 Review Session 8

• Approximation via log-barrier:

minimize tf0(x) + φ(x)

is an equality constrained problem

EE364 Review Session 8

• Feasible set is [2, 4], and optimal point x? = 2

EE364 Review Session 8

• Gradient and Hessian of the logarithmic barrier function φ a

EE364 Review Session 8

EE364 Review Session 1

• TAs: Alessandro Magnani, Argyris Zymnis and Joelle Skaf

f 0(u) = u(2 + u2)−1/2, f 00(u) = 2(2 + u2)−

• Condition for non-singularity of KKT matrix: P + AT A 0

1. Show that C is convex if A 0.

Define g(t) = f (Z + tV ), where Z 0 and V ∈ Sn.