Lecture Notes

Core Matrix Analysis
Shivkumar Chandrasekaran
October 4, 2010
1
Contents
1 A Note to the Student 3
1.1 Acknowledgements 5
2 Matrix Arithmetic 6
2.1 Notation 6
2.2 Addition & Subtraction 7
2.3 Multiplication 8
2.4 Inverses 10
2.5 Transpose 12
2.6 Gaussian Elimination 13
2.7 Solving Ax = b 17
2.8 Problems 19
3 Geometry 20
3.1 Vector Spaces 20
3.2 Hyper-planes 22
3.3 Lengths 25
3.4 Angles 28
3.5 Matrix Norms 30
3.6 RieszThorin 33
3.7 Perturbed inverses 37
4 Orthogonality 39
4.1 Unitary Matrices 39
4.2 The Singular Value Decomposition 40
4.3 Orthogonal Subspaces 43
4.4 Minimum norm least-squares solution 45
4.5 Problems 48
5 Spectral Theory 49
5.1 Spectral Decompositions 49
5.2 Invariant subspaces 57
5.3 Dierence Equations 60
5.4 Matrix-valued functions 62
5.5 Functions of matrices 64
5.6 Dierential equations 67
5.7 Localization of eigenvalues 69
5.8 Real symmetric matrices 71
5.9 Cholesky factorization 76
2
5.10 Problems 77
6 Tensor Algebra 78
6.1 Kronecker product 78
6.2 Tensor Product Spaces 80
6.3 Symmetric tensors 83
6.4 Symmetric tensor powers 94
6.5 Signs of permutations 96
6.6 Anti-symmetric tensors 98
6.7 Anti-symmetric tensor powers 103
3
1 A Note to the Student
These notes are very much a work in progress. Please check the web-site
frequently for updates.
These notes do not attempt to explain matrix analysis or even linear algebra. For
that I recommend other texts. For example the chapter on Matrix Arithmetic is
more of an extended exercise than an explanation.
If you need explanations then G. Strangs Linear Algebra and its Applications, is a
very good introduction for the neophyte.
On the other hand, if you have had a prior introduction to linear algebra, then
C. Meyers Matrix Analysis and Applied Linear Algebra is an excellent choice.
For students interested in systems theory, controls theory or operator theory I rec-
ommend H. Dyms Linear Algebra in Action.
Finally, for students of mathematics, I suggest A (Terse) Introduction to Linear
Algebra by Y. Katznelson and Y. R. Katznelson.
After this class, to see how the ideas presented here can be generalized to the innite-
dimensional setting, I recommend I. Gohberg, M. Kaashoek and S. Goldbergs Basic
Classes of Linear Operators. Another excellent book is P. Laxs Functional Analysis.
For more results in matrix analysis with good explanations nothing can beat R. Horn
and C. Johnsons classic Matrix Analysis.
The serious student of mathematics will also want to look at R. Bhatias Matrix
Analysis.
For all algorithmic issues Matrix Computations by G. H. Golub and C. van Loan is
a classic source.
I hope these notes relieve the student of the burden of taking handwritten notes in
my lectures. Anyway, a good way to learn the subject is to go through these notes
working out all the exercises.
These notes are still a work in progresstypos abound. Please email them me as
you nd them (email: shiv@ece.ucsb.edu).
Ideally there should be no errors in the proofs. If there are I would appreciate
hearing about them.
4
There are many ways to present matrix analysis. My desire has been to nd short,
constructive approaches to all proofs. If you have a shorter and more constructive
proof for any of the material please let me know.
Almost all proofs presented here are well-known. If at all there is a claim to in-
novation it might be in the proof of the Jordan decomposition theorem. What is
uncommon is a presentation of a version of the RieszThorin interpolation theorem,
and a related result of Holmgren. The latter especially is a very useful result that
is not as well-known as it should be. Both of these are based on the more general
presentation in Laxs Functional Analysis book.
The last (incomplete) chapter on tensor algebra is very much a work in progress and
could easily stand a couple of re-writes. Use with a great deal of caution.
5
1.1 Acknowledgements
Karthik Raghuram Jayaraman, Mike Lawson, Lee Nguyen, Naveen Somasunderam.
If I have inadvertently left somebody out please let me know.
6
2 Matrix Arithmetic
2.1 Notation
g f denotes the composition of the function g with the function f; that is, (g 1
f)(x) = g(f(x)).
The set of all positive integers. 2 N
The set of all integers. 3 Z
The set of all real numbers. 4 R
The set of all complex numbers. 5 C
For us scalars will denote either real numbers or complex numbers. The context will 6 Scalar
make it clear which one we are talking about. Small Greek letters , , , . . . will
usually denote scalars.
A matrix is a rectangular array of scalars. If A is a matrix then the scalar at the 7 Matrix
intersection of row i and column j is denoted by A
i,j
.
An mn matrix has m rows and n columns. One, or both, of m and n can be zero. 8 m n
The set of all real mn matrices. 9 R
mn
The set of all complex mn matrices. 10 C
mn
R
n1
, also called the set of column vectors with n real components. 11 R
n
C
n1
, also called the set of column vectors with n complex components. 12 C
n
A block matrix is a rectangular array of matrices. If A is a block matrix then the 13 Block Matrix
matrix at the intersection of block row i and block column j is denoted by A
i,j
. We
will assume that all matrices in block column j have n
j
columns, and all matrices
in block row i will have m
i
rows. That is we will assume that A
i,j
is an m
i
n
j
matrix. We will denote the block matrix A pictorially as follows
A =
_
_
_
n
1
n
l
m
1
A
1,1
A
1,l
.
.
.
.
.
.
.
.
.
.
.
.
m
k
A
1,k
A
k,l
_
_
_
.
This is also called a k l block partitioning of the matrix A.
7
2.2 Addition & Subtraction
For any scalar 14 Scalar mul-
tiplication
A
mn
= B
mn
( a )
11
= ( a )
11
_
A
1,1
A
1,2
A
2,1
A
2,2
_
=
_
A
1,1
A
1,2
A
2,1
A
2,2
_
The above denition of scalar multiplication must be interpreted as follows. The
rst equation implies that the argument A and the result B must have identical
number of rows m, and columns n. Therefore if either m or n is zero there are no
entries in B and nothing to compute. If the argument is a 1 1 matrix the second
equation states how the result must be computed. If the argument is larger than
that, the third equation states how the scalar multiplication can be reduced into at
most four smaller scalar multiplications.
Prove that if A = B then A
i,j
= B
i,j
. Exercise 1
15 Addition
A
mn
+ B
mn
= C
mn
( a )
11
+ ( b )
11
= ( a + b )
11
_
A
1,1
A
1,2
A
2,1
A
2,2
_
+
_
B
1,1
B
1,2
B
2,1
B
2,2
_
=
_
A
1,1
+B
1,1
A
1,2
+B
1,2
A
2,1
+B
2,1
A
2,2
+B
2,2
_
Prove that if A+B = C, then A
i,j
+ B
i,j
= C
i,j
. Exercise 2
AB = A+ (1)B. 16 Subtraction
Prove that if AB = C, then A
i,j
B
i,j
= C
i,j
. Exercise 3
We denote the m n matrix of zeros by 0
mn
. We will drop the subscripts if the 17 0
size is obvious from the context.
Show that A+0 = A and 0A = 0. Exercise 4
Show that matrix addition is commutative: A+B = B+A. Exercise 5
Show that scalar multiplication is distributive over matrix addition: (A+B) = Exercise 6
A+ B.
8
2.3 Multiplication
18 Multiplication
A
mk
B
kn
= C
mn
( )
10
( )
01
= ( 0 )
11
( a )
11
( b )
11
= ( ab )
11
_
A
1,1
A
1,2
A
2,1
A
2,2
__
B
1,1
B
1,2
B
2,1
B
2,2
_
=
_
A
1,1
B
1,1
+A
1,2
B
2,1
A
1,1
B
1,2
+A
1,2
B
2,2
A
2,1
B
1,1
+A
2,2
B
2,1
A
2,1
B
1,2
+A
2,2
B
2,2
_
Show that if AB = C then

k
A
i,k
B
k,j
= C
i,j
. Exercise 7
Show that A( B
1,1
B
1,2
B
1,n
) = ( AB
1,1
AB
1,2
AB
1,n
). This shows Exercise 8
that matrix multiplication from the left acts on each (block) column of the right ma-
trix independently.
Show that Exercise 9
_
_
_
_
A
1,1
A
2,1
.
.
.
A
m,1
_
_
_
_
B =
_
_
_
_
A
1,1
B
A
2,1
B
.
.
.
A
m,1
B
_
_
_
_
This shows that matrix multiplication from the left acts on each (block) row of the
left matrix independently.
( A
1,1
A
1,2
A
1,k
)
_
_
_
_
B
1,1
B
2,1
.
.
.
B
k,1
_
_
_
_
=
k
l=1
A
1,l
B
l,1
This is called a (block) inner product. Quite confusingly, when all the partitions
have only one row or column, each term on the right in the sum is an outer product.
In that case this formula is called the outer product form of matrix multiplication.
Usually the term inner product is reserved for the case when A has one row and B
has one column.
_
_
_
_
A
1,1
A
2,1
.
.
.
A
m,1
_
_
_
_
( B
1,1
B
1,2
B
1,n
) =
_
_
_
_
A
1,1
B
1,1
A
1,1
B
1,2
A
1,1
B
1,n
A
2,1
B
1,1
A
2,1
B
1,2
A
2,1
B
1,n
.
.
.
.
.
.
.
.
.
A
m,1
B
1,1
A
m,1
B
1,2
A
m,1
B
1,n
_
_
_
_
9
This is called a (block) outer product. Usually the term outer produce is reserved
for the case when A has one column and B has one row.
A square matrix L is said to be lower triangular if all its entries above the diagonal 19 Lower trian-
gular matrix are zero; that is, L
i,j
= 0 for i < j.
Show that the product of lower triangular matrices is lower triangular. Exercise 12
Matrix multiplication behaves a great deal like regular multiplication, except for one
crucial fact: in general it is not commutative (non-commutative).
Find 2 2 matrices A and B, such that AB = BA. Exercise 13
Show that matrix multiplication is associative: (AB)C = A(BC). Exercise 14
Show that matrix multiplication is left and right distributive over matrix addi- Exercise 15
tion: A(B+C) = AB+AC, and (B+C)A = BA+CA.
10
2.4 Inverses
Let A and B be two sets. A function f : A B is said to have a left inverse 20 Left inverse
g : B A if g f is the identity map on A.
Show that a function has a left inverse i it is one-to-one. Exercise 16
When does a one-to-one function have more than one left inverse? Exercise 17
Let A and B be two sets. A function f : A B is said to have a right inverse 21 Right inverse
g : B A if f g is the identity map on B.
Show that a function has a right inverse i it is onto. Exercise 18
When does an onto function have more than one right inverse? Exercise 19
The n n identity matrix is denoted by I
n
and is dened to have ones on the 22 Identity
diagonal and zeros every where else. That is, I
i,i
= 1 and I
i,j
= 0 if i = j.
Show that I
m
A
mn
= A
mn
I
n
= A
mn
. Exercise 20
We will restrict our attention to linear left and right inverses of matrices. So we
re-dene these notions to suit our usage.
A
L
is said to be a left inverse of A if A
L
A = I. 23 Left inverse
From now on the subscript on the identity matrix that denotes its size will be
dropped if it can be inferred from the context. So, in the above denition, it is clear
that the size of the identity matrix is determined by the number of columns of the
matrix A.
How many rows and columns must A
L
have? Exercise 21
A
R
is said to be a right inverse of A if AA
R
= I. 24 Right inverse
How many rows and columns must A
R
have? Exercise 22
To unify our denition of matrix inverses with function inverses we can think of a
matrix A
mn
as a function that maps vectors in C
n
to vectors in C
m
by the rule
y = Ax for all x C
n
.
Verify that the above statement makes sense; that is, if A
L
is a matrix left inverse Exercise 23
for A
mn
, then it is also a left inverse for A viewed as a function from C
n
to C
m
.
A
1
is said to be an inverse of A if it is both a left and right inverse of A. 25 Inverse
11
Show that if A
1
exists then it must be unique. Hint: Use Exercise 16, Exercise 17, Exercise 24
Exercise 18, Exercise 19 and Exercise 23.
Example 1
_
a b
c d
_
1
=
1
ad bc
_
d b
c a
_
when ad bc = 0.
Example 2
_
A 0
B C
_
1
=
_
A
1
0
C
1
BA
1
C
1
_
when A
1
and C
1
exist.
Find Exercise 25
_
A B
0 C
_
1
when A
1
and C
1
exist.
Example 3
( I 0 )
_
I
X
_
= I
but
_
I
X
_
( I 0 ) =
_
I 0
0 0
_
.
This shows that a left inverse need not be a right inverse and vice versa.
Show that the matrix Exercise 26
_
I 0
0 0
_
has no left or right inverses. Later we will dene the pseudo-inverse of a matrix,
which will always exist.
A square matrix U is said to be upper triangular if all its entries below the diagonal 26 Upper trian-
gular matrix are zero; that is, U
i,j
= 0 for i > j.
Show that the inverse of an upper triangular matrix exists if all the diagonal entries Exercise 27
are non-zero, and that the inverse is also upper triangular. Hint: Use Exercise 25.
Show that (AB)
1
= B
1
A
1
when A
1
and B
1
exist. Exercise 28
12
2.5 Transpose
Transpose is denoted by a raised superscript T and is dened by 27 Transpose
( a )
T
11
= ( a )
11
_
A
1,1
A
1,2
A
2,1
A
2,2
_
T
=
_
A
T
1,1
A
T
2,1
A
T
1,2
A
T
2,2
_
Show that if B
nm
= A
T
then A is an mn matrix and B
i,j
= A
j,i
. Exercise 29
Show that (A+B)
T
= A
T
+B
T
. Exercise 30
Show that (AB)
T
= B
T
A
T
provided the product AB is well-dened. Exercise 31
Hermitian transpose is denoted by a raised superscript H and is dened by 28 Hermitian
transpose
( a )
H
11
= ( a )
11
_
A
1,1
A
1,2
A
2,1
A
2,2
_
H
=
_
A
H
1,1
A
H
2,1
A
H
1,2
A
H
2,2
_
where z denotes the complex conjugate of z.
Show that if B
nm
= A
H
then A is an mn matrix and B
i,j
=

A
j,i
. Exercise 32
Show that (A+B)
H
= A
H
+B
H
. Exercise 33
Show that (AB)
H
= B
H
A
H
provided the product AB is well-dened. Exercise 34
The (Hermitian) transpose is a crucial operator as it lets m n matrices act by
matrix multiplication on other mn matrices.
Show that A
H
A and AA
H
are well-dened matrix products. Note that in general Exercise 35
A
2
is not a well-dened matrix product.
13
2.6 Gaussian Elimination
How do we compute a left, right or just plain old inverse of a given matrix A?
Answer: by Gaussian elimination. We will present Gaussian elimination as a matrix
factorization.
Given a permutation
1
,
2
, . . . ,
n
of the integers 1, . . . , n we can dene a permu- 29 Permutation
tation matrix P by the equation
P
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
_
_
=
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
_
_
x
i
C.
Write P down explicitly when
1
= 4,
2
= 1,
3
= 2,
4
= 3. Exercise 36
Write P down explicitly in the general case. Exercise 37
Show that P
T
= P
1
. Exercise 38
Show that a product of permutation matrices is another permutation matrix. Exercise 39
If P is a permutation matrix such that Exercise 40
P
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
_
_
=
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
_
_
x
i
C.
for some permutation
i
of the integers 1, . . . , n nd
( x
1
x
2
x
n
) P.
Hint: Transpose.
A lower triangular matrix with ones on the main diagonal is called a unit lower 30 Unit lower tri-
angular matrix triangular matrix.
Show that the product of unit lower triangular matrices is unit lower triangular. Exercise 41
Show that a unit lower triangular matrix always has an inverse, which is also unit Exercise 42
lower triangular, Hint: Use Example 2 and Exercise 27.
For every m n matrix A there exists two permutations P
1
and P
2
such that 31 LU
P
1
AP
2
= LU, where L is a unit lower triangular matrix and U is of the form
14
U =
_
r n r
r U
1,1
U
1,2
mr 0 0
_
,
where U
1,1
is an upper triangular matrix with non-zero diagonal entries.
The integer r in the LU factorization of A is called the rank of the matrix A. 32 Rank
Give examples of mn matrices for which the ranks are 0, 1, m and n. Exercise 43
Proof of LU decomposition. The proof is by induction on the matrix size.
Case 1.
I
..
P
1
0
..
A
I
..
P
2
= I
..
L
0
..
U
In this case U
1,1
is empty and the rank r = 0.
Case 2.
I
..
P
1
( a )
11
. .
A
I
..
P
2
= I
..
L
( a )
1n
. .
U
If a = 0 then r = 1, otherwise r = 0 and U
1,1
is empty.
Case 3. Pick two intermediate permutations Q
1
and Q
2
such that the (1, 1) entry
of Q
1
AQ
2
is non-zero.
Prove that this step is possible if A = 0. Otherwise we are done by case 1. Exercise 44
Let
Q
1
AQ
2
=
_
1 n 1
1 A
1,1
A
1,2
m1 A
2,1
A
2,2
_
with A
1,1
= 0. Let
L
1
=
_
1 0
A
2,1
A
1
1,1
I
_
and U
1
=
_
A
1,1
A
1,2
0 A
2,2
A
2,1
A
1
1,1
A
1,2
_
where L
1
is a unit lower triangular matrix. L
1
is called an elementary Gauss
transform.
15
Q
1
AQ
2
= L
1
U
1
.
Let S
1
= A
2,2
A
2,1
A
1
1,1
A
1,2
, which is called a Schur complement. Note that
S
1
is smaller than A. If S
1
is empty then we are done. Otherwise, by the induction
hypothesis S
1
has an LU decomposition
Q
3
S
1
Q
4
= L
2
U
2
(2.1)
where Q
3
and Q
4
are the associated permutation matrices. Substituting this in the
expression for U
1
we obtain
Q
1
AQ
2
=
_
1 0
A
2,1
A
1
1,1
I
_
. .
L
1
_
A
1,1
A
1,2
0 Q
T
3
L
2
U
2
Q
T
4
_
. .
U
1
.
Verify this. Hint: Multiply equation 2.1 from the left by Q
T
3
. Exercise 46
We can now expand and factor the right hand side of the above expression to obtain
Q
1
AQ
2
=
_
1 0
0 Q
T
3
__
1 0
Q
3
A
2,1
A
1
1,1
L
2
__
A
1,1
A
1,2
Q
4
0 U
2
__
1 0
0 Q
T
4
_
.
Verify this. Exercise 47
We observe that
_
1 0
0 Q
T
3
_
and
_
1 0
0 Q
T
4
_
are permutation matrices.
Prove it. Exercise 48
Therefore their inverses are just their transposes. We can multiply by their trans-
poses on the left and right respectively of the above equation and obtain the desired
LU decomposition of A
_
1 0
0 Q
3
_
Q
1
. .
P
1
AQ
2
_
1 0
0 Q
4
_
. .
P
2
=
_
1 0
Q
3
A
2,1
A
1
1,1
L
2
_
. .
L
_
A
1,1
A
1,2
Q
4
0 U
2
_
. .
U
.
Verify that L in the above equation is unit lower triangular and that U has the form Exercise 49
promised in the LU decomposition denition 31.
16
Write a software program in your favorite programming language to compute the Exercise 50
LU decomposition of a matrix.
Gaussian elimination, and hence the LU decomposition, is the heart of matrix alge-
bra. Schur complements are one common manifestation which often goes completely
unnoticed in practice.
17
2.7 Solving Ax = b
Given an mn matrix A and an mk matrix b how do we nd all nk matrices
x which satisfy the equation Ax = b? Answer: LU decomposition.
Let P
1
AP
2
= LU. Substituting this in the equation for x we obtain the following
set of equivalent equations for x
Ax = b
P
T
1
LUP
T
2
x = b
UP
T
2
x = L
1
P
1
b.
Why do each of the above equations determine exactly the same set of solutions x? Exercise 51
Let
U =
_
r n r
r U
1,1
U
1,2
mr 0 0
_
where r is a rank of A and let
P
T
2
x = y =
_
r y
1,1
n r y
2,1
_
and L
1
P
1
b =
_
r b
1,1
mr b
2,1
_
,
with some abuse of notation. Substituting back into the equation for x we obtain
_
U
1,1
U
1,2
0 0
__
y
1,1
y
2,1
_
=
_
b
1,1
b
2,1
_
.
We see that the last block equation requires that b
2,1
= 0. Either this matrix
has zero rows and the condition is trivially satised, or it does not, and then the
validity of this equation depends entirely on the given b and L and P
1
. If b
2,1
= 0
then there are no matrices x which satisfy the equation Ax = b. If b
2,1
= 0
then we must look at the remaining rst block equation U
1,1
y
1,1
+U
1,2
y
2,1
= b
1,1
.
Since we are guaranteed that U
1,1
is invertible we see that the general solution is
y
1,1
= U
1
1,1
(b
1,1
U
1,2
y
2,1
), where we are free to pick y
2,1
freely.
Verify this last statement thoroughly; that is, show that any solution y, can be Exercise 52
written in this form.
We can state this result more succinctly as
18
y =
_
U
1
1,1
b
1,1
0
_
+
_
U
1
1,1
U
1,2
I
_
z,
where z can be chosen freely.
Of course we really want all the solutions x which we now obtain as
x = P
2
_
U
1
1,1
b
1,1
0
_
+P
2
_
U
1
1,1
U
1,2
I
_
z, .
whenever b
2,1
= 0; otherwise there are no solutions.
Verify that every solution is of this form. Exercise 53
Show that an mn matrix A has a right inverse i rank(A) = m. Such a matrix is Exercise 54
called a full row-rank matrix. Write down explicitly all right inverses of A. Hint:
I just did it.
Find all x that satisfy the equation x
H
A = b
H
explicitly in terms of the LU Exercise 55
factorization of A (not A
H
).
Show that an mn matrix A has a left inverse i rank(A) = n. Such a matrix is Exercise 56
called a full column-rank matrix. Write down explicitly all left inverses of A in
terms of the LU decomposition of A (not A
H
).
Show that if a matrix has both a left and right inverse then it is square. Exercise 57
Show that A has a left inverse i Ax = 0 implies x = 0. Exercise 58
Show that A has a right inverse i x
H
A = 0 implies x = 0. Exercise 59
19
2.8 Problems
Find all non-zero solutions x of A
mn
x = 0. Show that there are non-trivial Problem 1
solutions x = 0 if m < n.
Find all matrices b such that A
mn
x = b has no solution x. Show that such Problem 2
matrices b always exist if m > n.
Usually in practice linear algebra is needed to analyze linear equations where the
coecient matrix has some special structure. Here are some simple cases.
Find all matrices X that satisfy the equation AXB
T
= C, in terms of the LU Problem 3
factorizations of A and B. State the precise conditions under which there are no
solutions.
Let U
1
and U
2
be two upper-triangular matrices. Let Z be an m n matrix. Let Problem 4
X be an unknown matrix that satises the equation
U
1
X+XU
2
= Z.
A. Give an algorithm to nd X in O(mn(m+n)) ops (oating-point operations).
B. Find conditions on U
1
and U
2
which guarantee the existence of a unique solution
X.
C. Give a non-trivial example (U
1
= 0, U
2
= 0, X = 0) where those conditions are
not satised and
U
1
X+XU
2
= 0.
20
3 Geometry
We will now develop the basic notions of Eulcidean geometry in higher-dimensional
spaces.
3.1 Vector Spaces
We will use F to denote either R or C, and we will call its elements as scalars. 33 F
A vector space consists of a set V of vectors and a set F of scalars, an operation 34 Vector space
+ : V V V, called vector addition, and an operation called scalar multiplication
from V F to V, that satisfy the following properties for all u, v, w V and all
, F:
1. u + v = v + u V (closed and commutative);
2. (u + v) + w = u + (v + w) (associative);
3. There exists a 0 vector in V such that u + 0 = u (existence of identity);
4. For each u V there exists an element u V such that u+(u) = 0 (existence
of inverse);
5. u V (scalar multiplication is closed);
6. (u + v) = u + v (distributive);
7. u( + ) = u + u (distributive);
8. u() = (u) (associative);
9. u1 = u (unit scaling).
Note: We will allow the scalar in scalar multiplication to be written on wither side
of the vector it is multiplying. This is possible because both vector addition and
scalar multiplication are commutative, associative and distribute over each other.
Show that the 0 vector in V is unique. Exercise 60
Show that for each v V there is exactly one vector w such that v + w = 0. Exercise 61
Show that 0v = 0 for all v V. Exercise 62
Show that (1)v = v for all v V. Exercise 63
21
The set of column vectors with n elements drawn from F. 35 F
n
Show that F
n
is a vector space over the scalars F with the obvious denition of Exercise 64
vector addition and scalar multiplication.
The set of mn matrices with elements drawn from F. 36 F
mn
Show that F
mn
is a vector space over the scalars F with matrix addition as vector Exercise 65
addition and the usual scalar multiplication.
Note: When the scalar F is obvious, we will abuse notation and call V as the vector
space. There is usually no confusion as to the implied vector addition and scalar
multiplication operations either.
22
3.2 Hyper-planes
A subset W of a vector space V is a subspace of V if W is a vector space in its own 37 Subspace
right.
Fortunately, it turns out that W is a subspace of V i it is closed under vector
addition and scalar multiplication.
The nullspace of a matrix A F
mn
, denoted by N(A), is the set of all column 38 Nullspace
vectors x F
n
such that Ax = 0.
Show that N(A) is a subspace. Exercise 67
The range space of a matrix A F
mn
, denoted by R(A), is the set of all vectors 39 Range space
y F
m
such that Ax = y for some vector x. This is also called the column space
of A.
Show that R(A) is a subspace. Exercise 68
N(A
H
) is called the left nullspace of A. 40 Left nullspace
R(A
H
) is called the row space of A. 41 Row space
Show that the intersection of two subspaces is a subspace. Exercise 69
Show that the union of two subspaces need not be a subspace. Exercise 70
Let W
1
and W
2
be two subsets of the vector space V. W
1
+W
2
is dened to be the 42 Sums of sets
set of all vectors of the form w
1
+ w
2
, where w
1
W
1
and w
2
W
2
.
Show that W
1
+W
2
is a subspace if W
1
and W
2
are subspaces. Exercise 71
Let W
1
and W
2
be subspaces. Show that W
1
+ W
2
is the smallest subspace that Exercise 72
contains W
1
W
2
.
Show that R(( A B)) = R(A) +R(B). Exercise 73
N
__
A
B
__
= N(A) N(B).
If W
1
and W
2
are subspaces with W
1
W
2
= {0}, then W
1
+ W
2
is written as 43 Direct sum
W
1
W
2
, and it is called the direct sum of W
1
and W
2
.
23
If v
1
, v
2
, . . . , v
k
(0 < k < ) are vectors and
1
,
2
, . . . ,
k
are scalars, then the 44 Linear combina-
tion
vector

k
i=1
i
v
i
is called a linear combination of the vectors v
1
, v
2
, . . . , v
k
.
Note that we can write this as
k
i=1
i
v
i
= ( v
1
v
2
v
k
)
_
_
_
_
2
.
.
.
k
_
_
_
_
.
So matrix vector multiplication results in a linear combination of the columns of the
matrix. Note that the matrix containing the vectors v
i
must be viewed only as a
block matrix, since the vectors v
i
are abstract at this point. However, from now on
we will allow such abstract block matrix notation where convenient.
The span of a set of vectors v
1
, v
2
, . . . , v
k
is dened to be the set of all possible linear 45 Span
combinations of v
1
, v
2
, . . . , v
k
.
Show that span{v
1
, v
2
, . . . , v
k
} is a subspace. Exercise 75
Show that span{v
1
, v
2
, . . . , v
k
} is the smallest subspace that contains v
1
, v
2
, . . . , v
k
. Exercise 76
Show that span{v
1
, v
2
, . . . , v
k
} = R(( v
1
v
2
v
k
)). Exercise 77
Spans are a compact means of specifying a subspace. However, they are not the
most compact necessarily.
A set of vectors v
1
, v
2
, . . . , v
k
is said to be linearly independent if the equation 46 Linear In-
dependence
1
v
1
+
2
v
2
+ +
k
v
k
= 0,
has only the zero solution
1
=
2
= =
k
= 0.
A set of vectors v
1
, v
2
, . . . , v
k
is said to be linearly dependent if they are not linearly 47 Linear De-
pendence independent.
Show that v
1
, v
2
, . . . , v
k
are linearly independent i N(( v
1
v
2
v
k
)) = 0. Exercise 78
Let Exercise 79
A =
_
L
X
_
,
where L is a lower-triangular matrix. Show that the columns of A are linearly
independent if the diagonal entries of L are non-zero.
24
A set of vectors v
1
, v
2
, . . . , v
k
is a basis for a subspace W if span{v
1
, v
2
, . . . , v
k
} = W 48 Basis
and the vectors v
1
, v
2
, . . . , v
k
are linearly independent.
Suppose a subspace W has a basis with k vectors. Then k is called the dimension 49 Dimension
of W and denoted by dim(W) = k.
Implicit in the above denition is that the dimension of a subspace does not depend
on the choice of basis. We prove this now. Assume to the contrary that the subspace
W has v
1
, v
2
, . . . , v
k
as one basis, and w
1
, w
2
, . . . , w
r
as a second basis with r < k <
. It follows from the properties of basis that there is an r k matrix X such that
( v
1
v
2
v
k
) = ( w
1
w
2
w
r
) X.
Since X is fat, N(X) = {0}.
Why? Exercise 80
Let 0 = z N(X). Then it follows that
( v
1
v
2
v
k
) z = ( w
1
w
2
w
r
) Xz = 0.
Hence v
1
, v
2
, . . . , v
k
are not linearly independent, giving a contradiction.
Let A be an mn matrix. Find bases for Exercise 81
R(A)
N(A)
R(A
H
)
N(A
H
)
explicitly using the LU factorization of A (only). From this establish that
dim(R(A)) = dim(R(A
H
)) = rank(A)
dim(N(A)) +rank(A) = n.
The last formula is called the rank-nullity theorem.
Show that dim(F
n
) = n. Exercise 82
Show that dim(F
mn
) = mn. Exercise 83
Let F
denote the set of columns vectors with elements drawn from F and indexed Exercise 84
from 1, 2, . . .. Show that dim(F
) is not nite.
Show that for every matrix A there are two full column-rank matrices X and Y Exercise 85
with the same rank as A, such that A = XY
H
.
25
3.3 Lengths
A norm, denoted by ||||, is a function from a vector space V over F to R that satises 50 Norm
the following properties
||v|| 0 for all v V (positive semi-deniteness)
||v|| = 0 i v = 0 (positive deniteness)
||v|| = ||||v|| for all F and all v V (homogeneity)
||v + w|| ||v|| +||w|| for all v, w V (triangle inequality)
Show that |||v|| ||w||| ||v w||. Exercise 86
Show that norms are continuous functions on F
n
. Hint: Let e
i
denote a basis for Exercise 87
F
n
. Then
||v w||
n
i=1
|v
i
w
i
|||e
i
|| constant max
1in
|v
i
w
i
|.
The set of vectors with norm 1 is called the unit ball of that norm. 51 Unit Ball
The set of vectors with norm 1 is called the unit sphere for that norm. 52 Unit Sphere
A set of vectors in a vector space V is said to be convex if for every pair of vectors 53 Convex sets
v and w in the set, and every 0 1, the vector v +(1 )w is also in the set.
Show that the intersection of two convex sets is convex. Exercise 88
Show that the sum of two convex sets is convex. Exercise 89
Show that the unit ball of a norm is a convex set. Exercise 90
A function f from a vector space to R is said to be convex if 54 Convex function
f (v + (1 )w) f(v) + (1 )f(w)
for all vectors v and w and 0 1.
Show that if f is a convex function then {v : f(v) } is a convex set for all . Exercise 91
By considering the function e
x
show that the converse is not true. Exercise 92
Show that || || is a convex function. Exercise 93
We claim that if f : V R is a function that satises the following conditions
26
f(v) 0 for all v V
f(v) = 0 i v = 0
f(v) = ||f(v) for all F and all v V
The set {v : f(v) 1} is convex
then f denes a norm on V.
Show that the ball of radius r, {v : f(v) r}, is convex. Exercise 94
Show that f (f(x)y + (1 )f(y)x) f(x)f(y) for all 0 1. Hint: f(x)y lies Exercise 95
in the ball of radius f(x)f(y).
Finish the proof by picking = f(y)/(f(x) + f(y)) in the above inequality. Exercise 96
This shows that the triangle inequality requirement is equivalent to the convexity
of the unit ball.
For x F
n
the p-norm of x, for 1 p < is dened to be 55 p-norm
||x||
p
=
_
n
i=1
|x
i
|
p
_
1
p
.
For p = we dene the -norm of x to be
||x||
= max
1in
|x
i
| .
lim
p
||x||
p
= ||x||
.
Show that the function || ||
p
for 1 p satises the rst three conditions for Exercise 98
being a norm.
Show that the sum of two convex functions is convex. Exercise 99
Assume that the function |x|
p
is convex when 1 p < . Or, better yet, prove it.
Show that the function f
1
(x) = |x
1
|
p
is convex if 1 p < . Exercise 100
Show that the function ||x||
p
p
is convex if 1 p < . Exercise 101
Show that the maximum of two convex functions is convex. Exercise 102
27
Show that ||x||
is convex. Exercise 103

Now observe that the unit ball {x : ||x||
p
1} = {x : ||x||
p
p
1}. It follows that the
unit balls for p-norms are convex. Hence, by exercise ??, we have established the
triangle inequality for p-norms.
56 Minkowskis
inequality
||x +y||
p
||x||
p
+||y||
p
, 1 p .
The case p = 2 is called the Euclidean norm. Observe that
||x||
2
=

x
H
x.
Let || ||
and || ||
be two norms on a vector space V. The two norms are said to 57 Equivalence
of norms be equivalent if there exist two positive nite constants c
1
and c
2
such that
c
1
||v||
||v||
c
2
||v||
, v V.
All norms on nite dimensional vector spaces are equivalent. Theorem 1
Proof. Since norms are continuous functions it follows that the unit sphere is closed.
Show that the unit sphere is closed. Exercise 104
Since V is assumed to be nite dimensional the unit sphere is compact.
Why? Exercise 105
Therefore the continuous functions || ||
and || ||
must both achieve their minimum

and maximum on the unit sphere. From this the existence of the positive nite
constants c
1
and c
2
follows. (Why?)
Show that for x F
n
, ||x||
p
||x||
q
for 1 q p . Exercise 106
Show that for x F
n
, ||x||
2

_
||x||
1
||x||
. Exercise 107
Establish the following inequalities for x F
n
Exercise 108
||x||
1

n||x||
2
||x||
1
n||x||
||x||
2

n||x||
Hint: For the rst inequality use the fact that 2xy |x|
2
+|y|
2
.
28
3.4 Angles
Pythagorean Theorem: If x and y are two perpendicular vectors (whatever that
means), they should form a right-angle triangle with x+y as the hypotenuse. Then
the Pythagorean Theorem would imply that
||x +y||
2
2
= ||x||
2
2
+||y||
2
2
.
Simplifying this using the fact that ||x||
2
2
= x
H
x, we obtain x
H
y = 0.
Two vectors x and y in F
n
are said to be (mutually) orthogonal if x
H
y = 0. This 58 Orthogonal
is denoted by x y.
More generally, for vectors in R
n
, we dene the angle between two vectors x and
y via the formula
cos =
x
T
y
||x||
2
||y||
2
.
There are many ways to justify this choice. One supporting fact is the Cauchy
BuniakowskySchwartz (CBS) inequality.
59 CBS inequality
x
H
y
||x||
2
||y||
2
.
Given x and y from F
n
, nd
such that, Exercise 109

||x + y||
2
||x +
y||
2
for all F.
Starting from Exercise 110
||x +
y||
2
0,
derive the CBS inequality.
The CBS inequality is a special case of the Hlder inequality.
60 Hlder inequality
x
H
y
||x||
p
||y||
q
,
1
p
+
1
q
= 1.
Prove the Hlder inequality when p = 1 and q = . Exercise 111
29
Proof of Hlder inequality. Note that ln x is convex on (0, ). Hence, for
x > 0 and y > 0,
ln (x + (1 )y) ln x (1 ) ln y.
Or, equivalently,
ln x + (1 ) ln y ln (x + (1 )y)
Exponentiating both sides we obtain
x
y
1
x + (1 )y. (3.1)
Therefore it follows that, with =
1
p
and 1 =
1
q
,
_
|x
i
|
p
||x||
p
p
_
1
p
_
|y
i
|
q
||y||
q
q
_
1
q
1
p
|x
i
|
p
||x||
p
p
+
1
q
|y
i
|
q
||y||
q
q
.
Summing both sides from 1 to n the Hlder inequality is derived.
Show that for x F
n
Exercise 112
||x||
2

_
||x||
p
||x||
q
,
1
p
+
1
q
= 1.
||x||
p
= sup
0=yF
n
x
H
y
||y||
q
,
1
p
+
1
q
= 1.
For this reason || ||
p
and || ||
q
are called dual norms whenever p +q = pq. || ||
2
is
the only self-dual norm among the lot and plays a prominent role.
30
3.5 Matrix Norms
The trace of a square matrix is dened to be the sum of its diagonal elements. 61 Trace
Show that trace(A+B) = trace(A) +trace(B). Exercise 114
Show that trace(AB) = trace(BA). Exercise 115
The Frobenius norm of a matrix A, denoted by ||A||
F
, is dened to be
_
trace(A
H
A). 62 Frobenius norm
||A||
2
F
=
m
i=1
n
j=1
|A
i,j
|
2
.
Show that the Frobenius norm satises all the properties of a norm. Exercise 117
Let || ||
be a norm on F
n
and let || ||
be a norm on F
m
. On F
mn
dene the norm 63 Induced ma-
trix norm
||A||
,
= sup
0=xF
n
||Ax||
||x||
.
Show that || ||
,
satises all the properties of a norm. Exercise 118
||Ax||
||A||
,
||x||
.
For A F
mn
we dene the p-norm of A to be 64 Induced ma-
trix p-norms
||A||
p
= sup
0=xF
n
||Ax||
p
||x||
p
, 1 p .
For x F
m1
show that the vector p-norm and matrix p-norm give identical values. Exercise 120
Show that for A F
mn
Exercise 121
||A||
1
= max
1jn
m
i=1
|A
i,j
|.
Show that for A F
mn
Exercise 122
||A||
= max
1im
n
j=1
|A
i,j
|.
31
Sub-multiplicative property: show that Exercise 123
||AB||
p
||A||
p
||B||
p
.
Establish the following inequalities for A F
mn
Exercise 124
||A||
1
m||A||
||A||
n||A||
1
||A||
1

m||A||
2
||A||
2

n||A||
1
.
Hint: The corresponding inequalities for vector norms might prove useful.
Show that for A F
mn
Exercise 125
||A||
2
= sup
0=yF
m
0=xF
n
y
H
Ax
||y||
2
||x||
2
.
Show that ||A||
2
= ||A
H
||
2
. Exercise 126
Show that ||AB||
F
min{||A||
2
||B||
F
, ||A||
F
||B||
2
}. Exercise 127
Show that ||A||
2
||A||
F
. Exercise 128
Show that the Frobenius norm is sub-multiplicative. Exercise 129
Show that for A F
mn
Exercise 130
||A||
p
= sup
0=yF
m
0=xF
n
y
H
Ax
||y||
q
||x||
p
,
1
p
+
1
q
= 1.
Show that ||A||
p
= ||A
H
||
q
when pq = p + q. Exercise 131
An important, but little known result, is one of Holmgrens,
||A||
2
2
||A||
1
||A||
.
Show that for c > 0, Exercise 132
xy c
x
2
2
+
1
c
y
2
2
,
and that the lower bound is achieved for some c 0 when x, y 0.
32
Since, for x F
n
and y F
m
,
y
H
Ax
i=1
n
j=1
|A
i,j
| |y
i
| |x
j
|
m
i=1
n
j=1
|A
i,j
|
_
c
|y
i
|
2
2
+
1
c
|x
j
|
2
2
_
,
whence
y
H
Ax

c
2
||A||
||y||
2
2
+
1
2c
||A||
1
||x||
2
2
.
Therefore, using the achievability of the lower-bound of exercise 132, we can con-
clude that
y
H
Ax
||x||
2
||y||
2
_
||A||
1
||A||
,
from which Holmgrens result follows.
Why? Exercise 133
33
3.6 RieszThorin
Holmgrens result is a special case of a result of M. Riesz. Due to an elegant proof of
Thorin it is called the RieszThorin interpolation theorem. We present a specialized
version of the result.
65 Riesz-Thorin
interpola-
tion theorem
||A||
p(a)
||A||
1a
p
0
||A||
a
p
1
,
1
p(a)
=
1 a
p
0
+
a
p
1
, 0 a 1.
We give a brief and dirty review of the needed complex analysis. For the net few
exercises engineering proofs are good enough, as a lot more work is needed to enable
rigorous proofs.
A formal series of the form 66 Taylor series
n=0
a
n
(z a)
n
is called a Taylor series about the point a C.
The radious of convergence of a Taylor series

n=0
a
n
(z a)
n
, is a number R, 67 Radius of
Convergence possibly innite, such that
n=0
|a
n
| |z a|
n
<
whenever |z a| < R.
Let denote an open set in C. We assume that the boundary of is a piece-wise
smooth curve that is simply connected.
A function f is said to be analytic in , if at every point a it has a Taylor series 68 Analytic
representation, f(z) =
n=0
a
n
(z a)
n
, with a non-zero radius of convergence.
Let 69 e
z
e
z
=
n=0
z
n
n!
.
Show that e
z
is analytic in C. Exercise 134
Let denote the circle |z a| = R, such that . Let f be analytic in . Show Exercise 135
that
34
_
f(z)dz = 0.
Hint: Take z a = Re
i
and dz = Rie
i
d and write it as an ordinary integral over
0 2.
f(a) =
1
2i
_
f(z)
z a
dz.
This is called Cauchys integral formula. Hint: Use a Taylor series expansion for f
integrate term-by-term.
|f(a)| max
|za|=R
|f(z)|.
Show that f(z) must attain its maximum (and minimum) at the boundary of . Exercise 138
This is called the maximum principle.
This is the end of the review, as all we needed was the maximum principle. You
should be able to give complete proofs from now on.
For the rest of this section let be the strip 0 Rez 1.
Show that |e
z
|, with real , must achieve its maximum and minimum in (inde- Exercise 139
pendently) on one of the lines Re(z) = 0 or Re(z) = 1. This does not require the
maximum principle.
Show that |
N
k=1
z
k
e
k
z
| with real
k
achieves its maximum on one of the lines Exercise 140
Re(z) = 0 or Re(z) = 1.
Let f(z) be analytic in an open set containing . Let 70 Hadamards
three lines lemma
F(a) = sup
y
|f(a + iy)| , 0 a 1.
Then
F(a) F
1a
(0)F
a
(1).
Proof of three lines lemma. Let
(z) = f(z)e
z log
F(0)
F(1)
.
35
Clearly is analytic in an open set containing . By the maximum principle |(z)|
F(0) on . Therefore
|f(a + iy)| e
a log
F(0)
F(1)
F(0),
and from this the three lines lemma follows.
Why? Exercise 141
We note that
||A||
p
= sup
x,y=0
y
H
Ax
||y||
q
||x||
p
,
1
p
+
1
q
= 1.
Let
1
p(z)
=
1 z
p
0
+
z
p
1
,
and
1
p(z)
+
1
q(z)
= 1.
Observe that
1
q(z)
=
1 z
q(0)
+
z
q(1)
.
Let ||x||
p(a)
= ||y||
q(a)
= 1. Let x
k
= |x
k
|e
i
k
and y
k
= |y
k
|e
i
k
. Dene
x
k
(z) = |x
k
|
p(a)
p(z)
e
i
k
and y
k
(z) = |y
k
|
q(a)
q(z)
e
i
k
.
Dene
f(z) = y
H
(z) Ax(z).
Note that 1/p(z) and 1/q(z) are linear functions in z, and hence analytic in z.
Therefore x(z) and y(z), and hence f(z), are also analytic functions of z.
As before let F(a) = sup
y
|f(a + iy)|. Then it is true that
F(0) ||A||
p
0
and F(1) ||A||
p
1
. (3.2)
36
To prove these we rst observe that
Re
_
1
p(x + iy)
_
=
1 x
p
0
+
x
p
1
=
1
p(x)
.
Hence it also follows that
Re
_
1
q(x + iy)
_
=
1
q(x)
.
Therefore we can conclude that ||x( + i)||
p()
= ||x()||
p()
. Similarly ||y( +
i)||
q()
= ||y()||
q()
.
Next we note that ||x(0)||
p
0
p
0
= ||x(a)||
p(a)
p(a)
= 1 = ||x(1)||
p
1
p
1
. Similarly ||y(0)||
q(0)
q(0)
=
||y(a)||
q(a)
q(a)
= 1 = ||y(1)||
q(1)
q(1)
.
From this it follows, using Hlders inequality, that
F(0) = sup
|f(i)| sup
||y(i)||
q(0)
||A||
p
0
||x(i)||
p
0
= ||A||
p
0
.
Similarly we can establish that
F(1) ||A||
p
1
.
Now choose x and y such that f(a) = ||A||
p(a)
, in addition to the fact that that
||x||
p(a)
= ||y||
q(a)
= 1. Then it follows that
F(a) = sup
b
|f(a + ib)| ||A||
p(a)
= |f(a)| F(a).
Now apply the three lines lemma to obtain the RieszThorin theorem.
Do so. Exercise 147
For nitedimensional matrices Holmgrens result is more than sucient in practice.
The RieszThorin result exhibits its power in the innitedimensional case, where
one or both of the 1norm and the norm may be innite.
37
3.7 Perturbed inverses
We will now show that A
1
is a continuous functions of its entries. There are
several ways to establish this fact. We will take a route via Neumanns theorem
that is useful in its own right.
Let A
n
for n = 1, 2, . . ., denote a sequence of mn matrices. We say that lim
n
A
n
=
71 Convergence of
matrix sequences
A, if every component of A
n
converges to the corresponding component of A. In
other words convergence of a matrix sequence is dened component-wise.
Show that lim
n
A
n
= A i lim
n
||A
n
A|| = 0, for any valid matrix norm. Note
Exercise 148
that this not true for matrices of innite size.
We say that

n=1
A
n
= A if lim
n
S
n
= A, with S
N
=
N
n=1
A
n
.
72 Convergence
of matrix sums
Just like innite sums of numbers, convergence of innite matrix sums can be deli-
cate.
Riemanns theorem. Show that by re-ordering the sum

n=1
(1)
n
/n you can Exercise 149
make it converge to any real number.
This cannot happen if the series converges absolutely. Geometrically if you think of
the series as a string with marks on it corresponding to the individual terms, bad
things can happen only if the string has innite length.
We say that

n=1
A
n
converges absolutely if

n=1
||A||
n
< , for some matrix 73 Absolute
convergence norm.
Show that if

n=1
||A||
n
< then there exists a nite matrix A such

n=1
A
n
= Exercise 150
A.
Let A be a square matrix such that ||A|| < 1 for some induced matrix norm. It then 74 Neumanns
Theorem follows that
(I A)
1
=
n=0
A
n
,
with absolute convergence of the series on the right.
Proof. This is just the matrix version of the geometric series.
Show that for |z| < 1, (1 z)
1
=
n=0
z
n
, with the series converging absolutely. Exercise 151
Show that

n=1
A
n
converges absolutely since ||A
n
|| < 1. Exercise 152
38
The only question is whether it converges to (I A)
1
? First we prove the required
inverse exists. Suppose it does not. Then there exists a vector x with ||x|| = 1 such
that Ax = x. (Why?)
Show that this implies that ||A|| 1, which is a contradiction. Exercise 153
It follows that I A is invertible.
Suppose
n=1
A
n
and
n=1
B
n
are two absolutely converging matrix series. Show Exercise 154
that
n=1
A
n
+
n=1
B
n
=
n=1
(A
n
+B
n
)
C
n=1
A
n
=
n=1
CA
n
Show that (I A)
n=0
A
n
= I. Exercise 155
Show that if A =
n=1
A
n
then ||A||
n=1
||A
n
||. Exercise 156
Show that if ||A|| < 1 for some induced matrix norm then ||(IA)
1
|| (1||A||)
1
. Exercise 157
Let ||A
1
||||E|| < 1 for some induced matrix norm. Show that A+E is non-singular Exercise 158
and that
||(A+E)
1
A
1
||
||A
1
||
||A||||A
1
||
||E||
||A||
1
1 ||A
1
||||E||
.
The factor (A) = ||A||||A
1
|| is called the condition number of the matrix A and
it is the amplication factor for the norm-wise relative error in A
1
due to relative
norm-wise perturbations in A. In general, linear systems with large condition num-
bers are dicult to solve accurately on oating-point machines. It is something that
one should always be aware of.
39
4 Orthogonality
The fact that the vector 2-norms are related to matrix multiplication leads to a
powerful algebraic technique.
4.1 Unitary Matrices
A set of column vectors v
i
is said to be orthonormal if ||v
i
||
2
= 1 and v
H
i
v
j
= 0 for 75 Orthonormal
i = j.
A square matrix U is said to be unitary if U
H
U = I. 76 Unitary Matrix
A real unitary matrix is called an orthogonal matrix. 77 Orthogo-
nal Matrix
Show that if the matrix U is unitary then UU
H
= I. Exercise 159
Show that the rows of a unitary matrix form an orthonormal set. Exercise 160
Show that the columns of a unitary matrix form an orthonormal set. Exercise 161
Show that the product of two unitary matrices is unitary. Exercise 162
Let U be a n n unitary matrix. Show that for x, y R
n
, y
H
x = (Uy)
H
(Ux). Exercise 163
Therefore unitary transforms preserve inner products. Conclude that unitary trans-
forms preserve 2-norms and angles of column vectors.
Show that ||UAV||
F
= ||A||
F
, if U and V are unitary transforms. Exercise 164
Show that ||UAV||
2
= ||A||
2
, if U and V are unitary transforms. Exercise 165
Show that permutation matrices are orthogonal matrices. Exercise 166
A matrix of the form I 2
vv
H
v
H
v
is called a Householder transform, where v is a
78 Householder
Transform
non-zero column vector.
Show that a Householder transform is a Hermitian unitary matrix. Exercise 167
Consider the Householder transform H = I 2
vv
H
v
H
v
. Show that Hv = v. Show
Exercise 168
that if x
H
v = 0, then Hx = x.
Explain why the Householder transform is called an elementary reector. Exercise 169
Let x, y R
n
. Show, by construction, that there is a Householder transform H Exercise 170
such that Hx = y, if ||x||
2
= ||y||
2
.
Elementary Gauss and Householder transforms are the main ingredients for the
algorithmic construction of matrix decompositions.
40
4.2 The Singular Value Decomposition
Or, the SVD, is the sledge-hammer that solves all problems in matrix analysis (or
something like that).
Show that for A C
mn
Exercise 171
||A||
2
= sup
||x||
2
=|| y ||
2
=1
|y
H
Ax|.
Since the unit spheres for the 2-norm in C
n
and C
m
are compact, and matrix Exercise 172
products are continuous functions, show that there exists x C
n
and y C
m
such
that ||x||
2
= ||y||
2
= 1, and Ax = ||A||
2
y.
For every m n matrix A there exist unitary matrices U and V and a matrix 79 SVD
R
mn
of the form
=
_
_
1
0
0
2
.
.
.
.
.
.
.
.
.
.
.
.
_
_
,
with
1

2

min(m,n)
0, such that A = UV
H
.
Proof. Let ||x||
2
= 1 = ||y||
2
such that Ax = ||A||
2
y. Let H
1
and H
2
be two
Householder transforms such that H
1
x = e
1
and H
2
y = e
1
, where e
i
denotes
column i of the appropriate identity matrix. Now we claim that
H
2
AH
H
1
=
_
||A||
2
b
H
0 C
_
.
Next we note that b = 0. To prove this rst note that ||H
2
AH
H
1
||
2
= ||A||
2
since H
1
and H
2
are unitary.
_
||A||
2
b
H
0 C
__
||A||
2
b
_
_
||A||
2
b
_
_
||A||
2
2
+||b||
2
2
.
But this would imply that ||H
2
AH
H
1
||
2
> ||A||
2
unless b = 0. Hence we have that
41
H
2
AH
H
1
=
_
||A||
2
0
0 C
_
.
Clearly we can take ||A||
2
=
1
in the proof. To nish we can proceed by induction.
Assuming that we have SVDs for all matrices of size (m1) (n1) and smaller,
let C = U
1
1
V
H
1
be the SVD of C. Then it is clear that
A = H
2
H
_
1 0
0 U
1
_
. .
U
_
||A||
2
0
0
1
_
. .
_
1 0
0 V
1
_
H
H
1
. .
V
H
.
Check that U and V in the above formula are unitary and that has the desired Exercise 175
diagonal structure with real non-negative entries on the main diagonal.
For the base case of the induction it is sucient to write down the SVD of an empty
(either rows or columns) matrix
A = I 0I
H
.
Check that this base case is sucent. Exercise 176
The only thing left to check is that the diagonal entries in are in decreasing order.
The easy way out is to say that if they are not in decreasing order then we can
apply two permutation matrices from the left and right to correct the order and
note that permutations are unitary. But it is more informative to note instead that
||C||
2
||A||
2
.
This follows from the following more general fact.
_
A 0
0 B
_
p
= max (||A||
p
, ||B||
p
) ,
for 1 p .
The columns of U are called the left singular vectors of A, while the columns of
V are called the right singular vectors. The
i
are called the singular values
of A.
Let Exercise 178
A =
_
_
a
11
0
0 a
22
.
.
.
.
.
.
.
.
.
.
.
.
_
_
mn
.
42
Show that ||A||
p
= max
1imin(m,n)
|a
ii
| for 1 p .
Show that ||A||
2
=
1
and ||A||
2
F
=
2
1
+ +
2
min(m,n)
, where
i
are the singular
Exercise 179
values of A.
43
4.3 Orthogonal Subspaces
Two subspaces U and W of F
n
are said to be orthogonal to each other if every vector 80 Orthogonal
subspaces in U is orthogonal to every vector in W. This is denoted by U W.
Show that U W = {0} if U W. Exercise 180
The orthogonal complement of the set U is the set of all vectors that are orthogonal 81 Orthogonal
Complement
to all vectors in U. It is denoted as U
.
Show that U U
. Exercise 181
Let U = ( U
1
U
2
) be an n n unitary matrix. Show that Exercise 182
The columns of U
1
form an orthonormal basis for R(U
1
)
R(U
1
) = R(U
2
)
(R(U
1
)
= R(U
1
)
R(U
1
) R(U
1
)
= C
n
Let the SVD of A be partitioned as follows
A = UV
H
= ( U
1
U
2
)
_
1
0
0 0
_
( V
1
V
2
)
H
,
where
1
R
rr
is a non-singular diagonal matrix. That is,
1

2

r
> 0.
Show that A = U
1
1
V
H
1
. Exercise 183
This is sometimes called the economy SVD of A.
Show that U
1
and V
1
are full column-rank matrices (rank r). Exercise 184
The SVD gives a full description of the geometry of the four fundamental subspaces
associated with the matrix A.
AV
1
= U
1
1
AV
2
= 0
U
H
1
A =
1
V
H
1
44
U
H
2
A = 0
R(A) = R(U
1
)
R(A
H
) = R(V
1
)
R(V
2
) = N(A)
R(U
2
) = N(A
H
)
R(A
H
) = N(A)
R(A)
= N(A
H
)
rank(A) = r, the number of non-zero singular values of A
Let U denote a subspace of C
n
. Construct an orthonormal basis for U from one of Exercise 186
its basis using the SVD.
Let U be a subspace of C
n
. Show that Exercise 187
U
= U
U U
= C
n
The orthogonal projector onto the subspace U is dened to be a linear operator 82 Orthogonal
Projector P
U
with the following properties
N(P
U
) = U
P
U
u = u for all u U
Show that orthogonal projectors are idempotent: P
2
U
= P
U
. Exercise 188
Show that P
U
is unique for a given U. Exercise 189
Let U = ( U
1
U
2
) be a unitary matrix. Show that U
1
U
H
1
= P
R(U
1
)
. Exercise 190
Show that orthogonal projectors are Hermitian. Exercise 191
Construct an idempotent matrix that is not an orthogonal projector. These are Exercise 192
called oblique projectors.
Let P be a Hermitian idempotent matrix. Show that P = P
R(P)
. Exercise 193
Let U be a subspace of C
n
. Show that every x C
n
has a unique decomposition of Exercise 194
the form x = u +w where u U and w U
. Hint: u = P
U
x.
min
uU
||x u||
2
= ||x P
U
x||
2
.
45
4.4 Minimum norm least-squares solution
The LU factorization solved completely the question of nding all solutions of the
system of equations Ax = b, where x is unknown. However there is something
unsatisfactory in that solution. Generically, skinny systems will almost surely have
no solutions, while fat systems will almost surely have innitely many solutions.
Since both these cases are frequent in engineering a more informative approach is
necessary.
Let x
LS
be such that Exercise 196
min
y
||Ay b||
2
= ||Ax
LS
b||
2
.
Show that Ax
LS
= P
R(A)
b, and hence unique. Give an example where x
LS
is not
unique.
Let
X
LS
= {x : min
yR
n
||Ay b||
2
= ||Ax b||
2
}.
A subset X of a vector space V is said to be ane linear if there exists a vector 83 Ane Linear
v V such that the set {x v : x X} is a subspace.
Show that X
LS
is a ane linear set. Exercise 197
Show that there is a unique solution to Exercise 198
min
uX
||x u||
2
,
where X is an ane linear set. Hint: Exercise 195.
Let 84 Minimum
Norm Least
Squares solution
x
MNLS
= argmin
xX
LS
||x||
2
.
Then x
MNLS
is called the minimum norm least squares solution of the system of
equations Ax = b.
Let A = U
r
r
V
H
r
denote the economy SVD of A. Then
x
MNLS
= V
r
1
r
U
H
r
b.
Let 85 Pseudo-inverse
46
=
_
r
0
0 0
_
with
r
a non-singular diagonal matrix. Then we dene the pseudo-inverse of
(denoted by superscript ) as
=
_
1
r
0
0 0
_
.
More generally, if A = UV
H
is the SVD of A we then dene A
= V
U
H
.
The above denition may be ambiguous since the SVD of A is not unique.
Show that A
= V
r
1
r
U
H
r
, using the economy SVD of A. Exercise 200
Therefore x
MNLS
= A
b. This can be used to dene the pseudoinverse uniquely.

AA
= P
R(A)
A
A = P
R(A
H
)
AA
A = A
A
AA
= A
Roger Penrose showed that the pseudo-inverse is the unique solution to these four
equations. However, we will take a dierent path.
A map A : V W, between two vector spaces V and W over the eld F is said to 86 Linear Map
be linear if A(x + y) = A(x) + A(y) for all , F and all x, y V.
Let A : V W be a linear map between two vector spaces. Let v
1
, , v
n
be a 87 Matrix Rep-
resentation basis for V. Let w
1
, , w
m
be a basis for W. Dene the mn unique numbers A
ij
by the equation A(v
j
) =
m
j=1
w
i
A
ij
. Then we call A the matrix representation of
A for the given bases.
Why is A unique? Exercise 202
Suppose x V, and b W, have the representations x =

n
j=1
x
j
v
j
and b =
Exercise 203
m
i=1
b
i
w
i
, and A(x) = b. Then show that Ax = b.
Let U, V and W be vector spaces over the eld F. Let A : U V and B : V W Exercise 204
be two linear maps. Show that B A : U W is a linear map.
47
If xed bases are used for U, V and W, then show that BA is a matrix representation Exercise 205
for B A.
Show that A C
mn
is a one-to-one onto linear map from R(A
H
) to R(A). Call Exercise 206
this map B : R(A
H
) R(A).
Show that in the appropriate bases for R(A
H
) and R(A),
r
is a matrix represen- Exercise 207
tation of B.
Dene the map C : C
n
C
m
as follows: C(b) = B
1
(P
R(A)
b). Show that A
is a Exercise 208
matrix representation of C.
This shows that the pseudo-inverse is uniquely dened.
Why? Exercise 209
48
4.5 Problems
The SVD usually costs about 10 times as much as an LU fcatorization. A good
substitute is the QR factorization.
Let A C
mn
with m n. Show that theres exists a unitary matrix Q such that Problem 5
A = Q
_
R
0
_
,
where R is upper triangular with non-negative diagonal entries. Hint: This is similar
to the construction of the SVD, but simpler.
Let A be a full column-rank matrix. Show that Problem 6
A
= (A
H
A)
1
A
H
= ( R
1
0) Q
H
Let A C
mn
with n m. Show that theres exists a unitary matrix Q such that Problem 7
A = ( L 0) Q,
where L is lower triangular with non-negative diagonal entries
Let A be a full row-rank matrix. Show that Problem 8
A
= A
H
(AA
H
)
1
= Q
H
_
L
1
0
_
Find the shortest distance between two innite straight lines in R
n
. Problem 9
Show that ||A||
F

_
rank(A)||A||
2
. Problem 10
49
5 Spectral Theory
In principle we have covered everything for solving systems of linear equations.
However, our techniques (meaning LU factorization) do not generalize (yet?) to
innite-number of equations. A host of dierent techniques have been developed for
handling this case. Spectral methods are among the most powerful of these.
Examples of innite number of equations include dierential and dierence equa-
tions, and it was in their analysis that spectral theory was rst born.
5.1 Spectral Decompositions
In this section, unless mentioned otherwise, all matrices will be assumed to be square.
Show that dim(C
nn
) = n
2
. Exercise 210
We will assume that A
0
= I and that A
k+1
= AA
k
for k 1. If A is invertible we
will dene A
k
= (A
1
)
k
for k 0.
Let p(x) =
N
n=0
a
n
x
n
. Dene p(A) =
N
n=0
a
n
A
n
. 88 Polynomial
of a matrix
For this denition to be useful, we need to ensure that dierent ways of dening the
same polynomial yield the same value when evaluated at a matrix. For example, if
q and r are polynomials, we would like that q(A)r(A) = r(A)q(A) = (qr)(A) for
all square matrices A.
Prove that it is so. Exercise 211
For every square matrix A there is a complex number such that AI is singular. Lemma 1
Proof.
For a given A C
nn
, show that there exist n
2
+ 1 complex numbers
i
, for Exercise 212
0 i n
2
, not all zero, such that
n
2
i=0
i
A
i
= 0.
Let p(x) =

n
2
i=0
i
x
i
be the corresponding polynomial. Let M 1 be its degree.
(Why not 0?). It is well known that p can be factored as
50
p(x) =
M
i=0
(x
i
),
for M complex numbers
i
(possibly indistinct). It follows that
p(A) =
M
i=0
(A
i
I) = 0.
Make sure you understand why exactly this is true. Exercise 213
Since the product of two square non-singular matrices is non-singular (why?) it
follows that there exists some i for which A
i
I is singular.
For every square matrix A there exists a unitary matrix Q and an upper-triangular 89 Schur de-
composition
matrix R such that A = QRQ
H
.
This is the computationally stable factorization in spectral theory, and hence of
great practical signicance.
Proof. The proof is by induction. For 1 1 matrices the theorem is obviously
true: A = IAI
H
. Assume it is true for all matrices of size (n 1) (n 1) or
smaller. Let A C
nn
. Let be a complex number such that A I is singular.
Let v N(A I) be of unit length: ||v||
2
= 1. Choose a Householder transform
H such that Hv = e
1
(where e
i
denotes column i of the identity matrix). Then it
is easy to see that
HAH
H
=
_
b
H
0 C
_
.
By the inductive assumption C = Q
1
R
1
Q
H
1
, where Q
1
is unitary and R
1
is upper
triangular. It follows that
A = H
H
_
1 0
0 Q
1
_
. .
Q
_
b
H
Q
1
0 R
1
_
. .
R
_
1 0
0 Q
1
_
H
H
H
. .
Q
H
,
In general the diagonal entries of R are arbitrary complex numbers. However, we
can impose some order on them that is of signicance.
Suppose A = VBV
1
. Show that trace(A) = trace(B). Exercise 216
51
Let Lemma 2
R =
_
1

0
2
_
.
There exists a unitary matrix Q such that
QRQ
H
=
_
2

0
1
_
.
Proof. There is nothing to prove if
1
=
2
. So we consider the case
1
=
2
.
Choose v such that Rv =
2
v and ||v||
2
= 1.
Find v explicitly. Exercise 217
Choose a Householder transform H such that Hv = e
1
.
Find H explicitly. Exercise 218
Then we can choose Q = H.
A strictly upper triangular matrix is an upper triangular matrix with zero entries 90 Strictly up-
per triangular on the diagonal.
For every square matrix A there is an unitary matrix Q such that A = QRQ
H
Lemma 3
with
R =
_
_
_
_
_
R
11
R
12
R
1M
0 R
22
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
R
M1,M
0 0 R
MM
_
_
_
_
_
,
where R
ii
=
i
I+
R
ii
with

R
ii
being a strictly upper triangular matrix, and
i
=
j
for i = j.
Proof. The proof follows from a simple observation. Suppose two adjacent diagonal
entries in the matrix R from the Schur decomposition are distinct
52
R =
_
_
_
_
_
_
_
_
_
_
_
_
_

0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

1

.
.
.
.
.
.
.
.
.

2

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
.
Then we can nd a unitary transform H such that
H
H
RH =
_
_
_
_
_
_
_
_
_
_
_
_
_
_

0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

2

.
.
.
.
.
.
.
.
.

1

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
,
where denotes elements that are not pertinent to the argument.
Prove this using Lemma 2. Exercise 220
The rest of the proof follows now by using this observation repeatedly in a bubble-
sort like operation to move the diagonal entries of R into the right positions.
Provide the details. Exercise 221
This extended version of the Schur decomposition is usually rened even further to
facilitate theoretical arguments. In particular we would like to make R as diagonal
as possible. Unfortunately, just using a single unitary transformation, the Schur
decomposition is the best we can do.
Let Lemma 4
R =
_
R
1
B
0 R
2
_
,
where R
i
=
i
I + strictly upper triangular matrix, and
1
=
2
. Then there exists
a non-singular matrix V such that
53
R = V
_
R
1
0
0 R
2
_
V
1
.
Proof.
Show that there exists a unique solution X, to the system of equations Exercise 222
R
1
XXR
2
+B = 0.
Show that there exists a unique solution X to the equation Exercise 223
_
I X
0 I
__
R
1
B
0 R
2
__
I X
0 I
_
=
_
R
1
0
0 R
2
_
.
Finish the proof of the lemma. Exercise 224
We will use the following notation for block diagonal matrices 91 Block diagonal
diag{R
i
}
n
i=1
=
_
_
_
_
R
1
0 0
0 R
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
0 0 R
n
_
_
_
_
.
For every square matrix A there exists a non-singular matrix V such that Lemma 5
V
1
AV = diag{R
i
}
M
i=1
,
where R
i
=
i
I +

R
i
,

R
i
are strictly upper triangular matrices, and
i
=
j
for
i = j.
Proof. Use Lemma 4 repeatedly.
Fill in the details of the proof. Exercise 225
The question is can we pick the non-singular matrix V in the above lemma so as
to make R
i
a true diagonal matrix? The answer, unfortunately, is no. However,
we can come pretty close: we can make it a bi-diagonal matrix with only zeros and
ones on the super-diagonal.
A Jordan block is a matrix of the form I
n
+ Z
n
, where Z
n
is the n n shift up 92 Jordan block
matrix
Z
n
=
_
_
_
_
_
_
0 1 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
.
.
.
.
.
.
1
0 0
_
_
_
_
_
_
nn
.
54
A matrix A is said to be nilpotent is there is a nite integer k for which A
k
= 0. 93 Nilpotent
Show that Z
n
n
= 0, and hence nilpotent. Exercise 226
Let R be a nilpotent matrix. Then there exists a non-singular matrix V such that 94 Jordan de-
composition
R = V
_
diag{Z
n
i
}
M
i=1
_
V
1
.
Proof. Let p be the smallest integer such that R
p
= 0. If p = 0 we are done.
(Why?) So assume p > 1. Clearly there exists a w such that R
p1
w = 0. Form the
right Jordan chain
w, Rw, R
2
w, , R
p1
w,
and stick them into the matrix
W = ( R
p1
w R
p2
w Rw w) .
RW = WZ
p
. (5.1)
We claim that W has full column-rank. To see this consider Wx = 0.
Multiplying this equation by R
p1
we get R
p1
Wx = 0. From this equation infer Exercise 228
that x
p
= 0.
Next multiply by R
p2
to obtain R
p2
Wx = 0 and infer that x
p1
= 0. Exercise 229
Proceed to establish that x = 0 and hence that W has full column-rank. Exercise 230
Next we construct the matching left Jordan chain. To do so we rst nd a vector
y such that
y
H
W = e
H
1
,
where e
i
is column i of the identity matrix.
Why is this possible? Exercise 231
Now form the left Jordan chain
y
H
, y
H
R, y
H
R
p1
,
and stick them into the matrix
55
Y
H
=
_
_
_
_
y
H
y
H
R
.
.
.
y
H
R
p1
_
_
_
_
.
Y
H
W = I.
This also establishes that Y has full column-rank.
Why? Another way is to imitate the corresponding proof for W. Exercise 233
Y
H
R = Z
p
Y
H
. (5.2)
Next we nd a non-singular matrix G such that
G
1
W =
_
I
0
_
and G
H
Y =
_
I
0
_
.
There are many ways to construct G. We do it in two stages.
Use the SVD of W to nd a non-singular matrix F such that Exercise 235
F
1
W =
_
I
0
_
.
Hint: Make a small modication to the construction of W
(which is not invertible).

Since Y
H
FF
1
W = I, it follows that
Y
H
F = ( I Y
H
2
) .
Now observe that block Gaussian elimination
( I Y
H
2
)
_
I Y
H
2
0 I
__
I Y
H
2
0 I
__
I
0
_
= I,
provides the necessary correction and we obtain
G = F
_
I Y
H
2
0 I
_
.
Verify that this G does indeed satisfy all the desired properties. Exercise 237
56
Using this G we convert Equations 5.1 and 5.2 into
_
G
1
RG
_
_
I
0
_
=
_
Z
p
0
_
( I 0)
_
G
1
RG
_
= ( Z
p
0) .
Verify these formulas. Exercise 238
From this we can verify that G
1
RG is a 2 2 block diagonal matrix, with the
(1, 1)-block being Z
p
. Now we can proceed by induction to handle the 2 2 block.
Complete the proof. Exercise 239
To summarize the nal Jordan decomposition theorem says that for every square
matrix A there exists a non-singular matrix V such that V
1
AV is a block diagonal
matrix where each block is of the form
I +diag{Z
n
i
}
M
i=1
,
where , n
i
and M can vary from block to block.
57
5.2 Invariant subspaces
Jordan chains made a magical appearance in the proof. A good way to see how they
arise is to consider the uniqueness of the decomposition.
A complex number such that AI is singular is called an eigenvalue of A. 95 Eigenvalue
A non-zero column vector v is said to be an eigenvector associated with the eigen- 96 Eigenvector
value of the matrix A if Av = v.
A subspace V is said to be an invariant subspace of the matrix A if for every v V 97 Invariant
subspace we have Av V.
A matrix A is said to be similar to a matrix B if there exists a non-singular matrix 98 Similarity
transformation V such that A = VBV
1
. We also say that A and B are related by a similarity
transformation.
Show that if is an eigenvalue of A then it is also an eigenvalue of VAV
1
. Exercise 240
Show that is an eigenvalue of the upper triangular matrix R i is one of the Exercise 241
diagonal entries of R.
The eigenvalues of a matrix A are exactly the numbers that arise on the diagonal Lemma 6
of the upper-triangular matrix R in the Schur decomposition of A.
Show that the trace of two similar matrices are equal. Exercise 242
Consider the matrix Example 4
R =
_
_
1 3 4
0 1 5
0 0 2
_
_
.
It is clear that the eigenvalues can only be the numbers 1 and 2.
But is the above matrix similar to
S =
_
_
1 3 4
0 2 5
0 0 2
_
_
?
Show that the two matrices dened above, R and S, are not similar to each other. Exercise 243
This raises the question of uniqueness of the eigenvalues. It is clear that the distinct
numbers that comprise the eigenvalues of a matrix are unique. (Why?) But, what is
not clear is if their multiplicities as they occur on the diagonal of the upper-triangular
58
matrix in the Schur decomposition are unique. The above example seems to suggest
that they must be unique, and we will proceed to establish it. The idea is to show
that the multiplicity of an eigenvalue has a unique geometrical interpretation. We
will actually show much more. We will show that the number and size of the Jordan
blocks asscoiated with the unique eigenvalue are also unique.
For the rest of this section let A = VJV
1
denote a Jordan decomposition of the
matrix A. Furthermore let
i
for i = 1, . . . , N, denote the distinct eigenvalues
of A. Note that the
i
s are unique by our previous arguments. It is clear that
dim(N(A
i
I)) = M
i;1
is a well-dened positive number.
Show that M
i;1
denotes the number of Jordan blocks of size greater than or equal to Exercise 244
one with eigenvalue
i
. Hint: J is upper triangular and J
i
I has some nilpotent
diagonal blocks, which are the only ones that matter in this calculation.
It follows that the number of Jordan blocks asscoiated with the eigenvalue is a
unique xed number. Note, this does not imply (right now) that the multiplicity of
is unique.
Now dene M
i;2
= dim(N(A
i
I)
2
). Again, M
i;2
is a well-dened unique positive
number.
Show that N(A
i
I) N(A
i
I)
2
and hence M
i;2
M
i;1
. Exercise 245
Show that M
i;2
M
i;1
is the number of Jordan blocks associated with the eigenvalue Exercise 246
i
that are of size greater than or equal to two. To do this compute a basis for
N(J
i
I) and a basis for N(J
i
I)
2
. Note that a basis for the latter subspace
can be obtained by extending the basis for for the former subspace with a few well-
chosen vectors that are associated with the null-vectors of Jordan blocks of size
greater than 1.
Conclude that the number of Jordan blocks of size 1 associated with the eigenvalue Exercise 247
i
is exactly 2M
i;1
M
i;2
, which is a unique well-dened non-negative number.
We now rinse and repeat to show that the blocks of bigger sizes must also be unique.
Let M
i;3
= dim(N(A
i
I)
3
).
Show that M
i;3
M
i;2
is the number of Jordan blocks of size greater than or equal Exercise 248
to 3 that are associated with the eigenvalue
i
.
Clearly we can keep this up and prove that the number and size of each Jordan
block is unique and well-dened for a given matrix.
Make sure that you understand clearly what is going on. Exercise 249
59
This only leaves the question of the uniqueness of matrix V in the Jordan decom-
position. Unfortunately the matrix is not fully unique. For example, the position of
the Jordan blocks inside J is not unique, thereby implying that the nmatrix V itself
is not unique. However, the columns of V and the rows of V
1
describe (are bases
for) certain invariant subspaces of A, and these invariant subspaces are unique. The
previous proof illustrates this point and we say no more about it.
60
5.3 Dierence Equations
So what can we do with spectral decompositions that we could not do with the
SVD? We have already seen examples, like the Stein equation, which can be more
eciently solved via spectral decompositions. However the classical examples are
innite sets of equations where spectral decompositions (for now at least) are the
only way.
Let u[n] C
N
for n = 0, 1, 2, . . ., be a sequence of unknown column vectors that
satisfy the constraints
u[n + 1] = Au[n] +f [n], (5.3)
where A C
NN
and f [n] C
N
and are both known quantities. The question is
to nd all sequences u[n] that satisfy the above constraints.
Write the above set of equations in the form Fx = b. Exercise 250
Note that there are an innite number of unknowns and equations. So, even though
the constraints are linear equations it is not easy to develop a procedure like Gaussian
elimination to nd the solutions. Fortunately it turns out that a spectral decompo-
sition of A is sucient.
The idea is to rst gure out the nullspace of the associated matrix. Consider the
so-called homogenous equations
u
h
[n + 1] = Au
h
[n], n 0.
It is clear that the only solutions are of the form
u
h
[n] = A
n
u
h
[0].
From this we can guess that a solution of the equations is
u
p
[n + 1] =
n
k=0
A
nk
f [k],
assuming u
p
[0] = 0.
Verify that u
p
does indeed satisfy the dierence equation 5.3. Exercise 251
Therefore the general solution is
u[n] = A
n
u[0] +A
n1
f [0] +A
n2
f [1] + +A
0
f [n 1].
Verify this. Exercise 252
61
This formula is a bit cumbersome to use. A simplication is available via the Jordan
decomposition A = VJV
1
.
Show that A
n
= VJ
n
V
1
. Exercise 253
Remember that J is block diagonal with each diagonal block of the form I + Z
p
.
Therefore we only need to gure out a formula for (I +Z
p
)
n
. (Why?)
Prove the binomial theorem Exercise 254
(a + b)
n
=
n
k=0
_
n
k
_
a
k
b
nk
for a, b C.
Show that if AB = BA then Exercise 255
(A+B)
n
=
n
k=0
_
n
k
_
A
k
B
nk
.
Show that (I +Z
p
)
n
is an upper triangular matrix with Exercise 256
n!
(n k)!k!
nk
as the entry in the k-th super-diagonal. So
n
is the entry on the main diagonal,
for example.
Using the Jordan decomposition develop a simple formula for V
1
u[n], the solution Exercise 257
of the dierence equation in terms of V
1
f .
62
5.4 Matrix-valued functions
We now dene dierentiation and integration of matrix-valued functions. Let A :
C C
mn
, denote a matrix-valued function of a single complex variable. This is
usually denoted as A(z). We dene
d
dz
A(z) to be an m n matrix whose (i, j)
entry is the derivative of the (i, j) entry of A(z). In other words we dene dier-
entiation component-wise. Sometimes we will use a super-script prime to denote
dierentiation: A
(z).
In a similar manner we dene
_
A(z)dz to be an m n matrix with the (i, j)

component being the corresponding integral of the (i, j) component of A(z). Note
that both dierentiation and integration are dened here for matrices of arbitrary
size of a single (potentially complex) variable.
d
dt
(A(t) +B(t)) =
d
dt
A(t) +
d
dt
B(t),
d
dt
(A(t)B(t)) =
_
d
dt
A(t)
_
B(t) +A
d
dt
B(t).
d
dt
A
1
(t) = A
1
(t)
_
d
dt
A(t)
_
A
1
(t).
Hint: AA
1
= I.
_
AB(t) Cdt = A
_
B(t) dt C,
when A and C are constant matrices.
A matrix-valued function A(t) is said to be continuous function of t if each com-
ponent A
ij
(t) is a continuous function of t. Suitable changes should be made for
continuous at a point and continuous on a set.
Let A(t) be a continuously dierentiable matrix-valued function on [0, 1]. Show that Exercise 261
_
1
0
d
dt
A(t) dt = A(1) A(0).
Let A(t) be a continous matrix-valued function on the interval [0, 1]. Show that Exercise 262
63
_
1
0
A(t) dt
_
1
0
||A(t)|| dt.
Hint: Use Riemann sums to approximate both sides and use the triangle inequality
satised by norms.
64
5.5 Functions of matrices
While it is possible to give more examples of innite sets of equations whose solution
is made accessible via spectral decompositions, we will take a more general point of
view in this section.
In Section 5.3 we saw the need to understand the internal structure of sums of
powers of matrices. In this section we place that in a larger context. Given an
analytic functions (like z
n
) how to evaluate that function at a given matrix A?
First we need some additional facts from complex analysis. See Section 3.6 for some
preliminary facts. Once more, for the next three exercises, engineering proofs are
good enough. Anything better requires substantially more machinery.
Extend Exercise 135 to show that if is some simple (not self-intersecting) Exercise 263
smooth closed curve in the open set in the complex plane, and f is analytic in
then
_
f(z)dz = 0. Hint: Use the fact that f(z) = F
(z) for some suitable analytic

function F. Can you suggest a candidate for F?
Extend Cauchys formula (Exercise 136) to the case where the contour of integration Exercise 264
is not necessarily a circle, but just a simple smooth closed curve:
f(a) =
1
2i
_
f(z)
z a
dz.
Hint: Starting with the circle deform it to the desired curve in pieces using the
previous exercise.
d
n
da
n
f(a) =
n!
2i
_
f(z)
(z a)
n+1
dz.
Let A be a square matrix. Let f be an analytic function in the open set . Let 99 f(A)
be a smooth closed curve in . Suppose all the eigenvalues of A lie inside the open
set bounded by . Then we dene
f(A) =
1
2i
_
f(z)(zI A)
1
dz.
Implicit in this denition is that the integral is well-dened and that the choice of
the curve is immaterial as long as it is simple, lies inside and encloses in its
strict interior all the eigenvalues of A.
Let A = VJV
1
denote the Jordan decomposition of A. Show that
65
V
1
f(A)V =
1
2i
_
f(z)(zI J)
1
dz.
Therefore it is enough to verify these assertions when A is a simple Jordan block.
(Why?)
Let J
p
() = I +Z
p
.
(zI J
p
())
1
=
_
_
_
1
z
1
(z)
2
1
(z)
3

0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
,
which is an upper-triangular Toeplitz matrix.
1
2i
_
f(z)(zI J
p
())
1
dz =
_
_
_
f()
f
()
1!
f
()
2!

0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
.
This clearly shows the independence of the denition of f(A) on the curve .
The Cauchy integral formula has a certain advantage for dening functions of ma-
trices: it is global. However Taylor series work better sometimes.
Let f(z) =

n=0
c
n
(z c)
n
for |z c| < R. Let all the eigenvalues of A lie inside
the circle = |z c| < R. Let denote a simple closed curve inside . Then for
any a inside the interior of it is clear that
f(a) =
n=0
c
n
(a c)
n
=
1
2i
_
f(z)(z a)
1
dz.
This suggests that f(A) =
n=0
c
n
(AcI)
n
should be true.
Show that f(z) =
n=0
f
(n)
(c)
n!
(z c)
n
, where f
(n)
denotes the n-th order derivative Exercise 268
of f.
Let A = VJV
1
denote the Jordan decompoition of A. Show that

n=0
c
n
(A Exercise 269
cI)
n
= V(
n=0
c
n
(J cI)
n
)V
1
.
Therefore it is sucient to check if the Talyor series can be used to evaluate f(A)
when all the eigenvalues of A lie inside the circle of convergence.
66
n=0
f
(n)
(c)
n!
(J
p
() cI)
n
=
_
_
_
f()
f
()
1!
f
()
2!

0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
.
This show that the Taylor series expansion can be used to evaluate f(A) but only
when the eigenvalues lie inside the circle of convergence.
Let f(z) =

z. Unfortunately

z is multi-valued and we must specify a branch to Example 5
use. Let z = re
i
denote the polar decomposition of the complex number z with
< . Pick the branch for the square-root such that f(re
i
) =

re
i/2
;
that is, f(z) lies in the right-half plane. Note that f(z) is discontiuous across the
negative real line. Therefore the negative real line is called the branch cut for f(z).
Let
A =
_
1 0
0 1
_
.
Note that the eigenvalues of A are 1 and 1. Clearly the eigenvalues of A do not
lie in an open set in which f(z) is analytic. Therefore neither Cauchys formula
nor Taylor series expansions can be used to evaluate f(A) in this case. However, if
we just want to solve the equation B
2
= A, then it is easy to write down several
solutions
B =
_
f
1
(1) 0
0 f
2
(1)
_
,
where f
1
and f
2
can be two dierent branches of the square root function. This
corresponds to picking the branch cut in such a way as to avoid all the eigenvalues
of A and allowing them to lie in a single connected open region.
Entire functions, functions that are analytic in the entire complex plane, do not
suer from this problem. Both Cauchys formula and Taylor series expansions will
always work. The most common examples of entire functions are the exponential,
sine and cosine.
Another exmaple of a multi-valued function is the logarithm. Again, depending on
the location of the eigenvalues either Taylor series (less often), or Cauchys integral
formula (more often), can be used. If both fail to be applicable then the branch cut
must be adjusted suitably.
67
5.6 Dierential equations
Let u(t) be a vector-valued function of the real variable t. Our objective is to nd
u(t) that satises the dierential equation
d
dt
u(t) = Au(t) +b(t), t > 0, (5.4)
where A is a constant matrix and b(t) is a known vector-valued function.
First some auxiliary facts.
Suppose tA has all its eigenvalues inside where f is analytic. Show that Exercise 271
d
dt
f(tA) = f
(tA)A = Af
(tA).
The proof is quite easy if you use a Taylor series expansion, but not general enough.
In general you have to use Cauchys formula and the fact that since the integral is
absolutely converging you can dierentiate inside the integral.
We rst look at the homogenous equation
d
dt
u
h
(t) = Au
h
(t), t > 0.
Verify that a solution is u
h
(t) = e
tA
u
h
(0). Exercise 272
With a little eort one can establish that this is the only solution. One approach is
to use the Jordan decomposition to reduce the problem to a set of single variable
ODEs and appeal to the scalar theory. Here we take an approach via Picard iteration
that also generalizes to non-constant coecient ODEs.
Let [0, T] be the interval over which a solution to the ODE
d
dt
u
u
(t) = Au
u
(t), u
u
(0) = 0,
exists. If we can show that u
u
(t) = 0 then we would have established uniqueness.
(Why?). Since the derivative of u
u
exists, it must be continuous. Let ||u
u
(t)|| L <
for t [0, T].
u
u
(t) =
_
t
0
Au
u
(s)ds.
68
||u
u
(t)|| t||A||L.
Hint: See Exercise 262.
Repeat the above argument and show that Exercise 275
||u
u
(t)||
t
n
||A||
n
n!
L, n 1.
Conclude that u
u
(t) = 0 for t [0, T]. Exercise 276
Now that we have uniqueness, we can look at the form of the homogenous solution
and guess that a particular solution of the dierential equation is
u
p
(t) =
_
t
0
e
(ts)A
b(s)ds,
assuming that u
p
(0) = 0.
Show that e
(t+s)A
= e
tA
e
sA
= e
sA
e
tA
. Since the exponential is an entire function Exercise 277
an easy proof is via a Taylor series expansion for the exponential function.
Show that e
0
= I. Exercise 278
Show that e
A
= (e
A
)
1
. Exercise 279
Verify that u
p
(t) is indeed a solution of equation 5.4. Exercise 280
Therefore the general solution to equation 5.4 is
u(t) =
_
t
0
e
(ts)A
b(s)ds + e
tA
u(0).
69
5.7 Localization of eigenvalues
One of the most important questions is how does f(A) change when we perturb A.
We already considered this question when f(x) = x
1
in Section 3.7. An obvious
idea is to use the Jordan decomposition to help make this estimate. For example
||f(A)|| = ||f(VJV
1
)|| ||V||||V
1
||||f(J)||.
But this upper bound can be wildly inaccuarate if (V) = ||V||||V
1
|| is very large.
However, better general-purpose estimates are hard to come by. So one approach is
to look for special classes of matrices for which (V) is small in a suitable norm.
Let
2
(V) = ||V||
2
||V
1
||
2
denote the 2-norm condition number of the matrix V.
Show that
2
(V) 1. Hint: Use the SVD. Exercise 281
Show that if V is a unitary matrix then
2
(V) = 1. Exercise 282
A matrix A is said to be normal if AA
H
= A
H
A. 100 Normal matrix
Show that unitary and orthogonal matrices are normal. Exercise 283
A matrix A is said to be 101 Symmetry
symmetric if A
T
= A
skew-symmetric if A
T
= A
Hermitian if A
H
= A
skew-Hermitian if A
H
= A
Show that Hermitian and skew-Hermitian matrices are normal. Exercise 284
Show that every matrix can be written uniquely as the sum of a Hermitian and a Exercise 285
skew-Hermitian matrix. Hint:
A =
A+A
H
2
+
AA
H
2
.
Let A be normal. Then there exists a unitary matrix Q and a diagonal matrix Theorem 2
such that A = QQ
H
.
In other words for normal matrices the Schur decomposition is also the Jordan
decomposition with each Jordan block being of size one. Furthermore there is a full
set of orthonormal eigenvectors.
70
Proof. The proof follows from the following fact.
If R is an upper triangular normal matrix then it is diagonal. Lemma 7
Prove the lemma. Hint: Write R as a 22 block matrix and solve the four resulting Exercise 286
equations.
Prove the theorem. Exercise 287
It follows that for normal matrices ||f(A)|| = ||f()||, where the diagonal entries
of are the eigenvalues of A. Therefore it becomes essential to locate, at least
approximately, the eigenvalues of A in the complex plane.
Let A be normal with Schur decomposition A = QQ
H
. Consider the expression
(Q
H
AQ)
H
and use it to prove the next three exercises.
Show that the eigenvalues of a unitary matrix must lie on the unit circle. Exercise 288
Show that the eigenvalues of a Hermitian matrix must be real. Exercise 289
Show that the eigenvalues of a skew-Hermitian matrix must be purely imaginary. Exercise 290
Show that the eigenvectors of a normal matrix corresponding to distinct eigenvalues Exercise 291
must be mutually orthogonal. Hint: Use the Schur decomposition.
Write down a family of normal matrices that is neither unitary nor Hermitian nor Exercise 292
skew-Hermitian. Hint: Use the Schur decomposition.
Show that e
skew-Hermitian
= unitary. Exercise 293
Show that e
i Hermitian
= unitary. Exercise 294
Let A be a square real matrix. Suppose is an eigenvalue of A with a non-zero Exercise 295
imaginary part.
Show that the corresponding eigenvector v, must have real and imaginary parts
that are linearly independent when considered as real vectors..
Show that

must also be an eigenvalue of A.
Show that an eigenvector for

can be constructed from v.
Show that a real orthogonal matrix with an odd number of rows and columns must Exercise 296
have either 1 or 1 as one of its eigenvalues.
71
5.8 Real symmetric matrices
Real symmetric matrices play the role of real numbers in matrix analysis.
Let A = A
R
+ iA
I
denote the real and imaginary parts of the m n matrix A. Exercise 297
Show that
T(A) =
_
A
R
A
I
A
I
A
R
_
,
is a faithful representation of the complex matrix A as a real matrix of twice the
size, in the sense that for all complex matrices A and B
T(A) = T(A)
T(A
H
) = T(A)
T
T(A+B) = T(A) + T(B)
T(AB) = T(A)T(B)
whenever the operations are well-dened.
T(unitary) = orthogonal
T(Hermitian) = symmetric
T(skew-Hermitian) = skew-symmetric
Let A be a real symmetric matrix. Then there exists a real orthogonal matrix Q Theorem 3
and a real diagonal matrix such that A = QQ
T
and
i,i

i+1,i+1
.
Proof. Just repeat the proof of the Schur decomposition and observe that you can
use orthogonal transforms instead of unitary transforms since the eigenvalues are
known to be real. Also, symmetry will help to directly produce a diagonal rather
than upper-triangular matrix.
Work out a detailed proof. Exercise 299
From now on we will use the notation
ii
=
i
for convenience.
Let A be a real mn matrix. Show that Exercise 300
||A||
2
= max
0=zC
n
||Az||
2
||z||
2
= max
0=xR
n
||Ax||
2
||x||
2
.
72
Hint: Exercise 177 might be useful.
Redo the proof of the SVD and show that if A is a real (possibly non-square) matrix, Exercise 301
then there exist real orthogonal matrices U and V such that A = UV
T
, with
having non-zero entries only on its principal diagonal, and
i,i

i+1,i+1
0.
Let A be a real symmetric matrix. Exercise 302
Let A = QQ
T
be its Schur decomposition. Show how to use it to write down
the SVD of A.
Let A = UV
T
be its SVD. Is it always possible to infer the Schur decomposition
directly from the SVD? Hint:
_
1 0
0 1
_
.
Let A be a mn matrix. Use the SVD of A to write down the Schur decomposition Exercise 303
of A
H
A and AA
H
. You cannot use these formulas to directly infer the SVD of A
from the Schur decompositions of A
H
A and AA
H
. Why?
Let A be an n n real symmetric matrix with eigenvalues
i
. Show that for real Exercise 304
x = 0
n

x
T
Ax
x
T
x

1
.
Hint: Use the Schur decomposition to convert the Rayleigh quotient (the frac-
tional middle term above) into the form
y
T
y
y
T
y
.
Let A be a real n n symmetric matrix with eigenvalues
i
in decreasing order 102 Courant-Fischer
i

i+1
. Then
k
= max
dim(U)=k
min
0=xU
x
T
Ax
x
T
x
.
Proof.
Use Exercise 304 to prove the theorem for k = 1 and k = n. Exercise 305
Now x k to be a number between 1 and n. Let q
i
denote column i of the matrix
Q from the Schur decomposition of A = QQ
T
. First pick U = span{q
1
, , q
k
}.
73
Show that for this choice of U Exercise 306
min
0=xU
x
T
Ax
x
T
x
=
k
.
Hint: Note that Aq
i
=
i
q
i
. Then look at Exercise 304.
It follows that
max
dim(U)=k
min
0=xU
x
T
Ax
x
T
x

k
.
Next let U be any subspace of R
n
of dimension k. Consider the subspace V =
span{q
k
, . . . , q
n
}. Since dim(U) = k and dim(V) = n k + 1, it follows that
U V = {0}.
Show that dim(U V) 1. Exercise 307
Pick a non-zero z U V. It can be represented as z =
n
i=k

i
q
i
.
z
T
Az
z
T
z

k
.
Hint: Use Exercise 304.
From this it follows that for any k-dimensional subspace U of R
n
min
0=xU
x
T
Ax
x
T
x

k
.
Therefore it follows that
max
dim(U)=k
min
0=xU
x
T
Ax
x
T
x

k
.
Therefore the theorem is true.
k
= min
dim(U)=nk+1
max
0=xU
x
T
Ax
x
T
x
.
Hint: Consider A.
We can now derive a perturbation result for the eigenvalues of real symmetric ma-
trices.
74
Let A and E be real symmetric n n matrices. Let
i
(A) denote the eigenvalues Theorem 4
of A in decreasing order. Then
i
(A) +
n
(E)
i
(A+E)
i
(A) +
1
(E).
This shows that the eigenvalues of real symmetric matrices depend continuously
on the matrix entries as long as the change leaves the matrix real and symmetric.
Furthermore it shows that the eigenvalues of a real symmetric matrix are well-
conditioned with respect to absolute perturbations.
Proof. Let A = QQ
T
denote the Schur decomposition of A with eigenvalues in
decreasing order. Let q
i
denote column i of Q and let U
k
= span{q
k
, . . . , q
n
}.
Using the min-max version of the Courant-Fischer theorem in Exercise 309 to es- Exercise 310
tablish that
k
(A+E) max
0=xU
k
x
T
Ax
x
T
x
+ max
0=xU
k
x
T
Ex
x
T
x
.
From this infer that Exercise 311
k
(A+E)
k
(A) +
1
(E).
From this infer that Exercise 312
k
(A+E)
k
(A) +
n
(E)
Hint: You can use the previous inequality with A A + E and E E, or you
can repeat the earlier argument with the max-min version of the Courant-Fischer
theorem.
Show that ||A||
2
= max{|
1
(A)|, |
n
(A)|}, when A is a real nn symmetric matrix, Exercise 313
with eigenvalues in decreasing order.
Show that |
i
(A+E)
i
(A)| ||E||
2
, when A and E are real symmetric matrices Exercise 314
with eigenvalues in decreasing order.
Next we consider perturbations that can change the size of the matrix.
Let A be a real n n symmetric matrix partitioned as follows 103 Cauchy Inter-
lacing Theorem
A =
_
B c
c
T
_
,
where is a real number. Then
75
n
(A)
n1
(B)
k
(B)
k
(A)
k1
(B)
1
(B)
1
(A).
Proof. Let B = QQ
T
denotes the Schur decomposition of B with eigenvalues in
decreasing order. Let q
i
denote column i of Q. Dene the range space
U
k
= R
_
q
k1
. . . q
n1
0 0
_
.
Note that there are only n 1 columns in Q.
Using the min-max version of the Courant-Fischer theorem show that Exercise 315
k
(A) max
0=xU
k
x
T
Ax
x
T
x
=
k1
(B).
Either apply the previous inequality to A and establish that Exercise 316
k
(B)
k
(A),
or repeat the argument with the max-min version of the Courant-Fischer theorem.
76
5.9 Cholesky factorization
While the Schur decomposition reveals a lot about symmetric matrices, it is hard
to compute since in general there are no closed-form formulas.
A matrix A is said to be positive semi-denite if x
H
Ax 0 for all x. 104 Positive semi-
denite
Show that if a matrix is Hermitian positive semi-denite then the diagonal entries Exercise 317
are non-negative.
A matrix B is said to be a principal sub-matrix of the matrix A if there exists a 105 Principal
sub-matrix permutation P such that
A = P
_
B

_
P
T
.
Show that every principal sub-matrix of a positive semi-denite matrix is positive Exercise 318
semi-denite.
Show that the eigenvalues of a Hermitian positive semi-denite matrix are non- Exercise 319
negative.
Show that if AA
H
is a Hermitian positive semi-denite matrix. Exercise 320
Show that every Hermitian positive semi-denite matrix can be written in the form Exercise 321
AA
H
for some suitable A. Hint: Use the Schur decomposition.
A matrix A is said to be positive denite if x
H
Ax > 0 for all x = 0. 106 Positive denite
Repeat the previous exercises with suitable modications for Hermitian positive Exercise 322
denite matrices.
Let A be a Hermitian positive denite matrix. Then there exists a non-singular 107 Cholesky fac-
torization
lower-triangular matrix G with positive diagonal entries such that A = GG
H
.
Proof. The proof is a repetition of the LU factorization proof, except that it does
not require the use of permutations.
Furnish the proof. Exercise 323
77
5.10 Problems
Let A be a real (possibly non-square) matrix. Let Problem 11
B =
_
0 A
T
A 0
_
.
Show that B is a real symmetric matrix. Show that the Schur decomposition of B
can be written in terms of the SVD of A. Hint: You can nd a permutation such
that
_
0
T
0
_
T
,
is a block diagonal matrix with each block of size 2 2 at most.
Let A and E be real (possibly non-square) matrices. Let
i
(A) denote the singular Problem 12
values of A in decreasing order. Show that
|
i
(A+E)
i
(A)| ||E||
2
.
Let
i
denote the singular values of A. Show that
k+1
is the 2-norm distance of A Problem 13
to the nearest rank-k matrix.
Let A be an mn real matrix partitioned as follows Problem 14
A =
_
B
c
T
_
,
where c is a real column vector. Show that

k
(B)
k
(A)
k1
(B)
1
(B)
1
(A)
where
i
(A) denotes the singular values of A in decreasing order.
Use the real and imaginary parts of the SVD of A, to write down the real SVD of Problem 15
the real matrix T(A), where T is dened in Exercise 297.
WielandtHoman. This problem is quite challenging. Let A and B be n n Problem 16
normal matrices. Let
i
(A) denote the eigenvalues of A. Show that
min
Permutations
n
i=1
|
i
(A)
(i)
(B)|
2
||E||
2
F
.
Show that Problem 17
min
XC
nm
||AXI||
F
= ||AA
I||
F
.
78
6 Tensor Algebra
In this chapter we consider the case when both entries in Aand x must be considered
as variables in the expression Ax. In general more terms could be involved in the
product; so we are concerned with multi-linear analysis.
6.1 Kronecker product
Again we prefer to introduce Kronecker products of matrices as a direct concrete
realization of tensor products.
Let A and B be two matrives. We dene the tensor or Kronecker product as follows 108 Kronecker prod-
uct
AB =
_
_
A
11
B A
12
B
A
21
B A
22
B
.
.
.
.
.
.
.
.
.
_
_
.
Show that if x and y are column vectors then Exercise 324
xy
H
= x y
H
= y
H
x.
Give an example where AB = BA. Exercise 325
Show that there are permutations P
1
and P
2
such that AB = P
1
(BA)P
2
. Exercise 326
(A) B = (AB) = A(B).
(A+B) C = AC+BC.
A(B+C) = AB+AC.
(AB) C = A(BC)
(AB)(CD) = (AC) (BD)
(AB)
H
= A
H
B
H
I I = I
(AB)
1
= A
1
B
1
Hermitian Hermitian = Hermitian
79
Unitary Unitary = Unitary
Hermitian Skew-Hermitian = Skew-Hermitian
Skew-Hermitian Skew-Hermitian = Hermitian
Upper-triangular Upper-triangular = Upper-triangular
d
dt
(A(t) B(t)) =
d
dt
A(t) B(t) +A(t)
d
dt
B(t)
Let A = UV
H
and B = XY
H
be SVDs. Show that the SVD of AB is given Exercise 328
by
(UX)( )(VY)
H
.
Show that rank(AB) = rank(A) rank(B). Exercise 329
Let A = VJV
1
and B = WGW
1
denote Jordan decompositions. Show that Exercise 330
AB = (VW)(J G)(VW)
1
.
Conclude that
i
(AB) =
r
(A)
s
(B). Note that this is not a Jordan decompo-
sition.
Let A be an mm matrix and B be an n n matrix. Show that Exercise 331
trace(AB) = trace(A) trace(B).
(AI
n
)(I
m
B) = AB = (I
m
B)(AI
n
).
diag{A
i
}
n
i=1
diag{B
j
}
m
j=1
= diag{diag{A
i
B
j
}
m
j=1
}
n
i=1
.
80
6.2 Tensor Product Spaces
At this point it is a good idea to look at the vector space structure of tensor products.
We will avoid an abstract approach (since I dont want to dene dual spaces).
Let F
i
j
denote vector spaces for positive integers i
1
, i
2
, . . ., i
n
. We dene the tensor
product of these vector spaces via the formula
n
j=1
F
i
j
= F
i
1
F
i
2
F
i
n
= span{
n
j=1
x
j
| x
j
F
i
j
, j = 1, . . . , n}.
Remember that span only allows nite linear combinations of its elements. There-
fore an arbitrary element of
j
F
i
j
can be written in the form

l
k=1
k

n
j=1
x
kj
,
where x
kj
F
i
j
.
Show that
n
j=1
F
i
j
is a sub-space of F
n
j=1
i
j
, where
n
j=1
i
j
= i
1
i
2
i
n
.
Exercise 333
Actually
n
j=1
F
i
j
= F
n
j=1
i
j
. We will prove this by constructing a suitable basis.
However, to keep the notation simple we will concentrate on the important case
when i
j
= m for all j. In this case we will use the notation
n
F
m
.
Show that if
i
x
i
= 0 then at least one of x
i
= 0. Exercise 334
Show that there is a vector in
2
R
2
that is not of the formxy. Hint: ( 1 1 1 0 )
T
. Exercise 335
At this point it is useful to introduce some notation about multi-indices. Let I
denote the n-tuple (i
1
, i
2
, . . . , i
n
) where 1 i
j
m. We will then use the notation
iI
x
i
=
n
j=1
x
i
j
.
We will assume that n-tuples I are ordered lexicographically; that is,
(i
1
, i
2
, . . . , i
n
) < (j
1
, j
2
, . . . , j
n
),
i i
k
= j
k
for k = 1, . . . , l, and i
l+1
< j
l+1
.
Let e
i
denote column i of the identity matrix. The length of e
i
will be apparent from
the context. Note that multiple occurences of e
i
in the same formula can denote
column vcetors of dierent lengths.
It is easy to check that the m
n
vectors
iI
e
i
= e
I
,
form an orthonormal basis for
n
F
m
.
81
Check this claim. Exercise 336
Write down a basis for
n
j=1
F
i
j
from bases for F
i
j
.
Exercise 337
We are now ready to compute the Jordan decomposition of the tensor product of
two nilpotent matrices.
Show that the smallest integer k for which (Z
p
Z
q
)
k
= 0, is k = min(p, q). Hint: Exercise 338
(Z
p
Z
q
)
r
= Z
r
p
Z
r
q
.
From now on without loss of generality we will assume p q.
Show that if v N(A) then v w N(AB). Exercise 339
Show that Z
r1
p
e
r
= 0, while Z
r
p
e
r
= 0. Hint: Z
p
e
i
= e
i1
. Exercise 340
Therefore {Z
k
p
e
p
}
p1
k=0
forms a right Jordan chain of length p for Z
p
.
Show that {(Z
p
Z
q
)
k
(e
p
e
r
)}
p1
k=0
forms a right Jordan chain of length p for Exercise 341
p r q.
This gives us q p + 1 linearly independent right Jordan chains. So there are at
least q p +1 Jordan blocks of size p in the Jordan decomposition of Z
p
Z
q
when
p q. In fact there are exactly q p + 1 Jordan blocks of size p. This will become
apparent soon. Dene the following subspace
U
p
= span{e
pi
e
ri
| i = 0, . . . , p 1, r = p, . . . , q}.
Note that dim(U
p
) = p(q p + 1) and dim(U
p
) = p(p 1).
Now consider the two chains {(Z
p
Z
q
)
k
(e
p1
e
q
)}
p2
k=0
and {(Z
p
Z
q
)
k
(e
p

e
p1
)}
p2
k=0
, of length p 1. Observe that the starting point of the chains, e
p1
e
q
and e
p
e
p1
, are not in the subspace U
p
, nor are any subsequent members of the
chain in U
p
. Therefore these are two new chains of length p 1 which establishes
that there are at least two Jordan blocks of size p 1. In fact there are exactly two
Jordan blocks of size p 1 as will be apparent soon. Dene the subspace
U
p1
=span{e
p1i
e
qi
| i = 0, . . . , p 2} +
span{e
pi
e
p1i
| i = 0, . . . , p 2}.
Observe that dim(U
p1
) = 2(p 1) and that U
p
U
p1
.
We can continue in this way to dene new linearly independent right Jordan chains.
In general for any integer 1 r < p we dene two right Jordan chains, {(Z
p

82
Z
q
)
k
(e
pr
e
q
)}
pr1
k=0
and {(Z
p
Z
q
)
k
(e
p
e
pr
)}
pr1
k=0
, of length p r. Dene
the subspace
U
pr
=span{e
pri
e
qi
| i = 0, . . . , p r 1} +
span{e
pi
e
pri
| i = 0, . . . , p r 1}.
Observe that dim(U
pr
) = 2(p r) and that U
s
U
pr
for s > p r. Therefore
there are at least two Jordan blocks of size p r. In fact there are exactly two
Jordan blocks of size p r as will be apparent soon.
Finally observe that
dim(U
p
) +
p1
r=1
dim(U
pr
) = p(q p + 1) +
p1
r=1
2(p r) = pq = dim(C
p
C
q
).
Therefore it follows that we have found a complete set of Jordan chains and all our
claims are proved: there are q p +1 Jordan blocks of size p and two Jordan blocks
of size 1 through p 1.
TBD. Jordan decompositions of Z
p
1
Z
p
n
, and (I +Z
p
) (I +Z
q
).
83
6.3 Symmetric tensors
The full tensor product spaces are not very interesting since they are the same
as (isomorphic to) C
n
. However, they contain interesting subspaces that occur
frequently. We have met some of them already; namely, the class of Hermitian and
skew-Hermitian matrices.
Let P
n
denote the set of all permutations of the integers 1, . . . , n. Let x
i
R
m
for
i = 1, . . . , n. We dene the symmetric tensor product of x
i
to be
x
1
x
2
x
n
=
1
n!
P
n
n
i=1
x
(i)
.
We denote the sub-space of
n
R
m
spanned by all symmetric tensor products of
n vectors from R
m
as
n
R
m
. We will use the convenient notation
n
i=1
x
i
for the
symmetric tensor product of x
i
.
x
1
x
i
x
j
x
n
= x
1
x
j
x
i
x
n
.
We will write this fact succinctly as
n
i=1
x
i
=
n
i=1
x
(i)
for any permutation P
n
.
(Prove it.)
Give an example of x, y, z R
m
, where Exercise 343
(x y) z +z (x y) = c(x y z),
for any choice of the constant c. This exercise shows that a naive denition of
symmetric tensor product is not associative.
Let G
m,n
= {(i
1
, i
2
, . . . , i
n
) : 1 i
k
i
k+1
m}. That is G
m,n
is the set of n-
tuples with components from the set {1, . . . , m} in non-decreasing order. Remember
that we use the notation I = (i
1
, . . . , i
n
) to denote n-tuples. Suppose that there
are n
i
occurences of the number i in the tuple I. Then we will use the notation
I! = n
1
!n
2
! n
m
!.
We claim that the set of symmetric tensors
g
I
=
_
n!
I!

n
i=1
e
I
i
, I G
m,n
,
forms an orthonormal basis for
n
R
m
.
Show that if I, J G
m,n
and I = J then g
T
I
g
J
= 0. Hint: Do a small example Exercise 344
rst.
84
Next we check that they have unit length. Let I = (i
1
, . . . , i
n
) G
m,n
. Then
g
I
=
1
I!n!
P
n
n
k=1
e
I
(k)
.
Therefore
g
T
I
g
I
=
1
I!n!
,P
n
n
k=1
e
T
I
(k)
e
I
(k)
= 1.
To see this consider a term in the sum for a xed . Clearly the term evaluates to 1
if = . But any which only permutes components in I that are identical among
themselves will still yield a term that evaluates to 1. For each there are I! such
terms. Therefore the right-hand side adds up to 1. This establishes that the g
I
for
I G
m,n
form an orthonormal set.
To nish establishing that it is a basis we must show that they span
n
R
m
.
Establish that it is sucient to show that an elementary symmetric tensor,
n
i=1
x
i
, Exercise 345
can be written as a linear combination of the g
I
s.
Let F
m,n
denote the set of all n-tuples formed from the integers between 1 and m
(inclusive). Then observe that
n
i=1
x
i
=
1
n!
P
n
n
l=1
x
(l)
=
1
n!
P
n
n
l=1
m
j=1
e
j
x
j,(l)
=
1
n!
P
n
IF
m,n
n
l=1
e
I
l
x
I
l
,(l)
(why?)
=

IF
m,n
1
n!
P
n
n
l=1
e
I
l
x
I
l
,(l)
=

IF
m,n
1
n!
_
P
n
n
l=1
x
I
l
,(l)
_
n
l=1
e
I
l
.
Now observe that for a xed I G
m,n
and any P
n
P
n
n
l=1
x
I
l
,(l)
=

P
n
n
l=1
x
I
(l)
,(l)
.
However for each I G
m,n
there are only n!/I! occurences of (I) in the actual
sum. Therefore we can group the terms further together and obtain
85
n
i=1
x
i
=

IF
m,n
1
n!
_
P
n
n
l=1
x
I
l
,(l)
_
n
l=1
e
I
l
=

IG
m,n
1
n!
_
P
n
n
l=1
x
I
l
,(l)
_
1
I!
P
n
n
l=1
e
I
(l)
=

IG
m,n
_
1
I!
P
n
n
l=1
x
I
l
,(l)
_
n
l=1
e
I
l
.
(6.1)
Hence we have shown that g
I
for I G
m,n
is an orthonormal basis for
n
R
m
.
Therefore dim(
n
R
m
) is the cardinality of the set G
m.n
. Let s(m, n) denote the
latter number. Observe that s(1, n) = 1 and s(m, 1) = m. Now let us see how
we can generate the tuples in G
m.n
using tuples in G
m1,n
and G
m,n1
. Partition
the tuples in G
m,n
into two sets; let the rst set of tuples start with the number 1,
and the second set be everything else. Clearly by prepending a 1 to every tuple in
G
m,n1
we can obtain exactly the rst set. Similarly we can obtain the second set by
taking every tuple in G
m1,n
and adding 1 to every component. Therefore it follows
that s(m, n) = s(m, n1) +s(m1, n). With the initial conditions s(1, n) = 1 and
s(m, 1) = m, this recursion uniquely species s(m, n) for all positive integers.
Verify that Exercise 346
dim(
n
R
m
) = s(m, n) =
_
m + n 1
n
_
.
Next we compute the orthogonal projector P
, from
n
R
m
onto
n
R
m
via its action
on the orthogonal basis e
I
for I F
m,n
P
(
iI
e
i
) =
iI
e
iI
.
We begin by checking if P
is idempotent. Clearly it is sucient to check if P
g
I
=
g
I
for I G
m,n
. Observe that
P
_
1
n!
P
n
n
i=1
e
I
(i)
_
=
1
n!
P
n
P
n
i=1
e
I
(i)
_
=
1
n!
P
n
n
i=1
e
I
(i)
=
1
n!
P
n
n
i=1
e
I
i
=
n
i=1
e
I
i
,
86
which proves that P
is idempotent. This also explains the presence of the factor

n! in the denition of the symmetric tensor product .
Finally we check if x P
x is perpendicular to P
x for all x
n
R
m
. It is
sucient to check that
i
e
I
i
P
(
i
e
I
i
) is perpendicular to g
J
for I F
m,n
and
J G
m,n
. We break the calculation up into 2 cases. First we assume that there is
no permutation such that (I) = J. Then clearly
_
1
n!
P
n
n
i=1
e
J
(i)
_
T
_
n
i=1
e
I
i

1
n!
P
n
n
i=1
e
I
(i)
_
= 0.
Next we consider the case when (I) = J for some P
n
. Then we have that
_
1
n!
P
n
n
i=1
e
J
(i)
_
T
_
n
i=1
e
I
i

1
n!
P
n
n
i=1
e
I
(i)
_
=
J!
n!

1
(n!)
2
J!n!.
Therefore we have shown that P
is an orthogonal projector onto

n
R
m
.
For I G
m,n
1
and J G
m,n
2
we have by an easy calculation that
P
_
_
_
_
1
n
1
!
P
n
1
n
1
i=1
e
I
(i)
_
_
_
_
1
n
2
!
P
n
2
n
2
i=1
e
J
(i)
_
_
_
_
=
n
1
+n
2
i=1
e
(I,J)
i
.
Hence we can extend the denition of , the symmetric tensor product, to a binary
operator between two symmetric tensors by rst dening it on bases for
n
R
m
:
_
n
1
i=1
e
I
i
_
n
2
i=1
e
J
i
_
= P
__
n
1
i=1
e
I
i
_
n
2
i=1
e
J
i
__
=
n
1
+n
2
i=1
e
(I,J)
i
.
More generally for x
n
1
R
m
and y
n
2
R
m
, we have
x =

IG
m,n
1
x
I

n
1
i=1
e
I
i
, and y =

IG
m,n
2
y
I

n
2
i=1
e
I
i
.
Hence
x y = P
(x y) =

IG
m,n
1
JG
m,n
2
x
I
y
J

n
1
+n
2
i=1
e
(I,J)
i
.
Show that for symmetric tensors x, y and z, and scalar Exercise 347
(x + z) y = x y + (z y)
x y = y x
(x y) z = x (y z)
87
Let x
i
= x for i = 1, . . . , n. Show that
n
i=1
x
i
=
n
x =
n
i=1
x
i
=
n
x. Exercise 348
An instant question is whether span{
n
x : x R
m
} =
n
R
m
?. The answer is yes.
To see this note that it is sucient to show that an arbitrary basis element
i
e
I
i
for some I G
m,n
is in the span. Without loss of generality assume that I only
contains the rst k integers from 1 to k. In particular let us assume that the number
i occurs exactly j
i
times in I. We will show that this basis vector can be written as a
linear combination of the symmetric tensors
n
(
k
i=1
i
e
i
) for suitable choice of
i
.
To make this calculation easier we will exploit the fact that the symmetric tensor
product between symmetric tensors is commutative, associative and distributive and
write x y as xy whenever x and y are symmetric tensors. Therefore we have that
n
x = x
n
, for example. Observe that
_
k
i=1
i
e
i
_
n
=
n
i
1
+i
2
++i
k
=0
n!
i
1
!i
2
! i
k
!
i
1
1

i
2
2

i
k
k
e
i
1
1
e
i
2
2
e
i
k
k
.
Now we take a linear combination of N = (n + 1)
k
of these terms and obtain
N
p=1
p
_
k
i=1
p,i
e
i
_
n
=
n
i
1
+i
2
++i
k
=0
n!
i
1
!i
2
! i
k
!
e
i
1
1
e
i
2
2
e
i
k
k
N
p=1
i
1
p,1
i
2
p,2

i
k
p,k
.
Therefore to recover just the term with i
l
= j
l
we must pick
p
and
i,p
such that
N
p=1
i
1
p,1
i
2
p,2

i
k
p,k
=
_
0, if (i
1
, . . . , i
k
) = (j
1
, . . . , j
k
),
1, if (i
1
, . . . , i
k
) = (j
1
, . . . , j
k
).
We pick
p,1
= 1 and
p,i
= x
p
, where
x
0
< x
1
< < x
N
.
We then observe that
p
is obtained by solving an adjoint multi-dimensional Vander-
monde system, which, with our choice of
p,i
is known to be invertible. In particular
the coecient matrix can be written as k-th tensor power of a (n + 1) (n + 1)
Vandermonde matrix. This establishes our claim.
Inner products of elementary symmetric tensors are given by the permanents of
certain matrices.
The permanent of an n n matrix is dened to be 109 Permanent
88
per(A) =

P
n
n
i=1
A
i,(i)
.
Let X and Y be m n matrices. We will use the notation X
i
to denote column i
of X. We now show that
(
n
i=1
X
i
)
T
(
n
i=1
Y
i
) =
1
n!
per(X
T
Y).
We calculate as follows
(
n
i=1
X
i
)
T
(
n
i=1
Y
i
) =
1
(n!)
2
_
P
n
i
X
T
(i)
__
P
n
i
Y
(i)
_
=
1
n!
_
P
n
n
i=1
X
T
i
Y
(i)
_
,
which proves the claim.
Observe that in equation 6.1 we give an explicit formula to expand a symmetric Exercise 349
tensor in terms of
iI
e
i
for I G
m,n
. The above formula can also be used for this
purpose by choosing for example Y
i
= e
I
i
. However there seems to be an extra I!
in one of the formulas. Can you reconcile them?
Show that per(X
T
X) 0. Exercise 350
| per(X
T
Y)|
_
per(X
T
X) per(Y
T
Y).
By placing restrictions on the basis set we can get lower dimensional symmetric
subspaces. Let U = ( U
1
U
2
), be an orthogonal mm matrix with U
1
containing
m
1
columns. Let u
i
denote the columns of U. Denote
span{
n
i=1
u
I
i
|I G
m
1
,n
} =
n
R(U
1
).
Note that
n
R(U
1
) is a subspace of
n
R
m
.
Show that dim(
n
R
m
1
) = dim(
n
R(U
1
)). Exercise 352
Denote
span{x y|x
n
1
R(U
1
), y
n
2
R(U
2
)} = (
n
1
R(U
1
)) (
n
2
R(U
2
)).
Show that dim((
n
1
R(U
1
)) (
n
2
R(U
2
))) = dim(
n
1
R(U
1
)) dim(
n
2
R(U
2
)). Exercise 353
89
n
R
m
=
n
j=0
(
j
R(U
1
)) (
nj
R(U
2
)).
Cross check by verifying independently that
_
m
1
+ m
2
+ n 1
n
_
=
n
j=0
_
m
1
+ j 1
j
__
m
2
+ n j 1
n j
_
.
Hint: To proceed rst extend the sum to
_
m
1
+ m
2
+ n 1
n
_
=
m
2
+n1
j=0
_
m
1
+ j 1
j
__
m
2
+ n j 1
n j
_
,
and then convert it to
_
m
1
+ m
2
+ n 1
n
_
=
m
2
+n1
j=0
_
m
1
+ j 1
m
1
1
__
m
2
+ n j 1
m
2
1
_
.
Now use identity (5.26) from Concrete Mathematics by Graham, Knuth and Patash-
nik.
Finally all of these formulas remain true if we merely require that U is non-singular.
Verify.
It is also convenient to be detect a symmetric tensor from its coecients in the
canonical basis e
I
for I F
m,n
. Let x =
IF
m,n
x
I
e
I
=
JG
m,n
x
J
g
J
.
Show that x
(I)
= x
I
for all exchange permutations . Exercise 355
Conclude that x
(I)
= x
I
for all permutations . Exercise 356
This explains why symmetric tensors form such a small subspace of
n
R
m
. This is
also an exact characterization of symmetric tensors.
Show that x =
IF
m,n
x
I
e
I

n
R
m
i x
(I)
= x
I
for all permutations .
Exercise 357
Therefore we can characterize the symmetric tensors as those x =

IF
m,n
x
I
e
I
that are in the nullspace of the equations
x
I
= x
(I)
, for all exchanges and all I G
m,n
.
One is then lead to consider other symmetry conditions on the tensor. Here is a
problem from Bishop and Goldberg.
90
Find all x =
3
i,j,k=1
x
i,j,k
e
(i,j,k)

3
R
m
that satisfy the symmetry equations
Example 6
x
i,j,k
+ x
i,k,j
= 0
x
i,j,k
+ x
j,k,i
+ x
k,i,j
= 0,
for i, j, k = 1 to m. The rst set of equations imply that the free variables can be
chosen from the set x
i,j,k
with 1 j < k m. Of course x
i,j,j
= 0. This only leaves
the second set of equations. We now claim that we can pick only the variables x
i,j,k
with 1 j < k m and 1 i k m as free. First let us check if a variable x
p,q,r
which does not satisfy the conditions, that is q < r < p, can be determined from
the putative free variables. Observe that
x
p,q,r
= x
q,r,p
+ x
r,q,p
,
and all the variables on the right are free, since q, r < p. Obviously a variable x
p,q,r
with r < q is determined by x
p,r,q
. Further those with r = q are zero. Hence we
see that all variables are determined by the free variables. The question is are all
equations simulatenously satised; that is, did we pick too many free variables. We
see that the rst set of equations is consistent with our choice as they each determine
exactly one basic variable. For the second set, for each choice of triplet (p, q, r) there
is an equation
x
p,q,r
+ x
q,r,p
+ x
r,p,q
= 0.
If all 3 integers are distinct then there is exactly one basic variable which does not
appear in any other such equation. If two of the integers are the same then we repeat
a previous anti-symmetry equation. If all three integers are same that variable is 0.
So we see the free variables leave all the equations consistently true.
Now we look at a more complicated problem from Bishop and Goldberg. This
concerns the symmetry conditions satised by the Riemannian curvature tensor.
Consider all x =
m
i,j,k,l=1
x
i,j,k,l
e
i,j,k,l

4
R
m
that satisfy the symmetry condi-
Example 7
tions
1. x
i,j,k,l
= x
j,i,k,l
2. x
i,j,k,l
= x
i,j,l,k
3. x
i,j,k,l
+ x
i,k,l,j
+ x
i,l,j,k
= 0
We rst show that any such tensor must automatically satisfy an extra symmetry
condition: x
i,j,k,l
= x
k,l,i,j
. To see this rst observe that
91
x
i,j,k,l
= x
i,k,l,j
x
i,l,j,k
= x
k,i,l,j
+ x
l,i,j,k
= x
k,l,j,i
x
k,j,i,l
x
l,j,k,i
x
l,k,i,j
= 2x
k,l,i,j
+ x
k,j,l,i
+ x
l,j,i,k
.
Next we do a similar derivation with a slight modication
x
i,j,k,l
= x
j,i,k,l
= 2x
k,l,i,j
+ x
k,i,j,l
+ x
l,i,k,j
.
Adding up these two formulae we get
2x
i,j,k,l
= 4x
k,l,i,j
+ x
k,j,l,i
+ x
k,i,j,l
+ x
l,j,i,k
+ x
l,i,k,j
= 4x
k,l,i,j
x
k,l,i,j
x
l,k,j,i
,
which proves our claim. Next we establish that if x
T
(vwvw) = 0 for all choices
of v and w then x = 0. First observe that if v =

m
i=1
v
i
e
i
and w =

m
i=1
w
i
e
i
then
x
T
(v wv w) =
m
i,j,k,l=1
x
i,j,k,l
v
i
w
j
v
k
w
l
= 0.
We already know from the skew-symmetry conditions on the rst two and last two
variables that x
iikl
= x
ijkk
= 0. Now x (i, j, k, l) and choose v = e
i
and w = e
k
.
Then the above equation becomes
x
ikik
= 0.
Next choose v = e
i
and w = e
k
+e
l
. Then using the above symmetry condition we
have that
x
ikik
+ x
ikil
+ x
ilil
+ x
ilik
= 0
x
ikil
+ x
ilik
= 0.
But we have also established that x
ikil
x
ilik
= 0. Therefore we can conclude that
x
ikil
= 0. By a similar reasoning we can also establish that x
ikjk
= 0. Therefore we
have now shown that variables with two or more identical indices in any position
will be 0. So the only non-zero variables are those that have four distinct integers
for their indices. Therefore consider v = e
i
+ e
j
and w = e
k
+ e
l
. Then we have
that
x
ikjl
+ x
jkil
+ x
iljk
+ x
jlik
= 0
x
ijlk
x
ilkj
x
jilk
x
jlki
+ x
iljk
+ x
jlik
= 0
x
iljk
+ x
jlik
= 0.
92
This shows that we have skew-symmetry for the second and third variables also, and
an application of the skew-symmetry for the rst two and last two indices, shows
that we have skew-symmetry between the rst and fourth indices also. In summary
we have shown skew-symmetry between any two pairs of indices. Now we go back
to the original symmetry condition and exploit this additional skew-symmetry.
x
ijkl
+ x
iklj
+ x
iljk
= 0
x
ijkl
+ x
ijkl
+ x
ijkl
= 0,
which proves our claim. This shows that the tensor satisfying such symmetry con-
ditions must be a subspace of the subspace spanned by all tensors of the form
v w v w. The containment is strict since such tensors do not have a skew-
symmetry between the rst two and last two indices. Finally we show that such
tensors can be constructed out of symmetric matrices. Let b
ij
= b
ji
. We claim that
x
ijkl
= b
ik
b
jl
b
il
b
jk
,
is a tensor with the symmetries of a Riemann curvature tensor. The requisite sym-
metry conditions are easily veried to be true.
A good example of use of symmetric tensors is a Taylor series expansion of a function
of several variables. Let f : R
m
R be an analytic real-valued function of m real
variables. Dene the n-th derivative of f to be a symmetric tensor of order n via
n
f(x
1
, . . . , x
m
) =

IG
m,n
n
f
x
I
1
x
I
2
x
I
n
iI
e
i
.
Write out
2
f explicitly. Note that it diers from the Hessian of f by a factor of 2. Exercise 358
The reason for representing the partial derivatives as a symmetric tensor should be
obvious now. For example, if f is suciently nice then
2
f
x
1
x
2
=

2
f
x
2
x
1
,
and this is the reason why
2
f is represented as a symmetric tensor.
By considering the Taylor series expansion in t of f(a + tx) it can be shown that
f(a +x) = f(a) +
n=1
(
n
f(a))
T
n!

n
x.
Show it assuming that f is suciently nice. Exercise 359
93
An interesting exercise is to compute the Taylor series expansion under an ane
linear change of variables. Let (b +y) = a +Ay. Let g = f . Clearly
g(b +y) = g(b) +
n=1
(
n
g(b))
T
n!

n
y.
But we would like to express this in terms of f. Observe that
g(b +y) = f(a +Ay) = f(a) +
n=1
(
n
f(a))
T
n!

n
A
n
y,
which shows immediately that
n
g(b) =
_
n
A
T
_

n
f(a),
whenever g(b+y) = f(a+Ay). A more detailed view of this operation is presented
in the next section.
94
6.4 Symmetric tensor powers
In the last section we saw how tensor powers arose naturally. In this section we look
at them more carefully. Let A denote a l m matrix. It is clear that
n
A can act
on
n
R
m
to yield a tensor in
n
R
l
via the usual matrix multiplication
(
n
A)(
n
i=1
x
i
) =
n
i=1
Ax
i
.
A simple calculation shows that
n
R
m
is an invariant subspace of
n
A for any
m m matrix A. It is therefore natural to study the restriction of
n
A to this
subspace. This restricted operator is denoted by
n
A and called the symmetric
tensor power of A. More prosaically, let G
m.n
denote the matrix whose columns are
formed from the orthonormal symmetric tensor basis g
I
for I G
m,n
. Then the
invariance of
n
R
m
under
n
A can be written as the equation
(
n
A)G
m,n
= G
m,n
(
n
A).
Using the orthonormality of the columns of G
m,n
we can infer from this an explicit
expression for
n
A
n
A = G
T
m,n
(
n
A)G
m,n
.
We will also use the notation
G
m,n
x
= x, for x
n
R
m
.
Clearly
(
n
A)(
n
i=1
x
i
)
= (
n
i=1
Ax
i
)
.
We start with a simple sequence of calculations
(
n
A)(
n
B) =
n
(AB)
(
n
A)(
n
B)G
m,n
= (
n
(AB))G
m,n
(
n
A)G
m,n
(
n
B) = G
m,n
(
n
(AB))
G
m,n
(
n
A)(
n
B) = G
m,n
(
n
(AB)).
From which, using the full column-rank of G
m,n
we can infer that
(
n
A)(
n
B) =
n
(AB).
It is also possible to show that
(
n
A)
T
=
n
A
T
.
(
n
A)
1
=
n
A
1
.
95
If A is either Hermitian, unitary or normal, then so is
n
A.
If Av
i
=
i
v
i
, for i = 1, . . . , n, with repetitions allowed, then
(
n
A)(
n
i=1
v
i
)
= (
n
i=1
i
)(
n
i=1
v
i
)
.
Let A = UV
T
be the SVD of A. Then
n
A = (
n
U)(
n
)(
n
V)
T
is the SVD of
n
A.
At this stage it is not clear that
n
is a diagonal matrix. So we compute an explicit
formula for the entries of
n
A. Observe that for I, J G
m,n
(
n
A)
I,J
= g
T
I
(
n
A)g
J
=
n!
I!J!
_
n
i=1
e
T
I
i
_
(
n
A) (
n
i=1
e
J
i
)
=
n!
I!J!
_
n
i=1
e
T
I
i
_
(
n
i=1
(Ae
J
i
))
=
1
n!
I!J!
_
P
n
n
i=1
e
T
I
(i)
__
P
n
n
i=1
(Ae
J
(i)
)
_
=
1
I!J!
_
P
n
n
i=1
A
I
i
,J
(i)
_
.
Let us dene A[I|J] to be the n n matrix whose (i, j) element is A
I
i
,J
j
. Then
we can summarise our formula for
n
A as
(
n
A)
I,J
=
1
I!J!
per(A[I|J]).
From this formula it is easy to see that the symmetric tensor product of a diagonal
matrix is another diagonal matrix and that indeed
n
contains the singular values
of
n
A.
96
6.5 Signs of permutations
Before we proceed we need to discuss the sign of a permutation. Let denote a
permutation of the integers 1, . . . , n. The sign of , denoted sgn(), is dened to
be either +1 or 1: it is +1 if can be represented as the composition of an even
number of exchanges; otherwise it is dened to be 1.
Let
i,j
denote the exchange which switches the position of the i-th and j-th integers.
Suppose
(1) = 2, (2) = 3, (3) = 1,
is a permutation of {1, 2, 3}, then we can decompose as
=
1,2

1,3
,
and hence sgn() = +1 in this case. The natural question is whether sgn is well-
dened; can a permutation be written as both an odd number of exchanges and an
even number of exchanges? No.
A nice proof of this is given in Hersteins Topics in Algebra. Let x
i
, for i = 1, . . . , n,
denote n distinct numbers in increasing order x
i
< x
i+1
. For a permutation of
{1, . . . , n} consider the number
() = sgn(
i<j
(x
(j)
x
(i)
)).
It is easy to see that of the identity permutation is 1. Let
i,j
denote a permutation
that exchanges the number i with the number j. We claim that (
i,j
) =
(
i,j
) = (). We compare the terms in the two formulas
() =
n
r=2
r1
s=1
(x
(r)
x
(s)
),
(
i,j
) =
n
r=2
r1
s=1
(x
(
i,j
(r))
x
(
i,j
(s))
).
Without loss of generality let i < j and s < r. We observe that if neither r nor s is
equal to i or j, then
x
(
i,j
(r))
x
(
i,j
(s))
= x
(r)
x
(s)
.
So any change in sign must be induced by the other terms. First consider the terms
where s
1
< i = r
1
and s
2
< i < j = r
2
. We note that these terms can be paired up
as follows
x
(
i,j
(i))
x
(
i,j
(s
1
))
= x
(j)
x
(s
2
)
, s
1
= s
2
.
97
Hence they do not induce a net sign change either. Next consider the terms of the
form i = s
1
< r
1
< j and i < s
2
< j = r
2
. These terms can be paired up as follows
x
(
i,j
(r
1
))
x
(
i,j
(i))
= x
(r
1
)
x
(j)
= (1)(x
(j)
x
(s
2
)
), s
2
= r
1
.
Therefore each of these terms cause a sign change. The total sign change is given
by (1)
ji1
. Next we consider the terms of the form i < s
1
< j = r
1
and
i = s
2
< r
2
< j. These terms can be paired up as
x
(
i,j
(j))
x
(
i,j
(s
1
))
= x
(i)
x
(s
1
)
= (1)(x
(r
2
)
x
(i)
), s
1
= r
2
.
Therefore these terms cause a total sign change of (1)
ji1
too. Next we consider
the terms of the form i = s
1
< j < r
1
and j = s
2
< r
2
. These can be paired up as
x
(
i,j
(r
1
))
x
(
i,j
(i))
= x
(r
1
)
x
(j)
= x
(r
2
)
x
(j)
, r
1
= r
2
.
So these cause no sign change. Next we consider terms of the form j = s
1
< r
1
and i = s
2
< j < r
2
. As in the previous argument there is no sign change for these
forms. All of the forms we have considered so far give together no sign change. This
leaves us only with the following two terms to compare
x
(
i,j
(i))
x
(
i,j
(j))
= x
(j)
x
(i)
= (1)(x
(i)
x
(j)
).
Therefore we have exactly one sign change and we have shown that (
i,j
) =
(). The other version (
i,j
) = (), is proved similarly.
Do it. Exercise 360
Show that sgn() is well-dened for permutations . Exercise 361
Show that sgn() = sgn(
1
) for permutations . Exercise 362
Let I denote an r-tuple and J an s-tuple and (I, J) the r + s-tuple obtained by Exercise 363
concatenating I and J. Let P
r
and P
s
. Let P
r+s
be the permutation
dened by (I, J) = ((I), (J)). Show that sgn() = sgn() sgn().
98
6.6 Anti-symmetric tensors
In this section we consider probably the most important subspace of
n
R
m
. We
dene the anti-symmetric tensor product (sometimes called the wedge product) of
x
i
to be
x
1
x
2
x
n
=
1
n!
P
n
sgn()
n
i=1
x
(i)
.
We will use the convenient notation
n
i=1
x
i
for the left hand side of the above
equation. We will denote the span of all wedge products of n vectors from R
m
as
n
R
m
.
x
1
x
i
x
j
x
n
= (1) x
1
x
j
x
i
x
n
.
We will write this fact succinctly as
n
i=1
x
i
= sgn()
n
i=1
x
(i)
for any permutation
P
n
. (Prove it.)
Give an example of x, y, z R
m
such that Exercise 365
(x y) z z (x y) = c(x y z),
for any scalar c. This shows that a naive denition of anti-symmetric tensor product
is not associative.
Let H
m,n
= {(i
1
, i
2
, . . . , i
n
) | 1 i
k
< i
k+1
m}. That is, H
m,n
is the set of
n-tuples with strictly increasing components with values restricted to the integers
1, . . . , m. We claim that set of anti-symmetric tensors
h
I
=

n!
n
i=1
e
I
i
, I H
m,n
,
is an orthonormal basis for
n
R
m
.
Show that h
T
I
h
J
= 0 for I, J H
m,n
and I = J. Exercise 366
Show that h
T
I
h
I
= 1 for I H
m,n
. Exercise 367
Show that if I G
m,n
, and I / H
m,n
, then
i
e
I
i
= 0. Exercise 368
So we just need to show that h
I
spans
n
R
m
. To do that it is sucient to check
that all elementray anti-symmetric tensors
i
x
i
, are in the span. We calculate the
linear combination as follows
99
n
i=1
x
i
=
1
n!
P
n
sgn()
n
i=1
x
(i)
=
1
n!
P
n
sgn()
n
i=1
m
k=1
e
k
x
k,(i),
=
1
n!
P
n
sgn()

IF
m,n
n
i=1
e
I
i
x
I
i
,(i)
=
1
n!
IF
m,n
P
n
sgn()
n
i=1
e
I
i
x
I
i
,(i)
=
1
n!
IF
m,n
_
P
n
sgn()
n
i=1
x
I
i
,(i)
_
n
i=1
e
I
i
.
Now we observe that for each J F
m,n
there is a I G
m,n
and a P
n
(though
the may not be unique) such that J = (I).
Show that for such a pair Exercise 369
P
n
sgn()
n
i=1
x
J
i
,(i)
= sgn()

P
n
sgn()
n
i=1
x
I
i
,(i)
.
Therefore we can further group the terms together and obtain
n
i=1
x
i
=
1
n!
IF
m,n
_
P
n
sgn()
n
i=1
x
I
i
,(i)
_
n
i=1
e
I
i
=

IG
m,n
1
n!
_
P
n
sgn()
n
i=1
x
I
i
,(i)
_
1
I!
P
n
sgn()
n
i=1
e
I
i
=

IG
m,n
_
1
I!
P
n
sgn()
n
i=1
x
I
i
,(i)
_
n
i=1
e
I
i
=

IH
m,n
_
P
n
sgn()
n
i=1
x
I
i
,(i)
_
n
i=1
e
I
i
,
which proves our claim.
Therefore dim(
n
R
m
) is the cardinality of the set H
m,n
which gives easily
dim(
n
R
m
) =
_
m
n
_
.
100
In particular
n
R
m
= {0} if n > m, and dim(
m
R
m
) = 1. Also note that
dim(
n
R
m
) = dim(
mn
R
m
).
Let A be an n n matrix. Its determinant is dened to be 110 Determinant
det(A) =

P
n
sgn()
n
i=1
A
i,(i)
.
Let X and Y be two mn matrices. We will use the notation X
i
to denote column
i of X.
(
n
i=1
X
i
)
T
(
n
i=1
Y
i
) =
1
n!
det(X
T
Y).
| det(X
T
Y)|
_
det(X
T
X) det(Y
T
Y).
At this stage it is good to do the following exercise from Bhatia. Note that dim(
3
R
3
) = Example 8
27, dim(
3
R
3
) = 10 and dim(
3
R
3
) = 1. Find an element of (
3
R
3
3
R
3
)
. A
brute force approach that will work is to pick a random vector in
3
R
3
and orthog-
onalize it against all suitable g
I
and h
I
. A simpler way is to proceed as follows.
Observe that every vector in
3
R
3
is a linear multiple of
3
i=1
e
i
. Motivated by this
consider the vector e
1
e
1
e
2
e
1
e
2
e
1
. Clearly it is orthogonal to
3
R
3
.
In
3
R
3
it is clearly orthogonal to all g
I
except possibly for g
(1,1,2)
. A quick check
shows that it is orthogonal to this one too.
As in the symmetric case calculations become easier to do if we can dene a fully
associative wedge product (also called the Grassmann product). Like before we need
to nd the orthogonal projector P
, onto
n
R
m
. We dene it on the canonical basis
vectors as follows
P
(
n
i=1
e
I
i
) =
n
i=1
e
I
i
, for I F
m,n
.
We need to check if this is indeed an orthogonal projector. We begin by checking it
is idempotent. It is sucient to check this on h
I
for I H
m,n
.
P
h
I
=
1
n!
P
n
sgn()P
n
i=1
e
I
(i)
_
=
1
n!
P
n
sgn()
n
i=1
e
I
(i)
= h
I
.
101
Finally we check if xP
x is perpendicular to P
x for all vectors x. It is sucient

to check
i
e
I
i
P
(
i
e
I
i
) is perpendicular to all h
J
. It is clear that if (I) G
m,n
,
but (I) / H
m,n
, for some permutation , then clearly the orthogonality condition
holds. So we only need to check when I H
m,n
. Thus for I, J H
m,n
we must
compute
_
P
n
sgn()
n
i=1
e
J
(i)
_
T
(
n
i=1
e
I
i
P
(
n
i=1
e
I
i
)).
Clearly if I = J the above inner product is zero. Thus we only need to check when
I = J H
m,n
.
_
P
n
sgn()
n
i=1
e
I
(i)
_
T
(
n
i=1
e
I
i
P
(
n
i=1
e
I
i
)) = 1
n!
n!
,
which conrms that P
is the orthogonal projector onto

n
R
m
.
Next, for I H
m,n
1
and J H
m,n
2
we compute the anti-symmetric tensor
P
_
_
_
_
1
n
1
!
P
n
1
sgn()
n
1
i=1
e
I
(i)
_
_
_
_
1
n
2
!
P
n
2
sgn()
n
2
i=1
e
J
(i)
_
_
_
_
=
1
n
1
!n
2
!
P
n
1
P
n
2
sgn() sgn()P
_
(
n
1
i=1
e
I
(i)
) (
n
2
i=1
e
J
(i)
)
_
=
1
n
1
!n
2
!
P
n
1
P
n
2
sgn() sgn()(
i((I),(J))
e
i
)
=
n
1
!n
2
!
n
1
!n
2
!

i(I,J)
e
i
.
Therefore we can extend the denition of wedge product to anti-symmetric tensors
by rst dening it on the canonical basis for
n
R
m
:
(
n
1
i=1
e
I
i
) (
n
2
i=1
e
J
i
) =
n
1
+n
2
i=1
e
(I,J)
i
.
We then extend it by linearity in each argument. Therefore for x
n
1
R
m
and
y
n
2
R
m
, since
x =

IH
m,n
1
x
I

n
1
i=1
e
I
i
, and y =

IH
m,n
2
y
I

n
2
i=1
e
I
i
,
we have
102
x y = P
(x y) =

IH
m,n
1
JH
m,n
2
x
I
y
J

n
1
+n
2
i=1
e
(I,J)
i
.
Note that many terms on the right-hand side can be zero. Furthemore observe that
for I H
m,n
1
and J H
m,n
2
(
n
1
i=1
e
I
i
) (
n
2
i=1
e
J
i
) = (1)
n
1
n
2
(
n
2
i=1
e
J
i
) (
n
1
i=1
e
I
i
).
Show that for anti-symmetric tensors x, y and z and scalar Exercise 372
(x + y) z = x z + y z,
(x y) z = x (y z),
x y = (1)
n
1
n
2
y x, if x
n
1
R
m
and y
n
2
R
m
.
Show that if v
i
R
m
then
n
i=1
v
i
= 0 i the v
i
are linearly dependent. Exercise 373
103
6.7 Anti-symmetric tensor powers

Lecture Notes

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Lecture Notes

Enviado por

Direitos autorais:

Formatos disponíveis

Core Matrix Analysis

is convex. Exercise 103

must both achieve their minimum

such that, Exercise 109

b. This can be used to dene the pseudoinverse uniquely.

(which is not invertible).

A(z)dz to be an m n matrix with the (i, j)

f(z)dz = 0. Hint: Use the fact that f(z) = F

(z) for some suitable analytic

is idempotent. Clearly it is sucient to check if P

is idempotent. This also explains the presence of the factor

is an orthogonal projector onto

x for all vectors x. It is sucient

is the orthogonal projector onto

Você também pode gostar