Você está na página 1de 51

Linear Algebra Review

Tyrone L. Vincent
Engineering Division, Colorado School of Mines, Golden, CO
E-mail address: tvincent@mines.edu
URL: http://egweb.mines.edu/faculty/tvincent
Copyright
c 2006-2011
Contents

Chapter 1. Review of Vectors and Matrices 1


1. Notation 1
2. Proofs 2
3. Vectors and Matrices 4
4. Solving Linear Equations 8
5. Basic operations on matricies and vectors 11
6. Exercises 14
Chapter 2. Vector Spaces 17
1. Vector Space Definition 17
2. Dimension and Basis 18
3. Norms, Dot Products, Orthonormal Basis 21
4. Projection Theorem 24
5. Orthogonal Subspaces 25
6. Exercises 27
Chapter 3. Matrices as Mappings 29
1. Range and Null Space 29
2. Eigenvalues and Eigenvectors 30
3. Symmetric Matrices 32
4. Matrix Norms 33
5. Singular Value Decomposition 35
6. SVD Application: Solutions to Systems of Linear Equations 39
7. SVD Application: Best rank k approximation 40
8. Exercises 42
Index 45
Bibliography 47

iii
CHAPTER 1

Review of Vectors and Matrices

1. Notation
In these notes, as in most branches of mathematics, we will often utilize sets
of mathematical objects. For example, there is the set of natural numbers, which
begins 1, 2, 3, · · · . This set is often denoted N, so that 2 is a member of N but π is
not. To specify that an object is a member of a set, we use the notation ∈ for ”is
a member of”. For example 2 ∈ N. Some of the sets we will use are

R real numbers
C complex numbers
Rn n dimensional vectors of real numbers
Rm×n m × n dimensional real matrices

For these common sets, particular notation will be used to identify members,
namely lower case, such as x, for a scalar or vector, and upper case, such as A, for a
matrix. Bold face will not be used to distinguish scalars from vectors and matrices.
To specify a set, we can also use a bracket notation. For example, to specify E
as the set of all positive even numbers, we can say either
E = {2, 4, 6, 8, · · · }
when the pattern is clear, or use a : symbol, which means “such that”:
E = {x ∈ N : mod(x, 2) = 0} .
This can be read “The set of natural numbers x, such that x is divisible evenly by
2”.
When talking about sets, we will often want to say when a property holds for
every member of the set, or for at least one. In this case, the symbol ∀, meaning
“for all” and ∃, meaning “there exists” are useful. For example, suppose I is the
set numbers consisting of the IQs for people in this class. Then
∀x ∈ I x > 110
means that all students in this class have IQ greater than 110 while
∃x ∈ I : x > 110
means that at least one student in the class has IQ greater than 110.
We will also be concerned with functions. You are familiar with the notation
f (x) to denote function of the variable x. We will often be more specific about what
is being mapped to what. In particular, given a set X and a set Y a function f from
X to Y maps an element from X to and element of Y and is denoted f : X → Y.
1
2 1. REVIEW OF VECTORS AND MATRICES

The set X is called the domain, and f (x) is assumed to be defined for every x ∈ X.
The range, or image of f is the set of y for which f (x) = y for some x :

Range(f ) = {y ∈ Y : ∃x ∈ X such that y = f (x)} .

If Range(f ) = Y, then f is called onto. If there is only one x ∈ X such that


y = f (x), then f is called one to one.

2. Proofs
This section is adapted from [1]

2.1. The need for proofs. In the next sections, and indeed, in the rest of
the class, we will encounter proofs of the properties that the various mathematical
entities that are useful for linear systems theory. If your background is engineering,
you may be a little out of practice at theorem proving, and indeed, you may feel
that this level of detail is excessive. However, the literature for the systems sciences
such as control systems, signal processing, communications, computer vision, and
robotics uses the language of theorem and proof to communicate new ideas, and as
a graduate student working in this area, being proficient with proofs make reading
and understanding the literature much easier.

2.2. What makes up a theorem. There are two parts to a theorem - the
hypothesis A and the conclusion B. The basic form of a theorem is nothing more
than the statement If A then B, and proof is the demonstration through logical
steps that this statement is true. As an example outside of mathematics, let A
be the condition “It is raining” and B be the condition “It is cloudy”. A theorem
statement could be “If it is raining, then it is cloudy”. A proof of this statement
would rely on the fact that rain only comes from clouds, thus clouds must be present
for rain to occur. There are many different ways in which this same statement may
be presented. With this example in mind, consider the following equivalents:
• A implies B
• A⇒B
• A is sufficient condition for B
• B if A
• A only if B
• B is a necessary condition for A
In the above statements, we know that B occurs whenever A does, but what
about the reverse? Does A occur whenever B does? From our example, we see that
this does not necessarily have to be the case, as “If it is cloudy then it is raining” is
not a true statement. However, there will be some statements that are equivalent,
in that both “If A then B” and “If B then A” are true. Usually, by changing the
first statement to ”B if A” and the second statement to “B only if A”, we can
combine the implications as B if and only if A. Other ways that this is stated are
• B⇔A
• A is a necessary and sufficient condition for B
In addition, “if and only if” is often abbreviated as “iff”, as in B iff A.
2. PROOFS 3

2.3. Some methods of proof. Unfortunately, there does not exist a recipe
that one can follow to prove any given true statement. Usually, finding a proof
requires a lot of trial and error, and perhaps a little inspiration. However, there are
some common avenues of attack occur fairly often.
2.3.1. Direct Computation. There are really two ways to show that A implies B
via direct computation. One is to start with A, (along with all axioms and properties
that have already been proven from those axioms) and using these, arrive at B. For
example
Theorem 1. If n is an even number, then n2 is even number
Proof. (A implies B) The antecedent, or what we are given, is that n is an even
number. What properties does an even number satisfy? The most fundamental is
that n/2 is an integer. The desired conclusion is that n2 also satisfies this property,
namely that n2 /2 is also an integer. To show this, we can use the property the
product of two integers is still an integer.
(1) Given: n, an integer, is even
(2) Thus n/2 is an integer
(3) Because the product of integers is also an integer, (n/2)(n) = n2 /2 is also
an integer
(4) Thus n2 is an even number

2.3.2. Proof by Contradiction. Although “A implies B” is not equivalent to “B
implies A”, it is equivalent to “not B implies not A”. Using our example, notice that
the statements ”If it is raining then it is cloudy” and ”If it is not cloudy then it is
not raining” are both true, and mean the same thing. Thus, we can also start with
not B and try to arrive at not A. We will give two examples. Remember that the
hypothesis A includes not only what is stated in the theorem, but the consistency
of the mathematical systems that we are using. Thus, if assuming both A and not
B leads to an inconsistency (like 0 = 1) then A implies B.
Theorem 2. If n is a positive integer greater than one, then it can expressed
as the product of primes
Proof. (by contradiction). Let us assume the opposite: there exists positive
integers that cannot be expressed as the product of primes. There must be a
smallest such integer, call it n. Since n is not the product of primes, it cannot be
a prime itself. Thus n = ab where a and b are integers smaller than n. Since they
are smaller, they can be expressed as products of primes, implying that n itself is
the product of primes, which is a contradiction. 
As an aside, the fundamental theorem of arithmetic states that each integer
can be decomposed into exactly one product of primes - that is, the decomposition
is unique modulo re-arrangement of the elements.
A second example assumes not B (and our algebraic rules) to arrive at not A
Theorem 3. If matrix A ∈ Rm×n , is full column rank, then there exists no
vector x ∈ Rn with x 6= 0 such that Ax = 0.
Proof. (not B ⇒ not A) Suppose there exists a vector not equal to zero such
that Ax = 0. Then using the elements of x, we can find coefficients such that the
4 1. REVIEW OF VECTORS AND MATRICES

columns of A sum to zero, and thus the columns of A are not linearly independent,
the maximal number of linearly independent columns is less than n, and A is not
full rank. 

2.3.3. Proof by induction. Induction proofs are used when there is an implica-
tion that depends on a positive integer . For example, vectors and matrices have
different sizes, but we would like our results to not depend on the exact size of the
vector or matrix. Rather than verifying the result for every index separately (which
would obviously take an infinite amount of time for every possible index) we can
verify two things:
• The result is true for index equal to 1
• If the result is true for index equal to n then it is true for index equal to
n+1
For example:
Theorem 4. If k = 7n − 2n , where n is a positive integer, then k is divisible
by 5.
Proof. If n = 1 then k = 5 which is clearly divisible by 5.
Now, suppose the result is true for n. Then
7n+1 − 2n+1 = 7(7n ) − 7(2n ) + 7(2n ) − 2(2n )
where we have added and subtracted 7(2n ). Collecting terms,
7n+1 − 2n+1 = 7(7n − 2n ) + 5(2n ).
Since the result is true for n, the first term on the right is divisible by 5. Clearly,
the second term on the right is also divisible by 5. Thus, the result is true for n + 1
and the theorem is proved. 

3. Vectors and Matrices


You are probably already familiar with vectors and matrices from previous
courses in mathematics or physics. We will find matrices and vectors very useful
when representing dynamic systems mathematically, however, we will need to be
able to manipulate and understand these objects at a fairly deep level. For com-
pleteness, as to establish notation, lets begin at the very beginning, and discuss
finite dimensional vectors and matrices.
A vector of n− tuple real (or sometimes complex) numbers is represented as:
 
x1
 x2 
x= . 
 
 .. 
xn
So that x is a vector, and xi are each scalars. We will use the notation xi ∈ Rn to
show that x is a length p vector of real numbers (or xi ∈ Cn if the elements of xi are
complex.) Sometimes we will want to index vectors as well, which can sometimes
be confusing: Is xi the vector xi or the ith element of the vector x? To make the
difference clear, we will reserve the notation [x]i to indicate the ith element of x. As
3. VECTORS AND MATRICES 5

an example, consider the following illustration of addition and scalar multiplication


for vectors:
   
[x1 ]1 + [x2 ]1 α[x1 ]1
 [x1 ]2 + [x2 ]2   α[x1 ]2 
x1 + x2 =  αx1 = 
   
..  .. 
 .   . 
[x1 ]n + [x2 ]n α[x1 ]n
A matrix is an m × n array of scalars:
 
a11 a12 ··· a1n
 a21 a22 ··· a2n 
A= .
 
.. .. ..
 ..

. . . 
am1 am2 ··· amn
m×n
We use the notation A ∈ R indicate that A is a m × n matrix. Addition and
scalar multiplication are defined the same way as for vectors.
3.1. Matrix-vector multiplication. Suppose we have an m × n matrix A
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A= .
 
.. .. .. 
 .. . . . 
am1 am2 · · · amn
and a length n vector x1 . Note that the number of columns of A are the same as
the length of x1 . Multiplication of A and x1 is defined as follows:
x2 = Ax1 (3.1)
where
[x2 ]1 = a11 [x1 ]1 + a12 [x1 ]2 + · · · + a1n [x1 ]n (3.2a)
[x2 ]2 = a21 [x1 ]1 + a22 [x1 ]2 + · · · + a2n [x1 ]n (3.2b)
..
. (3.2c)
[x2 ]m = am1 [x1 ]1 + am2 [x1 ]2 + · · · + amn [x1 ]n (3.2d)
Note that the result x2 is a length m vector (the number of rows of A). The notation
(3.1) is a compact representation of the system of linear algebraic equations (3.2).
Note that A defines a mapping (that is, function) from Rn to Rm . Thus, we can
write A : Rn → Rm . This mapping is linear.
We can also consider a matrix to be a group of vectors. For example, if we
group the vectors x1 , x2 , · · · , xn into a matrix
 
M = x1 x2 · · · xp
and define the vector  
α1
 α2 
a=
 
.. 
 . 
αp
Then all linear combinations of x1 , x2 , · · · , xp are given by
y = M a = α1 x1 + α2 x2 + · · · + αp xp
6 1. REVIEW OF VECTORS AND MATRICES

3.2. Matrix-Matrix multiplication. If matrix A : Rn → Rm , and matrix


B : Rm → Rp , we can find mapping C: Rn → Rp which is the composition of A
and B
C = BA
X
[c]ij = [b]ik [a]kj
k

That is, the i, j element of C is the dot product of the ith row of B with the jth
column of A. The dimension of C is p × n. This can also be though of as B mapping
a column of A at a time: That is, the first column of C, [c]∗1 is B[a]∗1 , B times
the first column of A. Clearly, two matrices can be multiplied only if they have
compatible dimensions.
Unlike scalars, the order of multiplication is important. If A and B are square
matrices, AB 6= BA in general.
The identity matrix
 
1 0 ··· 0
 .. 
0 1 .
I= .

 .. .. 
. 0
0 ··· 0 1
is a square matrix with ones along the diagonal. If size is important, we will denote
it via a subscript, so that Im is the m × m identity matrix. The identity matrix is
the multiplicative identity for matrix multiplication, in that AI = A and IA = A
(where I has compatible dimensions with A).

3.3. Block Matrices. Matrices can also be defined in blocks, using other
matrices. For example, suppose A ∈ Rm×n B ∈ Rm×p C ∈ Rq×n and D ∈ Rq×p .
Then we can ”block fill” a (m + q) by (n + p) matrix X as
 
A B
X=
C D
Often we will want to specify some blocks as zero. We will denote a block of zeros as
simply 0. The dimension can be worked out from the other matrices. For example,
if  
A 0
X=
C D
The zero block must have the name number of rows as A and the same number of
columns as D.
Matrix multiplication of block matrices uses the same rules as regular matrices,
except as applied to the blocks. Thus
    
A1 B 1 A2 B 2 A1 A2 + B1 C2 A1 B2 + B1 D2
=
C1 D1 C2 D2 C1 A2 + D1 C2 C1 B2 + D1 D2

3.4. Linear Combinations, Independence, Rank.


Definition 5. A linear combination of the vectors x1 , x2 , · · · , xp is a sum of
the form α1 x1 + α2 x2 + · · · + αp xp .
3. VECTORS AND MATRICES 7

A linear combination can also be written in matrix-vector form,


 
α1
   α2
 
α1 x1 + α2 x2 + · · · + αp xp = x1 x2 · · · xp  .

 ..


αp
= Xα.
where X is an n × p matrix with columns made up of the vectors xi and α is a
length p vector.
A vector x is said to be linearly dependent upon a set S of vectors if x can be
expressed as a linear combination of vectors from S. A vector x is said to be linearly
independent of S if it is not linearly dependent on S. A set of vectors is said to be
a linearly independent set if each vector in the set is linearly independent of the
remainder of the set.
This definition immediately leads to the following tests:
Criterion 1. A set of vectors S = {x1 , x2 , . . . , xp } are linearly dependent if
there exist αi , i = 1, · · · , p with at least one αi 6= 0 such that
α1 x1 + α2 x2 + · · · + αp xp = 0.
Criterion 2. A set of vectors S = {x1 , x2 , . . . , xp } is linearly independent if
and only if
α1 x1 + α2 x2 + · · · + αp xp = 0
implies αi = 0 i = 1, 2, · · · , p.
Example 1. Consider the set of vectors
     
2 1 0
x1 = 4 , x2 = 1 , x3 = 1 .
6 2 1
This set is linearly dependent, for if we select α1 = −1, α2 = 2 and α3 = 2, we
have
α1 x1 + α2 x2 + α3 x3 = 0.

Example 2. Consider the set of vectors
   
2 0
x1 = , x2 = .
4 0
This set is linearly dependent, for if we select α1 = 0 and α2 = 1, then
α1 x1 + α2 x2 = 0.
Note that the zero vector is linearly dependent on all other vectors. 
If we view a matrix as a collection of vectors, the number of linearly independent
vectors is an important characteristic.
Definition 6. The rank of a matrix is the largest number of linearly indepen-
dent columns. A matrix is full column rank if its rank is equal to the number of
columns, and full row rank if its rank is equal to the number of rows.
8 1. REVIEW OF VECTORS AND MATRICES

Although the definition is in terms of the columns of the matrix, in fact


rank(A) = rank(AT ). That is, the number of linearly independent columns is equal
to the number of linearly independent rows.

4. Solving Linear Equations


4.1. Gaussian Elimination. You are probably already familiar with meth-
ods to transform a system of equations into a form that is easy to solve. For
example, suppose we had the following two equations in the three unknowns α, β, γ:
1α + 2β + 3γ = 14
2α + 2β + 2γ = 10
We wish to find all possible solutions to this system of equations. One way to do
this is to find a simpler equation that has the same solution.
Definition 7. Systems of equations are equivalent if they have the same so-
lution set.
To do the problem by hand, we would try to eliminate variables by manipula-
tions that do not change the solutions, such as multiplying both sides of an equation
by a non-zero constant or add equations together. For example, if we multiply the
first equation by −2 and add it to the second equation, we get
−2α − 4β − 6γ = −28
+ 2α + 2β + 2γ = 10
.
−2β − 4γ = −18
We then know that the following system of equations will have the same solutions:

1α + 2β + 3γ = 14 (4.1a)
−2β − 4γ = −18 (4.1b)
Further simplification is possible by adding these two equations together:
1α + 2β + 3γ = 14
+ −2β − 4γ = −18
.
1α + 0 − 1γ = −4
and dividing the equation −2β − 4γ = 18 by -2, resulting in
1α + 0 − 1γ = −4 (4.2a)
1β + 2γ = 9 (4.2b)
All solutions are now clear: we can choose γ to by any number, and
α = −4 + γ
β = 9 − 2γ
Now, the original system of equations in vector-matrix form is
   
1 2 3 14
x= .
2 2 2 10
Each of the operations that we used above can be described as multiplying, by a
particular matrix on the left, both sides of the equation . For example, to multiply
4. SOLVING LINEAR EQUATIONS 9

the first equation by −2 and add it to the second equation (and replace that equation
with the new one) we can multiply both sides by
 
1 −2
.
0 1
Since      
1 0 1 2 3 1 0 14
x=
−2 1 2 2 2 −2 1 10
simplifies to    
1 2 3 14
x= . (4.3)
0 −2 −4 −18
Compare (4.3) to (4.1)! Similarly, we can add the second equation to the first using
the matrix  
1 1
. (4.4)
0 1
and divide the second equation by −2 using the matrix
 
1 0
1 (4.5)
0 −2
By multiplying both sides of (4.3) on the left by (4.4) and then (4.5) we get the
matrix-vector equivalent of (4.2). (Try it!).
The following matricies are called elementary matrices, and perform the oper-
ations we need in order to solve linear equations:
• Xij is the identity matrix with the ith and jth rows eXchanged.
• Mi (c) is the identity matrix with the ith row Multiplied by the scalar c
• Aij (c) is the identity matrix with c times the jth row added to the ith row
As some examples, if we are working with 4 × 4 matrices,
   
0 0 1 0 1 0 0 0
0 1 0 0 0 0 1 0
X1,3 =  1 0 0 0 , X2,3 = 0 1 0 0
  

0 0 0 1 0 0 0 1
   
3 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
M1 (3) =  0 0 1 0 , M3 (2) = 0 0 2 0
  

0 0 0 1 0 0 0 1
   
1 0 3 0 1 0 0 0
0 1 0 0 0 1 2 0
A1,3 (3) =  0 0 1 0 A2,3 (2) = 0 0 1 0
  

0 0 0 1 0 0 0 1
A system of equations becomes easy to solve when the defining matrix is in the
following form
Definition 8. A matrix is in row reduced echelon form when the following
conditions are satisfied
(1) Any row containing a nonzero entry precedes any row in which all the
entries are zero
(2) The first nonzero entry in each row in the only nonzero entry in its column
10 1. REVIEW OF VECTORS AND MATRICES

(3) The first nonzero entry in each row is 1 and it occurs in a column to the
right of the leading 1 in any preceding row
Example 3. The following are examples of matricies in row reduced echelon
form:    
1 0 3 0 1 0 1
0 1 2 0 0 1 0
0 0 0 1 0 0 0
The following are not in row reduced echelon form:
   
1 3 0 0 0 1 0
0 1 2 0 0 0 0
0 0 0 1 1 0 1
Elementary matricies allow us to find solutions because of the following fact:
Lemma 9. Multiplication of both sides of a system of equations by an elemen-
tary matrix results in an equivalent system of equations.
Proof. First, we recall that a square matrix C is invertible if there exists
another matrix C −1 such that CC −1 = I (see also the next section). Note that
−1
each elementary matrix is invertible. For example, Xi,j = Xj,i , Mi (c)−1 = Mi ( 1c )
−1
and Ai,j (c) = Ai,j (−c). Now, let Ax = b be the original system of equations, and
let S be the solution set, i.e.
S = {x|Ax = b} .
0
Let S be the solution set to CAx = Cb, where C is invertible.
0
If x ∈ S, then Ax = b and clearly we also have CAx = Cb, implying x ∈ S .
0
Thus S ⊂ S .
0
Conversely, if x ∈ S , Then CAx = Cb. But since C is invertible, there exists
0
C −1 , and C −1 CAx = C −1 Cb, implying Ax = b, and x ∈ S. Thus S ⊂ S. Since
0 0
S ⊂ S and S ⊂ S we must have S = S 0 . 
Lemma 10. The following are true
(1) Any matrix can be transformed to row reduced echelon form by a finite
number of elementary matrices
(2) Each matrix has a unique row reduced echelon form
Example 4. By transforming to row reduced echelon form, find all solutions
to    
4 3 1 1 1
2 3 2 0 x = 1 .
0 1 3 1 0
Solution: Multiply the first row by −1/2 and add to the second row
   
4 3 1 1 1
0 3/2 3/2 −1/2 x = 1/2 .
0 1 3 1 0
Scale the first row    
1 3/4 1/4 1/4 1/4
0 3/2 3/2 −1/2 x = 1/2 .
0 1 3 1 0
5. BASIC OPERATIONS ON MATRICIES AND VECTORS 11

Multiply the second row by −1/2 and add to the first row, then multiply the
second row by −2/3 and add to the third row
   
1 0 −1/2 1/2 0
0 3/2 3/2 −1/2 x =  1/2  .
0 0 2 4/3 −1/3
Scale the second row
   
1 0 −1/2 1/2 0
0 1 1 −1/3 x =  1/3  .
0 0 2 4/3 −1/3
add −0.5 times the third row to the second, and 0.25 times the third row to the first,
and scale the third row
   
1 0 0 5/6 −1/12
0 1 0 −1  x =  1/2  .
0 0 1 2/3 −1/6
Solutions:    
−1/12 −5/6
 1/2   1 
 −1/6  + −2/3 a
x=   

0 1
where a is arbitrary.

5. Basic operations on matricies and vectors


You should already be familiar with most of the basic operations on vectors
and matrices. The following table includes the common operations.

x vector or scalar
A matrix
AT transpose of A
A∗ conjugate transpose of A
tr(A) trace of A.
hx, yi inner product between vectors x and y
det(A), |A| determinant of A
A−1 inverse of A

5.1. Transpose. Given a matrix A ∈ Rm×n , the transpose AT ∈ Rn×m is


found by flipping all terms along the diagonal. That is, if
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A= . ..  ,
 
.. ..
 .. . . . 
am1 am2 · · · amn
then  
a11 a21 ··· an1
 a12 a22 ··· an2 
AT =  .
 
.. .. .. ..
 . . . . 
a1m a2m ··· anm
12 1. REVIEW OF VECTORS AND MATRICES

Note that if the matrix is not square (m 6= n), then the “shape” of the matrix
changes. We can also use transpose on a vector x ∈ Rn by considering it to be an
n by 1 matrix. In this case, xT is the 1 by n matrix:
xT = [x]1 [x]2 · · · [x]n .
 

5.1.1. Useful Properties of Transpose.


T
T1 AT =A
T2 (A + B)T = AT + B T
T3 (AB)T = B T AT
These properties can be directly verified by expanding each side.

5.2. Conjugate transpose. If A is a matrix made up of complex numbers,


A ∈ Cn×m , then A∗ applies both transpose and complex conjugate. In particular,
 
ā11 ā21 · · · ān1
 ā12 ā22 · · · ān2 
A∗ =  .
 
.. .. .. 
 .. . . . 
ā1m ā2m · · · ānm
where x̄ is the complex conjugate of x. If A is real, then A∗ = AT . Note that
for a complex column vector x ∈ Cn×1 , x∗ produces a row vector of the complex
conjugate, and x∗ x is always a real number.
5.2.1. Useful Properties of Conjugate Transpose.

T1∗ (A∗ ) = A
T2∗ (A + B)∗ = A∗ + B ∗
T3∗ (AB)∗ = B ∗ A∗
Note the similarity to transpose

5.3. Trace. The trace of a square matrix is the sum of the diagonal elements.
That is, if A ∈ Rn×n , then
n
X
tr(A) = [A]ii .
i=1
5.3.1. Useful Properties of Trace.
Tr1 tr(A) = tr(AT ).
Tr2 For matrices of compatible dimension, tr(AB) = tr(BA)

5.4. Inner (dot) product. In three dimensional space, we are familiar with
vectors as indicating direction. The inner product is an operation that allows us
to tell if two vectors are pointing in a similar direction. We will use the notation
hx, yi for inner product between x and y. In other courses, you may have seen this
called the dot product with notation x · y. The notation used here is more common
in signal processing and control systems. The inner product of x, y ∈ Rn is defined
to be the sum of product of the elements
n
X
hx, yi = [x]i [y]i
i=1
T
= x y
5. BASIC OPERATIONS ON MATRICIES AND VECTORS 13

Recall that if x and y are vectors, the angle between them can be found using the
formula
hx, yi
cos θ = p
hx, xi hy, yi
Note that the inner product satisfies the following rules (inherited from transpose)
hx + y, zi = hx, zi + hy, zi
hαy, zi = α hy, zi

5.5. Determinants. If A is a scalar matrix, that is, A ∈ R1×1 , then the


determinant of A is just equal to itself.
For higher dimensions, the determinant is defined recursively. If A ∈ Rn×n
then for any 1 ≤ j ≤ n
n
X
det(A) = [A]ij cij (5.1)
i=1

where cij is the ijth cofactor, and is the determinant of a Rn−1×n−1 matrix (this is
what makes the definition recursive) possibly times -1. In particular:
cij = (−1)i+j det(Mij ) (5.2)
where Mij is the n − 1 × n − 1 submatrix created by deleting the ith row and
jth column from A. Note that in the definition, j is arbitrary, however, the value
obtained will be the same for every j. In addition, we will also get the same result
with fixed i :
Xn
det(A) = [A]ij cij (5.3)
j=1

Example 5. Find the determinant of


 
a a12
A = 11 .
a21 a22
First, let’s take j = 1 in the definition. Then
[A]11 = a11 c11 = (−1)1+1 det(M11 ) M11 = a22
[A]21 = a21 c21 = (−1)2+1 det(M21 ) M21 = a12
thus
det(A) = [A]11 c11 + [A]21 c21
= a11 a22 − a21 a12
5.5.1. Useful Properties of Determinant.
D1 det(I) = 1
D2 det(AT ) = det(A)
D3 if A is not full rank, det(A) = 0
D4 det(AB) = det(BA) = det(A) det(B)
Property D1 is easy to verify directly. Proofs of properties D2-D4 are covered
in the exercises
14 1. REVIEW OF VECTORS AND MATRICES

5.5.2. Determinants for Block Matrices.


BD1 Very useful: the determinant of block triangular matrices is the product
of the determinant of the diagonal blocks. In particular, if A and D are
square
   
A B A 0
det = det = det(A) det(D)
0 D C D
BD2 If A and D are square and D−1 exists, then
 
A B
det = det(A − BD−1 C) det(D)
C D
5.6. Inverse. Given a square matrix A ∈ Rn×n , the inverse of A is the unique
matrix (when it exists) such that AA−1 = A−1 A = I
The inverse can be calculated as
1 T
A−1 = C
det(A)
where [C]ij = cij , the ijth cofactor of A, given in (5.2). C T is also called the
adjugate of A.
5.6.1. Useful properties of Inverse.
I1 The inverse exists whenever det(A) 6= 0.
I2 (AT )−1 = (A−1 )T
I3 (AB)−1 = B −1 A−1
5.6.2. Inverses for Block Matrices.
BI1 If A and D are square and invertible,
−1  −1
−A−1 BD−1
 
A B A
=
0 D 0 D−1
BI2 If A and D are square and D and ∆ = A − BD−1 C invertible,
−1 
∆−1 −∆−1 BD−1
 
A B
=
C D −D−1 C∆−1 D−1 C∆−1 BD−1 + D−1

6. Exercises
Section 3: Vectors and Matrices
(1) Show that for any vectors x, y ∈ Rn , xT y = y T x.
(2) Using formula (5.1), find the determinant of the following matrices
 
1 5 6
A1 = 0 2 4
0 0 3
 
1 3 0 0
3 1 0 0
A2 =  0 0 4 5

0 0 6 7
Section 2: Proofs (taken from [2])
(1) Prove that 1 + 2 + · · · + n = 12 n(n + 1) for all positive integers
(2) Let Pn denote the assertion “n2 + 5n + 100 is an even integer
(a) Prove that Pn+1 is true whenever Pn is true
6. EXERCISES 15

(b) For which n is Pn actually true? What is the moral of this exercise?
Section 4: Solving Linear Equations
(1) By transforming to row reduced echelon form, find all solutions to
   
1 0 1 1
1 1 0  x = 1 .
0 1 −1 0
Section 5: Basic Operations
(1) Show that det(A−1 ) = det(A)
1

(2) Prove property Tr2. What are the requirements for compatible dimension?
(Hint: square matrices is too restrictive)
(3) Show that for vectors and matrices of compatible dimension, xT BB T x =
tr(B T xxT B). (Hint, use the results from problem 2)
(4) Let
 
Im A
N =
O In
 
Im 0
Q =
−B In
 
Im −A
P =
B In
(a) Explain why det(N ) = det(Q) = 1
(b) Compute N P and QP
(c) Show that det(N P ) = det(Im + AB) and det(QP ) = det(In + BA)
(d) Show that det(Im + AB) = det(In + BA)
−1
(5) Using the results of problem 2, explain why (Im + AB) exists if and only
−1
if (In + BA) exists. Show by verifying the properties of the inverse that
−1
(Im + AB) = Im − A(In + BA)−1 B. (That is, multiply the right hand
side by Im + AB and show that you get the identity)
(6) Show that properties D2 through D4 hold
Hints:
D2: Use the equivalence of (5.1) and (5.3)
D3: First, show that det(EA) = det(AT E T ) = det(E) det(A) whenever
E is an elementary matrix. Then show that any matrix with a row
of zeros has determinant zero. Finally, show that if A is a matrix
with columns that are not linearly independent, then we can find an
elementary matrix E such that AT E T has a row of zeros.
D4: First, if you haven’t already, show that det(EA) = det(E) det(A)
when E is an elementary matrix. Then, show that if A is full rank and
square, its row reduced echelon form is the identity matrix. Finally,
show the result by expanding A and B using elementary matrices.
(7) Verify the block inversion equations BI1 and BI2. (Recall that the inverse
is the unique matrix such that AA−1 = A−1 A = I)
CHAPTER 2

Vector Spaces

Vector spaces play a fundamental role in science and engineering, especially


when dealing with signals and systems. Some additional good additional references
are [3][4][5][6]

1. Vector Space Definition


A vector space (X , F) consists of a set of elements X , called vectors, and a field
F (such as the real numbers) which satisfy the following conditions:
(1) To every pair of vectors x1 and x2 in X , there corresponds a vector x3 =
x1 + x2 in X .
(2) Addition is commutative: x1 + x2 = x2 + x1 .
(3) Addition is associative: (x1 + x2 ) + x3 = x1 + (x2 + x3 ).
(4) X contains a vector, denoted 0, such that 0 + x = x for every x in X .
(5) To every x in X there is a vector x̄ in X such that x + x̄ = 0.
(6) To every α in F, and every x in X , there corresponds a vector αx in X .
(7) Scalar multiplication is associative: For any α, β in F and any x in X ,
α(βx) = (αβ)x.
(8) Scalar multiplication is distributive with respect to vector addition: α(x1 +
x2 ) = αx1 + αx2 .
(9) Scalar multiplication is distributed with respect to scalar addition: (α +
β)x = αx + βx.
(10) For any x in X , 1x = x.
You can verify that Rn (or Cn ) is a vector space. It is interesting to see that
other mathematical objects also qualify to be vector spaces. For example:
Example 6. X =Pn [s], the set of all polynomials with real coefficients with
degree less than n, F = R, with addition and multiplication defined in the usual
way:
if x1 = a1 sn−1 + a2 sn−2 + · · · + an
x2 = b1 sn−1 + b2 sn−2 + · · · + bn ,
then x1 + x2 = (a1 + b1 )sn−1 + (a2 + b2 )sn−1 + · · · + (an + bn )
kx1 = ka1 sn−1 + ka2 sn−2 + · · · + kan
We can show that this is a vector space by verifying that it satisfies the 10 conditions:
(1) Given any x1 = a1 sn−1 + a2 sn−2 + · · · + an and x2 = b1 sn−1 + b2 sn−2 +
· · ·+bn , we see that x1 +x2 = (a1 +b1 )sn−1 +(a2 +b2 )sn−1 +· · ·+(an +bn )
is indeed a polynomial of degree less than n, so x1 + x2 is in X
(2) obvious from definition of addition
(3) obvious from definition of addition
17
18 2. VECTOR SPACES

(4) Select x = 0 as the zero vector


(5) Given x = a1 sn−1 +a2 sn−2 +· · ·+an , select x̄ = −a1 sn−1 −a2 sn−2 −· · ·−an
(6) Given x = a1 sn−1 + a2 sn−2 + · · · + an , we see that ax = aa1 sn−1 +
aa2 sn−2 + · · · + aan is a polynomial of degree less than n, so that ax is in
X
(7) obvious from definition of scalar multiplication
(8) obvious from definition of addition and scalar multiplication
(9) obvious from definition of addition and scalar multiplication
(10) select x = 1 as the unit vector 
Example 7. The space of continuous functions f (s) defined on [0, 1] is a vector
space with F = R. This space is denoted C[0, 1]. It is a vector space primarily
because the sum of two continuous functions is continuous.
Definition 11. A nonempty subset V of a vector space X is a subspace of X
if whenever x, y ∈ V, then αx + βy ∈ V for all scalars α, β.
Note that a subspace will itself satisfy the properties of a vector space, so it is
a vector space by itself, although one contained inside a larger vector space.

2. Dimension and Basis


The maximal number of linearly independent vectors in a vectors space is an
important characteristic of that vector space.
Definition 12. The maximal number of linearly independent vectors in a vec-
tor space is the dimension of the vector space.
Example 8. Show that the dimension of the vector space (R2 , R) is 2.
 T  T
Note that the vectors 1 0 and 0 1 are linearly independent. Thus
the dimension of (R2 , R) is greater than or equal to 2. Given three vectors x1 =
 T  T  T
a b , x2 = c d , x3 = e f , Then we have
α1 x1 + α2 x2 + α3 x3 = 0
if α3 = 1, and α1 and α2 are solutions to the system of equations
α1 a + α2 c = −e
α1 b + α2 d = −f
which always has at least one solution. Thus no set of three vectors are linearly in-
dependent, and the dimension of (R2 , R) is less than 3, implying that the dimension
of (R2 , R) is 2. 
Example 9. The vector space C[0, 1] is infinite dimensional. The Fourier
series functions cos(2πns), sin(2πns), n = 1, · · · , ∞, are an example of an infinite
set of independent vectors.
Definition 13. A set of vectors S = {x1 , x2 , · · · , xp } span the vector space X ,
if every x ∈ X can be written as a linear combination of the elements of S.
Likewise, all linear combinations of a finite set S define a vector space, called
the vector space spanned by S.
Definition 14. A set of linearly independent vectors from a vector space X is
a basis for X if every vector in X can be expressed as a unique linear combination
of these vectors.
2. DIMENSION AND BASIS 19

Note the difference from the definition for span: the set of vectors is assumed
to be linearly indepenent, which implies the linear combinantion will be unique.
In an n-dimensional vector space, any set of n linearly independent vectors
qualifies as a basis. We have seen that there are many different mathematical
objects which qualify at vector spaces. However, all n-dimensional vectors spaces
(X , R) have a one to one correspondence with the vector space (Rn , R) once a basis
has been chosen. Suppose e1 , e2 , · · · , en is a basis for X . Then for all x in X
 
x = e1 e2 · · · en β
 T
where β = β1 β2 · · · βn and βi are scalars. Thus the vector x can be
identified with the unique vector β in Rn .
Example 10. Consider the vector space (P3 [s], R) where P3 [s] is the set of all
real polynomials of degree less than 3. This vector space has dimension 3, with one
basis as e1 = 1, e2 = s, e3 = s2 . The vector x = 2s2 + 3s − 1 can be written as
 
  2
x = e1 e2 e3  3 
−1
 T
so that the representation with respect to this basis is 2 3 −1 . However, if
0 0 0
2
we choose the basis e1 = 1, e2 = s − 1, e3 = s − s, (verify that this set of vectors
is independent,)
x = 2s2 + 3s − 1 = 4 + 5(s − 1) + 2(s2 − s)
 
 0 2
0 0 
= e1 e2 e3  5 
4
 T
so that the representation of x with respect to this basis is 2 5 4 . 
n
2.1. Standard basis. For R , the standard basis are the unit vectors that
point in the direction of each axis
     
1 0 0
0 1  .. 
i1 =  .  , i2 =  .  , · · · in =  .  .
     
 ..   ..  0
0 0 1
2.2. Change of basis. Since the vectors are made up of polynomials which
are mathematical objects quite different from n− tuples of numbers, the ideas of
separation between vectors and their representations with respect to basis is fairly
clear. This becomes more complicated when consider the “native” vector space
(Rn , R). When n = 2, it is natural to visualize these vectors in the plane, as shown
in Figure 1. In order to represent the vector x, we need to choose a basis. The
most natural basis for (R2 , R) is the array
   
1 0
i1 = , i2 = .
0 1
In this basis, we have the following representation for x :
   
2   2
x= = i1 i2 .
3 3
20 2. VECTOR SPACES

Figure 1. Two dimensional vector space. The vector x can be


represented with either basis e1 , e2 or e01 e02 .

Note that the vector and its representation look identical. However, if we choose a
different basis, say    
2 1
e1 = , e2 = .
1 2
then
 1
   
2 
3
x= = e1 e2 4
3
 1 4  3
so the representation of x in this basis is 3 3 .
We have seen that a vector x can have different representations for different
basis. A natural desired operation would be to transform between
 one basis and an-
other. Suppose a vector x has representations with respect to e1 e2 · · · en
 0 0 0
as β and with respect to e1 e2 · · · en as β 0 , so that

 0 0 0
x = e1 e2 · · · en β = e1 e2 · · · en β 0 .
  
(2.1)
What is the relationship between β and β 0 ? The answer is most easily found by
finding the relationship between the bases themselves. Each basis vector has a
representation in the other basis. That is, there exist pi ∈ Rn such that
 0 0 0 
ei = e1 e2 · · · en pi .
If we group the vectors ei and pi into a matrices, we can write
   0 0 0  
e1 e2 · · · en = e1 e2 · · · en p1 p2 ··· pn
 0 0 0 
= e1 e2 · · · en P (2.2)
3. NORMS, DOT PRODUCTS, ORTHONORMAL BASIS 21

where the matrix P takes the vectors pi as its columns. Substituting (2.2) into
(2.1), we get
 0 0 0  0 0 0
e1 e2 · · · en P β = e1 e2 · · · en β 0 .
 

Since the representation of a vector with respect to its basis is unique, we


must
have β 0 = P β. Thus, in order to transform from basis 1 ( e1 e2 · · · en ) to

 0 0 0
basis 2 ( e1 e2 · · · en ), we must form the matrix P, where
 
ith column: the

 representation of 

P =
 basis 1 vector i (ei ) .

 with
 0 respect to basis 2 
0 0
( e1 e2 · · · en )

It turns out that P will always be an invertible matrix, so that β = P −1 β 0 , and we


must have
 
ith column: the
 representation of 
−1
 0 
P = Q =  basis 2 vector i (ei ) 

.
 with respect to basis 1 

( e1 e2 · · · en )

2.3. Final Thoughts. So far, we have been very general in our notion of a
vector space, allowing infinite dimensional vector spaces such as C[0, 1]. Although
many of the results and concepts that follow have extensions to the infinite dimen-
sional case, we will not consider them at this time. In what follows, we will restrict
ourselves to finite dimensional vector spaces. In addition, unless specified, the field
will be the reals, R.

3. Norms, Dot Products, Orthonormal Basis


3.1. Vector norms on Rn . A vector norm, denoted kxk , is a real valued
function of x which is measure of its length. You are probably already familiar
with a common norm defined by the Euclidean length of a vector, but in fact, there
are many possibilities. A valid norm satisfies the following properties:
(1) (Always positive unless x = 0) kxk ≥ 0 for every x and kxk = 0 implies
x=0
(2) (Homogeneity) kαxk = |α| kxk for scalar α.
(3) (Triangle inequality) kx1 + x2 k ≤ kx1 k + kx2 k
The most common vector norms are the following:
3.1.1. 1-norm. The 1-norm is the sum of the absolute value of the elements of
x,

n
X
kxk1 := |[x]i | .
i=1
22 2. VECTOR SPACES

3.1.2. 2-norm. The 2-norm corresponds to Euclidean distance, and is the square
root of the sum of squares of the elements of x,
v
u n
uX 2
kxk2 := t ([x]i ) .
i=1

Note that the sum of squares of the elements can also be written as xT x. Thus

kxk2 = xT x.

3.1.3. ∞-norm. The ∞-norm is simply the largest component of x,

kxk∞ = max |[x]i | .


i

In what follows, if a norm is given without a subscript, i.e. kxk , we will take
the “default” norm as the 2-norm.

3.2. Unit ball. Each norm induces a “unit ball” in Rn given by all vectors
with norm less than or equal to one: B1 = {x : kxk ≤ 1} . Only in the case of the
2-norm does the resulting set resemble a ball, and even then strictly speaking this
only occurs in R3 . For example:
In R1 , the unit ball is the line segment from −1 to 1 for the 1, 2, and ∞-norms
In R2 , the unit ball is a (filled) rotated square, circle, and square for the 1, 2,
and ∞-norms respectively.
In R3 , the unit ball is a (filled) rotated cube, sphere, and cube for the 1, 2, and
∞-norms respectively.

3.3. Dot products, orthogonality and projection. As discussed earlier,


the dot product between two vectors is given by
n
X
hx, yi = [x]i [y]i
i=1
T
= x y.

Note that
p
kxk2 = hx, xi.
If two vectors have a dot product of zero, then they are said to be orthogonal. A
set of vectors {xi } which are pair-wise orthogonal and unit 2-norm are said to be
orthonormal and will satisfy

T 1 i=j
hxi , xj i = xi xj = .
0 i 6= j

The projection of one vector (say x) on another (say y) is given by

hx, yi
z= 2 y.
kyk
3. NORMS, DOT PRODUCTS, ORTHONORMAL BASIS 23

The vector z points in the same direction as y, but the length is chosen so that the
difference between z and x is orthogonal to y.
* +
hx, yi
hz − x, yi = 2 y − x, y
kyk
hx, yi
= 2 hy, yi − hx, yi
kyk
= 0,
2
since hy, yi = kyk .

3.4. Orthonormal Basis - Gram-Schmidt Procedure. An orthonormal


basis is a vector space basis which is also orthonormal. Operations are often much
easier when vectors are defined using an orthonormal basis. The Gram-Schimidt
procedure can be used to transform a general basis into an orthonormal basis. It
does so by building up the orthonormal basis one vector at a time.
Suppose we had a basis of two vectors {e1 , e2 }. We can make an orthonormal
basis as follows: Set the first basis vector to point in the same direction as e1 , but
with unit length:
e1
q1 = .
ke1 k
We need to pick a second vector which is orthogonal to q1 , but spans the same
space as {e1 , e2 }. This can be done by subtracting the part of e2 which points in
the same direction as q1 . Let
u2 = e2 − hq1 , e2 i q1 .
Then
hq1 , u2 i = hq1 , e2 − hq1 , e2 i q1 i
= hq1 , e2 i − hq1 , hq1 , e2 i q1 i
= hq1 , e2 i − hq1 , e2 i hq1 , q1 i
= hq1 , e2 i − hq1 , e2 i
= 0
since hq1 , q1 i = 1. Thus u2 is orthogonal to q1 . We can get an orthonormal set by
letting q2 = kee22 k . Yet,
" #
    ke1 k − hqku1 ,e2 i
k
q1 q2 = e1 e2 1
1
2 ,
0 ku2 k

which is clearly an invertible change of basis.


The general procedure is as follows. Let {e1 , · · · , en } be a basis. Let
u1 = e1 q1 = kuu11 k
u2
u2 = e2 − hq1 , e2 i q1 q2 = ku2k
.. ..
. .
Pn−1
un = en − k=1 hqk , en i qk qn = kuunn k
The orthonormal basis given by {q1 , · · · , qn } spans the same space as {e1 , · · · , en }
24 2. VECTOR SPACES

y
y − Ax

Ax
a2

a1

Figure 2. Illustration of the projection theorem

4. Projection Theorem
The close connection between inner product and the 2-norm comes into play in
vector minimization problems that involve the 2-norm. Suppose we have a collection
of n vectors in Rm as columns of a matrix A ∈ Rm×n , and a vector y ∈ Rm . We
would like to find the weights x ∈ Rn so that Ax is a vector that is as close as
possible to y. That is, we have the following problem:
min kAx − yk (4.1)
x∈Rn
Useful facts:
(1) There is always a solution to this minimization problem (i.e a minimizer
x exists).
(2) A solution can be found using inner products.
We will take the first fact as given. However, let’s try to establish the second.
As an example, suppose A has two columns, a1 and a2 . These vectors span a two
dimensional space, indicated by the plane in Figure 2. By selecting different values
for x, one can choose different vectors Ax that will all lie in that plane. If the
vector y does not lie in the plane, then an exact match cannot occur. However,
we can find the vector which minimizes the error y − Ax. The figure suggests that
the minimum error occurs when y − Ax is orthogonal to the plane spanned by the
vectors in A. This intuition is formalized in the projection theorem.
Theorem 15. (Projection Theorem) x̄ is a minimizer of (4.1) if and only if
hy − Ax̄, Axi = 0 for all x ∈ Rn .
Proof. (if) Suppose x̄ satisfies hy − Ax̄, Axi = 0. Let x◦ be another vector in
n
R .
2 2
kAx◦ − yk = kA(x̄ + x◦ − x̄) − yk
2
= kAx̄ − y + A(x◦ − x̄)k
2 2
= kAx̄ − yk + 2 hAx̄ − y, A(x◦ − x̄)i + kA(x◦ − x̄)k .
5. ORTHOGONAL SUBSPACES 25

Now, since x◦ − x̄ is a vector in Rn , we have that hAx̄ − y, A(x◦ − x̄)i = 0, and


2 2 2
kAx◦ − yk = kAx̄ − yk + kA(x◦ − x̄)k
since kA(x◦ − x̄)k ≥ 0, kAx◦ − yk ≥ kAx̄ − yk . Thus x̄ is a minimizer.
(only if) Now, suppose x̂ does not satisfy hy − Ax̂, Axi = 0 for some xd ∈ Rn .
Without loss of generality, we may take kAxd k = 1 and hy − Ax̂, Axd i = δ. Then
2 2 2
kA(x̂ + δxd ) − yk = kAx̂ − yk + 2δ hAx̂ − y, Axd i + δ 2 kAxd k
2
= kAx̂ − yk − 2δ 2 + δ 2
2
= kAx̂ − yk − δ 2 .
so that x̂ + δxd has a smaller error than x̂, and x̂ is not a minimizer. 
The projection theorem gives us a method for characterizing solutions using
linear equations.
Corollary 16. Minimizers of kAx − yk satisfy
AT y − AT Ax = 0.

Proof. Let x̂ satisfy AT y = AT Ax̂. Then hy − Ax̂, Axi = y T A − x̂T AT A x =
0, which satisfies the condition of the projection theorem. 

5. Orthogonal Subspaces
Suppose we have finite dimensional vector space on which an inner product has
been defined, and a set of linearly independent vectors from that vector space. The
linearly independent vectors may not be a basis for the entire vector space. In that
case, they are a basis for a subspace, and we can decompose the vector space into
this subspace, and the set of vectors that are orthogonal to all the vectors in this
subspace.
n
 {x1 , · · · , xp } be a set of linearly independent vectors in R , p < n. Let
Let
X = x1 · · · xp be a matrix whose columns are the set of vectors, and let X ⊂
Rn be the subspace for which these vectors are a basis, that is
X = {x : ∃α ∈ Rp such that x = Xα} .
For every vector v in Rn , by the projection theorem, there exists an α, and thus a
vector x ∈ X such that the distance from v to X is minimized. The α is given by
solution to
X T v = X T Xα.
Since we assumed that the xi are linearly independent, X T X is invertible, thus
α = (X T X)−1 X T v
and the corresponding vector x ∈ X is
x = Xα
= X(X T X)−1 X T v.
We define
ΠX := X(X T X)−1 X T . (5.1)
Definition 17. The operation ΠX v is the projection of v on X , (or equiva-
lently, the columns of the matrix X.)
26 2. VECTOR SPACES

Let X ⊥ be the subspace of vectors orthogonal to X , that is


X ⊥ = {x : hx, Xαi = 0 ∀α ∈ Rp } .
We define
Π̄X = I − X(X T X)−1 X T . (5.2)
Note that Π̄X ΠX = 0.
Theorem 18. The operation Π̄X v is the projection of v on X ⊥ , that is, it
achieves the minimum of
min kx − vk
x∈X ⊥

Proof. First, we verify that Π̄X v ∈ X ⊥ . Note that Π̄X v, Xα = Xα −



X(X T X)−1 X T Xα = Xα
− Xα = 0.
Next, we show that v − Π̄X v, Π̄X v̄ = 0 for all v, v̄ ∈ V. Note that v − Π̄X v =
(I − Π̄X )v = ΠX v. But since ΠX Π̄X = 0, v T ΠX Π̄X v̄ = 0. 
Every vector can be decomposed into the sum of a unique vector in X and a
unique vector in X ⊥ . This follows from Π̄X + ΠX = I.
6. EXERCISES 27

6. Exercises
Section 1: Vector Space
(1) Show that X =C[0, 1], the set of all continuous functions defined on the
real line between 0 and 1 is a vector space withF = R, if addition and
multiplication are defined as x1 = f (t), x2 = g(t), x1 + x2 = f (t) + g(t),
ax1 = af (t).
(2) Show that X =Cn , the set of all n−tuples of complex numbers is a vector
space with F = C, the field of complex numbers
(3) Show that X =Rn×m , the set of n × m matricies, is a vector space with
F = R.
(4) Show that X =Rn , F = C is not a vector space.
(5) Show that X ={x(t) : ẍ + ẋ + x = 0} is a vector space. with F = R.
Section 2: Linear Independence and Basis
(1) Find the dimension of the vector space given by all (real) linear combina-
tions of      
1 3 4
x1 = 2 , x2 = 2 , x3 = 4 ,
3 2 5
That is,
X = {x : x = α1 x1 + α2 x2 + α3 x3 , αi ∈ R}
This is called the vector space spanned by {x1 , x2 , x3 } .
(2) Show that the space of all solutions to the differential equation
ẍ + 3ẋ + x = 0 t ≥ 0
is a 2 dimensional vector space.
(3) Given
     
2 1 0 0 0 1
0 2 1 0 0 1
A=   b= 
 b̄ =  

0 0 2 0 1 1
0 0 0 1 1 1
find the representations
 for thecolumns of A with respect to the basis
b, Ab, A2 b, A3 b and the basis b̄, Ab̄, A2 b̄, A3 b̄ .
Section 3: Norms, Dot Products and Orthonormal Basis
(1) Verify that the triangle inequality holds for the 1, 2 and ∞ vector norms.
(2) Find an orthogonal basis for the following vectors:
     
1 1 1
2 1 1
x1 = 1 x2 = 1 x3 = 1
    

0 0 1
(3) Show that
tr AT A


is a norm on the vector space of real matricies, Rm×n .


Section 4: Projection Theorem
28 2. VECTOR SPACES

(1) Find all solutions to


min kAx − yk
x∈Rn
when
 
1 2 3
A = 4 5 6
7 8 9
 
1
y = 2
3
CHAPTER 3

Matrices as Mappings

So far, we have emphasized the view of matrices as collections of vectors. We


can also view them as representing linear mappings between vector spaces.
Consider a linear map LA : X → Y where X and Y are finite dimensional vector
spaces with field R. Suppose the vectors e1 , e2 , · · · , em form a basis for X , and
0 0 0
e1 , e2 , · · · , en form a basis for Y. For i = 1, · · · , m let ai ∈ Rn be the representation
0
of LA (ei ) in the coordinates ei . Then the mapping LA can be represented by the
matrix A ∈ Rn×m ,  
A = a1 a2 · · · am
in the sense that if x ∈ Rm is the representation of x in basis ei , then y = Ax
0
is the representation of y in basis ei . Since once a basis is chosen, we can always
talk about elements of a finite dimensional vector spaces (with field R) in terms of
the representation as a vector in Rm , all further statements will be in terms of the
representation.

1. Range and Null Space


Some aspects of the mapping can be characterized in terms of range space and
null space.
Definition 19. The range space (or just range) of A ∈ Rm×n is the subset of
m
R to which vectors are mapped by M , that is
R(A) = {y : y = Ax, x ∈ Rn } .
The dimension of the range space is called the rank of a matrix. Clearly, we
always have for A ∈ Rm×n rank(A) ≤ min(m, n). If we have two matrices A ∈ Rm×n
and B ∈ Rn×p , a composite mapping AB ∈ Rm×p can be formed. Note that this
requires that B maps vectors to a space with dimension equal to the domain of
A. In other words, the number of columns of A must be the same as the number
of rows of B (in this case m). The rank of the composite mapping satisfies the
following properties.
Theorem 20. [4] (rank(A)+rank(B))−n ≤ rank(AB) ≤ min(rank(A), rank(B))
In addition, we have the following results concening rank:
Theorem 21. [4] The following hold for A ∈ Rm×n , B ∈ Rm×m , C ∈ Rn×n
(1) rank(A) = rank(AT )
(2) The rank of A is given by the maximal number of linearly independent
columns (or rows)
(3) rank(AT A) = rank(A)
(4) If B and C are invertible, then rank(A) = rank(BA) = rank(AC) =
rank(BAC)
29
30 3. MATRICES AS MAPPINGS

In matrix notation, we can say that the column vectors of A are linearly inde-
pendent if and only if Ax = 0 implies x = 0. If a matrix does not consist of linearly
independent column vectors, then there exists a set of x such that Ax = 0. This
set is a subspace of Rm , and is called a null space.
Definition 22. The null space of A is the subset of Rn which is mapped to
the zero vector, that is
N (A) = {x : 0 = Ax, x ∈ Rn } .

2. Eigenvalues and Eigenvectors


If A is a square matrix, i.e. A ∈ Rn×n , then A maps vectors back to the same
space. These matrices can be characterized by their eigenvalues and eigenvectors
Definition 23. Given matrix A ∈ Rn×n , (or Cn×n ) if there exists scalar λ ∈ C
and vector x ∈ Cn not equal to zero such that
Ax = λx
then λ is an eigenvalue of A, and x is an eigenvector of A.
To find eigenvalues and eigenvectors, we can use the concept of rank. If λ is an
eigenvalue of A, then there exists x such that
Ax − λx = 0
or
(A − λI)x = 0.
From above, an x 6= 0 only exists if the rank of (A − λI) is less than n. Recall
that the rank of a square matrix is less than n if and only if the determinant of the
matrix is zero. This implies that the eigenvalues of A satisfy
det(A − λI) = 0.
The function det(A − λI) is always a polynominal of degree n, and is called the
characteristic polynominal of A.
Note that an eigenvector does not have a unique length, for if x satisfies (A −
λI)x = 0, so does αx for any α. When we speak of “the” eigenvector, we are actually
talking about a unique direction, which can be made into a unique vector through
normalization, for example, by normalizing all eigenvectors to have unit length.
Example 11. Let  
1 4
A= .
1 1
Then  
1−λ 4
A − λI =
1 1−λ
and
det(A − λI) = (1 − λ)(1 − λ) − 4
= λ2 − 2λ − 3
= (λ − 3)(λ + 1).
Thus the eigenvalues are 3 and −1.
2. EIGENVALUES AND EIGENVECTORS 31

For λ = −1, we find the corresponding eigenvector via


(A + I) x = 0
  
2 4 x1
= 0
1 2 x2
 T
which has solution x = −2 1 . The unique eigenvector with unit length is x =
h iT
− √25 √15 .
For λ = 3, we find the corresponding eigenvector via
(A − 3I) x = 0
  
−2 4 x1
= 0
1 −2 x2
 T
which has solution x = 2 1 . The unique eigenvector with unit length is x =
h iT
√2 √1 . 
5 5

Often, an n × n matrix will have n unique eigenvectors, however, if eigenvalues


are repeated, this might not be the case.
Example 12. Consider the matrix
 
1 1
A=
0 1
2
the characteristic polynomial is (λ − 1) , thus this matrix has a repeated eigenvalue
at 1. However,  
0 1
(A − I) =
0 0
has a null space of dimension 1, thus only one eigenvector exists, given by x =
 T
1 0 . 
Example 13. Consider the matrix
 
1 0
A=
0 1
The characteristic polynomial is also (λ − 1)2 , but in this case, any vector x is an
eigenvector, and we can select two that are linearly independent. 
In fact, since the set of vectors associated with a particular eigenvalue is always
a subspace (defined by the null-space of (A − λI)) if multiple eigenvectors exists for
a single eigenvalue, it is always possible to choose them so that they are orthogonal.
Definition 24. A vector v is called a generalized eigenvector of grade n if
(A − λI)n v = 0
and
(A − λI)n−1 v 6= 0
Example 14. We saw that the matrix
 
1 1
A=
0 1
32 3. MATRICES AS MAPPINGS

had only one eigenvector. However,


 
2 0
0
(A − I) =
0
0
 T  T  T
and note that x = 0 1 satisfies (A − I)2 x = 0 0 and (A − I)x = 1 0 .
Thus x is a generalized eigenvector of grade 2. 
Theorem 25. For any n × n matrix, it is possible to select a linearly indepen-
dent set of size n consisting of eigenvectors and generalized eigenvectors.

3. Symmetric Matrices
n×n
If A ∈ R is symmetric, i.e. A = AT , then the eigenvalues and eigenvectors
satisfy some special properties. (Similar results hold for complex matrices which
are Hermitian, or conjugate symmetric, but we will only consider the case of real
matrices).
3.1. Eigenvalues and Eigenvectors. Recall that even if x is a complex
vector, x∗ x is a real scalar. It is also easy to verify that if A is symmetric, x∗ Ax is
also a real scalar.
Theorem 26. The eigenvalues of a symmetric matrix are real
Proof. Let A be a symmetric matrix and let λ, x be an eigenvalue-eigenvector
pair for A (possibly complex). Then
Ax = λx
Multiply both sides of the equation by x∗
x∗ Ax = λx∗ x
Since both x∗ x and (x∗ Ax) are real, so is λ 
n×n
Theorem 27. A symmetric matrix A ∈ R has no generalized eigenvectors
Proof. If there exists x, a generalized eigenvector of grade 2, then
(A − λI)2 x = 0
(A − λI)x 6= 0
must be satisfied. Let y = (A − λI)x. Since y 6= 0, y T y 6= 0. Also
yT y = xT (A − λI)T (A − λI)x
= xT (AT − λI T )(A − λI)x
= xT (A − λI)(A − λI)x
= xT (A − λI)2 x,
but we have (A − λI)2 x = 0, implying that y T y = 0, a contradiction. Since there is
no generalized eigenvectors of grade 2, there cannot be any generalized eigenvectors
of higher dimension. 
Corollary 28. For every symmetric matrix, there exists a set of n linearly
independent eigenvectors
Theorem 29. For a symmetric matrix, there exists a set of n orthogonal eigen-
vectors.
4. MATRIX NORMS 33

Proof. We have already shown that eigenvectors associated with the same
eigenvalue can be chosen to be orthogonal. It remains to show that any two eigen-
vectors associated with different eigenvalues are orthogonal. Let A be a symmetric
matrix with eigenvalues λ1 6= λ2 and eigenvectors v1 , v2 . Then
Av1 = λ1 v1 ,
Av2 = λ2 v2 .
By multiplying the first equation by v2T and the second by v1T ,
v2T Av1 = λ1 v2T v1 ,
v1T Av2 = λ2 v1T v2 .
Since A is symmetric, v2T Av1 = v1T Av2 , thus λ1 v2T v1 = λ2 v1T v2 . But since v2T v1 is a
scalar, v2T v1 = v1T v2 . Since λ1 6= λ2 , we must have v2T v1 = 0. 

Corollary 30. A symmetric matrix A ∈ Rn×n has an eigenvector/eigenvalue


decomposition
A = U ΛU T
where Λ is a diagonal matrix with the eigenvalues of A along the diagonal, and U is
an orthonormal matrix containing the eigenvectors of A normalized to unit length.
In what follows, when considering an eigenvector/eigenvalue decomposition, we
will assume that the eigenvalues are arranged along Λ in decending order.

3.2. Quadratic Forms. Given a symmetric matrix A ∈ Rn×n and a vector


x ∈ Rn , the vector function V (x) = xT Ax is called a quadratic form. If xT Ax > 0
for all x 6= 0, this quadratic form (and the matrix A) is called positive definite. From
the eigenvector/eigenvalue decomposition of A, we see that xT Ax = xT U ΛU T x. If
Pn 2
we let y = U T x, the xT Ax = y T Λy = i=1 ([y]i ) λi , and this will be positive if
T
y 6= 0 and λi > 0. Since U is invertible, we have the result that a quadratic form
is positive definite if and only if its associated matrix has all positive eigenvalues.
When a quadratic form is positive definite, the level sets xT Ax = c define (hyper)-
ellipses.

4. Matrix Norms
4.1. Matrix as a Vector. A matrix A can be viewed as a member of the
vector space Rm×n . In particular, if we stack the columns of A into a single column,
we obtain a vector of size mn. The 2-norm applied to this stacked vector, which is
the square root of the sum of the square of the elements, is given the special name
of Frobenius norm and is notated as
r 2
X
kAkF := [a]ij .

The following is a useful property of the Frobenius norm


Lemma 31. If U, V are orthonormal matrices of compatible size (satisfying
U T U = I and V V T = I), then kU AkF = kAV kF = kU AV kF = kAkF . That is,
the Frobenius norm is invariant over multiplication by orthonormal matrices.
34 3. MATRICES AS MAPPINGS

Proof. Note that


√ √
kAkF = trAAT = trAT A

Then

kU AkF = trAT U T U A

= trAT A
= kAkF

and

kAV kF = trAV V T AT

= trAAT
= kAkF

while kU AV kF = kAkF follows from application of both results. 

4.2. Matrix as a Mapping. If we view A as a linear mapping between finite


dimensional vector spaces, another type of norm can be defined by measuring the
maximum amplification possible as vectors are mapped through A. If y = Ax, and
we assign a vector norms k·kx and k·ky , the induced norm of A is defined to be

kAxky
kAkixy = max = max kAxky
kxkx 6=0 kxkx kxkx =1

Example 15. Let A ∈ Rn×m , and k·kx = k·k1 and k·ky = k·k1 . We will show
that
n
!
X
kAki1 = max |[a]ij |
j
i=1
= max column sum.

If y = Ax, then
m
X
kyk1 = |[y]i |
i=1
Xm X n
= |[A]ij [x]j |
i=1 j=1

If x = ej (unit
Pm vector with one in the jth element and zeros elsewhere) then kxk1 = 1
and kyk1 = i=1 |[A]ij | . Thus
m
X
kAki1 ≥ |[A]ij | j = 1, · · · , m
i=1
m
X
≥ max |[A]ij |
j
i=1
5. SINGULAR VALUE DECOMPOSITION 35

on the other hand,


m X
X n
kyk1 ≤ |[A]ij | |[x]j |
i=1 j=1
Xn m
X
≤ |[x]j | |[A]ij |
j=1 i=1
m
!
X
≤ kxk1 max |[A]ij |
j
i=1
Pm
Thus, kAki1 ≤ (maxj i=1 |[A]ij |) . Taken with the previous inequality, this implies
Pm
kAki1 = (maxj i=1 |[A]ij |) . 
Example 16. Let A ∈ Rn×m , and k·kx = k·k∞ and k·ky = k·k∞ . Then
 
m
X

kAki∞ = max  [a]ij 

i j=1
= max row sum
This is shown in the same way as above. In this case, the max gain occurs for
 T
x = 1 1 ··· 1 . 
Example 17. If A is orthonormal, show that kAki2 = 1.
√ √
If y = Ax, kyk2 = xAT Ax = xx = kxk2 . 
4.3. Matrix norm properties. You can verify that the induced matrix norm
satisfies all of the properties of a norm, including the triangle inequality kA + Bki ≤
kAki + kBki . In addition, from the definition, it is clear that kAxkv ≤ kAki kxkv
and kABki ≤ kAki kBki where “v” indicates a vector norm.

5. Singular Value Decomposition


When 2-norm is used for the vector norm, the induced can be calculated using
the singular value decomposition. In addition this gives a basis for both the range
and null spaces.
Theorem 32. Singular Value Decomposition (SVD) Given real matrix A ∈
Rm×n , there exist orthonormal matrices U ∈ Rm×m and V ∈ Rn×n such that
U T AV = Σ
where Σ ∈ Rm×n is a diagonal matrix with non-negative elements σi along the
diagonal and kAki2 = maxi σi
Proof. Let σ1 = kAki2 . Since Rn is finite dimensional, there exists x1 ∈ Rn
such that kAx1 ki2 = σ1 . Let y1 = Ax
σ1 . Note that ky1 k2 = 1. Select Ṽ1 and Ũ1 such
1

that
 
V1 = x1 Ṽ1
 
U1 = y1 Ũ1
36 3. MATRICES AS MAPPINGS

are orthonormal. Then


y1T
 
U1T AV1
 
= T A x1 Ṽ1
Ũ1
 T
y1  
= σ1 y1 AṼ1
Ũ1T
*1
 
σ1 y1T
y1 y1T AṼ1

=  0 
T>
 T
Ũ1 y Ũ1 AṼ1
  T

σ1 w
 0 
=  ≡ A1
 
 ..
 . B
0

where wT = y1T AṼ1 , B = Ũ1T AṼ1 . Now,


   2
σ1 + w T w

A1 σ1 =


w 2 Bw
2
≥ σ12 + wT w .


Thus
 
A1 σ1


w 2 σ12 + wT w
kA1 ki2 ≥  
σ1 =p 2
σ1 + wT w
w
2
q
≥ 2
σ1 + wT w.

However, kA1 ki2 = U1T AV1 i2 ≤ U1T i2 kAki2 kV1 ki2 = σ1 , since U1 and V1 are
orthonormal. Hence
q
σ12 + wT w ≤ kA1 ki2 ≤ σ1

and we must have wT w = 0, kA1 ki2 = σ1 , and


 
σ1 0 ··· 0
 0 
U1T AV1 =  . .
 
 .. B 
0

Now, the same reasoning can be applied to B, i.e., orthonormal U2 , V2 exist so


that
 
σ2 0 ··· 0
 0 
U2T BV2 =  . ,
 
 .. C 
0
5. SINGULAR VALUE DECOMPOSITION 37

and thus
 
σ1 0 0 ··· 0
      0 σ2 0 ··· 0 
1 0 1 0

T  0 0
U A V = .

1
0 U2T 1 0 V2

 . ..
} | {z }  .. C

| {z . 
orthonormal orthonormal
0 0
Continuing in this way, the result follows. 
The terms σ1 , σ2 , · · · , σmin(m,n) are the singular values of A. In the above con-
structive proof, these terms will be be a decreasing sequence of non-negative real
numbers. However, unless m = n and no σi 6= σi+1 , there will be several choices
for the orthonormal matrices U and V, thus it is not possible, strictly speaking,
to speak of “the” singular value decomposition of a matrix. In most applications,
however, this non-uniqueness will not be a issue, and we will consider “the singular
value decomposition of A” to be a singular value decomposition with decreasing
singular values along the diagonal of Σ for any of the possible choices for U and V.
The singular values can be used to find the Frobenius norm for a matrix.
q
Pmin(m,n) 2
Theorem 33. kAkF = i=1 σi where σi are the singular values of A.
Proof. The Frobenius norm is invariant to multiplication by orthonormal ma-
tricies. Thus
= U ΣV T

kAk F F
= kΣkF
v
umin(m,n)
u X
= t σ2 i
i=1

since the singular values are the only non-zero elements of Σ. 


By examining the singular value decomposition, the range space, null space,
and rank of a matrix are easy to find. Since the inverse of orthonormal matrices is
the same as the transpose, we can decompose any matrix A as
A = U ΣV T .
Let Σ̄ ∈ Rp×p denote a square diagonal matrix that contains the singular values
that are non-zero. Clearly p ≤ min(m, n). Then by decomposing U and V in a
compatible manner, we obtain either
 Σ̄ 0 V̄ T
  

A = Ū Ũ
0 0 Ṽ T
if p < min(m, n), or
 V̄ T
 

A = Ū Σ̄ 0
Ṽ T
(with Ū = U ) if p = n or  
 Σ̄
V̄ T

A = Ū Ũ
0
(with V̄ = V ) if p = m.
38 3. MATRICES AS MAPPINGS

In all cases, we can write


A = Ū Σ̄V̄ T (5.1)
n×p p×m
where Ū ∈ R , V̄ ∈ R .
Theorem 34. The following is true for (5.1).
(1) The rank of A is p.
(2) The range space of A is spanned by Ū , (and orthogonal to Ũ .)
(3) The null space of A is orthogonal to V̄ , (and spanned by Ṽ .)
(4) The projection of a vector in Rn onto the range space of A is ΠA := Ū Ū T .
(5) The projection of a vector in Rm onto the null space of A is ΠN (A) = Ṽ Ṽ T .
Proof. Clearly the range space of A is contained in the span of the columns
of Ū . By choosing x as the ith column of V̄ , since V̄ contains orthonormal columns,
we obtain Ax = ui σi thus the range space of A contains the columns of Ū . Ū is
orthogonal to Ũ as U is orthonormal. Since Ū contains p columns, the dimension
of the range space (and rank of A) is p.
Now, x is in the null space of A if 0 = Ax. We have Ax = Ū Σ̄V̄ T x, and Ax = 0
if and only if V̄ T x = 0. Since V is orthonormal, this is equivalent to x = Ṽ α for
some α. Thus Ṽ spans the null space of A.
The projection formula follow from (2.5.1), by noting Ū T Ū = I and Ṽ T Ṽ =
I. 
The singular values can also be viewed as eigenvalues of AAT or AT A.
Theorem 35. The square of the singular values of A are equal (possibly modulo
some extra eigenvalues of zero) to
(1) The eigenvalues of AAT .
(2) The eigenvalues of AT A.
In addition
(1) The columns of U are the eigenvectors of AAT .
(2) The columns of V are the eigenvectors of AT A.
Proof. Using the singular value decomposition,
AAT = (U ΣV T )(U ΣV T )T
= U ΣV T V ΣT U T
= U Σ2 U T
where Σ2 = ΣΣT is an n × n diagonal matrix with the singular values square along
the diagonal, plus extra zeros if n ≥ m. Thus
AAT U = U Σ2
so that column i of U is an eigenvector of AAT with eigenvalue equal to σi2 i ≤ m
or 0 for i > m.
The proof for AAT is similar. 
5.0.1. MATLAB. MATLAB has commands to find the range space, null space
and rank of a matrix that use the SVD as their basis. Consider the matrix
 
3 4 3
A = 6 8 6 .
9 12 9
6. SVD APPLICATION: SOLUTIONS TO SYSTEMS OF LINEAR EQUATIONS 39

The command orth finds a basis for the range space of A


>> orth(A)

ans =

-0.2673
-0.5345
-0.8018

The command null finds a basis for the null space of A


>> null(A)

ans =
0.7107 0.4798
-0.0054 -0.7276
-0.7035 0.4903

The command rank find the rank of A


>> rank(A)

ans =
1

6. SVD Application: Solutions to Systems of Linear Equations


Given A ∈ Rn×m , y ∈ Rn , consider the linear equation
Ax = y. (6.1)
In this section, we will
(1) Determine if there is a solution for a particular y.
(2) Determine if there is a solution for every y.
(3) Determine if the solution is unique.
(4) If the solution exists, find it.

6.1. Existence and uniqueness of a solution. The first three questions


are answered in the following theorem, stated without proof.

Theorem 36. Consider equation (6.1) with A given.


(1) For a fixed y, a solution exists if an only ify lies in the range space of A,
 
or equivalently rank A = rank A y .  
(2) A solution exists for every y if and only if rank  A = n.
(3) The solution is unique if and only if rank A = m.

It remains to find solutions, when they exist.


40 3. MATRICES AS MAPPINGS

6.1.1. Finding the solution, case 1: Unique solution exists. When a unique
solution exists, A has full column rank. However, A may not be square, and thus
not invertible. However, if we multiply both sides of (6.1) by AT , we obtain
AT y = AT Ax.
Note that AT A is an m × m matrix. Does it have rank m, and thus is invertible?
Note that if A has full column rank is has no null space, so that Ax = 0 only if
2
x = 0. This means that kAxk = xAT Ax = 0 only if x = 0, and thus AT Ax = 0
T
only if x = 0, and A A does indeed have full rank m, and is invertible. Thus the
solution is given by
x = (AT A)−1 AT y.
6.1.2. Finding the solution, case 2: Unique solution does not exist. When a
unique solution does not exist, A no longer has full column rank, and thus a null
space. First, we note that all solutions are given by
x = x0 + N α
where x0 is one solution that satisfies y = Ax0 , N is a matrix whose columns spans
the null space of A, and α is arbitrary. The complete set of solutions is easily found
using the singular value decomposition. Suppose
 Σ̄ 0 V̄ T
  

A = Ū Ũ .
0 0 Ṽ T
Theorem 37. If at least one solution to (6.1) exists, they are all given by
V̄ Σ̄−1 Ū T y + Ṽ α
where α is an arbitrary vector of compatible dimension.
Proof. We have already seen that N = Ṽ . . Since A = Ū Σ̄V̄ T ,
Ax0 = Ū Σ̄V̄ T V̄ Σ̄−1 Ū T y.
Note that V̄ T V̄ is a collection of inner products of orthonormal vectors, and thus
V̄ T V̄ = I, giving
Ax0 = Ū Ū T y.
Since Ū spans the range space of A, and y must be in the range space for a solution
to exist, we must have Ū Ū T y = y, and the result follows. 

The matrix A+ = V̄ Σ̄−1 Ū T is called the pseduo-inverse of A. When a solution


does not exist, it can be used to find the approximate solution with the smallest
2-norm.

7. SVD Application: Best rank k approximation


Suppose we wished to find a matrix M ∈ Rn×m which is known to be rank
k < min(n, m), but we have only available a noisy measurement A = M + E where
E is an unknown perturbation. In general, A will be rank min(n, m). We can use
our knowledge of the true rank in order to recover a better estimate. In particular,
we can try to find
2
min A − Â (7.1)

rank(Â)=k F
7. SVD APPLICATION: BEST RANK K APPROXIMATION 41

So that  is the closest rank k matrix to A as measured by the Frobenius norm. The
solution can be found using the singular value decomposition. First, a preliminary
result
Lemma 38. Given A, B ∈ Rm,n ,
min(m,n)
2
X 2
kA − BkF ≥ σiA − σiB .
i=1

Proof. Let UA ΣA VAT be the singular value decomposition of A and UB ΣB VBT


the singular value decomposition of B. Then, since the Frobenius norm is invariant
to multiplication by orthonormal matrices,
2 2
kA − BkF = ΣA − UAT BVA F

min(m,n) min(m,n)
X 2 X 2
σiA − 2tr ΣA UAT UB ΣB VBT VA + σiB

=
i=1 i=1
 Pmin(m,n)
Now, tr ΣA UAT UB ΣB VBT V = σiA αii where αii = B
P
i=1 j uij vij σj and
uij , vij are the dot products of the i, j columns of UA , P
UB and VA , VB respectively.
Since these are orthonormal matrices, we must have j u2ij = 1 and
P 2
vij = 1.
With these constraints, it can be shown that the trace term is maximized over all
possible choices for uij , vij . when

1 i=j
uij = vij =
0 otherwise
 Pmin(m,n) A B
so that αii = σiB . Thus tr ΣUAT UB ΣB VBT V ≤ i=1 σi σi and the result
follows. 

Let the SVD of A be U ΣV T , where it is assumed that the singular values form
a non-increasing sequence along the diagonal of Σ. Now, write this SVD as
 Σ̄ 0 V̄ T
  

A = Ū Ũ
0 Σ̃ Ṽ T

where Σ̄ is chosen to be k × k, and the rest of the SVD is decomposed compatibly.

Theorem 39. The solution to (7.1) is Â0 = Ū Σ̄V̄ T , and


2 min(n,m)
X
2
min A − Â = σk+1 ,

rank(Â)=k F
i=k+1

the sum of the min(n, m) − k smallest singular values of N.


Proof. Since a rank k matrix has only k non-zero singular values, from Lemma
38 we must have
min(m,n)
2
X 2
min kA − BkF ≥ σiA .
rank(B)=k i=k+1

It is easy to see that Â0 = Ū Σ̄V̄ T achieves this minimum. 


42 3. MATRICES AS MAPPINGS

8. Exercises
Section 1: Rank
(1) Show that a rank 1 matrix A ∈ Rm×n can be written as

A = xy T
where x and y are vectors of appropriate dimension.
(2) Find the dimension of the range space and null space for the following
matricies:
     
0 1 0 4 1 −1 1 2 3 4
A1 = 0 0 0 A2 = 3 2 0  A4 = 0 −1 −2 2
0 0 1 1 1 0 0 0 0 1
Section 2: Eigenvalues and Eigenvectors
(1) Let λi for i = 1, · · · , n be the eigenvalues for an n × n matrix A. Show
that
Yn
det A = λi
i=1

Hint:Consider the constant term of the polynomial det(λI − A).


(2) Prove that a square matrix is full rank if and only if there is no zero
eigenvalue.
(3) Consider the matrix
 
0 1 0 ··· 0
 0
 0 1 ··· 0  
 .. .
.. .
.. ..  .
A= . . 
 
 0 0 0 ··· 1 
−αn −αn−1 −αn−2 · · · −α1
(a) Show that the characteristic polynomial of A, det(λI − A) is
det(λI − A) = λn + α1 λn−1 + α2 λn−2 + · · · + αn−1 λ + αn .
(b) If λi is an eigenvalue of A (i.e. satisfies det(λI − A) = 0) show that
T
x = 1 λi λ2i · · · λn−1

i

is the corresponding eigenvector.


Section 3: Symmetric Matrices
(1) Show that if A ∈ Rn×n is symmetric, and x ∈ Cn , then x∗ Ax is a real
scalar.
(2) Show that for A ∈ Rn×n symmetric,
n
X
trA = λi
i=1

(In fact, this is true for all square matrices)


Section 4: Matrix Norms
8. EXERCISES 43

(1) Prove that


 
m
X
kAki∞ = max  [a]ij 

i j=1
= max row sum
Section 5: Singular Value Decomposition
(1) Compute the singular values for the following matrices
   
−1 0 1 −1 2
A1 = A2 =
2 −1 0 2 4
(2) If A is symmetric, what is the relationship between the eigenvalues and
singular values of A?
Index

Eigenvalue, 30 Vectors
of a symmetric matrix, 32 linear combination, 4
Eigenvector, 30 linear independence, 4
generalized, 31
of a symmetric matrix, 32

Frobenius norm, 33

Gram-Schmidt, 23

Induced norm, 34
Inner Product, 12

Matrix, 2
block matricies, 4, 13
conjugate transpose, 11
determinant, 12
identity, 3
inverse, 13
multiplication, 2, 3
norms, 33
null space, 30, 38
pseudo-inverse, 40
range space, 29, 38
rank, 29, 30, 38
trace, 11
transpose, 11

Orthonormal
basis, 23

Projection, 25

Singular Value Decomposition, 35


and matrix null space, 38
and matrix range space, 38
and matrix rank, 38
singular values, 37

Vector, 2
Vector Space, 17
basis, 18
change of basis, 19
dimension, 18
norm, 21
orthonormal basis, 23

45
Bibliography

[1] T. K. Moon and W. C. Stirling, Mathematical Methods and Algorithms for Signal Processing.
Upper Saddle River, NJ: Prentice Hall, 2000.
[2] K. A. Ross, Elementary Analysis: The Theory of Calculus. New York: Springer-Verlag, 1980.
[3] D. G. Luenberger, Optimization by Vector Space Methods. New York: Wiley, 1969.
[4] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, UK: Cambridge University
Press, 1985.
[5] S. H. Friedberg, A. J. Insel, and L. E. Spence, Linear Algebra, 2nd ed. Englewood Cliffs, NJ:
Prentice-Hall, 1989.
[6] G. H. Golub and C. F. V. Loan, Matrix Computations, 2nd ed. Baltimore: Johns Hopkins
University Press, 1989.

47

Você também pode gostar