Escolar Documentos
Profissional Documentos
Cultura Documentos
Tyrone L. Vincent
Engineering Division, Colorado School of Mines, Golden, CO
E-mail address: tvincent@mines.edu
URL: http://egweb.mines.edu/faculty/tvincent
Copyright
c 2006-2011
Contents
iii
CHAPTER 1
1. Notation
In these notes, as in most branches of mathematics, we will often utilize sets
of mathematical objects. For example, there is the set of natural numbers, which
begins 1, 2, 3, · · · . This set is often denoted N, so that 2 is a member of N but π is
not. To specify that an object is a member of a set, we use the notation ∈ for ”is
a member of”. For example 2 ∈ N. Some of the sets we will use are
R real numbers
C complex numbers
Rn n dimensional vectors of real numbers
Rm×n m × n dimensional real matrices
For these common sets, particular notation will be used to identify members,
namely lower case, such as x, for a scalar or vector, and upper case, such as A, for a
matrix. Bold face will not be used to distinguish scalars from vectors and matrices.
To specify a set, we can also use a bracket notation. For example, to specify E
as the set of all positive even numbers, we can say either
E = {2, 4, 6, 8, · · · }
when the pattern is clear, or use a : symbol, which means “such that”:
E = {x ∈ N : mod(x, 2) = 0} .
This can be read “The set of natural numbers x, such that x is divisible evenly by
2”.
When talking about sets, we will often want to say when a property holds for
every member of the set, or for at least one. In this case, the symbol ∀, meaning
“for all” and ∃, meaning “there exists” are useful. For example, suppose I is the
set numbers consisting of the IQs for people in this class. Then
∀x ∈ I x > 110
means that all students in this class have IQ greater than 110 while
∃x ∈ I : x > 110
means that at least one student in the class has IQ greater than 110.
We will also be concerned with functions. You are familiar with the notation
f (x) to denote function of the variable x. We will often be more specific about what
is being mapped to what. In particular, given a set X and a set Y a function f from
X to Y maps an element from X to and element of Y and is denoted f : X → Y.
1
2 1. REVIEW OF VECTORS AND MATRICES
The set X is called the domain, and f (x) is assumed to be defined for every x ∈ X.
The range, or image of f is the set of y for which f (x) = y for some x :
2. Proofs
This section is adapted from [1]
2.1. The need for proofs. In the next sections, and indeed, in the rest of
the class, we will encounter proofs of the properties that the various mathematical
entities that are useful for linear systems theory. If your background is engineering,
you may be a little out of practice at theorem proving, and indeed, you may feel
that this level of detail is excessive. However, the literature for the systems sciences
such as control systems, signal processing, communications, computer vision, and
robotics uses the language of theorem and proof to communicate new ideas, and as
a graduate student working in this area, being proficient with proofs make reading
and understanding the literature much easier.
2.2. What makes up a theorem. There are two parts to a theorem - the
hypothesis A and the conclusion B. The basic form of a theorem is nothing more
than the statement If A then B, and proof is the demonstration through logical
steps that this statement is true. As an example outside of mathematics, let A
be the condition “It is raining” and B be the condition “It is cloudy”. A theorem
statement could be “If it is raining, then it is cloudy”. A proof of this statement
would rely on the fact that rain only comes from clouds, thus clouds must be present
for rain to occur. There are many different ways in which this same statement may
be presented. With this example in mind, consider the following equivalents:
• A implies B
• A⇒B
• A is sufficient condition for B
• B if A
• A only if B
• B is a necessary condition for A
In the above statements, we know that B occurs whenever A does, but what
about the reverse? Does A occur whenever B does? From our example, we see that
this does not necessarily have to be the case, as “If it is cloudy then it is raining” is
not a true statement. However, there will be some statements that are equivalent,
in that both “If A then B” and “If B then A” are true. Usually, by changing the
first statement to ”B if A” and the second statement to “B only if A”, we can
combine the implications as B if and only if A. Other ways that this is stated are
• B⇔A
• A is a necessary and sufficient condition for B
In addition, “if and only if” is often abbreviated as “iff”, as in B iff A.
2. PROOFS 3
2.3. Some methods of proof. Unfortunately, there does not exist a recipe
that one can follow to prove any given true statement. Usually, finding a proof
requires a lot of trial and error, and perhaps a little inspiration. However, there are
some common avenues of attack occur fairly often.
2.3.1. Direct Computation. There are really two ways to show that A implies B
via direct computation. One is to start with A, (along with all axioms and properties
that have already been proven from those axioms) and using these, arrive at B. For
example
Theorem 1. If n is an even number, then n2 is even number
Proof. (A implies B) The antecedent, or what we are given, is that n is an even
number. What properties does an even number satisfy? The most fundamental is
that n/2 is an integer. The desired conclusion is that n2 also satisfies this property,
namely that n2 /2 is also an integer. To show this, we can use the property the
product of two integers is still an integer.
(1) Given: n, an integer, is even
(2) Thus n/2 is an integer
(3) Because the product of integers is also an integer, (n/2)(n) = n2 /2 is also
an integer
(4) Thus n2 is an even number
2.3.2. Proof by Contradiction. Although “A implies B” is not equivalent to “B
implies A”, it is equivalent to “not B implies not A”. Using our example, notice that
the statements ”If it is raining then it is cloudy” and ”If it is not cloudy then it is
not raining” are both true, and mean the same thing. Thus, we can also start with
not B and try to arrive at not A. We will give two examples. Remember that the
hypothesis A includes not only what is stated in the theorem, but the consistency
of the mathematical systems that we are using. Thus, if assuming both A and not
B leads to an inconsistency (like 0 = 1) then A implies B.
Theorem 2. If n is a positive integer greater than one, then it can expressed
as the product of primes
Proof. (by contradiction). Let us assume the opposite: there exists positive
integers that cannot be expressed as the product of primes. There must be a
smallest such integer, call it n. Since n is not the product of primes, it cannot be
a prime itself. Thus n = ab where a and b are integers smaller than n. Since they
are smaller, they can be expressed as products of primes, implying that n itself is
the product of primes, which is a contradiction.
As an aside, the fundamental theorem of arithmetic states that each integer
can be decomposed into exactly one product of primes - that is, the decomposition
is unique modulo re-arrangement of the elements.
A second example assumes not B (and our algebraic rules) to arrive at not A
Theorem 3. If matrix A ∈ Rm×n , is full column rank, then there exists no
vector x ∈ Rn with x 6= 0 such that Ax = 0.
Proof. (not B ⇒ not A) Suppose there exists a vector not equal to zero such
that Ax = 0. Then using the elements of x, we can find coefficients such that the
4 1. REVIEW OF VECTORS AND MATRICES
columns of A sum to zero, and thus the columns of A are not linearly independent,
the maximal number of linearly independent columns is less than n, and A is not
full rank.
2.3.3. Proof by induction. Induction proofs are used when there is an implica-
tion that depends on a positive integer . For example, vectors and matrices have
different sizes, but we would like our results to not depend on the exact size of the
vector or matrix. Rather than verifying the result for every index separately (which
would obviously take an infinite amount of time for every possible index) we can
verify two things:
• The result is true for index equal to 1
• If the result is true for index equal to n then it is true for index equal to
n+1
For example:
Theorem 4. If k = 7n − 2n , where n is a positive integer, then k is divisible
by 5.
Proof. If n = 1 then k = 5 which is clearly divisible by 5.
Now, suppose the result is true for n. Then
7n+1 − 2n+1 = 7(7n ) − 7(2n ) + 7(2n ) − 2(2n )
where we have added and subtracted 7(2n ). Collecting terms,
7n+1 − 2n+1 = 7(7n − 2n ) + 5(2n ).
Since the result is true for n, the first term on the right is divisible by 5. Clearly,
the second term on the right is also divisible by 5. Thus, the result is true for n + 1
and the theorem is proved.
That is, the i, j element of C is the dot product of the ith row of B with the jth
column of A. The dimension of C is p × n. This can also be though of as B mapping
a column of A at a time: That is, the first column of C, [c]∗1 is B[a]∗1 , B times
the first column of A. Clearly, two matrices can be multiplied only if they have
compatible dimensions.
Unlike scalars, the order of multiplication is important. If A and B are square
matrices, AB 6= BA in general.
The identity matrix
1 0 ··· 0
..
0 1 .
I= .
.. ..
. 0
0 ··· 0 1
is a square matrix with ones along the diagonal. If size is important, we will denote
it via a subscript, so that Im is the m × m identity matrix. The identity matrix is
the multiplicative identity for matrix multiplication, in that AI = A and IA = A
(where I has compatible dimensions with A).
3.3. Block Matrices. Matrices can also be defined in blocks, using other
matrices. For example, suppose A ∈ Rm×n B ∈ Rm×p C ∈ Rq×n and D ∈ Rq×p .
Then we can ”block fill” a (m + q) by (n + p) matrix X as
A B
X=
C D
Often we will want to specify some blocks as zero. We will denote a block of zeros as
simply 0. The dimension can be worked out from the other matrices. For example,
if
A 0
X=
C D
The zero block must have the name number of rows as A and the same number of
columns as D.
Matrix multiplication of block matrices uses the same rules as regular matrices,
except as applied to the blocks. Thus
A1 B 1 A2 B 2 A1 A2 + B1 C2 A1 B2 + B1 D2
=
C1 D1 C2 D2 C1 A2 + D1 C2 C1 B2 + D1 D2
1α + 2β + 3γ = 14 (4.1a)
−2β − 4γ = −18 (4.1b)
Further simplification is possible by adding these two equations together:
1α + 2β + 3γ = 14
+ −2β − 4γ = −18
.
1α + 0 − 1γ = −4
and dividing the equation −2β − 4γ = 18 by -2, resulting in
1α + 0 − 1γ = −4 (4.2a)
1β + 2γ = 9 (4.2b)
All solutions are now clear: we can choose γ to by any number, and
α = −4 + γ
β = 9 − 2γ
Now, the original system of equations in vector-matrix form is
1 2 3 14
x= .
2 2 2 10
Each of the operations that we used above can be described as multiplying, by a
particular matrix on the left, both sides of the equation . For example, to multiply
4. SOLVING LINEAR EQUATIONS 9
the first equation by −2 and add it to the second equation (and replace that equation
with the new one) we can multiply both sides by
1 −2
.
0 1
Since
1 0 1 2 3 1 0 14
x=
−2 1 2 2 2 −2 1 10
simplifies to
1 2 3 14
x= . (4.3)
0 −2 −4 −18
Compare (4.3) to (4.1)! Similarly, we can add the second equation to the first using
the matrix
1 1
. (4.4)
0 1
and divide the second equation by −2 using the matrix
1 0
1 (4.5)
0 −2
By multiplying both sides of (4.3) on the left by (4.4) and then (4.5) we get the
matrix-vector equivalent of (4.2). (Try it!).
The following matricies are called elementary matrices, and perform the oper-
ations we need in order to solve linear equations:
• Xij is the identity matrix with the ith and jth rows eXchanged.
• Mi (c) is the identity matrix with the ith row Multiplied by the scalar c
• Aij (c) is the identity matrix with c times the jth row added to the ith row
As some examples, if we are working with 4 × 4 matrices,
0 0 1 0 1 0 0 0
0 1 0 0 0 0 1 0
X1,3 = 1 0 0 0 , X2,3 = 0 1 0 0
0 0 0 1 0 0 0 1
3 0 0 0 1 0 0 0
0 1 0 0 0 1 0 0
M1 (3) = 0 0 1 0 , M3 (2) = 0 0 2 0
0 0 0 1 0 0 0 1
1 0 3 0 1 0 0 0
0 1 0 0 0 1 2 0
A1,3 (3) = 0 0 1 0 A2,3 (2) = 0 0 1 0
0 0 0 1 0 0 0 1
A system of equations becomes easy to solve when the defining matrix is in the
following form
Definition 8. A matrix is in row reduced echelon form when the following
conditions are satisfied
(1) Any row containing a nonzero entry precedes any row in which all the
entries are zero
(2) The first nonzero entry in each row in the only nonzero entry in its column
10 1. REVIEW OF VECTORS AND MATRICES
(3) The first nonzero entry in each row is 1 and it occurs in a column to the
right of the leading 1 in any preceding row
Example 3. The following are examples of matricies in row reduced echelon
form:
1 0 3 0 1 0 1
0 1 2 0 0 1 0
0 0 0 1 0 0 0
The following are not in row reduced echelon form:
1 3 0 0 0 1 0
0 1 2 0 0 0 0
0 0 0 1 1 0 1
Elementary matricies allow us to find solutions because of the following fact:
Lemma 9. Multiplication of both sides of a system of equations by an elemen-
tary matrix results in an equivalent system of equations.
Proof. First, we recall that a square matrix C is invertible if there exists
another matrix C −1 such that CC −1 = I (see also the next section). Note that
−1
each elementary matrix is invertible. For example, Xi,j = Xj,i , Mi (c)−1 = Mi ( 1c )
−1
and Ai,j (c) = Ai,j (−c). Now, let Ax = b be the original system of equations, and
let S be the solution set, i.e.
S = {x|Ax = b} .
0
Let S be the solution set to CAx = Cb, where C is invertible.
0
If x ∈ S, then Ax = b and clearly we also have CAx = Cb, implying x ∈ S .
0
Thus S ⊂ S .
0
Conversely, if x ∈ S , Then CAx = Cb. But since C is invertible, there exists
0
C −1 , and C −1 CAx = C −1 Cb, implying Ax = b, and x ∈ S. Thus S ⊂ S. Since
0 0
S ⊂ S and S ⊂ S we must have S = S 0 .
Lemma 10. The following are true
(1) Any matrix can be transformed to row reduced echelon form by a finite
number of elementary matrices
(2) Each matrix has a unique row reduced echelon form
Example 4. By transforming to row reduced echelon form, find all solutions
to
4 3 1 1 1
2 3 2 0 x = 1 .
0 1 3 1 0
Solution: Multiply the first row by −1/2 and add to the second row
4 3 1 1 1
0 3/2 3/2 −1/2 x = 1/2 .
0 1 3 1 0
Scale the first row
1 3/4 1/4 1/4 1/4
0 3/2 3/2 −1/2 x = 1/2 .
0 1 3 1 0
5. BASIC OPERATIONS ON MATRICIES AND VECTORS 11
Multiply the second row by −1/2 and add to the first row, then multiply the
second row by −2/3 and add to the third row
1 0 −1/2 1/2 0
0 3/2 3/2 −1/2 x = 1/2 .
0 0 2 4/3 −1/3
Scale the second row
1 0 −1/2 1/2 0
0 1 1 −1/3 x = 1/3 .
0 0 2 4/3 −1/3
add −0.5 times the third row to the second, and 0.25 times the third row to the first,
and scale the third row
1 0 0 5/6 −1/12
0 1 0 −1 x = 1/2 .
0 0 1 2/3 −1/6
Solutions:
−1/12 −5/6
1/2 1
−1/6 + −2/3 a
x=
0 1
where a is arbitrary.
x vector or scalar
A matrix
AT transpose of A
A∗ conjugate transpose of A
tr(A) trace of A.
hx, yi inner product between vectors x and y
det(A), |A| determinant of A
A−1 inverse of A
Note that if the matrix is not square (m 6= n), then the “shape” of the matrix
changes. We can also use transpose on a vector x ∈ Rn by considering it to be an
n by 1 matrix. In this case, xT is the 1 by n matrix:
xT = [x]1 [x]2 · · · [x]n .
5.3. Trace. The trace of a square matrix is the sum of the diagonal elements.
That is, if A ∈ Rn×n , then
n
X
tr(A) = [A]ii .
i=1
5.3.1. Useful Properties of Trace.
Tr1 tr(A) = tr(AT ).
Tr2 For matrices of compatible dimension, tr(AB) = tr(BA)
5.4. Inner (dot) product. In three dimensional space, we are familiar with
vectors as indicating direction. The inner product is an operation that allows us
to tell if two vectors are pointing in a similar direction. We will use the notation
hx, yi for inner product between x and y. In other courses, you may have seen this
called the dot product with notation x · y. The notation used here is more common
in signal processing and control systems. The inner product of x, y ∈ Rn is defined
to be the sum of product of the elements
n
X
hx, yi = [x]i [y]i
i=1
T
= x y
5. BASIC OPERATIONS ON MATRICIES AND VECTORS 13
Recall that if x and y are vectors, the angle between them can be found using the
formula
hx, yi
cos θ = p
hx, xi hy, yi
Note that the inner product satisfies the following rules (inherited from transpose)
hx + y, zi = hx, zi + hy, zi
hαy, zi = α hy, zi
where cij is the ijth cofactor, and is the determinant of a Rn−1×n−1 matrix (this is
what makes the definition recursive) possibly times -1. In particular:
cij = (−1)i+j det(Mij ) (5.2)
where Mij is the n − 1 × n − 1 submatrix created by deleting the ith row and
jth column from A. Note that in the definition, j is arbitrary, however, the value
obtained will be the same for every j. In addition, we will also get the same result
with fixed i :
Xn
det(A) = [A]ij cij (5.3)
j=1
6. Exercises
Section 3: Vectors and Matrices
(1) Show that for any vectors x, y ∈ Rn , xT y = y T x.
(2) Using formula (5.1), find the determinant of the following matrices
1 5 6
A1 = 0 2 4
0 0 3
1 3 0 0
3 1 0 0
A2 = 0 0 4 5
0 0 6 7
Section 2: Proofs (taken from [2])
(1) Prove that 1 + 2 + · · · + n = 12 n(n + 1) for all positive integers
(2) Let Pn denote the assertion “n2 + 5n + 100 is an even integer
(a) Prove that Pn+1 is true whenever Pn is true
6. EXERCISES 15
(b) For which n is Pn actually true? What is the moral of this exercise?
Section 4: Solving Linear Equations
(1) By transforming to row reduced echelon form, find all solutions to
1 0 1 1
1 1 0 x = 1 .
0 1 −1 0
Section 5: Basic Operations
(1) Show that det(A−1 ) = det(A)
1
(2) Prove property Tr2. What are the requirements for compatible dimension?
(Hint: square matrices is too restrictive)
(3) Show that for vectors and matrices of compatible dimension, xT BB T x =
tr(B T xxT B). (Hint, use the results from problem 2)
(4) Let
Im A
N =
O In
Im 0
Q =
−B In
Im −A
P =
B In
(a) Explain why det(N ) = det(Q) = 1
(b) Compute N P and QP
(c) Show that det(N P ) = det(Im + AB) and det(QP ) = det(In + BA)
(d) Show that det(Im + AB) = det(In + BA)
−1
(5) Using the results of problem 2, explain why (Im + AB) exists if and only
−1
if (In + BA) exists. Show by verifying the properties of the inverse that
−1
(Im + AB) = Im − A(In + BA)−1 B. (That is, multiply the right hand
side by Im + AB and show that you get the identity)
(6) Show that properties D2 through D4 hold
Hints:
D2: Use the equivalence of (5.1) and (5.3)
D3: First, show that det(EA) = det(AT E T ) = det(E) det(A) whenever
E is an elementary matrix. Then show that any matrix with a row
of zeros has determinant zero. Finally, show that if A is a matrix
with columns that are not linearly independent, then we can find an
elementary matrix E such that AT E T has a row of zeros.
D4: First, if you haven’t already, show that det(EA) = det(E) det(A)
when E is an elementary matrix. Then, show that if A is full rank and
square, its row reduced echelon form is the identity matrix. Finally,
show the result by expanding A and B using elementary matrices.
(7) Verify the block inversion equations BI1 and BI2. (Recall that the inverse
is the unique matrix such that AA−1 = A−1 A = I)
CHAPTER 2
Vector Spaces
Note the difference from the definition for span: the set of vectors is assumed
to be linearly indepenent, which implies the linear combinantion will be unique.
In an n-dimensional vector space, any set of n linearly independent vectors
qualifies as a basis. We have seen that there are many different mathematical
objects which qualify at vector spaces. However, all n-dimensional vectors spaces
(X , R) have a one to one correspondence with the vector space (Rn , R) once a basis
has been chosen. Suppose e1 , e2 , · · · , en is a basis for X . Then for all x in X
x = e1 e2 · · · en β
T
where β = β1 β2 · · · βn and βi are scalars. Thus the vector x can be
identified with the unique vector β in Rn .
Example 10. Consider the vector space (P3 [s], R) where P3 [s] is the set of all
real polynomials of degree less than 3. This vector space has dimension 3, with one
basis as e1 = 1, e2 = s, e3 = s2 . The vector x = 2s2 + 3s − 1 can be written as
2
x = e1 e2 e3 3
−1
T
so that the representation with respect to this basis is 2 3 −1 . However, if
0 0 0
2
we choose the basis e1 = 1, e2 = s − 1, e3 = s − s, (verify that this set of vectors
is independent,)
x = 2s2 + 3s − 1 = 4 + 5(s − 1) + 2(s2 − s)
0 2
0 0
= e1 e2 e3 5
4
T
so that the representation of x with respect to this basis is 2 5 4 .
n
2.1. Standard basis. For R , the standard basis are the unit vectors that
point in the direction of each axis
1 0 0
0 1 ..
i1 = . , i2 = . , · · · in = . .
.. .. 0
0 0 1
2.2. Change of basis. Since the vectors are made up of polynomials which
are mathematical objects quite different from n− tuples of numbers, the ideas of
separation between vectors and their representations with respect to basis is fairly
clear. This becomes more complicated when consider the “native” vector space
(Rn , R). When n = 2, it is natural to visualize these vectors in the plane, as shown
in Figure 1. In order to represent the vector x, we need to choose a basis. The
most natural basis for (R2 , R) is the array
1 0
i1 = , i2 = .
0 1
In this basis, we have the following representation for x :
2 2
x= = i1 i2 .
3 3
20 2. VECTOR SPACES
Note that the vector and its representation look identical. However, if we choose a
different basis, say
2 1
e1 = , e2 = .
1 2
then
1
2
3
x= = e1 e2 4
3
1 4 3
so the representation of x in this basis is 3 3 .
We have seen that a vector x can have different representations for different
basis. A natural desired operation would be to transform between
one basis and an-
other. Suppose a vector x has representations with respect to e1 e2 · · · en
0 0 0
as β and with respect to e1 e2 · · · en as β 0 , so that
0 0 0
x = e1 e2 · · · en β = e1 e2 · · · en β 0 .
(2.1)
What is the relationship between β and β 0 ? The answer is most easily found by
finding the relationship between the bases themselves. Each basis vector has a
representation in the other basis. That is, there exist pi ∈ Rn such that
0 0 0
ei = e1 e2 · · · en pi .
If we group the vectors ei and pi into a matrices, we can write
0 0 0
e1 e2 · · · en = e1 e2 · · · en p1 p2 ··· pn
0 0 0
= e1 e2 · · · en P (2.2)
3. NORMS, DOT PRODUCTS, ORTHONORMAL BASIS 21
where the matrix P takes the vectors pi as its columns. Substituting (2.2) into
(2.1), we get
0 0 0 0 0 0
e1 e2 · · · en P β = e1 e2 · · · en β 0 .
2.3. Final Thoughts. So far, we have been very general in our notion of a
vector space, allowing infinite dimensional vector spaces such as C[0, 1]. Although
many of the results and concepts that follow have extensions to the infinite dimen-
sional case, we will not consider them at this time. In what follows, we will restrict
ourselves to finite dimensional vector spaces. In addition, unless specified, the field
will be the reals, R.
n
X
kxk1 := |[x]i | .
i=1
22 2. VECTOR SPACES
3.1.2. 2-norm. The 2-norm corresponds to Euclidean distance, and is the square
root of the sum of squares of the elements of x,
v
u n
uX 2
kxk2 := t ([x]i ) .
i=1
Note that the sum of squares of the elements can also be written as xT x. Thus
√
kxk2 = xT x.
In what follows, if a norm is given without a subscript, i.e. kxk , we will take
the “default” norm as the 2-norm.
3.2. Unit ball. Each norm induces a “unit ball” in Rn given by all vectors
with norm less than or equal to one: B1 = {x : kxk ≤ 1} . Only in the case of the
2-norm does the resulting set resemble a ball, and even then strictly speaking this
only occurs in R3 . For example:
In R1 , the unit ball is the line segment from −1 to 1 for the 1, 2, and ∞-norms
In R2 , the unit ball is a (filled) rotated square, circle, and square for the 1, 2,
and ∞-norms respectively.
In R3 , the unit ball is a (filled) rotated cube, sphere, and cube for the 1, 2, and
∞-norms respectively.
Note that
p
kxk2 = hx, xi.
If two vectors have a dot product of zero, then they are said to be orthogonal. A
set of vectors {xi } which are pair-wise orthogonal and unit 2-norm are said to be
orthonormal and will satisfy
T 1 i=j
hxi , xj i = xi xj = .
0 i 6= j
hx, yi
z= 2 y.
kyk
3. NORMS, DOT PRODUCTS, ORTHONORMAL BASIS 23
The vector z points in the same direction as y, but the length is chosen so that the
difference between z and x is orthogonal to y.
* +
hx, yi
hz − x, yi = 2 y − x, y
kyk
hx, yi
= 2 hy, yi − hx, yi
kyk
= 0,
2
since hy, yi = kyk .
y
y − Ax
Ax
a2
a1
4. Projection Theorem
The close connection between inner product and the 2-norm comes into play in
vector minimization problems that involve the 2-norm. Suppose we have a collection
of n vectors in Rm as columns of a matrix A ∈ Rm×n , and a vector y ∈ Rm . We
would like to find the weights x ∈ Rn so that Ax is a vector that is as close as
possible to y. That is, we have the following problem:
min kAx − yk (4.1)
x∈Rn
Useful facts:
(1) There is always a solution to this minimization problem (i.e a minimizer
x exists).
(2) A solution can be found using inner products.
We will take the first fact as given. However, let’s try to establish the second.
As an example, suppose A has two columns, a1 and a2 . These vectors span a two
dimensional space, indicated by the plane in Figure 2. By selecting different values
for x, one can choose different vectors Ax that will all lie in that plane. If the
vector y does not lie in the plane, then an exact match cannot occur. However,
we can find the vector which minimizes the error y − Ax. The figure suggests that
the minimum error occurs when y − Ax is orthogonal to the plane spanned by the
vectors in A. This intuition is formalized in the projection theorem.
Theorem 15. (Projection Theorem) x̄ is a minimizer of (4.1) if and only if
hy − Ax̄, Axi = 0 for all x ∈ Rn .
Proof. (if) Suppose x̄ satisfies hy − Ax̄, Axi = 0. Let x◦ be another vector in
n
R .
2 2
kAx◦ − yk = kA(x̄ + x◦ − x̄) − yk
2
= kAx̄ − y + A(x◦ − x̄)k
2 2
= kAx̄ − yk + 2 hAx̄ − y, A(x◦ − x̄)i + kA(x◦ − x̄)k .
5. ORTHOGONAL SUBSPACES 25
5. Orthogonal Subspaces
Suppose we have finite dimensional vector space on which an inner product has
been defined, and a set of linearly independent vectors from that vector space. The
linearly independent vectors may not be a basis for the entire vector space. In that
case, they are a basis for a subspace, and we can decompose the vector space into
this subspace, and the set of vectors that are orthogonal to all the vectors in this
subspace.
n
{x1 , · · · , xp } be a set of linearly independent vectors in R , p < n. Let
Let
X = x1 · · · xp be a matrix whose columns are the set of vectors, and let X ⊂
Rn be the subspace for which these vectors are a basis, that is
X = {x : ∃α ∈ Rp such that x = Xα} .
For every vector v in Rn , by the projection theorem, there exists an α, and thus a
vector x ∈ X such that the distance from v to X is minimized. The α is given by
solution to
X T v = X T Xα.
Since we assumed that the xi are linearly independent, X T X is invertible, thus
α = (X T X)−1 X T v
and the corresponding vector x ∈ X is
x = Xα
= X(X T X)−1 X T v.
We define
ΠX := X(X T X)−1 X T . (5.1)
Definition 17. The operation ΠX v is the projection of v on X , (or equiva-
lently, the columns of the matrix X.)
26 2. VECTOR SPACES
X(X T X)−1 X T Xα = Xα
− Xα = 0.
Next, we show that v − Π̄X v, Π̄X v̄ = 0 for all v, v̄ ∈ V. Note that v − Π̄X v =
(I − Π̄X )v = ΠX v. But since ΠX Π̄X = 0, v T ΠX Π̄X v̄ = 0.
Every vector can be decomposed into the sum of a unique vector in X and a
unique vector in X ⊥ . This follows from Π̄X + ΠX = I.
6. EXERCISES 27
6. Exercises
Section 1: Vector Space
(1) Show that X =C[0, 1], the set of all continuous functions defined on the
real line between 0 and 1 is a vector space withF = R, if addition and
multiplication are defined as x1 = f (t), x2 = g(t), x1 + x2 = f (t) + g(t),
ax1 = af (t).
(2) Show that X =Cn , the set of all n−tuples of complex numbers is a vector
space with F = C, the field of complex numbers
(3) Show that X =Rn×m , the set of n × m matricies, is a vector space with
F = R.
(4) Show that X =Rn , F = C is not a vector space.
(5) Show that X ={x(t) : ẍ + ẋ + x = 0} is a vector space. with F = R.
Section 2: Linear Independence and Basis
(1) Find the dimension of the vector space given by all (real) linear combina-
tions of
1 3 4
x1 = 2 , x2 = 2 , x3 = 4 ,
3 2 5
That is,
X = {x : x = α1 x1 + α2 x2 + α3 x3 , αi ∈ R}
This is called the vector space spanned by {x1 , x2 , x3 } .
(2) Show that the space of all solutions to the differential equation
ẍ + 3ẋ + x = 0 t ≥ 0
is a 2 dimensional vector space.
(3) Given
2 1 0 0 0 1
0 2 1 0 0 1
A= b=
b̄ =
0 0 2 0 1 1
0 0 0 1 1 1
find the representations
for thecolumns of A with respect to the basis
b, Ab, A2 b, A3 b and the basis b̄, Ab̄, A2 b̄, A3 b̄ .
Section 3: Norms, Dot Products and Orthonormal Basis
(1) Verify that the triangle inequality holds for the 1, 2 and ∞ vector norms.
(2) Find an orthogonal basis for the following vectors:
1 1 1
2 1 1
x1 = 1 x2 = 1 x3 = 1
0 0 1
(3) Show that
tr AT A
Matrices as Mappings
In matrix notation, we can say that the column vectors of A are linearly inde-
pendent if and only if Ax = 0 implies x = 0. If a matrix does not consist of linearly
independent column vectors, then there exists a set of x such that Ax = 0. This
set is a subspace of Rm , and is called a null space.
Definition 22. The null space of A is the subset of Rn which is mapped to
the zero vector, that is
N (A) = {x : 0 = Ax, x ∈ Rn } .
3. Symmetric Matrices
n×n
If A ∈ R is symmetric, i.e. A = AT , then the eigenvalues and eigenvectors
satisfy some special properties. (Similar results hold for complex matrices which
are Hermitian, or conjugate symmetric, but we will only consider the case of real
matrices).
3.1. Eigenvalues and Eigenvectors. Recall that even if x is a complex
vector, x∗ x is a real scalar. It is also easy to verify that if A is symmetric, x∗ Ax is
also a real scalar.
Theorem 26. The eigenvalues of a symmetric matrix are real
Proof. Let A be a symmetric matrix and let λ, x be an eigenvalue-eigenvector
pair for A (possibly complex). Then
Ax = λx
Multiply both sides of the equation by x∗
x∗ Ax = λx∗ x
Since both x∗ x and (x∗ Ax) are real, so is λ
n×n
Theorem 27. A symmetric matrix A ∈ R has no generalized eigenvectors
Proof. If there exists x, a generalized eigenvector of grade 2, then
(A − λI)2 x = 0
(A − λI)x 6= 0
must be satisfied. Let y = (A − λI)x. Since y 6= 0, y T y 6= 0. Also
yT y = xT (A − λI)T (A − λI)x
= xT (AT − λI T )(A − λI)x
= xT (A − λI)(A − λI)x
= xT (A − λI)2 x,
but we have (A − λI)2 x = 0, implying that y T y = 0, a contradiction. Since there is
no generalized eigenvectors of grade 2, there cannot be any generalized eigenvectors
of higher dimension.
Corollary 28. For every symmetric matrix, there exists a set of n linearly
independent eigenvectors
Theorem 29. For a symmetric matrix, there exists a set of n orthogonal eigen-
vectors.
4. MATRIX NORMS 33
Proof. We have already shown that eigenvectors associated with the same
eigenvalue can be chosen to be orthogonal. It remains to show that any two eigen-
vectors associated with different eigenvalues are orthogonal. Let A be a symmetric
matrix with eigenvalues λ1 6= λ2 and eigenvectors v1 , v2 . Then
Av1 = λ1 v1 ,
Av2 = λ2 v2 .
By multiplying the first equation by v2T and the second by v1T ,
v2T Av1 = λ1 v2T v1 ,
v1T Av2 = λ2 v1T v2 .
Since A is symmetric, v2T Av1 = v1T Av2 , thus λ1 v2T v1 = λ2 v1T v2 . But since v2T v1 is a
scalar, v2T v1 = v1T v2 . Since λ1 6= λ2 , we must have v2T v1 = 0.
4. Matrix Norms
4.1. Matrix as a Vector. A matrix A can be viewed as a member of the
vector space Rm×n . In particular, if we stack the columns of A into a single column,
we obtain a vector of size mn. The 2-norm applied to this stacked vector, which is
the square root of the sum of the square of the elements, is given the special name
of Frobenius norm and is notated as
r 2
X
kAkF := [a]ij .
Then
√
kU AkF = trAT U T U A
√
= trAT A
= kAkF
and
√
kAV kF = trAV V T AT
√
= trAAT
= kAkF
kAxky
kAkixy = max = max kAxky
kxkx 6=0 kxkx kxkx =1
Example 15. Let A ∈ Rn×m , and k·kx = k·k1 and k·ky = k·k1 . We will show
that
n
!
X
kAki1 = max |[a]ij |
j
i=1
= max column sum.
If y = Ax, then
m
X
kyk1 = |[y]i |
i=1
Xm X n
= |[A]ij [x]j |
i=1 j=1
If x = ej (unit
Pm vector with one in the jth element and zeros elsewhere) then kxk1 = 1
and kyk1 = i=1 |[A]ij | . Thus
m
X
kAki1 ≥ |[A]ij | j = 1, · · · , m
i=1
m
X
≥ max |[A]ij |
j
i=1
5. SINGULAR VALUE DECOMPOSITION 35
that
V1 = x1 Ṽ1
U1 = y1 Ũ1
36 3. MATRICES AS MAPPINGS
Thus
A1 σ1
w
2 σ12 + wT w
kA1 ki2 ≥
σ1
=p 2
σ1 + wT w
w
2
q
≥ 2
σ1 + wT w.
However, kA1 ki2 =
U1T AV1
i2 ≤
U1T
i2 kAki2 kV1 ki2 = σ1 , since U1 and V1 are
orthonormal. Hence
q
σ12 + wT w ≤ kA1 ki2 ≤ σ1
and thus
σ1 0 0 ··· 0
0 σ2 0 ··· 0
1 0 1 0
T 0 0
U A V = .
1
0 U2T 1 0 V2
. ..
} | {z } .. C
| {z .
orthonormal orthonormal
0 0
Continuing in this way, the result follows.
The terms σ1 , σ2 , · · · , σmin(m,n) are the singular values of A. In the above con-
structive proof, these terms will be be a decreasing sequence of non-negative real
numbers. However, unless m = n and no σi 6= σi+1 , there will be several choices
for the orthonormal matrices U and V, thus it is not possible, strictly speaking,
to speak of “the” singular value decomposition of a matrix. In most applications,
however, this non-uniqueness will not be a issue, and we will consider “the singular
value decomposition of A” to be a singular value decomposition with decreasing
singular values along the diagonal of Σ for any of the possible choices for U and V.
The singular values can be used to find the Frobenius norm for a matrix.
q
Pmin(m,n) 2
Theorem 33. kAkF = i=1 σi where σi are the singular values of A.
Proof. The Frobenius norm is invariant to multiplication by orthonormal ma-
tricies. Thus
=
U ΣV T
kAk F F
= kΣkF
v
umin(m,n)
u X
= t σ2 i
i=1
ans =
-0.2673
-0.5345
-0.8018
ans =
0.7107 0.4798
-0.0054 -0.7276
-0.7035 0.4903
ans =
1
6.1.1. Finding the solution, case 1: Unique solution exists. When a unique
solution exists, A has full column rank. However, A may not be square, and thus
not invertible. However, if we multiply both sides of (6.1) by AT , we obtain
AT y = AT Ax.
Note that AT A is an m × m matrix. Does it have rank m, and thus is invertible?
Note that if A has full column rank is has no null space, so that Ax = 0 only if
2
x = 0. This means that kAxk = xAT Ax = 0 only if x = 0, and thus AT Ax = 0
T
only if x = 0, and A A does indeed have full rank m, and is invertible. Thus the
solution is given by
x = (AT A)−1 AT y.
6.1.2. Finding the solution, case 2: Unique solution does not exist. When a
unique solution does not exist, A no longer has full column rank, and thus a null
space. First, we note that all solutions are given by
x = x0 + N α
where x0 is one solution that satisfies y = Ax0 , N is a matrix whose columns spans
the null space of A, and α is arbitrary. The complete set of solutions is easily found
using the singular value decomposition. Suppose
Σ̄ 0 V̄ T
A = Ū Ũ .
0 0 Ṽ T
Theorem 37. If at least one solution to (6.1) exists, they are all given by
V̄ Σ̄−1 Ū T y + Ṽ α
where α is an arbitrary vector of compatible dimension.
Proof. We have already seen that N = Ṽ . . Since A = Ū Σ̄V̄ T ,
Ax0 = Ū Σ̄V̄ T V̄ Σ̄−1 Ū T y.
Note that V̄ T V̄ is a collection of inner products of orthonormal vectors, and thus
V̄ T V̄ = I, giving
Ax0 = Ū Ū T y.
Since Ū spans the range space of A, and y must be in the range space for a solution
to exist, we must have Ū Ū T y = y, and the result follows.
So that  is the closest rank k matrix to A as measured by the Frobenius norm. The
solution can be found using the singular value decomposition. First, a preliminary
result
Lemma 38. Given A, B ∈ Rm,n ,
min(m,n)
2
X 2
kA − BkF ≥ σiA − σiB .
i=1
min(m,n) min(m,n)
X 2 X 2
σiA − 2tr ΣA UAT UB ΣB VBT VA + σiB
=
i=1 i=1
Pmin(m,n)
Now, tr ΣA UAT UB ΣB VBT V = σiA αii where αii = B
P
i=1 j uij vij σj and
uij , vij are the dot products of the i, j columns of UA , P
UB and VA , VB respectively.
Since these are orthonormal matrices, we must have j u2ij = 1 and
P 2
vij = 1.
With these constraints, it can be shown that the trace term is maximized over all
possible choices for uij , vij . when
1 i=j
uij = vij =
0 otherwise
Pmin(m,n) A B
so that αii = σiB . Thus tr ΣUAT UB ΣB VBT V ≤ i=1 σi σi and the result
follows.
Let the SVD of A be U ΣV T , where it is assumed that the singular values form
a non-increasing sequence along the diagonal of Σ. Now, write this SVD as
Σ̄ 0 V̄ T
A = Ū Ũ
0 Σ̃ Ṽ T
8. Exercises
Section 1: Rank
(1) Show that a rank 1 matrix A ∈ Rm×n can be written as
A = xy T
where x and y are vectors of appropriate dimension.
(2) Find the dimension of the range space and null space for the following
matricies:
0 1 0 4 1 −1 1 2 3 4
A1 = 0 0 0 A2 = 3 2 0 A4 = 0 −1 −2 2
0 0 1 1 1 0 0 0 0 1
Section 2: Eigenvalues and Eigenvectors
(1) Let λi for i = 1, · · · , n be the eigenvalues for an n × n matrix A. Show
that
Yn
det A = λi
i=1
Eigenvalue, 30 Vectors
of a symmetric matrix, 32 linear combination, 4
Eigenvector, 30 linear independence, 4
generalized, 31
of a symmetric matrix, 32
Frobenius norm, 33
Gram-Schmidt, 23
Induced norm, 34
Inner Product, 12
Matrix, 2
block matricies, 4, 13
conjugate transpose, 11
determinant, 12
identity, 3
inverse, 13
multiplication, 2, 3
norms, 33
null space, 30, 38
pseudo-inverse, 40
range space, 29, 38
rank, 29, 30, 38
trace, 11
transpose, 11
Orthonormal
basis, 23
Projection, 25
Vector, 2
Vector Space, 17
basis, 18
change of basis, 19
dimension, 18
norm, 21
orthonormal basis, 23
45
Bibliography
[1] T. K. Moon and W. C. Stirling, Mathematical Methods and Algorithms for Signal Processing.
Upper Saddle River, NJ: Prentice Hall, 2000.
[2] K. A. Ross, Elementary Analysis: The Theory of Calculus. New York: Springer-Verlag, 1980.
[3] D. G. Luenberger, Optimization by Vector Space Methods. New York: Wiley, 1969.
[4] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge, UK: Cambridge University
Press, 1985.
[5] S. H. Friedberg, A. J. Insel, and L. E. Spence, Linear Algebra, 2nd ed. Englewood Cliffs, NJ:
Prentice-Hall, 1989.
[6] G. H. Golub and C. F. V. Loan, Matrix Computations, 2nd ed. Baltimore: Johns Hopkins
University Press, 1989.
47