Practical Linear Algebra

Practical Linear Algebra
Class Notes
Spring 2009
Robert van de Geijn
Maggie Myers
Department of Computer Sciences
The University of Texas at Austin
Austin, TX 78712
Draft
January 18, 2012
Copyright 2009
Robert van de Geijn and Maggie Myers
Contents
1. Introduction 1
1.1. A Motivating Example: Predicting the Weather . . . . . . . . . . . . . . . . 1
1.2. Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1. Equality (=) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2. Copy (copy) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3. Scaling (scal) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.4. Vector addition (add) . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.5. Scaled vector addition (axpy) . . . . . . . . . . . . . . . . . . . . . . 8
1.2.6. Dot product (dot) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.7. Vector length (norm2) . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.8. A Simple Linear Algebra Package: SLAP . . . . . . . . . . . . . . . . 9
1.3. A Bit of History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4. Matrices and Matrix-Vector Multiplication . . . . . . . . . . . . . . . . . . . 12
1.4.1. First: Linear transformations . . . . . . . . . . . . . . . . . . . . . . 12
1.4.2. From linear transformation to matrix-vector multiplication . . . . . . 19
1.4.3. Special Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4.4. Computing the matrix-vector multiply . . . . . . . . . . . . . . . . . 29
1.4.5. Cost of matrix-vector multiplication . . . . . . . . . . . . . . . . . . . 33
1.4.6. Scaling and adding matrices . . . . . . . . . . . . . . . . . . . . . . . 33
1.4.7. Partitioning matrices and vectors into submatrices (blocks) and sub-
vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.5. A High Level Representation of Algorithms . . . . . . . . . . . . . . . . . . . 38
1.5.1. Matrix-vector multiplication . . . . . . . . . . . . . . . . . . . . . . . 38
1.5.2. Transpose matrix-vector multiplication . . . . . . . . . . . . . . . . . 40
1.5.3. Triangular matrix-vector multiplication . . . . . . . . . . . . . . . . . 42
1.5.4. Symmetric matrix-vector multiplication . . . . . . . . . . . . . . . . . 45
1.6. Representing Algorithms in Code . . . . . . . . . . . . . . . . . . . . . . . . 46
1.7. Outer Product and Rank-1 Update . . . . . . . . . . . . . . . . . . . . . . . 48
1.8. A growing library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
i
1.8.1. General matrix-vector multiplication (gemv) . . . . . . . . . . . . . . 50
1.8.2. Triangular matrix-vector multiplication (trmv) . . . . . . . . . . . . 51
1.8.3. Symmetric matrix-vector multiplication (symv) . . . . . . . . . . . . 52
1.8.4. Rank-1 update (ger) . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
1.8.5. Symmetric Rank-1 update (syr) . . . . . . . . . . . . . . . . . . . . 53
1.9. Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2. Matrix-Matrix Multiplication 71
2.1. Motivating Example: Rotations . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.2. Composing Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . 72
2.3. Special Cases of Matrix-Matrix Multiplication . . . . . . . . . . . . . . . . . 76
2.4. Properties of Matrix-Matrix Multiplication . . . . . . . . . . . . . . . . . . . 79
2.5. Multiplying partitioned matrices . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.6. Summing it all up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.7. Additional Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3. Gaussian Elimination 93
3.1. Solving a System of linear Equations via Gaussian Elimination (GE, Take 1) 93
3.2. Matrix Notation (GE, Take 2) . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.3. Towards Gauss Transforms (GE, Take 3) . . . . . . . . . . . . . . . . . . . . 97
3.4. Gauss Transforms (GE, Take 4) . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.5. Gauss Transforms Continued (GE, Take 5) . . . . . . . . . . . . . . . . . . . 104
3.6. Toward the LU Factorization (GE Take 6) . . . . . . . . . . . . . . . . . . . 108
3.7. Coding Up Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . 113
4. LU Factorization 117
4.1. Gaussian Elimination Once Again . . . . . . . . . . . . . . . . . . . . . . . . 118
4.2. LU factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.3. Forward Substitution = Solving a Unit Lower Triangular System . . . . . . . 123
4.4. Backward Substitution = Solving an Upper Triangular System . . . . . . . . 124
4.5. Solving the Linear System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.6. When LU Factorization Breaks Down . . . . . . . . . . . . . . . . . . . . . . 128
4.7. Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
4.8. Back to When LU Factorization Breaks Down . . . . . . . . . . . . . . . . 133
4.9. The Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.9.1. First, some properties . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.9.2. Thats about all we will say about determinants . . . . . . . . . . . . 143
4.9.3. Gauss-Jordan method . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.9.4. Inverting a matrix using the LU factorization . . . . . . . . . . . . . 147
4.9.5. Inverting the LU factorization . . . . . . . . . . . . . . . . . . . . . . 151
4.9.6. In practice, do not use inverted matrices! . . . . . . . . . . . . . . . . 151
4.9.7. More about inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
iii
5. Vector Spaces: Theory and Practice 155
5.1. Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
5.2. Why Should We Care? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.2.1. A systematic procedure (rst try) . . . . . . . . . . . . . . . . . . . . 162
5.2.2. A systematic procedure (second try) . . . . . . . . . . . . . . . . . . 164
5.3. Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
5.4. Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.5. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.6. The Answer to Life, The Universe, and Everything . . . . . . . . . . . . . . 174
5.7. Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
6. Orthogonality 183
6.1. Orthogonal Vectors and Subspaces . . . . . . . . . . . . . . . . . . . . . . . 183
6.2. Motivating Example, Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
6.3. Solving a Linear Least-Squares Problem . . . . . . . . . . . . . . . . . . . . 188
6.4. Motivating Example, Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
6.5. Computing an Orthonormal Basis . . . . . . . . . . . . . . . . . . . . . . . . 191
6.6. Motivating Example, Part III . . . . . . . . . . . . . . . . . . . . . . . . . . 192
6.7. What does this all mean? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.8. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
7. The Singular Value Decomposition 209
7.1. The Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
7.2. Consequences of the SVD Theorem . . . . . . . . . . . . . . . . . . . . . . . 211
7.3. Projection onto a column space . . . . . . . . . . . . . . . . . . . . . . . . . 214
7.4. Low-rank Approximation of a Matrix . . . . . . . . . . . . . . . . . . . . . . 216
8. QR factorization 219
8.1. Classical Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . 219
8.2. Modied Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.3. Householder QR factorization . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8.3.1. Householder transformations (reectors) . . . . . . . . . . . . . . . . 231
8.3.2. Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
8.3.3. Forming Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
8.4. Solving Linear Least-Squares Problems . . . . . . . . . . . . . . . . . . . . . 240
9. Eigenvalues and Eigenvectors 241
9.1. Motivating Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
9.2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
9.2.1. Eigenvalues and vector a a 2 2 matrix . . . . . . . . . . . . . . . . 244
iv
Preface
v
vi
Chapter 1
Introduction
In this chapter, we give a motivating example and use this to introduce linear algebra
notation.
1.1 A Motivating Example: Predicting the Weather
Let us assume that on any day, the following table tells us how the weather on that day
predicts the weather talgorithms and architectureshe next day:
Today
sunny cloudy rainy
Tomorrow
sunny 0.4 0.3 0.1
cloudy 0.4 0.3 0.6
rainy 0.2 0.4 0.3
Table 1.1: Table that predicts the weather.
This table is interpreted as follows: If today it is rainy, then the probability that it will be
cloudy tomorrow is 0.6, etc.
Example 1.1 If today is cloudy, what is the probability that tomorrow is sunny? cloudy?
rainy?
To answer this, we simply consult Table 1.1: the probability it will be sunny, cloudy, and
rainy are given by 0.3, 0.3, and 0.4, respectively.
1
2 Chapter 1. Introduction
Example 1.2 If today is cloudy, what is the probability that the day after tomorrow is
sunny? cloudy? rainy?
Now this gets a bit more dicult. Let us focus on the question what the probability is
that it is sunny the day after tomorrow. We dont know what the weather will be tomorrow.
What we do know is
The probability that it will be sunny the day after tomorrow and sunny tomorrow is
0.4 0.3.
The probability that it will sunny the day after tomorrow and cloudy tomorrow is
0.3 0.3.
The probability that it will sunny the day after tomorrow and rainy tomorrow is
0.1 0.4.
Thus, the probability that it will be sunny the day after tomorrow if it is cloudy today is
0.4 0.3 + 0.3 0.3 + 0.1 0.4 = 0.25.
Exercise 1.3 Work out the probabilities that it will be cloudy/rainy the day after tomorrow.
Example 1.4 If today is cloudy, what is the probability that a week from today it is sunny?
cloudy? rainy?
We will not answer this question now. We insert it to make the point that things can get
messy.
As is usually the case when we are trying to answer quantitative questions, it helps to
introduce some notation.
Let
(k)
s
denote the probability that it will be sunny k days from now.
Let
(k)
c
denote the probability that it will be cloudy k days from now.
Let
(k)
r
denote the probability that it will be rainy k days from now.
Here, is the Greek lower case letter chi, pronounced [kai] in English. Now, Table 1.1,
Example 1.2, and Exercise 1.3 motivate the equations
(k+1)
s
= 0.4
(k)
s
+ 0.3
(k)
c
+ 0.1
(k)
r
(k+1)
c
= 0.4
(k)
s
+ 0.3
(k)
c
+ 0.6
(k)
r
(k+1)
r
= 0.2
(k)
s
+ 0.4
(k)
c
+ 0.3
(k)
r
(1.1)
1.1. A Motivating Example: Predicting the Weather 3
The probabilities that denote what the weather may be on day k and the table that
summarizes the probabilities is often represented as a (state) vector, x, and (transition)
matrix, P, respectively:
x
(k)
=
_
_
_
(k)
s
(k)
c
(k)
r
_
_
_
and P =
_
_
0.4 0.3 0.1
0.4 0.3 0.6
0.2 0.4 0.3
_
_
.
The transition from day k to day k + 1 is then written as the matrix-vector product (multi-
plication)
_
_
_
(k+1)
s
(k+1)
c
(k+1)
r
_
_
_
=
_
_
0.4 0.3 0.1
0.4 0.3 0.6
0.2 0.4 0.3
_
_
_
_
_
(k)
s
(k)
c
(k)
r
_
_
_
,
or x
(k+1)
= Px
(k)
, which is simply a more compact representation (way of writing) the
equations in (1.1).
Assume again that today is cloudy so that the probability that it is sunny, cloudy, or
rainy today is 0, 1, and 0, respectively:
]x
(0)
=
_
_
_
(0)
s
(0)
c
(0)
r
_
_
_
=
_
_
0
1
0
_
_
. (1.2)
Then the vector of probabilities for tomorrows weather, x
(1)
, is given by
_
_
_
(1)
s
(1)
c
(1)
r
_
_
_
=
_
_
0.4 0.3 0.1
0.4 0.3 0.6
0.2 0.4 0.3
_
_
_
_
_
(0)
s
(0)
c
(0)
r
_
_
_
=
_
_
0.4 0.3 0.1
0.4 0.3 0.6
0.2 0.4 0.3
_
_
_
_
0
1
0
_
_
(1.3)
=
_
_
0.4 0 + 0.3 1 + 0.1 0
0.4 0 + 0.3 1 + 0.6 0
0.2 0 + 0.4 1 + 0.3 0
_
_
=
_
_
0.3
0.3
0.4
_
_
. (1.4)
The vector of probabilities for the day after tomorrow, x
(2)
, is given by
_
_
_
(2)
s
(2)
c
(2)
r
_
_
_
=
_
_
0.4 0.3 0.1
0.4 0.3 0.6
0.2 0.4 0.3
_
_
_
_
_
(1)
s
(1)
c
(1)
r
_
_
_
=
_
_
0.4 0.3 0.1
0.4 0.3 0.6
0.2 0.4 0.3
_
_
_
_
0.3
0.3
0.4
_
_
(1.5)
=
_
_
0.4 0.3 + 0.3 0.3 + 0.1 0.4
0.4 0.3 + 0.3 0.3 + 0.6 0.4
0.2 0.3 + 0.4 0.3 + 0.3 0.4
_
_
=
_
_
0.25
0.45
0.30
_
_
. (1.6)
Repeating this process, we can nd the probabilities for the weather for the next seven
days, under the assumption that today is cloudy:
k
0 1 2 3 4 5 6 7
x
(k)
=
_
_
0
1
0
_
_
_
_
0.3
0.3
0.4
_
_
_
_
0.25
0.45
0.30
_
_
_
_
0.265
0.415
0.320
_
_
_
_
0.2625
0.4225
0.3150
_
_
_
_
0.26325
0.42075
0.31600
_
_
_
_
0.26312
0.42112
0.31575
_
_
_
_
0.26316
0.42104
0.31580
_
_
Exercise 1.5 Follow the instructions for this problem given on the class wiki.
For the example described in this section,
1. Recreate the above table by programming it up with Matlab or Octave, starting with
the assumption that today is cloudy.
2. Create two similar tables starting with the assumption that today is sunny andrainy,
respectively
3. Compare how x
(7)
diers depending on todays weather.
4. What do you notice if you compute x
(k)
starting with today being sunny/cloudy/rainy
and you let k get large?
5. What does x
()
represent?
Alternatively, (1.5)(1.6) could have been stated as
_
_
_
(2)
s
(2)
c
(2)
r
_
_
_
=
_
_
0.4 0.3 0.1
0.4 0.3 0.6
0.2 0.4 0.3
_
_
_
_
_
(1)
s
(1)
c
(1)
r
_
_
_
=
_
_
0.4 0.3 0.1
0.4 0.3 0.6
0.2 0.4 0.3
_
_
_
_
_
_
_
0.4 0.3 0.1
0.4 0.3 0.6
0.2 0.4 0.3
_
_
_
_
_
(0)
s
(0)
c
(0)
r
_
_
_
_
_
_
= Q
_
_
_
(0)
s
(0)
c
(0)
r
_
_
_
, (1.7)
where Q is the transition matrix that tells us how the weather today predicts the weather
the day after tomorrow.
1.1. A Motivating Example: Predicting the Weather 5
Exercise 1.6 Given Table 1.1, create the following table, which predicts the weather the
day after tomorrow given the weather today:
Today
sunny cloudy rainy
Day after
Tomorrow
sunny
cloudy
rainy
This then tells us the entries in Q in (1.7).
1.2 Vectors
We will call a one-dimensional array of numbers a (real-valued) (column) vector:
x =
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
,
where
i
R for 0 i < n. The set of all such vectors is denoted by R
n
. A vector has a
direction and a length:
Its direction can be visualized by drawing an arrow from the origin to the point
(
0
,
1
, . . . ,
n1
).
Its length is given by the Euclidean length of this arrow: The length of x R
n
is given
by |x|
2
=
_
2
0
+
2
1
+ +
2
n1
, which is often called the two-norm.
A number of operations with vectors will be encountered in the course of this book. In the
remainder of this section, let x, y R
n
R, with
x =
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
, y =
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
, and z =
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
.
1.2.1 Equality (=)
Two vectors are equal if all their components are element-wise equal: x = y if and only if,
for 0 i < n,
i
=
i
.
1.2.2 Copy (copy)
The copy operation assigns the content of one vector to another. In our mathematical
notation we will denote this by the symbol := (pronounce: becomes). An algorithm and
M-script for the copy operation are given in Figure 1.1.
Remark 1.7 Unfortunately, M-script starts indexing at one, which explains the dier-
ence in indexing between the algorithm and the M-script implementation.
Cost: Copying one vector to another requires 2n memory operations (memops): The vector
x of length n must be read, requiring n memops, and the vector y must be written, which
accounts for the other n memops.
1.2. Vectors 7
for i = 0, . . . , n 1
i
:=
i
endfor
for i=1:n
y(i) = x( i );
end
Figure 1.1: Algorithm and M-script for the vector copy operation y := x.
for i = 0, . . . , n 1
i
:=
i
endfor
for i=1:n
y(i) = alpha * x( i );
end
Figure 1.2: Algorithm and M-script for the vector scale operation y := x.
for i = 0, . . . , n 1
i
:=
i
+
i
endfor
for i=1:n
z(i) = x(i) + y( i );
end
Figure 1.3: Algorithm and M-script for the vector addition operation z := x +y.
for i = 0, . . . , n 1
i
:=
i
+
i
endfor
for i=1:n
z(i) = alpha * x(i) + y(i);
end
Figure 1.4: Algorithm and M-script for scaled vector addition z := x +y.
:= 0
for i = 0, . . . , n 1
:=
i
i
+
endfor
alpha = 0;
for i=1:n
alpha = x(i) * y(i) + alpha;
end
Figure 1.5: Algorithm and M-script for the dot product := x
T
y.
1.2.3 Scaling (scal)
Multiplying vector x by scalar yields a new vector, x, in the same direction as x by scaled
by a factor . Scaling a vector by means each of its components,
i
, is scaled by :
x =
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
=
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
.
An algorithm and M-script for the scal operation are given in Figure 1.2.
Cost: On a computer, real numbers are stored as oating point numbers and real arithmetic
is approximated with oating point arithmetic. Thus, we count oating point computations
(ops): a multiplication or addition each cost one op. Scaling a vector requires n ops and
2n memops.
1.2.4 Vector addition (add)
The vector addition x +y is dened by
x +y =
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
+
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
=
_
_
_
_
_
0
+
0
1
+
1
.
.
.
n1
+
n1
_
_
_
_
_
.
In other words, the vectors are added element-wise, yielding a new vector of the same length.
An algorithm and M-script for the add operation are given in Figure 1.3.
Cost: Vector addition requires 3n memops (x and y are read and the resulting vector is
written) and n ops.
Exercise 1.8 Let x, y R
n
. Show that vector addition commutes: x +y = y +x.
1.2.5 Scaled vector addition (axpy)
One of the most commonly encountered operations when implementing more complex linear
algebra operations is the scaled vector addition y := x +y:
x +y =
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
+
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
=
_
_
_
_
_
0
+
0
1
+
1
.
.
.
n1
+
n1
_
_
_
_
_
.
It is often referred to as the axpy operation, which stands for alpha times x plus y. An
algorithm and M-script for the axpy operation are given in Figure 1.4. We emphasize that
typically it is used in situations where the output vector overwrites the input vector y.
Cost: The axpy operation requires 3n memops and 2n ops. Notice that by combining
the scaling and vector addition into one operation, there is the opportunity to reduce the
number of memops that are incurred separately by the scal and add operations.
1.2. Vectors 9
1.2.6 Dot product (dot)
The other commonly encountered operation is the dot (inner) product. It is dened by
x
T
y =
n1
i=0
i
=
0
0
+
1
1
+ +
n1
n1
.
We have already encountered the dot product in, for example, (1.1) (the right-hand side of
each equation is a dot product) and (1.6). An algorithm and M-script for the dot operation
are given in Figure 1.5.
Cost: A dot product requires approximately 2n memops and 2n ops.
n
. Show that x
T
y = y
T
x.
1.2.7 Vector length (norm2)
The Euclidean length of a vector x (the two-norm) is given by |x|
2
=
_
n1
i=0

2
i
Clearly
|x|
2
=
x
T
x, so that the dot operation can be used to compute this length.
Cost: If computed with a dot product, it requires approximately n memops and 2n ops.
Remark 1.10 In practice the two-norm is typically not computed via the dot operation,
since it can cause overow or underow in oating point arithmetic. Instead, vectors x
and y are rst scaled to prevent such unpleasant side eects.
1.2.8 A Simple Linear Algebra Package: SLAP
As part of this course, we will demonstrate how linear algebra libraries are naturally layered
by incrementally building a small library in M-script. For example, the axpy operation can
be implemented as a function as illustrated in Figure 1.7.
A couple of comments:
In this text, we distinguish between a (column) vector and a row vector (a vector
written as a row of vectors or, equivalently, a column vector that has been transposed).
In M-script, every variable is viewed as a multi-dimensional array. For our purposes,
it suces to think of all arrays as being two dimensional. A scalar is the special case
where the array has one row and one column. A vector is the case where the array
consists of one row or one column. If variable x consists of one column, then x( i )
Operation Abbrev. Denition Function Approx. cost
ops memops
Vector-vector operations
Copy (copy) y := x y = SLAP Copy( x ) 0 2n
Vector scaling (scal) y := x y = SLAP Scal( alpha, x ) n 2n
Adding (add) z := x +y z = SLAP Add( x, y ) n 3n
Scaled addition (axpy) z := x +y z = SLAP Axpy( alpha, x, y ) 2n 3n
Dot product (dot) := x
T
y alpha = SLAP Dot( x, y ) 2n 2n
Length (norm2) := |x|
2
alpha = SLAP Norm2( x ) 2n n
Figure 1.6: A summary of the more commonly encountered vector-vector routines in our
library.
equals the ith component of that column. If variable x consists of one row, then x(
i ) equals the ith component of that row. Thus, in this sense, M-script is blind to
whether a vector is a row or a column vector.
It may seem a bit strange that we are creating functions for the mentioned operations.
For example, M-script can compute the inner product of column vectors x and y via
the expression x * y and thus there is no need for the function. The reason we are
writing explicit functions is to make the experience of building the library closer to
how one would do it in practice with a language like the C programming language.
We expect the reader to be able to catch on without excessive explanation. M-script
is a very simple and intuitive language.
When in doubt, try help <topic> or doc <topic> at the prompt.
A list of functions for performing common vector-vector operations is given in Figure 1.6.
Exercise 1.11 Start building your library by implement the functions in Figure 1.6. (See
directions on the class wiki page.)
1.3 A Bit of History
The functions Figure 1.6 are very similar in functionality to Fortran routines known as the
(level-1) Basic Linear Algebra Subprograms (BLAS) that are commonly used in scientic
libraries. These were rst proposed in the 1970s and were used in the development of one of
the rst linear algebra libraries, LINPACK.
See
1.3. A Bit of History 11
function [ z ] = SLAP_Axpy( alpha, x, y )
%
% Compute z = alpha * x + y
%
% If y is a column/row vector, then
% z is a matching vector. This is
% because the operation is almost always
% used to overwrite y = axpy( alpha, x, y )
%
[ m_x, n_x ] = size( x );
[ m_y, n_y ] = size( y );
if ( n_x == 1 ) % x is a column
n = m_x;
else
n = n_x; % x is a row
endif
if ( n_y == 1 & m_y ~= n )
abort(axpy: incompatible lengths);
else if ( m_y == 1 & n_y ~= n )
abort(axpy: incompatible lengths);
endif
z = y;
for i=1:n
z(i) = alpha * x( i ) + z(i);
end
return
end
octave:1> x = [
> 1
> -1
> 2
> ]
x =
1
-1
2
octave:2> y = [
> 2
> 1
> 0
> ]
y =
2
1
0
octave:3> axpy( -1, x, y )
ans =
1
2
-2
octave:4> -1 * x + y
ans =
1
2
-2
Figure 1.7: Left: A routine for computing axpy. Right: Testing the axpy function in
octave.
C. Lawson, R. Hanson, D. Kincaid, and F. Krogh, Basic Linear Algebra Subprograms
for Fortran Usage, ACM Trans. on Math. Soft., 5 (1979) 305325.
J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart, LINPACK Users Guide,
SIAM, Philadelphia, 1979.
1.4 Matrices and Matrix-Vector Multiplication
To understand what a matrix is in the context of linear algebra, we have to start with linear
transformations, after which we can discuss how matrices and matrix-vector multiplication
represent such transformations.
1.4.1 First: Linear transformations
Denition 1.12 (Linear transformation)
A function L : R
n
R
m
is said to be a linear transformation if for all x, y R
n
and
R
L(x) = L(x) (Transforming a scaled vector is the same as scaling the transformed
vector).
L(x +y) = L(x) +L(y) (The transformation is distributive).
Example 1.13 A rotation is a linear transformation. For example, let R
: R
2
R
2
be
the transformation that rotates a vector through angle .
Let x, y R
2
and R.
Figure 1.8 illustrates that scaling a vector rst by and then rotating it yields the
same result as rotating the vector rst and then scaling it.
Figure 1.10 illustrates that rotation x +y yields the same vector as adding the vectors
after rotating them rst.
Lemma 1.14 Let x, y R
n
, , R and L : R
n
R
m
be a linear transformation.
Then
L(x +y) = L(x) +L(y).
Proof: L(x +y) = L(x) +L(y) = L(x) +L(y).
Lemma 1.15 Let x
o
, x
1
, . . . , x
n1
R
n
and let L : R
n
R
m
be a linear transforma-
tion. Then
L(x
0
+x
1
+. . . +x
n1
) = L(x
0
) +L(x
1
) +. . . +L(x
n1
). (1.8)
1.4. Matrices and Matrix-Vector Multiplication 13
Scale then rotate Rotate then scale
(a) (d)
(b) (e)
(c) (f)
Figure 1.8: An illustration that for a rotation, R
: R
2
R
2
, one can scale the input rst
and then apply the transformation or one can transform the input rst and then scale the
result: R
(2x) = 2R
(x).
Figure 1.9: A generalization of Figure 1.8: R
(x) = R
(x).
While it is tempting to say that this is simply obvious, we are going to prove this rigor-
ously. Wheneven one tries to prove a result for a general n, where n is a natural number,
one often uses a proof by induction. We are going to give the proof rst, and then we will
try to explain it.
Proof: Proof by induction on n.
Base case: n = 1. For this case, we must show that L(x
0
) = L(x
0
). This is trivially true.
Inductive step: Inductive Hypothesis (IH): Assume that the result is true for n = k where
k 1:
L(x
0
+x
1
+. . . +x
k1
) = L(x
0
) +L(x
1
) +. . . +L(x
k1
).
We will show that the result is then also true for n = k + 1:
L(x
0
+x
1
+. . . +x
k
) = L(x
0
) +L(x
1
) +. . . +L(x
k
).
Assume that k 1. Then
L(x
0
+x
1
+. . . +x
k
)
= (expose extra term)
L(x
0
+x
1
+. . . +x
k1
+x
k
)
= (associativity of vector addition)
L((x
0
+x
1
+. . . +x
k1
) +x
k
)
= (Lemma 1.14)
L(x
0
+x
1
+. . . +x
k1
) +L(x
k
)
= (Inductive Hypothesis)
L(x
0
) +L(x
1
) +. . . +L(x
k1
) +L(x
k
).
By the Principle of Mathematical Induction the result holds for all n.
Add then rotate Rotate then add
(a) (d)
(b) (e)
(c) (f)
Figure 1.10: An illustration that for a rotation, R
: R
2
R
2
, one can add vectors rst
and then apply the transformation or one can transform the vectors rst and then add the
results: R
(x +y) = R
(x) +R
(y).
The idea is as follows: The base case shows that the result is true for n = 1:L(x
0
) = L(x
0
).
The inductive step shows that if the result is true for n = 1, then the result is true for
n = 1 + 1 = 2 so that L(x
0
+ x
1
) = L(x
0
) + L(x
1
). Since the result is indeed true for
n = 1 (as proven by the base case) we now know that the result is also true for n = 2. The
inductive step also implies that if the result is true for n = 2, then it is also true for n = 3.
Since we just reasoned that it is true for n = 2, we now know it is also true for n = 3:
L(x
0
+x
1
+x
2
) = L(x
0
) +L(x
1
) +L(x
2
). And so forth.
Remark 1.16 The Principle of Mathematical Induction says that
if one can show that a property holds for n = n
b
and
one can show that if it holds for n = k, where k n
b
, then it is also holds for
n = k + 1,
one can conclude that the property holds for all n n
b
.
In the above example (and quite often) n
b
= 1.
Example 1.17 Show that
n1
i=0
i = n(n 1)/2.
Proof by induction:
Base case: n = 1. For this case, we must show that
11
i=0
i = 1(0)/2.
11
i=0
i
= (Denition of summation)
0
= (arithmetic)
1(0)/2
This proves the base case.
k 1:
k1
i=0
i = k(k 1)/2.
(k+1)1
i=0
i = (k + 1)((k + 1) 1)/2.
(k+1)1
i=0
i
= (arithmetic)
k
i=0
i
= (split o last term)
k1
i=0
i +k
k(k 1)/2 +k.
= (algebra)
(k
2
k)/2 + 2k/2.
= (algebra)
(k
2
+k)/2.
= (algebra)
(k + 1)k/2.
= (arithmetic)
(k + 1)((k + 1) 1)/2.
This proves the inductive step.
As we become more procient, we will start combining steps. For now, we give lots of detail
to make sure everyone stays on board.
Exercise 1.18 Use mathematical induction to prove that
n1
i=0
i
2
= (n 1)n(2n 1)/6.
Theorem 1.19 Let x
o
, x
1
, . . . , x
n1
R
n
,
o
,
1
, . . . ,
n1
R, and let L : R
n
R
m
be a linear transformation. Then
L(
0
x
0
+
1
x
1
+. . . +
n1
x
n1
) =
0
L(x
0
) +
1
L(x
1
) +. . . +
n1
L(x
n1
). (1.9)
Proof:
L(
0
x
0
+
1
x
1
+. . . +
n1
x
n1
)
= (Lemma 1.15)
L(
0
x
0
) +L(
1
x
1
) +. . . +L(
n1
x
n1
)
= (Denition of linear transforma-
tion)
0
L(x
0
) +
1
L(x
1
) +. . . +
k1
L(x
k1
) +
n1
L(x
n1
).
Example 1.20 The transformation F
__

0
1
__
=
_

0
+
1
0
_
is a linear transforma-
tion.
The way we prove this is to pick arbitrary R, x =
_

0
1
_
, and y =
_

0
1
_
for
which we then show that F(x) = F(x) and F(x +y) = F(x) +F(y):
Show F(x) = F(x):
F(x) = F
_
_

0
1
__
= F
__

0
1
__
=
_

0
+
1
0
_
=
_
(
0
+
1
)
0
_
=
_

0
+
1
0
_
= F
__

0
1
__
= F(x).
Show F(x +y) = F(x) +F(y):
F(x +y) = F
__

0
1
_
+
_

0
1
__
= F
__

0
+
0
1
+
1
__
=
_
(
0
+
0
) + (
1
+
1
)
0
+
0
_
=
_
(
0
+
1
) + (
0
+
1
)
0
+
0
_
=
_

0
+
1
0
_
+
_

0
+
1
0
_
= F
__

0
1
__
+F
__

0
1
__
= F(x) +F(y).
__

__
=
_
+
+ 1
_
is not a linear transforma-
tion.
Recall that in order for it to be a linear transformation, F(x) = F(x).. This means
that F(0) = 0. (Pick = 0. Here 0 denotes the vector of all zeroes.) Notice that for this
particular F,
F
__
0
0
__
=
_
0
1
_
and thus this cannot be a linear transformation.
Remark 1.22 Always check what happens if you plug in the zero vector. However, if
F(0) = 0, this does not mean that the transformation is a linear transformation. It merely
means it might be.
__

__
=
_

_
is not a linear transformation.
The rst check should be whether F(0) = 0. The answer in this case is yes. However,
F
_
2
_
1
1
__
= F
__
2
2
__
=
_
2 2
2
_
=
_
4
2
_
,= 2
_
1
1
_
= 2F
__
1
1
__
.
Hence, there is a vector x R
2
and R such that F(x) ,= F(x). This means that F is
not a linear transformation.
Exercise 1.24 For each of the following, determine whether it is a linear transformation or
not:
F
_
_
_
_
2
_
_
_
_
=
_
_
0
0
2
_
_
.
F
__

0
1
__
=
_

2
0
0
_
.
1.4.2 From linear transformation to matrix-vector multiplication
Now we are ready to link linear transformations to matrices and matrix-vector multiplication.
The denitions of vector scaling and addition mean that any x R
n
can be written as
x =
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
=
0
_
_
_
_
_
1
0
.
.
.
0
_
_
_
_
_
. .
e
0
+
1
_
_
_
_
_
0
1
.
.
.
0
_
_
_
_
_
. .
e
1
+ +
n1
_
_
_
_
_
0
0
.
.
.
1
_
_
_
_
_
. .
e
n1
=
n1
j=0
j
e
j
.
Denition 1.25 The vectors e
j
R
n
mentioned above are called the unit basis vectors
and the notation e
j
is reserved for these vectors.
Exercise 1.26 Let x, e
i
R
n
. Show that e
T
i
x = x
T
e
i
=
i
(the ith element of x).
Now, let L : R
n
R
m
be a linear transformation. Then, given x R
n
, the result of
y = L(x) is a vector in R
m
. But then
y = L(x) = L
_
n1
j=0
j
e
j
.
_
=
n1
j=0
j
L(e
j
) =
n1
j=0
j
a
j
,
where we let a
j
= L(e
j
). The linear transformation L is completely described by
these vectors a
0
, . . . , a
n1
: Given any vector x, L(x) =
n1
j=0

j
a
j
. By arranging these
vectors as the columns of a two-dimensional array, the matrix A, we arrive at the observation
that the matrix is simply a representation of the corresponding linear transformation L. If
we let
A =
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
. . . . . .
a
0
a
1
a
n1
so that
i,j
equals the ith component of vector a
j
, then
Ax = L(x) =
0
a
0
+
1
a
1
+ +
n1
a
n1
=
0
_
_
_
_
_
0,0
1,0
.
.
.
m1,0
_
_
_
_
_
+
1
_
_
_
_
_
0,1
1,1
.
.
.
m1,1
_
_
_
_
_
+ +
n1
_
_
_
_
_
0,n1
1,n1
.
.
.
m1,n1
_
_
_
_
_
=
_
_
_
_
_
0,0
0
+
0,1
1
+ +
0,n1
n1
1,0
0
+
1,1
1
+ +
1,n1
n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0
0
+
m1,1
1
+ +
m1,n1
n1
_
_
_
_
_
.
Denition 1.27 (R
mn
)
The set of all m n real valued matrices is denoted by R
mn
. Thus, A R
mn
means
that A is a real valued matrix.
Remark 1.28 We adopt the convention that Roman lowercase letters are used for vector
variables and Greek lowercase letters for scalars. Also, the elements of a vector are denoted
by the Greek letter that corresponds to the Roman letter used to denote the vector. A
table of corresponding letters is given in Figure 1.11.
Matrix Vector Scalar Note
Symbol L
A
T
E
X Code
A a \alpha alpha
B b \beta beta
C c \gamma gamma
D d \delta delta
E e \epsilon epsilon e
j
= jth unit basis
vector.
F f \phi phi
G g \xi xi
H h \eta eta
I Used for identity ma-
trix.
K k \kappa kappa
L l \lambda lambda
M m \mu mu m() = row dimen-
sion.
N n \nu nu is shared with V.
n() = column di-
mension.
P p \pi pi
Q q \theta theta
R r \rho rho
S s \sigma sigma
T t \tau tau
U u \upsilon upsilon
V v \nu nu shared with N.
W w \omega omega
X x \chi chi
Y y \psi psi
Z z \zeta zeta
Figure 1.11: Correspondence between letters used for matrices (uppercase Roman)/vectors
(lowercase Roman) and the symbols used to denote their scalar entries (lowercase Greek
letters).
Denition 1.29 (Matrix-vector multiplication)
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
=
_
_
_
_
_
0,0
0
+
0,1
1
+ +
0,n1
n1
1,0
0
+
1,1
1
+ +
1,n1
n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0
0
+
m1,1
1
+ +
m1,n1
n1
_
_
_
_
_
. (1.10)
Example 1.30 Compute Ax when A =
_
_
1 0 2
3 1 1
2 1 2
_
_
and x =
_
_
1
0
0
_
_
.
Answer:
_
_
1
3
2
_
_
, the rst column of the matrix!
It is not hard to see that if e
j
is the j unit basis vector (as dened in Denition 1.25),
then
Ae
j
=
_
a
0
a
1
a
j
a
n1
_
_
_
_
_
_
_
_
_
_
0
0
.
.
.
1
.
.
.
0
_
_
_
_
_
_
_
_
_
= 0 a
0
+0 a
1
+ +1 a
j
+ +0 a
n1
= a
j
.
Also, given a vector x, the dot product e
T
i
x equals the ith entry in x,
i
:
e
T
i
x =
_
0 0 1 0
_
_
_
_
_
_
_
_
_
_
1
.
.
.
i
.
.
.
n1
_
_
_
_
_
_
_
_
_
= 0
0
+0
1
+ +1
i
+ +0
n1
=
i
.
(a) (b)
(c)
Figure 1.12: Computing the matrix that represents rotation R
. (a)-(b) Trigonomitry tells

us the coecients of R
(e
0
) and R
(e
1
). This then motivates the matrix that is given in (c),
so that the rotation can be written as a matrix-vector multiplication.
Example 1.31
_
_
0
0
1
_
_
T
_
_
_
_
1 0 2
3 1 1
2 1 2
_
_
.
_
_
1
0
0
_
_
_
_
=
_
_
0
0
1
_
_
T
_
_
1
3
2
_
_
= 2,
the (2, 0) element of the matrix. We notice that
i,j
= e
T
i
(Ae
j
).
Later, we will see that e
T
i
A equals the ith row of matrix A and that
i,j
= e
T
i
(Ae
j
) =
e
T
i
Ae
j
= (e
T
i
A)e
j
.
Example 1.32 Recall that a rotation R
: R
2
R
2
is a linear transformation. We now
show how to compute the matrix, Q, that represents this rotation.
Given that the transformation is from R
2
to R
2
, we know that the matrix will be a
2 2 matrix: it will take vectors of size two as input and will produce vectors of size
two.
We have learned that the rst column of the matrix Q will equal R
(e
0
) and the second
column will equal R
(e
1
).
In Figure 1.12 we motivate that
R
(e
0
) =
_
cos()
sin()
_
and R
(e
0
) =
_
sin()
cos()
_
.
We conclude that
Q =
_
cos() sin()
sin() cos()
_
.
This means that a vector x =
_

0
1
_
is transformed into
R
(x) = Qx =
_
cos() sin()
sin() cos()
__

0
1
_
=
_
cos()
0
sin()
1
sin()
0
+ cos()
1
_
.
This is a formula you may have seen in a precalculus or physics course when discussing change
of coordinates, except with the coordinates
0
and
1
replaced by x and y, respectively.
__

0
1
__
=
_

0
+
1
0
_
was shown in Exam-
ple 1.20 to be a linear transformation. Let us give an alternative proof of this: we will
compute a possible matrix, A, that represents this linear transformation. We will then show
that F(x) = Ax, which then means that F is a linear transformation.
To compute a possible matrix consider:
F
__
1
0
__
=
_
1
1
_
and F
__
0
1
__
=
_
1
0
_
.
Thus, if F is a linear transformation, then F(x) = Ax where A =
_
1 1
1 0
_
. Now,
Ax =
_
1 1
1 0
__

0
1
_
=
_

0
+
1
0
_
= F
__

0
1
__
= F(x)
which nished the proof that F is a linear transformation.
Exercise 1.34 Show that the transformation in Example 1.21 is not a linear transformation
by computing a possible matrix that represents it, and then showing that it does not represent
it.
it.
Exercise 1.36 For each of the transformations in Exercise 1.24 compute a possible matrix
that represents it and use it to show whether the transformation is linear.
1.4.3 Special Matrices
The identity matrix Let L
I
: R
n
R
n
be the function dened for every x R
n
as
L
I
(x) = x. Clearly L
I
(x + y) = x + y = L
I
(x) + L
I
(y), so that we recognize it
as a linear transformation. We will denote the matrix that represents L
I
by the letter I
and call it the identity matrix. By the denition of a matrix, the j column of I is given by
L
I
(e
j
) = e
j
. Thus, the identity matrix is given by
I =
_
_
_
_
_
1 0 0
0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1
_
_
_
_
_
.
Notice that clearly Ix = x and a column partitioning of I yields I =
_
e
0
e
1
e
n1
_
.
In M-script, an n n identity matrix is generated by the command eye( n ).
Diagonal matrices
Denition 1.37 (Diagonal matrix)
A matrix A R
nn
is said to be diagonal if
i,j
= 0 for all i ,= j.
Example 1.38 Let A =
_
_
3 0 0
0 1 0
0 0 2
_
_
and x =
_
_
2
1
2
_
_
. Then
Ax =
_
_
3 0 0
0 1 0
0 0 2
_
_
_
_
2
1
2
_
_
=
_
_
(3)(2)
(1)(1)
(2)(2)
_
_
=
_
_
6
1
(4)
_
_
.
We notice that a diagonal matrix scales individual components of a vector by the corre-
sponding diagonal element of the matrix.
Exercise 1.39 Let D =
_
_
2 0 0
0 3 0
0 0 1
_
_
. What linear transformation, L, does this matrix
represent? In particular, answer the following questions:
L : R
?
R
??
. Give ? and ??.
A linear transformation can be describe by how it transforms the unit basis vectors:
L(e
0
) =
L(e
1
) =
L(e
2
) =
L
_
_
_
_
2
_
_
_
_
=
_
_
_
_
In M-script, an nn diagonal matrix with indicated components results from the function
call diag( x ).
Example 1.40
> x = [ 1 2 3 ]
> A = diag( x )
A =
1 0 0
0 2 0
0 0 3
Here x can be either a row or a column vector. When A is already a matrix, this same
function returns the diagonal of A as a column vector:
> y = diag( A )
y =
1
2
3
In linear algebra an element-wise vector-vector product is not a meaningful operation:
when x, y R
n
the product xy has no meaning. However, in M-script, if x and y are
both column vectors (or matrices for that matter), the operation x .* y is an element-wise
multiplication. Thus, diag( x ) * y can be alternatively written as x .* y .
Triangular matrices
Denition 1.41 (Triangular matrix)
A matrix A R
nn
is said to be
lower triangular if
i,j
= 0 for all i < j
strictly lower triangular if
i,j
= 0 for all i j
unit lower triangular if
i,j
= 0 for all i < j and
i,j
= 1 if i = j
upper triangular if
i,j
= 0 for all i > j
strictly upper triangular if
i,j
= 0 for all i j
unit upper triangular if
i,j
= 0 for all i > j and
i,j
= 1 if i = j
If a matrix is either lower or upper triangular, it is said to be triangular.
Exercise 1.42 Give examples for each of the triangular matrices in Denition 1.41.
Exercise 1.43 Show that a matrix that is both lower and upper triangular is in fact a
diagonal matrix.
In M-script, given an nn matrix A, its lower and upper triangular parts can be extracted
by the calls tril( A ) and triu( A ), respectively. Its strictly lower and strictly upper
triangular parts can be extract by the calls tril( A, -1 ) and triu( A, 1 ), respectively.
Exercise 1.44 Add the functions trilu( A ) and triuu( A ) to your SLAP library. These
functions return the lower and upper triangular part of A, respectively, with the diagonal set
to ones. Thus,
> A = [
1 -2 1
-1 2 0
2 3 3
];
> trilu( A )
ans =
1 0 0
-1 1 0
2 3 1
Hint: use the tril() and eye() functions. You will also want to use the size() function
to extract the dimensions of A, to pass in to eye().
Transpose matrix
Denition 1.45 (Transpose matrix)
Let A R
mn
and B R
nm
. Then B is said to be the transpose of A if, for 0 i < m
and 0 j < n,
j,i
=
i,j
. The transpose of a matrix A is denoted by A
T
.
We have already used
T
to indicate a row vector, which is consistent with the above de-
nition.
Rather than writing supplying our own function for transposing a matrix (or vector), we
will rely on the M-script operation: B = A creates a matrix that equals the transpose of
matrix A. There is a reason for this: rarely do we explicitly transpose a matrix. Instead, we
will try to always write our routines so that the operation is computed as if the matrix is
transposed, but without performing the transpose so that we do not pay the overhead (in
time and memory) associated with the transpose.
_
_
1 0 2 1
2 1 1 2
3 1 1 3
_
_
and x =
_
_
1
2
4
_
_
. Then
A
T
=
_
_
1 0 2 1
2 1 1 2
3 1 1 3
_
_
T
=
_
_
_
_
1 2 3
0 1 1
2 1 1
1 2 3
_
_
_
_
and x
T
=
_
_
1
2
4
_
_
T
=
_
1 2 4
_
.
Remark 1.47 Clearly, (A
T
)
T
= A.
Symmetric matrices
Denition 1.48 (Symmetric matrix)
A matrix A R
nn
is said to be symmetric if A = A
T
.
Exercise 1.49 Show that a triangular matrix that is also symmetric is in fact a diagonal
matrix.
1.4.4 Computing the matrix-vector multiply
The output of a matrix-vector multiply is itself a vector: If A R
mn
and x R
n
, then
y = Ax is a vector of length m (y R
m
). We now discuss dierent ways in which to compute
this operation.
Via dot products: If one looks at a typical row in (1.10)
i,0
0
+
i,1
1
+ +
i,n1
n1
one notices that this is just the dot product of vectors
a
i
=
_
_
_
_
_
i,0
i,1
.
.
.
i,n1
_
_
_
_
_
and x =
_
_
_
_
_
1
.
.
.
n
1
_
_
_
_
_
in other words, the dot product of the ith row of A, viewed as a column vector, with the
vector x, which one can visualize as
_
_
_
_
_
_
_
0
.
.
.
i
.
.
.
m1
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
0,0

0,1

0,n1
.
.
.
.
.
.
.
.
.
i,0

i,1

i,n1
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
_
_
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
_
_
1 0 2
2 1 1
3 1 1
_
_
and x =
_
_
1
2
1
_
_
. Then
Ax =
_
_
1 0 2
2 1 1
3 1 1
_
_
_
_
1
2
1
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 0 2
_
_
_
1
2
1
_
_
_
2 1 1
_
_
_
1
2
1
_
_
_
3 1 1
_
_
_
1
2
1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
0
2
_
_
T
_
_
1
2
1
_
_
_
_
2
1
1
_
_
T
_
_
1
2
1
_
_
_
_
3
1
1
_
_
T
_
_
1
2
1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
(1)(1) + (0)(2) + (2)(1)
(2)(1) + (1)(2) + (1)(1)
(3)(1) + (1)(2) + (1)(1)
_
_
=
_
_
3
3
2
_
_
The algorithm for computing y := Ax +y is given in Figure 1.13(left). To its right is the
M-script function that implements the matrix-vector multiplication. If initially y = 0, then
these compute y := Ax.
Now, let us revisit the fact that the matrix-vector multiply can be computed as dot
products of the rows of A with the vector x. Think of the matrix A as individual rows:
A =
_
_
_
_
_
a
T
0
a
T
1
.
.
.
a
T
m1
_
_
_
_
_
,
where a
i
is the (column) vector which, when transposed, becomes the ith row of the matrix.
Then
Ax =
_
_
_
_
_
a
T
0
a
T
1
.
.
.
a
T
m1
_
_
_
_
_
x =
_
_
_
_
_
a
T
0
x
a
T
1
x
.
.
.
a
T
m1
x
_
_
_
_
_
,
for i = 0, . . . , m1
for j = 0, . . . , n 1
i
:=
i
+
i,j
j
endfor
endfor
for i=1:m
for j=1:n
y( i ) = y( i ) + A( i,j ) * x( j );
end
end
Figure 1.13: Algorithm and M-script for computing y := Ax +y.
for i = 0, . . . , m1
for j = 0, . . . , n 1
i
:=
i
+
i,j
j
endfor
_
_
_
i
:=
i
+ a
T
i
x
endfor
for i=1:m
y( i ) = y( i ) + FLA_Dot( A( i,: ), x );
end
Figure 1.14: Algorithm and M-script for computing y := Ax+y via dot products. A( i,: )
equals the ith row of array A.
for j = 0, . . . , n 1
for i = 0, . . . , m1
i
:=
i
+
i,j
j
endfor
endfor
for j=1:n
for i=1:m
y( i ) = y( i ) + A( i,j ) * x( j );
end
end
Figure 1.15: Algorithm and M-script for computing y := Ax+y. This is exactly the algorithm
in Figure 1.13, except with the order of the loops interchanged.
for j = 0, . . . , n 1
for i = 0, . . . , m1
i
:=
i
+
i,j
j
endfor
_
_
_
y :=
j
a
j
+y
endfor
for j=1:n
y = FLA_Axpy( x( j ), A( :, j ), y );
end
Figure 1.16: Algorithm and M-script for computing y := Ax + y via axpy operations.
A( :,j ) equals the jth column of array A.
which is exactly what we reasoned before. An algorithm and corresponding M-script imple-
mentation that exploit this insight are given in Figure 1.14.
Via axpy operations: Next, we note that, by denition,
_
_
_
_
_
0,0
0
+
0,1
1
+ +
0,n1
n1
1,0
0
+
1,1
1
+ +
1,n1
n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0
0
+
m1,1
1
+ +
m1,n1
n1
_
_
_
_
_
=
0
_
_
_
_
_
0,0
1,0
.
.
.
m1,0
_
_
_
_
_
+
1
_
_
_
_
_
0,1
1,1
.
.
.
m1,1
_
_
_
_
_
+ +
n1
_
_
_
_
_
0,n1
1,n1
.
.
.
m1,n1
_
_
_
_
_
.
This suggests the alternative algorithm and M-script for computing y := Ax + y given
in Figure 1.15, which are exactly the algorithm and M-script given in Figure 1.13(left) but
with the two loops interchanged.
If we let a
j
denote the vector that equals the jth column of A, then A =
_
a
0
a
1
a
n1
_
and
Ax =
0
_
_
_
_
_
0,0
1,0
.
.
.
m1,0
_
_
_
_
_
. .
a
0
+
1
_
_
_
_
_
0,1
1,1
.
.
.
m1,1
_
_
_
_
_
. .
a
1
+ +
n1
_
_
_
_
_
0,n1
1,n1
.
.
.
m1,n1
_
_
_
_
_
. .
a
n1
=
0
a
0
+
1
a
1
+ +
n1
a
n1
.
This suggests the algorithm and M-script for computing y := Ax+y given in Figure 1.16.. It
illustrates how matrix-vector multiplication can be expressed in terms of axpy operations.
_
_
1 0 2
2 1 1
3 1 1
_
_
and x =
_
_
1
2
1
_
_
. Then
Ax =
_
_
1 0 2
2 1 1
3 1 1
_
_
_
_
1
2
1
_
_
= (1)
_
_
1
2
3
_
_
+ (2)
_
_
0
1
1
_
_
+ (1)
_
_
2
1
1
_
_
=
_
_
(1)(1)
(1)(2)
(1)(3)
_
_
+
_
_
(2)(0)
(2)(1)
(2)(1)
_
_
+
_
_
(1)(2)
(1)(1)
(1)(1)
_
_
=
_
_
(1)(1) + (0)(2) + (2)(1)
(2)(1) + (1)(2) + (1)(1)
(3)(1) + (1)(2) + (1)(1)
_
_
=
_
_
3
3
2
_
_
1.4.5 Cost of matrix-vector multiplication
Computing y := Ax + y, where A R
mn
, requires mn multiplies and mn adds, for a total
of 2mn oating point operations (ops). This count is the same regardless of the order of
the loops (i.e., regardless of whether the matrix-vector multiply is organized by computing
dot operations with the rows or axpy operations with the columns).
1.4.6 Scaling and adding matrices
Theorem 1.52 Let L
A
: R
n
R
m
be a linear transformation and, for all x R
n
, dene
the function L
B
: R
n
R
m
by L
B
(x) = L
A
(x). Then L
B
(x) is a linear transformation.
Proof: Let x, y R
n
and , R. Then
L
B
(x +y) = L
A
(x +y) = (L
A
(x) +L
A
(y)) = ()L
A
(x) + ()L
A
(y)
= ()L
A
(x) + ()L
A
(y) = (L
A
(x)) +(L
A
(y)) = L
B
(x) +L
B
(y).
Hence L
B
is a linear transformation.
Now, let A be the matrix that represents L
A
. Then, for all x R
n
, (Ax) = L
A
(x) =
L
B
(x). Since L
B
is a linear transformation, there should be a matrix B such that, for all
x R
n
, Bx = L
B
(x) = (Ax). One way to nd how that matrix relates to and A is to
recall that b
j
= Be
j
, the jth column of B. Thus, b
j
= Be
j
= (Ae
j
) = a
j
, where a
j
equals
the jth column of A. We conclude that B is computed from A by scaling each column by .
But that simply means that each element of B is scaled by . This motivates the following
denition
Denition 1.53 (Scaling a matrix)
If A R
mn
and R, then
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
. =
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
.
An alternative motivation for this denition is to consider
(Ax) =
_
_
_
_
_
0,0
0
+
0,1
1
+ +
0,n1
n1
1,0
0
+
1,1
1
+ +
1,n1
n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0
0
+
m1,1
1
+ +
m1,n1
n1
_
_
_
_
_
=
_
_
_
_
_
(
0,0
0
+
0,1
1
+ +
0,n1
n1
)
(
1,0
0
+
1,1
1
+ +
1,n1
n1
)
.
.
.
.
.
.
.
.
.
.
.
.
(
m1,0
0
+
m1,1
1
+ +
m1,n1
n1
)
_
_
_
_
_
=
_
_
_
_
_
0,0
0
+
0,1
1
+ +
0,n1
n1
1,0
0
+
1,1
1
+ +
1,n1
n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0
0
+
m1,1
1
+ +
m1,n1
n1
_
_
_
_
_
=
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
= (A)x.
Remark 1.54 Since, by design, (Ax) = (A)x we can drop the parentheses and write
Ax (which also equals A(x) since L(x) = Ax is a linear transformation).
Theorem 1.55 Let L
A
: R
n
R
m
and L
B
: R
n
R
m
both be linear transformations
and, for all x R
n
, dene the function L
C
: R
n
R
m
by L
C
(x) = L
A
(x) +L
B
(x). Then
L
C
(x) is a linear transformations.
Exercise 1.56 Prove Theorem 2.2.
Now, let A, B, and C be the matrices that represent L
A
, L
B
, and L
C
from Theorem 2.2,
respectively. Then, for all x R
n
, Cx = L
C
(x) = L
A
(x) + L
B
(x). One way to nd how
matrix C relates to matrices A and B is to exploit the fact that
c
j
= Ce
j
= L
C
(e
j
) = L
A
(e
j
) +L
B
(e
j
) = Ae
j
+Be
j
= a
j
+b
j
,
where a
j
, b
j
, and c
j
equal the jth columns of A, B, and C, respectively. Thus, the jth
column of C equals the sum of the corresponding columns of A and B. But that simply
means that each element of C equals the sum of the corresponding elements of A and B:
Denition 1.57 (Matrix addition)
If A, B R
mn
, then
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
+
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
=
_
_
_
_
_
0,0
+
0,0

0,1
+
0,1

0,n1
+
0,n1
1,0
+
1,0

1,1
+
1,1

1,n1
+
1,n1
.
.
.
.
.
.
.
.
.
m1,0
+
m1,0

m1,1
+
m1,1

m1,n1
+
m1,n1
_
_
_
_
_
.
Exercise 1.58 Note: I have changed this question from before. Give a motivation
for matrix addition by considering a linear transformation L
C
(x) = L
A
(x) +L
B
(x)
1.4.7 Partitioning matrices and vectors into submatrices (blocks) and
subvectors
Theorem 1.59 Let A R
mn
, x R
n
, and y R
n
. Let
m = m
0
+m
1
+ m
M1
, m
i
0 for i = 0, . . . , M 1; and
n = n
0
+n
1
+ n
N1
, n
j
0 for j = 0, . . . , N 1; and
Partition
A =
_
_
_
_
_
A
0,0
A
0,1
A
0,N1
A
1,0
A
1,1
A
1,N1
.
.
.
.
.
.
.
.
.
.
.
.
A
M1,0
A
M1,1
A
M1,N1
_
_
_
_
_
, x =
_
_
_
_
_
x
0
x
1
.
.
.
x
N1
_
_
_
_
_
, and y =
_
_
_
_
_
y
0
y
1
.
.
.
y
M1
_
_
_
_
_
with A
i,j
R
m
i
n
j
, x
j
R
n
j
, and y
i
R
m
i
. Then y
i
=
N1
j=0
A
i,j
x
j
.
This theorem is intuitively true, and messy to prove carefully, and therefore we will not
give its proof, relying on examples instead.
Remark 1.60 If one partitions matrix A, vector x, and vector y into blocks, and one
makes sure the dimensions match up, then blocked matrix-vector multiplication proceeds
exactly as does a regular matrix-vector multiplication except that individual multipli-
cations of scalars commute while (in general) individual multiplications with matrix and
vector blocks (submatrices and subvectors) do not.
Example 1.61 Consider
A =
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
=
_
_
_
_
_
_
1 2 4 1 0
1 0 1 2 1
2 1 3 1 2
1 2 3 4 3
1 2 0 1 2
_
_
_
_
_
_
, (1.11)
x =
_
_
x
0
1
x
2
_
_
=
_
_
_
_
_
_
1
2
3
4
5
_
_
_
_
_
_
, and y =
_
_
y
0
1
y
2
_
_
, (1.12)
where y
0
, y
2
R
2
. Then
y =
_
_
y
0
1
y
2
_
_
=
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
_
_
x
0
1
x
2
_
_
=
_
_
A
00
x
0
+a
01
1
+A
02
x
2
a
T
10
x
0
+
11
1
+a
T
12
x
2
A
20
x
0
+a
21
1
+A
22
x
2
_
_
=
_
_
_
_
_
_
_
_
_
1 2
1 0
__
1
2
_
+
_
4
1
_
3 +
_
1 0
2 1
__
4
5
_
_
2 1
_
_
1
2
_
+ (3)3 +
_
1 2
_
_
4
5
_
_
1 2
1 2
__
1
2
_
+
_
3
0
_
3 +
_
4 3
1 2
__
4
5
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
3
1
_
+
_
12
3
_
+
_
4
3
_
0 + 9 + 14
_
5
5
_
+
_
9
0
_
+
_
31
14
_
_
_
_
_
_
_
=
_
_
_
_
_
_
19
5
23
45
9
_
_
_
_
_
_
Remark 1.62 The labeling of the submatrices and subvectors in (1.11) and (1.12) was
carefully chosen to convey information: The letters that are used convey information
about the shapes. For example, for a
01
and a
21
the use of a lowercase Roman letter
indicates they are column vectors while the
T
s in a
T
10
and a
T
12
indicate that they are row
vectors. Symbols
11
and
1
indicate these are scalars. We will use these conventions
consistently to enhance readability.
y := Mvmult unb var1(A, x, y)
Partition A
_
A
T
A
B
_
, y
_
y
T
y
B
_
where A
T
is 0 n and y
T
is 0 1
while m(A
T
) < m(A) do
Repartition
_
A
T
A
B
_
_
_
A
0
a
T
1
A
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
where a
1
is a row
1
:= a
T
1
x +
1
Continue with
_
A
T
A
B
_
_
_
A
0
a
T
1
A
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
endwhile
y := Mvmult unb var2(A, x, y)
Partition A
_
A
L
A
R
_
, x
_
x
T
x
B
_
where A
L
is m0 and x
T
is 0 1
while m(x
T
) < m(x) do
Repartition
_
A
L
A
R
_
_
A
0
a
1
A
2
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
where a
1
is a column
y :=
1
a
1
+ y
Continue with
_
A
L
A
R
_
_
A
0
a
1
A
2
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
endwhile
Figure 1.17: The algorithms in Figures 1.14 and 1.16, represented in FLAME notation.
1.5 A High Level Representation of Algorithms
It is our experience that many, if not most, errors in coding are related to indexing mistakes.
We now discuss how matrix algorithms can be described without excessive indexing.
1.5.1 Matrix-vector multiplication
In Figure 1.17, we present algorithms for computing y := Ax+y, already given in Figures 1.14
and 1.16, using a notation that is meant to hide indexing. Let us discuss the algorithm on
the left in detail.
We compare and contrast with the partitioning of matrix A and vector y in Figure 1.14
A =
_
_
_
_
_
a
T
0
a
T
1
.
.
.
a
T
m1
_
_
_
_
_
and y =
_
_
_
_
_
1
.
.
.
m1
_
_
_
_
_
with the partitioning in Figure 1.17(left)
A =
_
_
A
0
a
T
1
A
2
_
_
and y =
_
_
y
0
1
y
2
_
_
.
1.5. A High Level Representation of Algorithms 39
In a nutshell, during the ith iteration (starting with i = 0),
_
_
A
0
a
T
1
A
2
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a
T
0
.
.
.
a
T
i1
_
_
_
a
T
i
_
_
_
a
T
i+1
.
.
.
a
T
n1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
and
_
_
y
0
1
y
2
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
i1
_
_
_
i
_
_
_
i+1
.
.
.
n1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
Thus,
1
:= a
T
1
x +
1
in Figure 1.17 is the same as the computation
i
:= a
T
i
x +
i
in
Figure 1.14.
Thus,
_
A
T
A
B
_
and
_
y
T
y
B
_
represent two sets of rows of A and elements of y, respectively:
those with which the algorithm is nished (A
T
and y
T
) and those with which the
algorithm will compute in the future (A
B
and y
B
). Here subscripts T and B stand for
Top and Bottom, respectively.
Repartition
_
A
T
A
B
_
_
_
A
0
a
T
1
A
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
exposes the top row of A
B
and top element of y
B
so that the algorithm can update
that element with
1
:= a
T
1
x +
1
.
Continue with
_
A
T
A
B
_
_
_
A
0
a
T
1
A
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
adds a
T
1
and
1
to A
T
and y
T
, respectively, since these parts of A and y are now done.
Partition
A
_
A
T
A
B
_
, y
_
y
T
y
B
_
where A
T
is 0 n and y
T
is 0 1
initializes A
T
and y
T
to be empty, since initially the algorithm is nished with no rows
of A and no elements of y.
With this explanation, we believe that both algorithms in Figure 1.17 become intuitive. The
subscripts L and R in the algorithm on the right denote the Left and Bottom part of
matrix A, respectively.
y := Mvmult t unb var1(A, x, y)
Partition A
_
A
L
A
R
_
, y
_
y
T
y
B
_
where A
L
is m0 and y
T
is 0 1
while m(y
T
) < m(y) do
Repartition
_
A
L
A
R
_
_
A
0
a
1
A
2
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
where a
1
is a column
1
:= a
T
1
x +
1
Continue with
_
A
L
A
R
_
_
A
0
a
1
A
2
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
endwhile
y := Mvmult t unb var2(A, x, y)
Partition A
_
A
T
A
B
_
, x
_
x
T
x
B
_
where A
T
is 0 n and x
T
is 0 1
while m(A
T
) < m(A) do
Repartition
_
A
T
A
B
_
_
_
A
0
a
T
1
A
2
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
where a
1
is a row
y :=
1
a
1
+ y
Continue with
_
A
T
A
B
_
_
_
A
0
a
T
1
A
2
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
endwhile
Figure 1.18: Algorithms for computing y := A
T
x +y. Notice that matrix A is not explicitly
transposed. Be sure to compare and contrast with the algorithms in Figure 1.17.
1.5.2 Transpose matrix-vector multiplication
_
_
1 2 0
2 1 1
1 2 3
_
_
and x =
_
_
1
2
3
_
_
. Then
A
T
x =
_
_
1 2 0
2 1 1
1 2 3
_
_
T
_
_
1
2
3
_
_
=
_
_
1 2 1
2 1 2
0 1 3
_
_
_
_
1
2
3
_
_
=
_
_
0
6
7
_
_
.
The thing to notice is that what was a column in A becomes a row in A
T
.
The above example motivates the observation that if
A =
_
A
T
A
B
_
=
_
_
A
0
a
T
1
A
2
_
_
then
A
T
=
_
A
T
T
A
T
B
_
=
_
A
T
0
a
1
A
T
2
_
.
Moreover, if
A =
_
A
L
A
R
_
=
_
A
0
a
1
A
2
_
.
y := Mvmult unb var1b(A, x, y)
Partition A
_
A
TL
A
TR
A
BL
A
BR
_
,
x
_
x
T
x
B
_
, y
_
y
T
y
B
_
where A
TL
is 00, x
T
, y
T
are 01
while m(A
TL
) < m(A) do
Repartition
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
where
11
,
1
, and
1
are scalars
1
:= a
T
10
x
0
+
11
1
+ a
T
12
x
2
+
1
Continue with
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
endwhile
y := Mvmult unb var2b(A, x, y)
Partition A
_
A
TL
A
TR
A
BL
A
BR
_
,
x
_
x
T
x
B
_
, y
_
y
T
y
B
_
where A
TL
is 00, x
T
, y
T
are 01
while m(A
TL
) < m(A) do
Repartition
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
where
11
,
1
, and
1
are scalars
y
0
:=
1
a
01
+ y
0
1
:=
1
11
+
1
y
2
:=
1
a
21
+ y
2
Continue with
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
endwhile
Figure 1.19: The algorithms in Figure 1.17, exposing more submatrices and vectors. Works
for square matrices and prepares for the discussion on multiplying with triangular and sym-
metric matrices.
then
A
T
=
_
A
T
L
A
T
R
_
=
_
_
A
T
0
a
T
1
A
T
2
_
_
.
This motivates the algorithms for computing y = A
T
x +y in Figure 1.18.
1.5.3 Triangular matrix-vector multiplication
Theorem 1.64 Let U be an upper triangular matrix. Partition
U
_
U
TL
U
TR
U
BL
U
BR
_
=
_
_
U
00
u
01
U
02
u
T
10

11
u
T
12
U
20
u
21
U
22
_
_
,
where U
TL
and U
00
are square matrices. Then
U
BL
= 0, u
T
10
= 0 U
20
= 0, and u
21
= 0, where 0 indicates a matrix or vector of the
appropriate dimensions.
U
TL
and U
BR
are upper triangular matrices.
Rather then proving the above theorem, we look at an example, after which we will
declare this intuitively obvious:
_
_
U
00
u
01
U
02
u
T
10

11
u
T
12
U
20
u
21
U
22
_
_
, =
_
_
_
_
_
_
1 2 4 1 0
0 0 1 2 1
0 0 3 1 2
0 0 0 4 3
0 0 0 0 2
_
_
_
_
_
_
We notice that u
T
10
= 0, U
20
= 0, and u
21
= 0.
A similar theorem holds for lower triangular matrices:
Theorem 1.66 Let L be a lower triangular matrix. Partition
L
_
L
TL
L
TR
L
BL
L
BR
_
=
_
_
L
00
l
01
L
02
l
T
10

11
l
T
12
L
20
l
21
L
22
_
_
,
where L
TL
and L
00
L
BL
= 0, l
01
= 0 L
02
= 0, and l
21
= 0, where 0 indicates a matrix or vector of the
appropriate dimensions.
L
TL
and L
BR
are lower triangular matrices.
Let us consider y := Ux, where U is an n n upper triangular matrix. The purpose of
the game is to take advantage of the zeroes below the diagonal.
What we are going to do rst is to take the matrix-vector multiplication algorithms in
Figure 1.17 and change them to algorithms that partition the matrix A similar to how U
will be partitioned. As a consequence, both x and y need to be partitioned to make sure
that dimensions match. The result is given in Figure 1.19.
Comparing the algorithms on the left in Figures 1.17 and 1.19, we notice that the single
update
1
:= a
T
1
x +
1
now becomes
1
:=
_
_
a
10
11
a
12
_
_
T
_
_
x
0
1
x
2
_
_
+
1
= a
T
10
x
0
+
11
1
+a
T
12
x
2
+
1
.
Comparing the algorithms on the right in Figures 1.17 and 1.19, we notice that the
single update y :=
1
a
1
x +x now becomes
_
_
y
0
1
y
2
_
_
:=
1
_
_
a
01
11
a
21
_
_
+
_
_
y
0
1
y
2
_
_
,
or, the separate computations y
0
:=
1
a
01
+y
0
,
1
:=
1
11
+
1
, and y
2
:=
1
a
21
+y
2
.
Now we are ready to change the algorithms in Figure 1.19 into algorithms that exploit
the upper triangular structure of matrix U, as indicated in Figure 1.20.
Exercise 1.67 Modify the algorithms in Figure 1.19 to compute y := Lx, where L is a
lower triangular matrix.
Exercise 1.68 Modify the algorithms in Figure 1.20 so that x is overwritten with x := Ux,
without using the vector y.
Exercise 1.69 Reason why the algorithm you developed for Exercise 1.67 cannot be trivially
changed so that x := Lx without requiring y. What is the solution?
Exercise 1.70 Develop algorithms for computing x := U
T
x and x := L
T
x, where U and L
are respectively upper triangular and lower triangular, without explicitly transposing matri-
ces U and L.
y := Trmv un unb var1(U, x, y)
Partition U
_
U
TL
U
TR
0 U
BR
_
,
x
_
x
T
x
B
_
, y
_
y
T
y
B
_
where U
TL
is 00, x
T
, y
T
are 01
while m(U
TL
) < m(U) do
Repartition
_
U
TL
U
TR
0 U
BR
_
_
_
U
00
u
01
U
02
0
11
u
T
12
0 0 U
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
where
11
,
1
, and
1
are scalars
1
:= u
T
10
x
0
+
11
1
+ u
T
12
x
2
+
1
Continue with
_
U
TL
U
TR
0 U
BR
_
_
_
U
00
u
01
U
02
0
11
u
T
12
0 0 A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
endwhile
y := Trmv un unb var2(U, x, y)
Partition U
_
U
TL
U
TR
0 U
BR
_
,
x
_
x
T
x
B
_
, y
_
y
T
y
B
_
where U
TL
is 00, x
T
, y
T
are 01
while m(U
TL
) < m(U) do
Repartition
_
U
TL
U
TR
0 U
BR
_
_
_
U
00
u
01
U
02
0
11
u
T
12
0 0 U
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
where
11
,
1
, and
1
are scalars
y
0
:=
1
u
01
+ y
0
1
:=
1
11
+
1
y
2
:=
1
u
21
+ y
2
Continue with
_
U
TL
U
TR
0 U
BR
_
_
_
U
00
u
01
U
02
0
11
u
T
12
0 0 A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
endwhile
Figure 1.20: The algorithms in Figure 1.19, modied to take advantage of the upper trian-
gular structure of U. They compute y := Ux if y = 0 upon entry.
Cost Let us analyze the algorithm in Figure 1.20(left). The cost is in the update
1
:=
11
1
+u
T
12
x
2
+
1
which is typically computed in two steps:
1
:=
11
1
+
1
followed by a
dot product
1
:= u
T
12
x
2
+
1
. Now, during the rst iteration, u
12
and x
2
are of length 0, so
that that iteration requires 2 ops for the rst step. During the ith iteration (staring with
i = 0), u
12
and x
2
are of length i so that the cost of that iteration is 2 ops for the rst step
and 2i ops for the second. Thus, if A is an n n matrix, then the total cost is given by
n1
i=0
[2 + 2i] = 2n + 2
n1
i=0
i = 2n + 2
n(n 1)
2
= 2n +n
2
n = n
2
+n n
2
ops.
Exercise 1.71 Compute the cost, in ops, of the algorithm in Figure 1.20(right).
y := Symv u unb var1(A, x, y)
Partition A
_
A
TL
A
TR
A
BL
A
BR
_
,
x
_
x
T
x
B
_
, y
_
y
T
y
B
_
where A
TL
is 00, x
T
, y
T
are 01
while m(A
TL
) < m(A) do
Repartition
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
where
11
,
1
, and
1
are scalars
1
:=
a
T
01
..
a
T
10
x
0
+
11
1
+ a
T
12
x
2
Continue with
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
endwhile
y := Symv u unb var2(A, x, y)
Partition A
_
A
TL
A
TR
A
BL
A
BR
_
,
x
_
x
T
x
B
_
, y
_
y
T
y
B
_
where A
TL
is 00, x
T
, y
T
are 01
while m(A
TL
) < m(A) do
Repartition
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
where
11
,
1
, and
1
are scalars
y
0
:=
1
a
01
+ y
0
1
:=
1
11
+
1
y
2
:=
1
a
21
..
a
12
+ y
2
Continue with
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
endwhile
Figure 1.21: The algorithms in Figure 1.19, modied to compute y := Ax + y where A is
symmetric and only the upper triangular part of A is stored.
1.5.4 Symmetric matrix-vector multiplication
Theorem 1.72 Let A be a symmetric matrix. Partition
A
_
A
TL
A
TR
A
BL
A
BR
_
=
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
where A
TA
and L
00
A
BL
= A
T
TR
, a
10
= a
01
, A
20
= A
T
02
, and a
21
= a
12
.
A
TL
and A
BR
are symmetric matrices.
Again, we dont prove this, giving an example instead:
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
, =
_
_
_
_
_
_
1 2 4 1 0
2 0 1 2 1
4 1 3 1 2
1 2 1 4 3
0 1 2 3 2
_
_
_
_
_
_
.
We notice that a
T
10
= a
01
, A
T
20
= A
02
, and a
T
12
= A
21
.
Since the upper and lower triangular part of a symmetric matrix are simply the transpose
of each other, it is only necessary to store half the matrix: only the upper triangular part or
only the lower triangular part. We now discuss how to modify the algorithms in Figure 1.19
to compute y := Ax + y when A is symmetric and only stored in the upper triangular part
of the matrix.
The change is simple: a
10
and a
21
are not stored and thus
For the algorithm in Figure 1.19(left), the update
1
:= a
T
10
x
0
+
11
1
+ a
T
12
x
2
+
1
must be changed to
1
:= a
T
01
x
0
+
11
1
+a
T
12
x
2
+
1
, as noted in Figure 1.21(left).
For the algorithm in Figure 1.19(right), the update y
2
:=
1
a
21
+ y
2
must be changed
to y
2
:=
1
a
12
+y
2
as noted in Figure 1.21(right).
Exercise 1.74 Modify the algorithms in Figure 1.19 to compute y := Ax + y, where A is
symmetric and stored in the lower triangular part of matrix.
1.6 Representing Algorithms in Code
We believe that the notation used in Figures 1.171.21 has a number of advantages
It is very visual.
It facilitates comparing and contrasting dierent algorithms for dierent related oper-
ations.
The opportunity for introducing indexing errors is greatly reduced.
The problem now is that errors can still be introduced as the algorithms are translated to M-
script or some other language. For this, we created a few routines that provide Application
1.6. Representing Algorithms in Code 47
1 function [ y_out ] = Symv_u_unb_var1( A, x, y )
2 [ ATL, ATR, ...
3 ABL, ABR ] = FLA_Part_2x2( A, ...
4 0, 0, FLA_TL );
5 [ xT, ...
6 xB ] = FLA_Part_2x1( x, ...
7 0, FLA_TOP );
8 [ yT, ...
9 yB ] = FLA_Part_2x1( y, ...
10 0, FLA_TOP );
11 while ( size( ATL, 1 ) < size( A, 1 ) )
12 [ A00, a01, A02, ...
13 a10t, alpha11, a12t, ...
14 A20, a21, A22 ] = FLA_Repart_2x2_to_3x3( ATL, ATR, ...
15 ABL, ABR, ...
16 1, 1, FLA_BR );
17 [ x0, ...
18 chi1, ...
19 x2 ] = FLA_Repart_2x1_to_3x1( xT, ...
20 xB, 1, FLA_BOTTOM );
21 [ y0, ...
22 psi1, ...
23 y2 ] = FLA_Repart_2x1_to_3x1( yT, ...
24 yB, 1, FLA_BOTTOM );
25 %------------------------------------------------------------%
26 psi1 = FLA_Dot( a01, x0 ) + psi1;
27 psi1 = alpha11 * chi1 + psi1;
28 psi1 = FLA_Dot( a12, x2 ) + psi1;
29 %------------------------------------------------------------%
30 [ ATL, ATR, ...
31 ABL, ABR ] = FLA_Cont_with_3x3_to_2x2( A00, a01, A02, ...
32 a10t, alpha11, a12t, ...
33 A20, a21, A22, FLA_TL );
34 [ xT, ...
35 xB ] = FLA_Cont_with_3x1_to_2x1( x0, ...
36 chi1, ...
37 x2, FLA_TOP );
38 [ yT, ...
39 yB ] = FLA_Cont_with_3x1_to_2x1( y0, ...
40 psi1, ...
41 y2, FLA_TOP );
42 end
43 y_out = [ yT
44 yB ];
45 return
Figure 1.22: FLAME@lab code for the algorithm in Figure 1.21(left).
Programming Interface (API) so that code can closely mirror the algorithms as presented in
FLAME notation. The API for M-script is called FLAME@lab.
Consider the code in Figure 1.22, which computes y := Ax + y, where A is symmetric
and stored only in the upper triangular part of array A. It implements the algorithm in
Figure 1.21(left). The code uses a set of routines we dened that mirror the partition-
repartition-continue with constructs in the algorithm.
We now explain the code:
lines 2-4: These lines encode A
_
A
TL
A
TR
A
BL
A
BR
_
, where A
TL
is 0 0, in code. Matrix
A is partitioned into four quadrants. Here 0, 0, FLA TL indicates that the top-left
quadrant, ATL is 0 0, which then also prescribes the sizes of the other quadrants.
lines 5-10: These lines encode x
_
x
T
x
B
_
and y
_
y
T
y
B
_
, where x
T
and y
T
are 0 1,
in code. Vectors x and y are partitioned into top and bottom subvectors. Here 0,
FLA TOP indicates that the top subvectors, xT and yT are 0 1, which then also
prescribes the sizes of xB and yB.
lines 11, 42: These lines encode while m(A
TL
) < m(A) do and endwhile. The M-script
call size( A, 1 ) returns the row dimension of matrix A.
lines 12-16: These lines encode
_
A
TL
A
TR
A
BL
A
BR
_

_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
. Here 1, 1,
FLA BR indicates that a 1 1 submatrix (vector) alpha11 is partitioned from the
bottom-right quadrant (ABR). This then prescribes the sizes of all the other submatri-
ces.
lines 17-24: These lines similarly encode the repartitioning of x and y.
lines 26-28: These lines encode the update
1
:= a
T
01
x
0
+
11
1
+a
T
12
x
2
+
1
.
lines 30-41: These lines encode the Continue with in Figure 1.21(left). The FLA TL in
line 33 indicates that alpha11 is to be added to ATL while FLA TOP in lines 37 and
41 indicate that chi1 and psi1 are to be added to xT and yT, respectively.
lines 43-44: The output vector y out is set to be the updated vector
_
y
T
y
B
_
.
1.7 Outer Product and Rank-1 Update
In this section, we discuss the outer product and an operation that will become very impor-
tant later on, but for now is just an operation: the rank-1 update.
1.7. Outer Product and Rank-1 Update 49
Denition 1.75 (Outer product) Let y R
m
and x R
n
. Then the outer product of
vectors y and x is given by
yx
T
=
_
_
_
_
_
1
.
.
.
m1
_
_
_
_
_
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
T
=
_
_
_
_
_
1
.
.
.
m1
_
_
_
_
_
_

0

1

n1
_
=
_
_
_
_
_
0

0
1

0
n1
0

1
1

1
n1
.
.
.
.
.
.
.
.
.
.
.
.
m1
0

m1
1

m1
n1
_
_
_
_
_
We note that yx
T
can be written in a number of equivalent ways:
yx
T
=
_
_
_
_
_
0

0
1

0
n1
0

1
1

1
n1
.
.
.
.
.
.
.
.
.
.
.
.
m1
0

m1
1

m1
n1
_
_
_
_
_
=
_
_
_
_
_

0
_
_
_
_
_
1
.
.
.
m1
_
_
_
_
_

1
_
_
_
_
_
1
.
.
.
m1
_
_
_
_
_

n1
_
_
_
_
_
1
.
.
.
m1
_
_
_
_
_
_
_
_
_
_
=
_

0
y
1
y
n1
y
_
and
yx
T
=
_
_
_
_
_
0

0
1

0
n1
0

1
1

1
n1
.
.
.
.
.
.
.
.
.
.
.
.
m1
0

m1
1

m1
n1
_
_
_
_
_
=
_
_
_
_
_
0
_

0

1

n1
_
1
_

0

1

n1
_
.
.
.
m1
_

0

1

n1
_
_
_
_
_
_
=
_
_
_
_
_
0
x
T
1
x
T
.
.
.
m1
x
T
_
_
_
_
_
Denition 1.76 (Rank-1 update) Let A R
mn
, y R
m
, x R
n
, and R. Then
A + yx
T
=
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
+
_
_
_
_
_
1
.
.
.
m1
_
_
_
_
_
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
T
=
_
_
_
_
_
00

01
. . .
0,n1
10

11
. . .
1,n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0

m1,1
. . .
m1,n1
_
_
_
_
_
+
_
_
_
_
_
1
.
.
.
m1
_
_
_
_
_
_

0

1

n1
_
=
_
_
_
_
_
00

01
. . .
0,n1
10

11
. . .
1,n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0

m1,1
. . .
m1,n1
_
_
_
_
_
+
_
_
_
_
_
0

0
1

0
n1
0

1
1

1
n1
.
.
.
.
.
.
.
.
.
.
.
.
m1
0

m1
1

m1
n1
_
_
_
_
_
=
_
_
_
_
_
0,0
+
0
0

0,1
+
0
1

0,n1
+
0
n1
1,0
+
1
0

1,1
+
1
1

1,n1
+
1
n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0
+
m1
0

m1,1
+
m1
1

m1,n1
+
m1
n1
_
_
_
_
_
This operation is called a rank-1 update.
1.8 A growing library
We now describe some additions to the SLAP library you are developing.
1.8.1 General matrix-vector multiplication (gemv)
In our library, we add three routines that together implement commonly encountered cases
of matrix-vector multiplication.
SLAP Gemv n( alpha, A, x, y ) and SLAP Gemv t( alpha, A, x, y )
These routines respectively implement z := Ax + y and z := A
T
x + y, where
A R
mn
.
Assume that matrix A is stored by columns. Then, pick the algorithm that favors
access by column, as discussed in class.
Implement the algorithm with calls to the SLAP Dot and/or SLAP Axpy routines
you wrote.
You will want to be careful to add the multiplication by at just the right place
so as not to incur unnecessary computations or copies of vectors and/r matrices.
For example, creating a temporary matrix B = A is not a good idea. (Why?)
1.8. A growing library 51
SLAP Gemv( trans, alpha, A, x, beta, y )
Depending on whether trans equals the constant SLAP NO TRANSPOSE or SLAP TRANSPOSE,
this routine computes z := Ax +y or z := A
T
x +y, respectively.
Implement this routine as a wrapper routine to the two routines given above:
have it check the parameter trans, perform some minimal computation, and
then calling one of the other two routines dened above.
In this wrapper routine, you must accommodate the possibility that x and/or
y are row vectors. Vector z and y are either both row vectors or both column
vectors.
Exercise 1.77 Follow the directions on the wiki to implement the above routines for com-
puting the general matrix-vector multiply.
1.8.2 Triangular matrix-vector multiplication (trmv)
We add ve routines that together implement all commonly encountered cases of triangular
matrix-vector multiplication.
SLAP Trmv un( diag, U, x ), SLAP Trmv ut( diag, U, x ), SLAP Trmv ln( diag, L, x ),SLAP Trmv lt( diag, L, x )
These routines respectively implement z := Ux, z := U
T
x, z := Lx, and z := L
T
x.
Some notes:
As always, assume that matrices U and L are stored by columns and pick the
algorithm that favors access by column, as discussed in class.
you wrote.
The parameter diag allows one to pass in one of three values: SLAP NON UNIT if the
values stored on the diagonal are to be used, SLAP UNIT if the values stored on the
diagonal are to be ignored, and the computation assumes they are (implicitly) all
equal to one, or SLAP ZERO if the values stored on the diagonal are to be ignored,
and the computation assumes they are (implicitly) all equal to zero (meaning that
the matrix is strictly upper or lower triangular).
You may assume that x and z are column vectors.
SLAP Trmv( uplo, trans, diag, alpha, A, x )
Depending on whether trans equals the constant SLAP NO TRANSPOSE or SLAP TRANSPOSE,
this routine computes either z := Ax or z := A
T
x.
If uplo equals SLAP UPPER TRIANGULAR then A upper triangular (stored in the
upper triangular part of array A while if uplo equals SLAP LOWER TRIANGULAR it
is lower triangular.
The parameter diag is passed on to one of the above four routines.
Vector x and z are either both row vectors or both column vectors.
puting the triangular matrix-vector multiply.
1.8.3 Symmetric matrix-vector multiplication (symv)
We add three routines that together implement all commonly encountered cases of symmetric
SLAP Symv u( alpha, A, x, y ), SLAP Symv l( alpha, A, x, y )
These routines implement z := Ax +y.
Again, assume that matrix A is stored by columns and pick the algorithm that
favors access by column.
you wrote.
The u and l in the name of the routine indicate whether the matrix is stored
only in the upper or lower triangular part of array A, respectively.
You may assume that x, y, and z are all column vectors.
SLAP Symv( uplo, alpha, A, x, beta, y )
This routine is the wrapper to the above routines. It computes z := Ax +y, where
A is symmetric.
Matrix A is stored only in the upper or the lower triangular part of array A, de-
pending on whether uplo equals SLAP UPPER TRIANGULAR or SLAP LOWER TRIANGULAR,
respectively.
The parameter diag is passed on to one of the above four routines.
Vector x can be a row or a column vector. Vector y and z are either both row
vectors or both column vectors.
puting the symmetric matrix-vector multiply.
1.8. A growing library 53
Exercise 1.80 When you try to pick from the algorithms in Figure 1.21, you will notice
that neither algorithm has the property that almost all computations involve data that is
stored consecutively in memory. There are actually two more algorithms for computing this
operation.
The observation is that y := Ax + y, where A is symmetric and stored in the upper
trianglar part of the array, can be computed via y := Ux + y followed by y :=

U
T
x + y,
where U and

U are the upper triangular and strictly upper triangular parts of A. Now, for
the rst there are two algorithms given in the text, and the other one is an exercise. This
gives us 2 2 combinations. By merging the two loops that you get (one for each of the two
operations) you can nd all four algorithms for the symmetric matrix-vector multiplication.
From these, you can pick the algorithm that strides through memory in the most favorable
way.
1.8.4 Rank-1 update (ger)
The operation that adds an outer product to a matrix is known as a rank-1 update, A :=
yx
T
+ A, for reasons that will become clear later. Your library will include the following
routine:
SLAP Ger( alpha, y, x, A )
This routine makes the following assumptions:
Vectors y and x can be row or column vectors.
The algorithm you implement should be a small modication of the algorithm in
Figure 1.17(right).
Implement the algorithm with calls to SLAP Axpy.
If y is a row vector, you may want to copy it into a column vector to improve how
the algorithm strides through memory.
1.8.5 Symmetric Rank-1 update (syr)
We will get back to this operation, later.
1.9 Answers
Exercise 1.3 Work out the probabilities that it will be cloudy/rainy the day after tomorrow.
Exercise 1.5 Follow the instructions for this problem given on the class wiki.
For the example described in this section,
1. Recreate the above table by programming it up with Matlab or Octave, starting with
the assumption that today is cloudy.
2. Create two similar tables starting with the assumption that today is sunny andrainy,
respectively
3. Compare how x
(7)
diers depending on todays weather.
4. What do you notice if you compute x
(k)
starting with today being sunny/cloudy/rainy
and you let k get large?
5. What does x
()
represent?
Exercise 1.6 Given Table 1.1, create the following table, which predicts the weather the
day after tomorrow given the weather today:
Today
sunny cloudy rainy
Day after
Tomorrow
sunny
cloudy
rainy
This then tells us the entries in Q in (1.7).
n
. Show that vector addition commutes: x +y = y +x.
n
. Show that x
T
y = y
T
x.
1.9. Answers 55
Exercise 1.11 Start building your library by implement the functions in Figure 1.6. (See
directions on the class wiki page.)
Exercise 1.18 Use mathematical induction to prove that
n1
i=0
i
2
= (n 1)n(2n 1)/6.
Answer:
Proof by induction:
Base case: n = 1. For this case, we must show that
11
i=0
i
2
= (0)2(2 1)/6.
11
i=0
i
2
= (Denition of summation)
0
= (arithmetic)
(0)1(2 1)/6
This proves the base case.
k 1:
k1
i=0
i
2
= (k 1)k(2k 1)/6.
(k+1)1
i=0
i
2
= ((k + 1) 1)k(2(k + 1) 1)/6.
(k+1)1
i=0
i
2
= (arithmetic)
k
i=0
i
2
= (split o last term)
k1
i=0
i
2
+k
2
(k 1)k(2k 1)/6 +k
2
.
= (algebra)
((2k
3
3k
2
+k) + 6k
2
)/6.
= (algebra)
(2k
3
+ 3k
2
+k)/6.
= (algebra)
k(k + 1)(2k + 1)/6.
= (arithmetic)
((k + 1) 1)(k + 1)(2(k + 1) 1)/6.
This proves the inductive step.
Exercise 1.24 For each of the following, determine whether it is a linear transformation or
not:
F
_
_
_
_
2
_
_
_
_
=
_
_
0
0
2
_
_
.
F
__

0
1
__
=
_

2
0
0
_
.
Answer
F
_
_
_
_
2
_
_
_
_
=
_
_
0
0
2
_
_
.
First check if F(0) = 0. F
_
_
_
_
0
0
0
_
_
_
_
=
_
_
0
0
0
_
_
. So it COULD be a linear
transformation.
1.9. Answers 57
Next, check if F(x) = F(x). Let x =
_
_
2
_
_
be an arbitrary vector and be
an arbitrary scalar. Then
F
_
_
_
_
2
_
_
_
_
= F
_
_
_
_
2
_
_
_
_
=
_
_
0
0
2
_
_
= F
_
_
_
_
0
0
2
_
_
_
_
.
Next, check if F(x +y) = F(x) +F(y). Let x =
_
_
2
_
_
and y =
_
_
2
_
_
be an
arbitrary vectors. Then
F
_
_
_
_
2
_
_
+
_
_
2
_
_
_
_
= F
_
_
_
_
0
+
0
1
+
1
2
+
2
_
_
_
_
=
_
_
0
+
0
0
2
+
2
_
_
=
_
_
0
0
2
_
_
+
_
_
0
0
2
_
_
= F
_
_
_
_
0
0
2
_
_
_
_
+F
_
_
_
_
0
0
2
_
_
_
_
.
Thus it is a linear transformation.
F
__

0
1
__
=
_

2
0
0
_
.
First check if F(0) = 0. F
__
0
0
__
=
_
0
0
_
. So it COULD be a linear
transformation.
Now, looking at it, I suspect this is not a linear transformation because of the
2
0
. So, I will try to construct an example where F(x) ,= F(x) or F(x + y) ,=
F(x) +F(y). Let x =
_
1
0
_
and = 2. Then
F(x) = F
_
2
_
1
0
__
= F
__
2
0
__
=
_
4
0
_
.
Also
2F(x) = 2F
__
1
0
__
= 2
_
1
0
_
=
_
2
0
_
.
Thus, for this choice of and x we nd that F(x) ,= F(x).
Thus it is not a linear transformation.
Exercise 1.26 Let x, e
i
R
n
. Show that e
T
i
x = x
T
e
i
=
i
(the ith element of x).
Answer
Recall that e
i
=
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
0
1
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
where the 1 occurs in the ith entry and x =
_
_
_
_
_
_
_
_
_
_
_
0
.
.
.
i1
i+1
.
.
.
n1
_
_
_
_
_
_
_
_
_
_
_
. Then
e
T
i
x = 0
0
+ + 0
i1
+ 1
i
+ 0
i+1
+ + 0
n1
=
i
.
it.
Answer
Let F
__

__
=
_
+
+ 1
_
. If F were a linear transformation, then there would be
a corresponding matrix, A =
_

_
, such that
F
__

__
=
_

__

_
.
This matrix would be computed by computing its columns
_

_
= F
__
1
0
__
=
_
1 + 0
1 + 1
_
=
_
1
2
_
and
_

_
= F
__
0
1
__
=
_
0 + 1
0 + 1
_
=
_
1
1
_
so that
_

_
=
_
1 1
2 1
_
. But
_
1 1
2 1
__

_
=
_
+
2 +
_
,=
_
+
+ 1
_
.
Thus F cannot be a linear transformation. (There is no matrix that has the same action
as F.)
1.9. Answers 59
it.
Answer
Let F
__

__
=
_

_
. If F were a linear transformation, then there would be a
corresponding matrix, A =
_

_
, such that
F
__

__
=
_

__

_
.
_

_
= F
__
1
0
__
=
_
0
1
_
and
_

_
= F
__
0
1
__
=
_
0
0
_
so that
_

_
=
_
0 0
1 0
_
. But
_
0 0
1 0
__

_
=
_
0
_
,=
_

_
.
Thus F cannot be a linear transformation. (There is no matrix that has the same action
as F.)
Exercise 1.36 For each of the transformations in Exercise 1.24 compute a possible matrix
that represents it and use it to show whether the transformation is linear.
Answer
Let F
_
_
_
_
2
_
_
_
_
=
_
_
0
0
2
_
_
_
_
00

01

02
10

11

12
20

21

22
_
_
, such that
F
_
_
_
_
2
_
_
_
_
=
_
_
00

01

02
10

11

12
20

21

22
_
_
_
_
2
_
_
.
_
_
00
10
20
_
_
= F
_
_
_
_
1
0
0
_
_
_
_
=
_
_
1
0
0
_
_
,
_
_
01
11
21
_
_
= F
_
_
_
_
0
1
0
_
_
_
_
=
_
_
0
0
0
_
_
,
and
_
_
02
12
22
_
_
= F
_
_
_
_
0
0
1
_
_
_
_
=
_
_
0
0
1
_
_
,
so that
_
_
00

01

02
10

11

12
20

21

22
_
_
=
_
_
1 0 0
0 0 0
0 0 1
_
_
. Checking now
_
_
1 0 0
0 0 0
0 0 1
_
_
_
_
2
_
_
=
_
_
0
0
2
_
_
= F
_
_
_
_
2
_
_
_
_
.
Thus there is a matrix that corresponds to F which is therefore a linear transformation.
Let F
__

0
1
__
=
_

2
0
0
_
_

00

01
10

11
_
, such that
F
__

0
1
__
=
_

00

01
10

11
__

0
1
_
.
_

00
10
_
= F
__
1
0
__
=
_
1
0
_
and
_

01
11
_
= F
__
0
1
__
=
_
0
0
_
so that
_

00

01
10

11
_
=
_
1 0
0 0
_
. Checking now
_
1 0
0 0
__

0
1
_
=
_

0
0
_
,= F
__

0
1
__
.
Thus it is not a linear transformation.
1.9. Answers 61
Exercise 1.39 Let D =
_
_
2 0 0
0 3 0
0 0 1
_
_
. What linear transformation, L, does this matrix
represent? In particular, answer the following questions:
L : R
?
R
??
. Give ? and ??.
L(e
0
) =
L(e
1
) =
L(e
2
) =
L
_
_
_
_
2
_
_
_
_
=
_
_
_
_
Answer
L : R
3
R
3
.
L(e
0
) =
_
_
2
0
0
_
_
L(e
1
) =
_
_
0
3
0
_
_
L(e
2
) =
_
_
0
0
1
_
_
L
_
_
_
_
2
_
_
_
_
= L(
0
e
0
+
1
e
1
+
2
e
2
) =
0
L(e
0
) +
1
L(e
1
) +
2
L(e
2
)
=
0
_
_
2
0
0
_
_
+
1
_
_
0
3
0
_
_
+
2
_
_
0
0
1
_
_
=
_
_
2
0
3
1
2
_
_
.
Thus, the elements of the input vector are scaled by the corresponding elements of the
diagonal of the matrix.
Exercise 1.42 Give examples for each of the triangular matrices in Denition 1.41.
Answer
lower triangular if
i,j
= 0 for all i < j
_
1 0
2 3
_
strictly lower triangular if
i,j
= 0 for all i j
_
0 0
2 0
_
unit lower triangular if
i,j
= 0 for all i < j and
i,j
= 1 if i = j
_
1 0
2 1
_
upper triangular if
i,j
= 0 for all i > j
_
1 2
0 3
_
strictly upper triangular if
i,j
= 0 for all i j
_
0 2
0 0
_
unit upper triangular if
i,j
= 0 for all i > j and
i,j
= 1 if i = j
_
1 2
0 1
_
Exercise 1.43 Show that a matrix that is both lower and upper triangular is in fact a
diagonal matrix.
Answer
If a matrix is both upper and lower triangular then
ij
= 0 if i ,= j. Thus the matrix is
diagonal.
Exercise 1.44 Add the functions trilu( A ) and triuu( A ) to your SLAP library. These
functions return the lower and upper triangular part of A, respectively, with the diagonal set
to ones. Thus,
> A = [
1 -2 1
-1 2 0
2 3 3
];
> trilu( A )
ans =
1 0 0
-1 1 0
2 3 1
Hint: use the tril() and eye() functions. You will also want to use the size() function
to extract the dimensions of A, to pass in to eye().
1.9. Answers 63
Answer
See the base directory of the SLAP library.
Exercise 1.49 Show that a triangular matrix that is also symmetric is in fact a diagonal
matrix.
Answer Let us focus on a lower triangular matrix. Then
ij
= 0 if i > j. But if the matrix
is also symmetric then
ij
= 0 if i < j. Thus
ij
= 0 if i ,= j.
Exercise 1.56 Prove Theorem 2.2.
Answer
We need to show that
L
C
(x) = L
C
(x):
L
C
(x) = L
A
(x) +L
B
(x) = L
A
(x) +L
B
(x) = L
C
(x).
L
C
(x +y) = L
C
(x) +L
C
(y):
L
C
(x +y) = L
A
(x +y) +L
B
(x +y) = L
A
(x) +L
A
(y) +L
B
(x) +L
B
(y)
= L
A
(x) +L
B
(x) +L
A
(y) +L
B
(y) = L
C
(x) +L
C
(y).
Exercise 1.58 Note: I have changed this question from before. Give a motivation
for matrix addition by considering a linear transformation L
C
(x) = L
A
(x) +L
B
(x)
Answer
Given A, B, C R
mn
let L
A
, L
B
and L
C
be the corresponding linear transformations.
Dene L
C
(x) = L
A
(x) + L
B
(x). Let a
j
, b
j
,and c
j
be the jth columns of A, B, and C,
respectively. Then
c
j
= L
C
(e
j
) = L
A
(e
J
) +L
B
(e
j
) = a
j
+b
j
.
Thus the elements of the each of the columns of C equal the addition of the corresponding
elements of columns of A and B.
Exercise 1.67 Modify the algorithms in Figure 1.19 to compute y := Lx, where L is a
lower triangular matrix.
Answer
The answer is in Figure 1.23. It computes y := Lx + y, since the given operation can
then be obtained by starting with y = 0 (the vector of zeroes).
y := Ltrmvmult unb var1b(A, x, y)
Partition A
_
A
TL
A
TR
A
BL
A
BR
_
,
x
_
x
T
x
B
_
, y
_
y
T
y
B
_
where A
TL
is 00, x
T
, y
T
are 01
while m(A
TL
) < m(A) do
Repartition
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
where
11
,
1
, and
1
are scalars
1
:= a
T
10
x
0
+
11
1
+a
T
12
x
2
. .
0
+
1
Continue with
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
endwhile
y := Ltrmvmult unb var2b(A, x, y)
Partition A
_
A
TL
A
TR
A
BL
A
BR
_
,
x
_
x
T
x
B
_
, y
_
y
T
y
B
_
where A
TL
is 00, x
T
, y
T
are 01
while m(A
TL
) < m(A) do
Repartition
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
where
11
,
1
, and
1
are scalars
y
0
:=
0
..
1
a
01
+ y
0
1
:=
1
11
+
1
y
2
:=
1
a
21
+ y
2
Continue with
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
,
_
y
T
y
B
_
_
_
y
0
1
y
2
_
_
endwhile
Figure 1.23: Answer to Exercise 1.67. Compare and contrast to Figure 1.19. Executing
either of these algorithms with A = L will compute y := Lx +y where L is lower triangular.
Notice that the algorithm would become even clearer if A, a, were replaced by L, l, .
Exercise 1.68 Modify the algorithms in Figure 1.20 so that x is overwritten with x := Ux,
without using the vector y.
Answer
The answer is in Figure 1.24. It computes x := Ux.
Some explaination: Let us rst consider y := Ux + y. If we partition the matrix and
vectors we get
_
_
y
0
1
y
2
_
_
:=
_
_
U
00
u
01
U
02
0
11
u
T
12
0 0 U
22
_
_
_
_
x
0
1
x
2
_
_
+
_
_
y
0
1
y
2
_
_
=
_
_
U
00
x
0
+u
01
1
+U
02
x
2
+y
0
11
1
+u
T
12
x
2
+
1
U
02
x
2
+y
2
_
_
.
1.9. Answers 65
Notice that the algorithm in Figure 1.20(left) has the property that when the current iteration
starts, everything in red currently exists in y:
_
_
y
0
1
y
2
_
_
:==
_
_
U
00
x
0
+u
01
1
+U
02
x
2
+y
0
11
1
+u
T
12
x
2
+
1
U
02
x
2
+y
2
_
_
.
The update
1
:=
11
1
+u
T
12
x
2
+
1
then makes it so that one more element of y has been updated.
Now, turn to the computation x := Ux:
_
_
x
0
1
x
2
_
_
:==
_
_
U
00
x
0
+u
01
1
+U
02
x
2
11
1
+u
T
12
x
2
U
02
x
2
_
_
.
Notice that to update one more element of x we much compute
1
:=
11
1
+u
T
12
x
2
which justies the algorithm in Figure 1.24(left).
Now, lets turn to the algorithm in Figure 1.20(right). That algorithm has the property
that when the current iteration starts, everything in red currently exists in y:
_
_
y
0
1
y
2
_
_
:==
_
_
U
00
x
0
+u
01
1
+U
02
x
2
+y
0
11
1
+u
T
12
x
2
+
1
U
02
x
2
+y
2
_
_
.
The updates
y
0
:=
1
u
01
+y
0
1
:=
11
1
+
1
then make it so that the vector y contains everything in blue:
_
_
y
0
1
y
2
_
_
:==
_
_
U
00
x
0
+u
01
1
+U
02
x
2
+y
0
11
1
+u
T
12
x
2
+
1
U
02
x
2
+y
2
_
_
.
Now, turn to the computation x := Ux:
_
_
x
0
1
x
2
_
_
:==
_
_
U
00
x
0
+u
01
1
+U
02
x
2
11
1
+u
T
12
x
2
U
02
x
2
_
_
.
Notice that the computations
x
0
:=
1
u
01
+x
0
1
:=
11
1
x := Trmv un unb var1b(U, x)
Partition U
_
U
TL
U
TR
0 U
BR
_
,
x
_
x
T
x
B
_
where U
TL
is 0 0, x
T
is 0 1
while m(U
TL
) < m(U) do
Repartition
_
U
TL
U
TR
0 U
BR
_
_
_
U
00
u
01
U
02
0
11
u
T
12
0 0 U
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
where
11
,
1
are scalars
1
:= u
T
10
x
0
+
11
1
+ u
T
12
x
2
Continue with
_
U
TL
U
TR
0 U
BR
_
_
_
U
00
u
01
U
02
0
11
u
T
12
0 0 A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
endwhile
x := Trmv un unb var2(U, x)
Partition U
_
U
TL
U
TR
0 U
BR
_
,
x
_
x
T
x
B
_
where U
TL
is 0 0, x
T
is 0 1
while m(U
TL
) < m(U) do
Repartition
_
U
TL
U
TR
0 U
BR
_
_
_
U
00
u
01
U
02
0
11
u
T
12
0 0 U
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
where
11
,
1
are scalars
x
0
:=
1
u
01
+ x
0
1
:=
1
11
Continue with
_
U
TL
U
TR
0 U
BR
_
_
_
U
00
u
01
U
02
0
11
u
T
12
0 0 A
22
_
_
,
_
x
T
x
B
_
_
_
x
0
1
x
2
_
_
endwhile
Figure 1.24: Answer to Exercise 1.67.
make is so that everthing in blue is in x:
_
_
x
0
1
x
2
_
_
:==
_
_
U
00
x
0
+u
01
1
+U
02
x
2
11
1
+u
T
12
x
2
U
02
x
2
_
_
.
which justies the algorithm in Figure 1.24(right).
Exercise 1.69 Reason why the algorithm you developed for Exercise 1.67 cannot be trivially
changed so that x := Lx without requiring y. What is the solution?
Answer
Let us focus on the algorithm on the left in Figure 1.23. Consider the update
1
:= a
T
10
x
0
+
11
1
+a
T
12
x
2
. .
0
+
1
1.9. Answers 67
and reason what would happen if we blindly changed this to
1
:= a
T
10
x
0
+
11
1
.
Then x
0
on the right of the := refers to the original contents of subvector x
0
, but those have
been overwritten by previous iterations. The solution is to make this change, but to then
move through the matrix backwards.
(This is not an issue for the case where we are working with an upper triangular matrix,
because there the computation accidently moves in just the right direction.)
You may want to clarify this to yourself by working through a small (3 3) example.
Exercise 1.70 Develop algorithms for computing x := U
T
x and x := L
T
x, where U and L
are respectively upper triangular and lower triangular, without explicitly transposing matri-
ces U and L.
Answer
Let us focus on computing x := U
T
x. Compare
_
_
U
00
u
01
U
02
0
11
u
T
12
0 0 U
22
_
_
T
=
_
_
U
T
00
0 0
u
T
01

11
0
U
T
02
u
12
U
T
22
_
_
.
to
_
_
L
00
0 0
l
T
10

11
0
L
02
l
12
L
22
_
_
.
Notice that computing x := U
T
x is the same as computing x := Lx except that you use U
T
00
for L
00
, u
T
01
for l
T
10
, etc. Thus, you need to make the obvious changes to the algorithms you
developed for x := Lx.
Ditto for x := L
T
x except that you modify the algorithms you developed for x := Ux.
Exercise 1.71 Compute the cost, in ops, of the algorithm in Figure 1.20(right).
Answer
Let us analyze the algorithm in Figure 1.20(right). The cost is in the updates
y
0
:=
1
u
01
+y
0
and
1
:=
1
11
+
1
.
Now, during the rst iteration, u
01
and y
0
are of length 0, so that that iteration requires 2
ops for the second step only. During the ith iteration (staring with i = 0), u
01
and y
0
are
of length i so that the cost of that iteration is 2i ops for the rst step (an axpy operation)
and 2 ops for the second. Thus, if U is an n n matrix, then the total cost is given by
n1
i=0
[2 + 2i] = 2n + 2
n1
i=0
i = 2n + 2
n(n 1)
2
= 2n +n
2
n = n
2
+n n
2
ops.
Thus the cost of the two dierent algorithms is the same! This is not surprising:
to get from one algorithm to the other, all you are doing is reordering the same
operations.
Exercise 1.74 Modify the algorithms in Figure 1.19 to compute y := Ax + y, where A is
symmetric and stored in the lower triangular part of matrix.
Answer
Notice that if A is symmetric, then
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
T
=
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
which means that
_
_
A
T
00
a
10
A
T
20
a
T
01

11
a
T
21
A
T
02
a
12
A
T
22
_
_
=
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
so that
A
T
00
= A
00
a
10
= a
01
A
T
20
= A
02
a
T
01
= a
T
10

11
a
T
21
= a
T
12
A
T
02
= A
02
a
12
= a
21
A
T
22
= A
22
.
Thus, if only the lower triangular part is stored, we can take advantage of the equalities
A
T
00
= A
00
a
T
01
= a
T
10

11
A
T
02
= A
02
a
12
= a
21
A
T
22
= A
22
.
in Figure 1.19, replacing in the left algorithm
1
:= a
T
10
+
11
1
+a
T
12
x
2
+
1
by
1
:= a
T
10
+
11
1
+a
T
21
x
2
+
1
and in the right algorithm
y
0
:=
1
a
01
+y
0
1
:=
1
11
+
1
y
2
:=
1
a
21
+y
2
by
y
0
:=
1
a
10
+y
0
1
:=
1
11
+
1
y
2
:=
1
a
21
+y
2
1.9. Answers 69
puting the general matrix-vector multiply.
puting the triangular matrix-vector multiply.
puting the symmetric matrix-vector multiply.
Exercise 1.80 When you try to pick from the algorithms in Figure 1.21, you will notice
that neither algorithm has the property that almost all computations involve data that is
stored consecutively in memory. There are actually two more algorithms for computing this
operation.
The observation is that y := Ax + y, where A is symmetric and stored in the upper
trianglar part of the array, can be computed via y := Ux + y followed by y :=

U
T
x + y,
where U and

U are the upper triangular and strictly upper triangular parts of A. Now, for
the rst there are two algorithms given in the text, and the other one is an exercise. This
gives us 2 2 combinations. By merging the two loops that you get (one for each of the two
operations) you can nd all four algorithms for the symmetric matrix-vector multiplication.
From these, you can pick the algorithm that strides through memory in the most favorable
way.
Chapter 2
Matrix-Matrix Multiplication
In this chapter, we discuss matrix-matrix multiplication. We start by motivating its deni-
tion. Next, we discuss why its implementation inherently allows high performance.
2.1 Motivating Example: Rotations
Consider the transformation R : R
2
R
2
that, given a vector x R
2
returns y = R(x),
where y is the vector x but rotated through an angle .
Example 2.1 The rotation R so dened is a linear transformation.
R(x) = R(x): One can rst scale x by a factor and then rotate it through an
angle or one can rst rotate x through an angle and then scale the result by a
factor .
R(x + y) = R(x) + R(y): One can rst add the vectors and then rotate, or one can
rotation each of them rst, and then add the results.
We need a picture to accompany this!
Now, lets see what the matrix is that represents R. We need to determine how e
0
=
_
1
0
_
and e
1
=
_
0
1
_
are transformed by the transformation R. Simple trigonometry tells
us that R
__
1
0
__
=
_
cos
sin
_
and R
__
0
1
__
=
_
sin
cos
_
. We need a picture to
accompany this! Thus,
R(x) = Ax where A =
_
cos sin
sin cos
_
.
71
72 Chapter 2. Matrix-Matrix Multiplication
Next, let us consider the transformation S : R
2
R
2
that rotates vectors through an angle
. Then
S(x) = Bx where B =
_
cos sin
sin cos
_
.
Finally, consider the transformation T that rotates vectors through an angle + so that
T(x) = Cx where C =
_
cos( +) sin( +)
sin( +) cos( +)
_
.
We notice that T(x) = R(S(x)): rotating a vector through angle + can be accomplished
by rst rotating that vector through angle and then rotating the result through angle .
Thus, Cx = A(Bx). The question is, how are A, B, and C related when Cx = A(Bx)?
Recall that the jth row of matrix C, c
j
, equals Ce
j
so that
c
0
= Ce
0
= A(Be
0
) = Ab
0
=
_
cos sin
sin cos
__
cos
sin
_
=
_
cos cos sin sin
sin cos + cos sin
_
c
1
= Ce
1
= A(Be
1
) = Ab
1
=
_
cos sin
sin cos
__
sin
cos
_
=
_
sin cos cos sin
cos cos sin sin
_
so that
C =
_
cos( +) sin( +)
sin( +) cos( +)
_
=
_
cos cos sin sin sin cos cos sin
sin cos + cos sin cos cos sin sin
_
.
(2.1)
This is consistent with what we learned in high school: from trigonometry we remember that
cos( +) = cos cos sin sin
sin( +) = sin cos + cos sin .
What we will see next, in more generality, is that C is the matrix product of A and B:
C = A(Bx) = (AB)x. In other words, C is dened to be the matrix that represents the
composition of transformations R and S, which is then written as C = AB.
2.2 Composing Linear Transformations
Theorem 2.2 Let L
A
: R
k
R
m
and L
B
: R
n
R
k
both be linear transformations
and, for all x R
n
, dene the function L
C
: R
n
R
m
by L
C
(x) = L
A
(L
B
(x)). Then
L
C
(x) is a linear transformations.
2.2. Composing Linear Transformations 73
Proof: Let x, y R
n
and , R. Then
L
C
(x +y) = L
A
(L
B
(x +y)) = L
A
(L
B
(x) +L
B
(y))
= L
A
(L
B
(x)) +L
A
(L
B
(y)) = L
C
(x) +L
C
(y).
Now, let linear transformations L
A
, L
B
, and L
C
be represented by matrices A R
mk
,
B R
kn
, and C R
mn
, respectively. Then Cx = L
C
(x) = L
A
(L
B
(x)) = A(Bx). The
matrix-matrix multiplication (product) is dened so that C = AB = AB. It composes A
and B into C just like L
C
is the composition of L
A
and L
B
.
Remark 2.3 If A is m
A
n
A
matrix, B is m
B
n
B
matrix, and C is m
C
n
C
matrix,
then for C = AB to hold it must be the case that m
C
= m
A
, n
C
= n
B
, and n
A
= m
B
.
The question now becomes how to compute C given matrices A and B. For this, we are
going to use and abuse the unit basis vectors e
j
.
Let
C =
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
, A =
_
_
_
_
_
0,0

0,1

0,k1
1,0

1,1

1,k1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,k1
_
_
_
_
_
,
and B =
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
.
.
.
k1,0

k1,1

k1,n1
_
_
_
_
_
.
Recall that
i,j
= e
T
i
(Ce
j
): Ce
j
picks out the jth column of C and e
T
i
(Ce
j
) then picks out
the ith element of the jth column.
Recall that C = AB is dened to be the matrix such that Cx = A(Bx) for all x.
Then
i,j
= e
T
i
(Ce
j
) = e
T
i
(A(Be
j
)). The result of Be
j
is the jth column of B, b
j
. Thus,
e
T
i
(A(Be
j
)) = e
T
i
(Ab
j
) which equals the ith element of the vector Ab
j
. By the denition of
matrix-vector multiplication, the ith element of vector Ab
j
is given by the dot product of
the ith row of A (viewed as a column vector) with the vector b
j
. Now, the ith row of A and
jth column of B are respectively given by
_

i,0

i,1

i,k1
_
and
_
_
_
_
_
0,j
1,j
.
.
.
k1,j
_
_
_
_
_
so that
i,j
= e
T
i
(A(Bx))e
j
=
i,0
0,j
+
i,1
1,j
+ +
i,k1
k1,j
=
k1
p=0
i,p
p,j
.
Denition 2.4 Matrix-matrix multiplication
Let A R
mk
, B R
kn
, and C R
mn
. Then the matrix-matrix multiplication
(product) C = AB is computed by setting
i,j
=
k1
p=0
i,p
p,j
=
i,0
0,j
+
i,1
1,j
+ +
i,k1
k1,j
.
As a result of this denition Cx = A(Bx) = (AB)x and can drop the parentheses, unless
they are useful for clarity: Cx = ABx.
Example 2.5 In (1.7) and Exercise 1.6, notice that
Q = P P =
_
_
0.4 0.3 0.1
0.4 0.3 0.6
0.2 0.4 0.3
_
_
_
_
0.4 0.3 0.1
0.4 0.3 0.6
0.2 0.4 0.3
_
_
=
_
_
0.30 0.25 0.25
0.40 0.45 0.40
0.30 0.30 0.35
_
_
.
Example 2.6 In (2.1) we notice that
C =
_
cos( +) sin( +)
sin( +) cos( +)
_
=
_
cos sin
sin cos
__
cos sin
sin cos
_
=
_
cos cos sin sin sin cos cos sin
sin cos + cos sin cos cos sin sin
_
.
Remark 2.7 We emphesize that for matrix-matrix multiplication to be a legal oper-
ations, the row and column dimensions of the matrices must obey certain constraints.
Whenever we talk about dimensions being conformal, we mean that the dimensions are
such that the encountered matrix multiplications are valid operations.
The following triple-nested loops compute C := AB +C:
2.2. Composing Linear Transformations 75
for j = 0, . . . , n 1
for i = 0, . . . , m1
for p = 0, . . . , k 1
i,j
:=
i,p
p,j
+
i,j
endfor
_
_
_
i,j
:=
k1
p=0
i,p
p,j
+
i,j
endfor
endfor
for j=1:n
for i=1:m
for p=1:k
C(i,j) += A(i,p) * B(p,j);
end
end
end
Figure 2.1: Simple triple-nested loop for computing C := AB +C.
for j = 0, . . . , n 1
for i = 0, . . . , m1
for p = 0, . . . , k 1
i,j
:=
i,p
p,j
+
i,j
endfor
_
_
_
i,j
:= a
T
i
b
j
+
i,j
endfor
endfor
for j=1:n
for i=1:m
C(i,j) += SLAP_Dot( A(i,:), B(:,j) );
end
end
Figure 2.2: Algorithm and M-script from Figure 2.1 with inner loop replaced by a dot
product.
for j = 0, . . . , n 1
for i = 0, . . . , m1
for p = 0, . . . , k 1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
_
_
c
j
:= Ab
j
+ c
j
endfor
for j=1:n
C(:,j) += ...
SLAP_Gemv( SLAP_NO_TRANSPOSE, ...
1, A, B(:,j), 1, C(:,j) );
end
Figure 2.3: Algorithm and M-script from Figure 2.1 with inner two loops replaced by a
for i = 0, . . . , m1
for j = 0, . . . , n 1
for p = 0, . . . , k 1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
endfor
The two outer-most loops sweep over all elements in C, and the inner loop computes the
inner product of the ith row of A with the jth column of B. If originally C = 0, then the
above algorithm computes C := AB.
Remark 2.8 When computing C = AB + C the three loops can be nested in 3! = 6
dierent ways. We will examine this more, later.
In Figures 2.12.3 we show how an ordering that places the loop indexed by j rst, the triple-
nested loop can be viewed as a double nested loop with dot products as the body (which
then implements the third loop) or as a single loop with matrix-vector multiplications as its
loop.
2.3 Special Cases of Matrix-Matrix Multiplication
We now show that if one treats scalars, column vectors, and row vectors as special cases of
matrices, then many operations we encountered previously become simply special cases of
matrix-matrix multiplication. In the below discussion, consider C = AB where C R
mn
,
A R
mk
, and B R
kn
.
m = n = k = 1 (scalar multiplication) In this case, all three matrices are actually scalars:
_

0,0
_
=
_

0,0
_ _

0,0
_
=
_

0,0
0,0
_
so that matrix-matrix multiplication becomes scalar multiplication.
Example 2.9 Let A =
_
4
_
and B =
_
3
_
. Then AB = (4 3) = 12.
n = 1, k = 1 (scal) Now the matrices look like
_
_
_
_
_
0,0
1,0
.
.
.
m1,0
_
_
_
_
_
=
_
_
_
_
_
0,0
1,0
.
.
.
m1,0
_
_
_
_
_
_

0,0
_
=
_
_
_
_
_
0,0
0,0
0,0
1,0
.
.
.
0,0
m1,0
_
_
_
_
_
=
_
_
_
_
_
0,0
0,0
0,0
1,0
.
.
.
0,0
m1,0
_
_
_
_
_
=
0,0
_
_
_
_
_
0,0
1,0
.
.
.
m1,0
_
_
_
_
_
.
In other words, C and A are vectors, B is a scalar, and the matrix-matrix multiplication
becomes scaling of a vector.
_
_
1
3
2
_
_
and B =
_
4
_
. Then
AB =
_
_
1
3
2
_
_
_
4
_
= 4
_
_
1
3
2
_
_
=
_
_
4 1
4 (3)
4 2
_
_
=
_
_
4
12
8
_
_
.
2.3. Special Cases of Matrix-Matrix Multiplication 77
m = 1, k = 1 (scal) Now the matrices look like
_

0,0

0,1

0,n1
_
=
_

0,0
_ _

0,0

0,1

0,n1
_
=
0,0
_

0,0

0,1

0,n1
_
=
_

0,0
0,0

0,0
0,1

0,0
0,n1
_
.
In other words, C and B are just row vectors and A is a scalar. The vector C is computed
by scaling the row vector B by the scalar A.
_
4
_
and B =
_
1 3 2
_
. Then
AB =
_
4
_ _
1 3 2
_
=
_
4 1 4 (3) 4 2
_
=
_
4 12 8
_
.
m = 1, n = 1 (dot) The matrices look like
_

0,0
_
=
_

0,0

0,1

0,k1
_
_
_
_
_
_
0,0
1,0
.
.
.
k1,0
_
_
_
_
_
=
k1
p=0
0,p
p,0
In other words, C is a scalar that is computed by taking the dot product of the one row that
is A and the one column that is B.
_
1 3 2
_
and B =
_
_
2
1
0
_
_
. Then
AB =
_
1 3 2
_
_
_
2
1
0
_
_
= 1 2 + (3) (1) + 2 0 = 2 + 3 + 0 = 5.
k = 1 (outer product)
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
=
_
_
_
_
_
0,0
1,0
.
.
.
m1,0
_
_
_
_
_
_

0,0

0,1

0,n1
_
=
_
_
_
_
_
0,0
0,0

0,0
0,1

0,0
0,n1
1,0
0,0

1,0
0,1

1,0
0,n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0
0,0

m1,0
0,1

m1,0
0,n1
_
_
_
_
_
_
_
1
3
2
_
_
and B =
_
1 2
_
. Then
AB =
_
_
1
3
2
_
_
_
1 2
_
=
_
_
1 (1) 1 (2)
(3) (1) (3) (2)
2 (1) 2 (2)
_
_
=
_
_
1 2
3 6
2 4
_
_
n = 1 (matrix-vector product)
_
_
_
_
_
0,0
1,0
.
.
.
m1,0
_
_
_
_
_
=
_
_
_
_
_
0,0

0,1

0,k1
1,0

1,1

1,k1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,k1
_
_
_
_
_
_
_
_
_
_
0,0
1,0
.
.
.
k1,0
_
_
_
_
_
m = 1 (row vector-matrix product)
_

0,0

0,1

0,n1
_
=
_

0,0

0,1

0,k1
_
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
.
.
.
k1,0

k1,1

k1,n1
_
_
_
_
_
so that
0,j
=
k1
p=0

0,p
p,j
.
_
0 1 0
_
and B =
_
_
1 2 2
4 2 0
1 2 3
_
_
. Then AB =
_
4 2 0
_
Exercise 2.15 Let e
i
R
m
equal the ith unit basis vector and A R
mn
. Show that
e
T
i
A = a
T
i
, the ith row of A.
2.4. Properties of Matrix-Matrix Multiplication 79
2.4 Properties of Matrix-Matrix Multiplication
Let us examine some properties of matrix-matrix multiplication:
Theorem 2.16 Let A, B, and C be matrices of conforming dimensions. Then
(AB)C = A(BC).
In other words, matrix multiplication is associative.
Proof: Let e
j
equal the jth unit basis vector. Then
(AB)Ce
j
= (AB)c
j
= A(Bc
j
) = A(BCe
j
) = A(BC)e
j
.
Thus, the columns of (AB)C equal the columns of A(BC), making the two matrices equal.
Theorem 2.17 Let A, B, and C be matrices of conforming dimensions. Then
(A +B)C = AC +BC and A(B +C) = AB +AC.
In other words, matrix multiplication is distributative.
Remark 2.18 Matrix-matrix multiplication does not commute: Only in very
rare cases does AB equal BA. Indeed, the matrix dimensions may not even be
conformal.
mk
and B R
kn
. Then
(AB)
T
= B
T
A
T
.
Before proving this theorem, we rst note that
Lemma 2.20 Let A R
mn
. Then e
T
i
A
T
= (Ae
i
)
T
and A
T
e
j
= (e
T
j
A)
T
.
The proof of this lemma is pretty obvious: The ith row of A
T
is clearly the ith column
of A, but viewed as a row, etc.
Proof: (of Theorem 2.19) We prove that the (i, j) element of (AB)
T
equals the (i, j) element
of (B
T
A
T
):
e
T
i
(AB)
T
e
j
= < (i, j) element of C equals (j, i) element of C
T
>
e
T
j
(AB)e
i
= < Associativity of matrix multiplication >
(e
T
j
A)(Be
i
)
= < x
T
y = y
T
x >
(Be
i
)
T
(e
T
j
A)
T
= < Lemma 2.20 >
e
T
i
(B
T
A
T
)e
j
.
2.5 Multiplying partitioned matrices
Theorem 2.21 Let C R
mn
, A R
mk
, and B R
kn
. Let
m = m
0
+m
1
+ m
M1
, m
i
0 for i = 0, . . . , M 1;
n = n
0
+n
1
+ n
N1
, n
j
0 for j = 0, . . . , N 1; and
k = k
0
+k
1
+ k
K1
, k
p
0 for p = 0, . . . , K 1.
Partition
C =
_
_
_
_
_
C
0,0
C
0,1
C
0,N1
C
1,0
C
1,1
C
1,N1
.
.
.
.
.
.
.
.
.
.
.
.
C
M1,0
C
M1,1
C
M1,N1
_
_
_
_
_
, A =
_
_
_
_
_
A
0,0
A
0,1
A
0,K1
A
1,0
A
1,1
A
1,K1
.
.
.
.
.
.
.
.
.
.
.
.
A
M1,0
A
M1,1
A
M1,K1
_
_
_
_
_
,
and B =
_
_
_
_
_
B
0,0
B
0,1
B
0,N1
B
1,0
B
1,1
B
1,N1
.
.
.
.
.
.
.
.
.
.
.
.
B
K1,0
B
K1,1
B
K1,N1
_
_
_
_
_
,
with C
i,j
R
m
i
n
j
, A
i,p
R
m
i
kp
, and B
p,j
R
kpn
j
. Then C
i,j
=
K1
p=0
A
i,p
B
p,j
.
Remark 2.22 If one partitions matrices C, A, and B into blocks, and one makes sure the
dimensions match up, then blocked matrix-matrix multiplication proceeds exactly as does
a regular matrix-matrix multiplication except that individual multiplications of scalars
commute while (in general) individual multiplications with matrix blocks (submatrices)
do not.
2.5. Multiplying partitioned matrices 81
A =
_
_
_
_
1 2 4 1
1 0 1 2
2 1 3 1
1 2 3 4
_
_
_
_
, B =
_
_
_
_
2 2 3
0 1 1
2 1 0
4 0 1
_
_
_
_
, and AB =
_
_
_
_
2 4 2
8 3 5
6 0 4
8 1 1
_
_
_
_
:
If
A
0
=
_
_
_
_
1 2
1 0
2 1
1 2
_
_
_
_
, A
1
=
_
_
_
_
4 1
1 2
3 1
3 4
_
_
_
_
, B
0
_
2 2 3
0 1 1
_
, and B
1
=
_
2 1 0
4 0 1
_
.
Then
AB =
_
A
0
A
1
_
_
B
0
B
1
_
= A
0
B
0
+A
1
B
1
:
_
_
_
_
1 2 4 1
1 0 1 2
2 1 3 1
1 2 3 4
_
_
_
_
. .
A
_
_
_
_
2 2 3
0 1 1
2 1 0
4 0 1
_
_
_
_
. .
B
=
_
_
_
_
1 2
1 0
2 1
1 2
_
_
_
_
. .
A
0
_
2 2 3
0 1 1
_
. .
B
0
+
_
_
_
_
4 1
1 2
3 1
3 4
_
_
_
_
. .
A
1
_
2 1 0
4 0 1
_
. .
B
1
=
_
_
_
_
2 0 1
2 2 3
4 3 5
2 4 5
_
_
_
_
. .
A
0
B
0
+
_
_
_
_
4 4 1
6 1 2
2 3 1
10 3 4
_
_
_
_
. .
A
1
B
1
=
_
_
_
_
2 4 2
8 3 5
6 0 4
8 1 1
_
_
_
_
. .
AB
.
Corollary 2.24 In Theorem 2.21 partition C and B by columns and do not partition A.
In other words, let M = 1, m
0
= m; N = n, n
j
= 1, j = 0, . . . , n 1; and K = 1, k
0
= k.
Then
C =
_
c
0
c
1
c
n1
_
and B =
_
b
0
b
1
b
n1
_
so that
_
c
0
c
1
c
n1
_
= C = AB = A
_
b
0
b
1
b
n1
_
=
_
Ab
0
Ab
1
Ab
n1
_
.
Example 2.25
_
_
1 2 4
1 0 1
2 1 3
_
_
_
_
2 2
0 1
2 1
_
_
=
_
_
_
_
1 2 4
1 0 1
2 1 3
_
_
_
_
2
0
2
_
_
_
_
1 2 4
1 0 1
2 1 3
_
_
_
_
2
1
1
_
_
_
_
=
_
_
6 4
0 3
10 0
_
_
By moving the loop indexed by j to the outside in the algorithm for computing C =
AB +C we observe that
for j = 0, . . . , n 1
for i = 0, . . . , m1
for p = 0, . . . , k 1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
_
_
c
j
:= Ab
j
+ c
j
endfor
or
for j = 0, . . . , n 1
for p = 0, . . . , k 1
for i = 0, . . . , m1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
_
_
c
j
:= Ab
j
+ c
j
endfor
Corollary 2.26 In Theorem 2.21 partition C and A by rows and do not partition B. In
other words, let M = m, m
i
= i, i = 0, . . . , m 1; N = 1, n
0
= n; and K = 1, k
0
= k.
Then
C =
_
_
_
_
_
c
T
0
c
T
1
.
.
.
c
T
m1
_
_
_
_
_
and A =
_
_
_
_
_
a
T
0
a
T
1
.
.
.
a
T
m1
_
_
_
_
_
so that
_
_
_
_
_
c
T
0
c
T
1
.
.
.
c
T
m1
_
_
_
_
_
= C = AB =
_
_
_
_
_
a
T
0
a
T
1
.
.
.
a
T
m1
_
_
_
_
_
B =
_
_
_
_
_
a
T
0
B
a
T
1
B
.
.
.
a
T
m1
B
_
_
_
_
_
.
Example 2.27
_
_
1 2 4
1 0 1
2 1 3
_
_
_
_
2 2
0 1
2 1
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 2 4
_
_
_
2 2
0 1
2 1
_
_
_
1 0 1
_
_
_
2 2
0 1
2 1
_
_
_
2 1 3
_
_
_
2 2
0 1
2 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
6 4
0 3
10 0
_
_
In the algorithm for computing C = AB +C the loop indexed by i can be moved to the
outside so that
for i = 0, . . . , m1
for j = 0, . . . , n 1
for p = 0, . . . , k 1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
_
_
c
T
i
:= a
T
i
B + c
T
i
endfor
or
for i = 0, . . . , m1
for p = 0, . . . , k 1
for j = 0, . . . , n 1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
_
_
c
T
i
:= a
T
i
B + c
T
i
endfor
Corollary 2.28 In Theorem 2.21 partition A and B by columns and rows, respectively,
and do not partition C. In other words, let M = 1, m
0
= m; N = 1, n
0
= n; and K = k,
k
p
= 1, p = 0, . . . , k 1. Then
A =
_
a
0
a
1
a
k1
_
and B =
_
_
_
_
_
b
T
0
b
T
1
.
.
.
b
T
k1
_
_
_
_
_
so that
C = AB =
_
a
0
a
1
a
k1
_
_
_
_
_
_
b
T
0
b
T
1
.
.
.
b
T
k1
_
_
_
_
_
= a
0
b
T
0
+a
1
b
T
1
+ +a
k1
b
T
k1
.
Example 2.29
_
_
1 2 4
1 0 1
2 1 3
_
_
_
_
2 2
0 1
2 1
_
_
=
_
_
1
1
2
_
_
_
2 2
_
+
_
_
2
0
1
_
_
_
0 1
_
+
_
_
4
1
3
_
_
_
2 1
_
=
_
_
2 2
2 2
4 4
_
_
+
_
_
0 2
0 0
0 1
_
_
+
_
_
8 4
2 1
6 3
_
_
=
_
_
6 4
0 3
10 0
_
_
In the algorithm for computing C = AB +C the loop indexed by p can be moved to the
outside so that
for p = 0, . . . , k 1
for j = 0, . . . , n 1
for i = 0, . . . , m1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
_
_
C := a
p
b
T
p
+ C
endfor
or
for p = 0, . . . , k 1
for i = 0, . . . , m1
for j = 0, . . . , n 1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
_
_
C := a
p
b
T
p
+ C
endfor
Example 2.30 In Theorem 2.21 partition C into elements (scalars) and A and B by rows
and columns, respectively, and do not partition C. In other words, let M = m, m
i
= 1,
i = 0, . . . , m1; N = N, n
j
= 1, j = 0, . . . , n 1; and K = 1, k
0
= k. Then
C =
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
, A =
_
_
_
_
_
a
T
0
a
T
1
.
.
.
a
T
m1
_
_
_
_
_
, and B =
_
b
0
b
1
b
n1
_
so that
C =
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
=
_
_
_
_
_
a
T
0
a
T
1
.
.
.
a
T
m1
_
_
_
_
_
_
b
0
b
1
b
n1
_
=
_
_
_
_
_
a
T
0
b
0
a
T
0
b
1
a
T
0
b
n1
a
T
1
b
0
a
T
1
b
1
a
T
1
b
n1
.
.
.
.
.
.
.
.
.
.
.
.
a
T
m1
b
0
a
T
m1
b
1
a
T
m1
b
n1
_
_
_
_
_
.
As expected,
i,j
= a
T
i
b
j
: the dot product of the ith row of A with the jth row of B.
Example 2.31
_
_
1 2 4
1 0 1
2 1 3
_
_
_
_
2 2
0 1
2 1
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 2 4
_
_
_
2
0
2
_
_
_
1 2 4
_
_
_
2
1
1
_
_
_
1 0 1
_
_
_
2
0
2
_
_
_
1 0 1
_
_
_
2
1
1
_
_
_
2 1 3
_
_
_
2
0
2
_
_
_
2 1 3
_
_
_
2
1
1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
6 4
0 3
10 0
_
_
In the algorithm for computing C = AB+C the loop indexed by p (which computes the
dot product of the ith row of A with the jth column of B) can be moved to the inside so
that
for j = 0, . . . , n 1
for i = 0, . . . , m1
for p = 0, . . . , k 1
i,j
:=
i,p
p,j
+
i,j
endfor
_
_
_
i,j
:= a
T
i
b
j
+
i,j
endfor
endfor
or
for i = 0, . . . , m1
for j = 0, . . . , n 1
for p = 0, . . . , k 1
i,j
:=
i,p
p,j
+
i,j
endfor
_
_
_
i,j
:= a
T
i
b
j
+
i,j
endfor
endfor
Notice that the algorithm on the left already appeared after Example ?? while the one on
the right appeared after Example 2.27.
2.6 Summing it all up
Figure 2.4 summarizes how matrix-matrix multiplication can be implemented in terms of
operations that we studies in the previous chapter. It explains how the six dierent orderings
of the loops exhibit themselves as matrix-matrix multiplication is viewed in a layered fashion.
2.6. Summing it all up 87
C = AB +C
for j = 0, . . . , n 1
c
j
:= Ab
j
+ c
j
endfor
for i = 0, . . . , m1
c
T
i
:= a
T
i
B + c
T
i
endfor
for p = 0, . . . , k 1
C := a
p
b
T
p
+ C
endfor
for j = 0, . . . , n 1
for i = 0, . . . , m1
i,j
:= a
T
i
b
j
+
i,j
endfor
endfor
for j = 0, . . . , n 1
for p = 0, . . . , k 1
c
j
:=
p,j
a
p
+ c
j
endfor
endfor
for i = 0, . . . , m1
for j = 0, . . . , n 1
i,j
:= a
T
i
b
j
+
i,j
endfor
endfor
for i = 0, . . . , m1
for p = 0, . . . , k 1
c
T
i
:=
i,p
b
T
p
+ c
T
i
endfor
endfor
for p = 0, . . . , k 1
for i = 0, . . . , m1
c
T
i
:=
i,p
b
T
p
+ c
T
i
endfor
endfor
for p = 0, . . . , k 1
for j = 0, . . . , n 1
c
j
:=
p,j
a
p
+ c
j
endfor
endfor
for j = 0, . . . , n 1
for i = 0, . . . , m1
for p = 0, . . . , k 1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
endfor
for j = 0, . . . , n 1
for p = 0, . . . , k 1
for i = 0, . . . , m1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
endfor
for i = 0, . . . , m1
for j = 0, . . . , n 1
for p = 0, . . . , k 1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
endfor
for i = 0, . . . , m1
for p = 0, . . . , k 1
for j = 0, . . . , n 1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
endfor
for p = 0, . . . , k 1
for i = 0, . . . , m1
for j = 0, . . . , n 1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
endfor
for p = 0, . . . , k 1
for j = 0, . . . , n 1
for i = 0, . . . , m1
i,j
:=
i,p
p,j
+
i,j
endfor
endfor
endfor
#
E
g
g
g
g
g
g
g
g
g
g
g
g
g
g
g
#
g
g
g
g
#
g
g
g
g
#
g
g
g
g
E
E
E
E
E
E
Figure 2.4: Algorithms for implementing C := AB +C.
2.7 Additional Exercises
1. Consider the linear transformations L
A
: R
3
R
2
and L
B
: R
3
R
3
and let L
C
be
dened by L
C
(x) = L
A
(L
B
(x)).
(a) L
C
: R
???
R
??
.
(b) Assume that
L
B
_
_
_
_
1
0
0
_
_
_
_
=
_
_
2
1
0
_
_
, L
B
_
_
_
_
0
1
0
_
_
_
_
=
_
_
3
2
1
_
_
, L
B
_
_
_
_
0
0
1
_
_
_
_
=
_
_
1
1
2
_
_
.
and
L
A
_
_
_
_
2
1
0
_
_
_
_
=
_
1
2
_
, L
A
_
_
_
_
3
2
1
_
_
_
_
=
_
3
4
_
, L
A
_
_
_
_
1
1
2
_
_
_
_
=
_
2
1
_
.
Let Bx = L
B
(x) and Cx = L
C
(x) (B is the matrix that represents L
B
and C is
the matrix that represents L
C
). Then
B =
C =
2. Compute
(a)
_
1 1 2
3 2 1
_
_
_
1
2
3
_
_
=
(b)
_
1 1 2
3 2 1
_
_
_
1
1
0
_
_
=
(c)
_
1 1 2
3 2 1
_
_
_
1 1
2 1
3 0
_
_
=
(d)
_
1 1 2
3 2 1
_
_
_
1 1
1 2
0 3
_
_
=
(e) You should notice a pattern in the above calculations. What are they? How could
you reduce the number of calculations you have to do?
3. Compute
2.7. Additional Exercises 89
(a)
_
3 2 1
1 1 2
_
_
_
1
2
3
_
_
=
(b)
_
3 2 1
1 1 2
_
_
_
1
1
0
_
_
=
(c)
_
3 2 1
1 1 2
_
_
_
1 1
2 1
3 0
_
_
=
(d)
_
3 2 1
1 1 2
_
_
_
1 1
1 2
0 3
_
_
=
(e) You should notice a pattern in the above calculations. What are they? How could
4. Compute
(a)
_
1 2 3
1 1 0
_
_
_
3
2
1
_
_
=
(b)
_
1 2 3
1 1 0
_
_
_
1
1
2
_
_
=
(c)
_
1 2 3
1 1 0
_
_
_
3 1
2 1
1 2
_
_
=
(d)
_
1 2 3
1 1 0
_
_
_
1 3
1 2
2 1
_
_
=
(e)
_
_
1 1
2 1
3 0
_
_
T
_
_
3 1
2 1
1 2
_
_
=
(f)
_
_
1 1
2 1
3 0
_
_
T
_
3 2 1
1 1 2
_
T
=
(g)
_
_
_
3 2 1
1 1 2
_
_
_
1 1
2 1
3 0
_
_
_
_
T
=
(h) You should notice a pattern in the above calculations. What are they? How could
5. Look carefully at what you did in Exercises 24. What could you have observed to
save yourself a lot of work? (This is not a question I actually want you to hand it.
Think about it...)
6. Compute
(a)
_
_
1
2
0
_
_
_
0 1 2
_
=
(b)
_
_
1
3
1
_
_
_
1 2 3
_
=
(c)
_
_
1 1
2 3
0 1
_
_
_
0 1 2
1 2 3
_
=
(d)
_
_
1 1
3 2
1 0
_
_
_
1 2 3
0 1 2
_
=
(e) What do you observe?
7. Which of the following are legal multiplications:
(a)
_
_
1 1
3 2
1 0
_
_
_
_
1 1
3 2
1 0
_
_
(b) 3
_
_
1 1
3 2
1 0
_
_
(c)
_
_
1 1
3 2
1 0
_
_
3
(d)
_
_
1 1
3 2
1 0
_
_
_
_
1 1
3 2
1 0
_
_
T
(e)
_
1 1
_
_
_
1 1
3 2
1 0
_
_
2.7. Additional Exercises 91
(f)
_
1 1 2
_
_
_
1 1
3 2
1 0
_
_
8. Compute the following
(a) using Theorem 2.21:
_
_
_
_
1 1 2
0 2 1
2 3 1
2 1 3
_
_
_
_
_
_
1 2 3
1 2 3
0 1 1
_
_
=
_
_
_
_
_
1 1
0 2
_ _
2
1
_
_
2 3
2 1
_ _
1
3
_
_
_
_
_
_
_
_
1 2
1 2
_ _
3
3
_
_
0 1
_ _
1
_
_
_
=
_
_
_
1 1
0 2
__
1 2
1 2
_
+
_
2
1
_
_
0 1
_

_
_
=
(b) using Corollary 2.24:
_
2 1
2 1
__
0 1 2 3
0 1 2 3
_
(c) using Corollary 2.26:
_
2 1
2 1
__
0 1 2 3
0 1 2 3
_
(d) using Corollary 2.28:
_
2 1
2 1
__
0 1 2 3
0 1 2 3
_
Chapter 3
Gaussian Elimination
In this chapter, we again motivate matrix notation and the matrix-vector multiplication
operation, but this time by looking at how we solve systems of linear equations.
3.1 Solving a System of linear Equations via Gaussian Elim-
ination (GE, Take 1)
Consider the system of linear equations
2x +4y 2z =10
4x 2y +6z = 20
6x 4y +2z = 18
Notice that x, y, and z are just variables, for which we can pick any name we want. To be
consistent with the notation we introduced previously for naming components of vectors, we
use the names
0
,
1
, and and
2
instead of x, y, and z, respectively:
2
0
+4
1
2
2
=10
4
0
2
1
+6
2
= 20
6
0
4
1
+2
2
= 18
Solving this linear system relies on the fact that its solution does not change if
1. Equations are reordered (not actually used in this example); and/or
2. An equation in the system is modied by subtracting a multiple of another equation
in the system from it; and/or
3. Both sides of an equation in the system are scaled by a nonzero.
The following steps are knows as Gaussian elimination. They transform a system of linear
equations to an equivalent upper triangular system of linear equations:
93
94 Chapter 3. Gaussian Elimination
Subtract
10
= (4/2) = 2 times the rst equation from the second equation:
Before After
2
0
+4
1
2
2
=10
4
0
2
1
+6
2
= 20
6
0
4
1
+2
2
= 18
2
0
+ 4
1
2
2
=10
10
1
+10
2
= 40
6
0
4
1
+ 2
2
= 18
Subtract
20
= (6/2) = 3 times the rst equation from the third equation:
Before After
2
0
+ 4
1
2
2
=10
10
1
+10
2
= 40
6
0
4
1
+ 2
2
= 18
2
0
+ 4
1
2
2
=10
10
1
+10
2
= 40
16
1
+ 8
2
= 48
Subtract
21
= ((16)/(10)) = 1.6 times the second equation from the third equa-
tion:
Before After
2
0
+ 4
1
2
2
=10
10
1
+10
2
= 40
16
1
+ 8
2
= 48
2
0
+ 4
1
2
2
=10
10
1
+10
2
= 40
8
2
=16
This now leaves us with an upper triangular system of linear equations.
Remark 3.1 In the above Gaussian elimination procedure,
10
,
20
, and
21
are called
the multipliers.
The equivalent upper triangular system of equations is now solved via back substitution:
Consider the last equation, 8
2
=16. Scaling both sides by by 1/(8) we nd
that
2
= 16/(8) = 2.
Next, consider the second equation, 10
1
+10
2
=40. We know that
2
= 2, which
we plug into this equation to yield 10
1
+10(2) =40. Rearranging this we nd that
1
= (40 10(2))/(10) = 2.
Finally, consider the rst equation, 2
0
+4
1
2
2
=10 We know that
2
= 2
and
1
= 2, which we plug into this equation to yield 2
0
+4(2) 2(2) =10.
Rearranging this we nd that
0
= (10 (4(2) + (2)(2)))/2 = 1.
3.2. Matrix Notation (GE, Take 2) 95
Gaussian elimination (transform to upper triangular system of equations)
Initial system of equations:
Subtract
10
= (4/2) = 2 times the rst
row from the second row:
2
0
+4
1
2
2
=10
4
0
2
1
+6
2
= 20
6
0
4
1
+2
2
= 18
2
0
+ 4
1
2
2
=10
10
1
+10
2
= 40
6
0
4
1
+ 2
2
= 18
Subtract
20
row from the third row:
Subtract
21
= ((16)/(10)) = 1.6 times
the second row from the third row:
2
0
+ 4
1
2
2
=10
10
1
+10
2
= 40
16
1
+ 8
2
= 48
2
0
+ 4
1
2
2
=10
10
1
+10
2
= 40
8
2
=16
Back substitution (solve upper triangular system of equations)
8
2
=16 implies
2
= (16)/(8) = 2
10
1
+10
2
=40 implies
10
1
+10(2) =40 and hence
1
= (40 10(2))/(10) = 2
2
0
+4
1
2
2
=10 implies
2
0
+4(2) 2(2) =10 and hence
0
= (10 4(2) + 2(2))/(2) = 1
Solution equals x =
_
_
2
_
_
=
_
_
1
2
2
_
_
Check the answer (by plugging
0
= 1,
1
= 2, and
2
= 2 into the original system)
2(1) +4(2) 2(2) =10
4(1) 2(2) +6(2) = 20
6(1) 4(2) +2(2) = 18
Figure 3.1: Summation of the steps that yield the solution via Gaussian elimination and
backward substitution for the example in Section 3.1.
Thus, the solution is the vector x =
_
_
2
_
_
=
_
_
1
2
2
_
_
. All the above discussed steps are
summarized in Figure 3.1.
3.2 Matrix Notation (GE, Take 2)
Now, in the above example, it becomes very cumbersome to always write the entire equation.
The information is encoded in the coecients in front of the
i
variables, and the values to
the right of the equal signs. Thus, we could just let
_
_
2 4 2 10
4 2 6 20
6 4 2 18
_
_
represent
2
0
+4
1
2
2
=10
4
0
2
1
+6
2
= 20
6
0
4
1
+2
2
= 18
Then Gaussian elimination can simply work with this array of numbers as illustrated in
Figure 3.2.
Gaussian elimination (transform to upper triangular system of equations)
Subtract
10
_
_
2 4 2 10
4 2 6 20
6 4 2 18
_
_
_
_
2 4 2 10
0 10 10 40
6 4 2 18
_
_
Subtract
20
Subtract
21
= ((16)/(10)) = 1.6 times
_
_
2 4 2 10
0 10 10 40
0 16 8 48
_
_
_
_
2 4 2 10
0 10 10 40
0 0 8 16
_
_
The last row is shorthand for
8
2
=16 which implies
2
= (16)/(8) = 2
The second row is shorthand for
10
1
+10
2
=40 which implies
10
1
1
= (40 10(2))/(10) = 2
The rst row is shorthand for
2
0
+4
1
2
2
=10 which implies
2
0
+4(2) 2(2) =10 and hence
0
= (10 4(2) + 2(2))/(2) = 1
Solution equals x =
_
_
2
_
_
=
_
_
1
2
2
_
_
0
= 1,
1
= 2, and
2
2(1) +4(2) 2(2) =10
4(1) 2(2) +6(2) = 20
6(1) 4(2) +2(2) = 18
backward substitution for the example in Section 3.1. In this gure, a shorthand (matrix
notation) is used to represent the system of linear equations. Compare and contrast to
Figure 3.1.
The above discussion motivates storing only the coecients of a linear system (the num-
bers to the left of the [) as a two dimensional array and the numbers to the right as a one
dimension array. We recognize this two dimensional array as a matrix: A R
mn
is the two
dimensional array of scalars
A =
_
_
_
_
_
0,0

0,1

0,n1
1,0

1,1

1,n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0

m1,1

m1,n1
_
_
_
_
_
,
where
i,j
R for 0 i < m and 0 j < n. It has m rows and n columns. Note that the
parentheses are simply there to delimit the array rather than having any special meaning.
3.3. Towards Gauss Transforms (GE, Take 3) 97
We similarly recognize that the one dimensional array is a (column) vector x R
n
where
x =
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
.
The length of the vector is n.
Now, given A R
mn
and vector x R
n
, the notation Ax stands for
_
_
_
_
_
0,0
0
+
0,1
1
+ +
0,n1
n1
1,0
0
+
1,1
1
+ +
1,n1
n1
.
.
.
.
.
.
.
.
.
.
.
.
m1,0
0
+
m1,1
1
+ +
m1,n1
n1
_
_
_
_
_
(3.1)
which is itself a vector of length m. We already encountered this operation: it is the by now
familiar matrix-vector multiply or matrix-vector product.
Now a linear system of m equations in n unknowns can be written as Ax = y, where
A R
mn
, x R
n
, and y R
m
. For example, the problem in Section 3.1 could be written
as
_
_
2 4 2
4 2 6
6 4 2
_
_
_
_
2
_
_
=
_
_
10
20
18
_
_
3.3 Towards Gauss Transforms (GE, Take 3)
We start by showing that the various steps in Gaussian elimination can be instead presented
as multiplication by a carefully chosen matrix. Compare and contrast the examples in
Example 3.2 and Exercise 3.3 with the steps of Gaussian elimination in Figure 3.2.
Example 3.2
_
_
1 0 0
2 1 0
0 0 1
_
_
_
_
2 4 2
4 2 6
6 4 2
_
_
=
_
_
2 4 2
0 10 10
6 4 2
_
_
.
Notice that multiplication by the given matrix is the same as performing an axpy with the
negative of the given multiplier,
1,0
= 2, the rst row, and the second row.
Gaussian elimination, Take 3
Subtract
10
_
_
2 4 2 10
4 2 6 20
6 4 2 18
_
_
_
_
1 0 0
2 1 0
0 0 1
_
_
_
_
2 4 2 10
4 2 6 20
6 4 2 18
_
_
=
_
_
2 4 2 10
0 10 10 40
6 4 2 18
_
_
Subtract
20
Subtract
21
= ((16)/(10)) = 1.6 times
_
_
1 0 0
0 1 0
3 0 1
_
_
_
_
2 4 2 10
0 10 10 40
6 4 2 18
_
_
=
_
_
2 4 2 10
0 10 10 40
0 16 8 48
_
_
_
_
1 0 0
0 1 0
0 1.6 1
_
_
_
_
2 4 2 10
0 10 10 40
0 16 8 48
_
_
=
_
_
2 4 2 10
0 10 10 40
0 0 8 16
_
_
The last row is shorthand for
8
2
=16 which implies
2
= (16)/(8) = 2
The second row is shorthand for
10
1
+10
2
=40 which implies
10
1
1
= (40 10(2))/(10) = 2
The rst row is shorthand for
2
0
+4
1
2
2
=10 which implies
2
0
+4(2) 2(2) =10 and hence
0
= (10 4(2) + 2(2))/(2) = 1
Solution equals x =
_
_
2
_
_
=
_
_
1
2
2
_
_
0
= 1,
1
= 2, and
2
2(1) +4(2) 2(2) =10
4(1) 2(2) +6(2) = 20
6(1) 4(2) +2(2) = 18
backward substitution for the example in Section 3.1. In this gure, we show how the steps
of Gaussian Elimination can be formulated as multiplications by the special matrices

L
i,j
.
3.3. Towards Gauss Transforms (GE, Take 3) 99
Exercise 3.3 Compute
_
_
1 0 0
0 1 0
3 0 1
_
_
_
_
2 4 2
0 10 10
6 4 2
_
_
.
How can this be described as an axpy operation?
Theorem 3.4 Let i > j and let

L
i,j
be the matrix that equals the identity, except that
the (i, j) element has been replaced with
i,j
. Then

L
i,j
A equals the matrix A except
that the ith row is modied by subtracting
i,j
times the jth row from it.
Proof: Let
L
i,j
=
_
_
_
_
_
_
I
j
0 0 0 0
0 1 0 0 0
0 0 I
ij1
0 0
0
i,j
0 1 0
0 0 0 0 I
ni1
_
_
_
_
_
_
A =
_
_
_
_
_
_
A
0:j1
a
T
j
A
j+1:i1
a
T
i
A
i+1:m1
_
_
_
_
_
_
,
where I
k
equals a kk identity matrix, A
s:t
equals the matrix that consists of rows s through
t from matrix A, and a
T
k
equals the kth row of A. Then
L
i,j
A =
_
_
_
_
_
_
I
j
0 0 0 0
0 1 0 0 0
0 0 I
ij1
0 0
0
i,j
0 1 0
0 0 0 0 I
ni1
_
_
_
_
_
_
_
_
_
_
_
_
A
0:j1
a
T
j
A
j+1:i1
a
T
i
A
i+1:m1
_
_
_
_
_
_
=
_
_
_
_
_
_
A
0:j1
a
T
j
A
j+1:i1
i,j
a
T
j
+ a
T
i
A
i+1:m1
_
_
_
_
_
_
=
_
_
_
_
_
_
A
0:j1
a
T
j
A
j+1:i1
a
T
i

i,j
a
T
j
A
i+1:m1
_
_
_
_
_
_
.
Example 3.5 The steps of Gaussian elimination already given in Figures 3.1 and 3.2 are
now repeated with the insights in this section, in Figure 3.3.
Exercise 3.6 Verify that in Figure 3.3
_
_
1 0 0
0 1 0
0 1.6 1
_
_
_
_
1 0 0
0 1 0
3 0 1
_
_
_
_
1 0 0
2 1 0
0 0 1
_
_
_
_
2 4 2 10
4 2 6 20
6 4 2 18
_
_
=
_
_
2 4 2 10
0 10 10 40
0 0 8 16
_
_
In Figure 3.4, we use the insights in this section to show that one can perform Gaussian
Elimination with just the matrix rst and then on the right-hand side. This is important,
because often one has multiple linear systems of equations that dier only in the vector on
the right-hand side. The important insight is that the multipliers can be stored in the place
in which a new zero is introduced at each step.
_
_
1 0 0
0 1 0
0 1.6 1
_
_
_
_
1 0 0
0 1 0
3 0 1
_
_
_
_
1 0 0
2 1 0
0 0 1
_
_
_
_
2 4 2
4 2 6
6 4 2
_
_
=
_
_
2 4 2
0 10 10
0 0 8
_
_
and
_
_
1 0 0
0 1 0
0 1.6 1
_
_
_
_
1 0 0
0 1 0
3 0 1
_
_
_
_
1 0 0
2 1 0
0 0 1
_
_
_
_
10
20
18
_
_
=
_
_
10
40
16
_
_
.
3.4 Gauss Transforms (GE, Take 4)
Next, we show how eliminating (zeroing) all elements in the matrix below the diagonal can
be combined into a single step. Compare and contrast the following example with those in
Example 3.2 and Exercise 3.3.
Example 3.8
_
_
1 0 0
2 1 0
3 0 1
_
_
_
_
2 4 2
4 2 6
6 4 2
_
_
=
_
_
2 4 2
0 10 10
0 16 8
_
_
3.4. Gauss Transforms (GE, Take 4) 101
Gaussian elimination, Take 3b,
separating the factorization of the matrix from the updating of the right-hand side
Initial matrix:
Subtract
10
row from the second row, overwriting the
elemented that is zeroed: _
_
2 4 2
4 2 6
6 4 2
_
_
_
_
1 0 0
2 1 0
0 0 1
_
_
_
_
2 4 2
4 2 6
6 4 2
_
_
=
_
_
2 4 2
2 10 10
6 4 2
_
_
Subtract
20
row from the third row, overwriting the el-
emented that is zeroed:
Subtract
21
= ((16)/(10)) = 1.6 times
the second row from the third row, over-
writing the elemented that is zeroed:
_
_
1 0 0
0 1 0
3 0 1
_
_
_
_
2 4 2
2 10 10
6 4 2
_
_
=
_
_
2 4 2
2 10 10
3 16 8
_
_
_
_
1 0 0
0 1 0
0 1.6 1
_
_
_
_
2 4 2
2 10 10
3 16 8
_
_
=
_
_
2 4 2
2 10 10
3 1.6 8
_
_
Forward substitution (apply the transformations to the right-hand side vector)
Initial right-hand side vector:
Subtract
10
= 2 times the rst row from
the second row:
_
_
10
20
18
_
_
_
_
1 0 0
2 1 0
0 0 1
_
_
_
_
10
20
18
_
_
=
_
_
10
40
18
_
_
Subtract
20
the third row:
Subtract
21
= 1.6 times the second row
from the third row:
_
_
1 0 0
0 1 0
3 0 1
_
_
_
_
10
40
18
_
_
=
_
_
10
40
48
_
_
_
_
1 0 0
0 1 0
0 1.6 1
_
_
_
_
10
40
48
_
_
=
_
_
10
40
16
_
_
As before
0
= 1,
1
= 2, and
2
As before
backward substitution for the example in Section 3.1. In this gure, we show how the
steps of Gaussian Elimination can be formulated as multiplications by the special matrices
L
i,j
. This time, Gaussian elimination is rst applied to the matrix, and then applied to the
right-hand side. Compare and contrast to Figure 3.3.
Gaussian elimination, Take 4,
using Gauss transforms
Initial matrix:
Subtract
10
= (4/2) = 2 times the
rst row from the second row and
20
=
(6/2) = 3 times the rst row from the third
row, overwriting the elemented that is ze-
roed: _
_
2 4 2
4 2 6
6 4 2
_
_
_
_
1 0 0
2 1 0
3 0 1
_
_
_
_
2 4 2
4 2 6
6 4 2
_
_
=
_
_
2 4 2
2 10 10
3 16 8
_
_
Subtract
21
= ((16)/(10)) = 1.6 times the second row from the third row, over-
writing the elemented that is zeroed:
_
_
1 0 0
0 1 0
0 1.6 1
_
_
_
_
2 4 2
2 10 10
3 16 8
_
_
=
_
_
2 4 2
2 10 10
3 1.6 8
_
_
Forward substitution (apply the transformations to the right-hand side vector)
Initial right-hand side vector:
Subtract
10
the second row and Subtract
20
= 3 times
the rst row from the third row: _
_
10
20
18
_
_
_
_
1 0 0
2 1 0
3 0 1
_
_
_
_
10
20
18
_
_
=
_
_
10
40
48
_
_
Subtract
21
= 1.6 times the second row from the third row:
_
_
1 0 0
0 1 0
0 1.6 1
_
_
_
_
10
40
48
_
_
=
_
_
10
40
16
_
_
As before
0
= 1,
1
= 2, and
2
As before
backward substitution for the example in Section 3.1. In this gure, we show how the
steps of Gaussian Elimination can be formulated as multiplications by the special matrices
L
i,j
. This time, Gaussian elimination is rst applied to the matrix, and then applied to the
right-hand side. Compare and contrast to Figure 3.4.
3.4. Gauss Transforms (GE, Take 4) 103
We notice that this one multiplication has the same net eect as the two multiplications in
Example 3.2 and Exercise 3.3, performed in succession.
Theorem 3.9 Let

L
j
be a matrix that equals the identity, except that for i > jthe (i, j)
elements (the ones below the diagonal in the jth column) have been replaced with
i,j
:
L
j
=
_
_
_
_
_
_
_
_
_
I
j
0 0 0 0
0 1 0 0 0
0
j+1,j
1 0 0
0
j+2,j
0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
m1,j
0 0 1
_
_
_
_
_
_
_
_
_
.
Then

L
j
A equals the matrix A except that for i > j the ith row is modied by subtracting
i,j
times the jth row from it. Such a matrix

L
j
is called a Gauss transform.
Proof: Let
L
j
=
_
_
_
_
_
_
_
_
_
I
j
0 0 0 0
0 1 0 0 0
0
j+1,j
1 0 0
0
j+2,j
0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
m1,j
0 0 1
_
_
_
_
_
_
_
_
_
A =
_
_
_
_
_
_
_
_
_
A
0:j1
a
T
j
a
T
j+1
a
T
j+2
.
.
.
a
T
m1
_
_
_
_
_
_
_
_
_
,
where I
k
equals a kk identity matrix, A
s:t
equals the matrix that consists of rows s through
t from matrix A, and a
T
k
equals the kth row of A. Then
L
j
A =
_
_
_
_
_
_
_
_
_
I
j
0 0 0 0
0 1 0 0 0
0
j+1,j
1 0 0
0
j+2,j
0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0
m1,j
0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
A
0:j1
a
T
j
a
T
j+1
a
T
j+2
.
.
.
a
T
m1
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
A
0:j1
a
T
j
j+1,j
a
T
j
+ a
T
j+1
j+2,j
a
T
j
+ a
T
j+2
.
.
.
m1,j
a
T
j
+ a
T
m1
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
A
0:j1
a
T
j
a
T
j+1
j+1,j
a
T
j
a
T
j+2
j+2,j
a
T
j
.
.
.
a
T
m1
m1,j
a
T
j
_
_
_
_
_
_
_
_
_
.
Example 3.10 The steps of Gaussian elimination already given in Figures 3.1 3.4 are now
repeated with the insights in this section, in Figure 3.5.
_
_
1 0 0
0 1 0
0 1.6 1
_
_
_
_
1 0 0
2 1 0
3 0 1
_
_
_
_
2 4 2
4 2 6
6 4 2
_
_
=
_
_
2 4 2
0 10 10
0 0 8
_
_
and
_
_
1 0 0
0 1 0
0 1.6 1
_
_
_
_
1 0 0
2 1 0
3 0 1
_
_
_
_
10
20
18
_
_
=
_
_
10
40
16
_
_
.
3.5 Gauss Transforms Continued (GE, Take 5)
Example 3.12 Now were are going to turn things around: Consider
_
_
1 0 0
10
1 0
20
0 1
_
_
_
_
2 4 2
4 2 6
6 4 2
_
_
=
_
_
2 4 2
4
10
(2) 2
10
(4) 6
10
(2)
6
20
(2) 4
20
(4) 2
20
(2)
_
_
(3.2)
The question we now ask is how should
10
and
20
be chosen so that zeroes are introduced
below the diagonal in the rst column?. Examining 4
10
(2) and 6
20
(2) we conclude
that our familiar
10
= 4/2 = and
20
= 6/2 = 3 have the desired property.
Example 3.13 Alternatively, we can write (3.2) as
_
_
1
_
0 0
_
_

10
20
_ _
1 0
0 1
_
_
_
_
_
2
_
4 2
_
_
4
6
_ _
2 6
4 2
_
_
_
3.5. Gauss Transforms Continued (GE, Take 5) 105
=
_
_
2
_
4 2
_
_

10
20
_
2 +
_
4
6
_

_

10
20
_
_
4 2
_
+
_
2 6
4 2
_
_
_
and note that to zero the elements below the diagonal in the rst column we must ensure
that
_

10
20
_
2 +
_
4
6
_
=
_
0
0
_
which means that
_

10
20
_
=
_
4
6
_
/2 =
_
2
3
_
.
Now let us try to generalize this insight: Let A
(0)
R
nn
be a given matrix and

L
(0)
a
Gauss transform which is to be determined. Partition
A
(0)
_

(0)
11
a
(0) T
12
a
(0)
21
A
(0)
22
_
and

L
(0)
_
1 0
l
(0)
21
I
_
.
Then
_
1 0
l
(0)
21
I
_
_

(0)
11
a
(0) T
12
a
(0)
21
A
(0)
22
_
=
_

(0)
11
a
(0) T
12
a
(0)
21
l
(0)
21

(0)
11
A
(0)
22
l
21
a
(0) T
12
_
.
Now,
We want to choose l
(0)
21
so that a
(0)
21
l
(0)
21

(0)
11
= 0, introducing zeroes below the diagonal
in the rst column. Clearly, l
(0)
21
= a
(0)
21
/
(0)
11
has this property. In other words, l
(0)
21
results from dividing a
(0)
21
by 1/
(0)
11
.
Moreover, we notice that the rest of the matrix, A
(0)
22
, is transformed to A
(0)
22
l
(0)
21
a
(0) T
12
.
In other words, an outer product is subtracted from A
(0)
22
, which we before called a rank-
1 update (ger).
After choosing l
(0)
21
as prescribed, let
A
(1)
:=
_
1 0
l
(0)
21
I
_
_

(0)
11
a
(0) T
12
a
(0)
21
A
(0)
22
_
=
_

(0)
11
a
(0) T
12
a
(0)
21
l
(0)
21

(0)
11
A
(0)
22
l
21
a
(0) T
12
_
=
_

(1)
11
a
(1) T
12
0 A
(1)
22
_
.
Exercise 3.14 Identify
(0)
11
, a
(0)
21
, a
T (0)
12
, A
(0)
22
,

L
(0)
, l
(0)
21
, and A
(0)
22
l
(0)
21
a
(0) T
12
in Example 3.12.
_
_
1 0 0
0 1 0
0
21
1
_
_
_
_
2 4 2
0 10 10
0 16 8
_
_
=
_
_
2 4 2
0 10 10
0 16
21
(10) 8
20
(10)
_
_
The question we now ask is how should
21
be chosen so that zeroes are introduced below
the diagonal in the second column?. Examining 16
21
(10) we conclude that our
familiar
21
= 16/(10) = 1.6 has the desired property. Alternatively, we notice that,
viewed as a vector,
_

21
_
=
_
16
_
/(10).
Continuing our more general discussion, partition
A
(1)
_
_
_
A
(1)
00
a
(1)
01
A
(1)
02
0
(1)
11
a
(1) T
12
0 a
(1)
21
A
(1)
22
_
_
_
and

L
(1)
_
_
I
1
0 0
0 1 0
0 l
(1)
21
I
_
_
.
Then
_
_
I
1
0 0
0 1 0
0 l
(1)
21
I
_
_
_
_
_
A
(1)
00
a
(1)
01
A
(1)
02
0
(1)
11
a
(1) T
12
0 a
(1)
21
A
(1)
22
_
_
_
=
_
_
_
A
(1)
00
a
(1)
01
A
(1)
02
0
(1)
11
a
(1) T
12
0 a
(1)
21
l
(1)
21

(1)
11
A
(1)
22
l
(1)
21
a
(1) T
12
_
_
_
.
Now,
We want to choose l
(1)
21
so that a
(1)
21
l
(1)
21

(1)
11
(1)
21
= a
(1)
21
/
(1)
11
(1)
21
(1)
21
by 1/
(1)
11
.
(1)
22
(1)
22
l
(1)
21
a
(1) T
12
.
(1)
22
1 update (ger).
After choosing l
(1)
21
as prescribed, let
A
(2)
=
_
_
I
1
0 0
0 1 0
0 l
(1)
21
I
_
_
_
_
_
A
(1)
00
a
(1)
01
A
(1)
02
0
(1)
11
a
(1) T
12
0 a
(1)
21
A
(1)
22
_
_
_
=
_
_
_
A
(1)
00
a
(1)
01
A
(1)
02
0
(1)
11
a
(1) T
12
0 0 A
(2)
22
_
_
_
Exercise 3.16 Identify A
(1)
00
, . . . , A
(1)
22
,

L
(1)
, l
(1)
21
, and A
(1)
22
l
(1)
21
a
(1) T
12
in Example 3.15.
3.5. Gauss Transforms Continued (GE, Take 5) 107
More general yet, assume that this progression of computations has proceded to where
A
(k)
_
_
_
A
(k)
00
a
(k)
01
A
(k)
02
0
(k)
11
a
(k) T
12
0 a
(k)
21
A
(k)
22
_
_
_
and

L
(k)
_
_
I
k
0 0
0 1 0
0 l
(k)
21
I
_
_
,
where A
(k)
00
and I
k
are k k matrices. Then
_
_
I
k
0 0
0 1 0
0 l
(k)
21
I
_
_
_
_
_
A
(k)
00
a
(k)
01
A
(k)
02
0
(k)
11
a
(k) T
12
0 a
(k)
21
A
(k)
22
_
_
_
=
_
_
_
A
(k)
00
a
(k)
01
A
(k)
02
0
(k)
11
a
(k) T
12
0 a
(k)
21
l
(k)
21

(k)
11
A
(k)
22
l
(k)
21
a
(k) T
12
_
_
_
.
Now,
We want to choose l
(k)
21
so that a
(k)
21
l
(k)
21

(k)
11
(k)
21
= a
(k)
21
/
(k)
11
(k)
21
(k)
21
by 1/
(k)
11
.
(k)
22
(k)
22
l
(k)
21
a
(k) T
12
.
(k)
22
1 update (ger).
After choosing l
(k)
21
as prescribed, let
A
(k+1)
=
_
_
I
1
0 0
0 1 0
0 l
(k)
21
I
_
_
_
_
_
A
(k)
00
a
(k)
01
A
(k)
02
0
(k)
11
a
(k) T
12
0 a
(k)
21
A
(k)
22
_
_
_
=
_
_
_
A
(k)
00
a
(k)
01
A
(k)
02
0
(k)
11
a
(k) T
12
0 0 A
(k+1)
22
_
_
_
Now, if A R
nn
, then A
(n)
=

L
(n1)

L
(1)
L
(0)
A = U, an upper triangular matrix. Also,
to solve Ax = b, we note that
Ux = (
L
(n1)

L
(1)
L
(0)
A)x =

L
(n1)

L
(1)
L
(0)
b
. .
b
.
The right-hand size of this we recognize as forward substitution applied to vector b. We will
later see that solving Ux =

b where U is upper triangular is equivalent to back substitution.
The reason why we got to this point as GE Take 5 is so that the reader, hopefully, now
recognizes this as just Gaussian elimination. The insights in this section are summarized in
the algorithm in Figure 3.6, in which the original matrix A is overwritten with the upper
triangular matrix that results from Gaussian elimination and the strictly lower triangular
elements are overwritten by the multipliers.
Algorithm: A := GE Take5 (A)
Partition A
_
A
TL
A
TR
A
BL
A
BR
_
where A
TL
is 0 0
while m(A
TL
) < m(A) do
Repartition
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
where
11
is 1 1
a
21
:= a
21
/
11
(= l
21
)
A
22
:= A
22
a
21
a
T
12
(= A
22
l
21
a
T
12
)
Continue with
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
endwhile
Figure 3.6: Algorithm for applying Gaussian elimination to matrix A, overwriting the matrix
with U on and above the diagonal, and the multipliers below the diagonal.
3.6 Toward the LU Factorization (GE Take 6)
Denition 3.17 (Inverse of a matrix) Let A R
nn
and B R
nn
have the property
that AB = BA = I. Then B is said to be the inverse of matrix A and is denoted by A
1
.
Later we will discuss when a matrix has an inverse and show that for square A and B it
is always the case that if AB = I then BA = I, and that the inverse of a matrix is unique.
Example 3.18 Let
L =
_
_
1 0 0
2 1 0
0 0 1
_
_
and L =
_
_
1 0 0
2 1 0
0 0 1
_
_
.
Then
L
L =
_
_
1 0 0
2 1 0
0 0 1
_
_
_
_
1 0 0
2 1 0
0 0 1
_
_
=
_
_
1 0 0
0 1 0
0 0 1
_
_
.
Notice that this should be intuitively true:

LA subtracts two times the rst row of A
3.6. Toward the LU Factorization (GE Take 6) 109
from the second row. LA adds two times the rst row of A from the second row. Thus,
L
LA = L(
LA) = A: the rst transformation subtracts two times the rst row from the
second row, and the second transformation adds two times the rst row back to the result,
undoing the rst transformation. Two transformations that always undo each other are
inverses of each other.
Exercise 3.19 Compute
_
_
1 0 0
2 1 0
3 0 1
_
_
_
_
1 0 0
2 1 0
3 0 1
_
_
and reason why this should be intuitively true.
Theorem 3.20 Let
L =
_
_
I
k
0 0
0 1 0
0 l
21
I
_
_
be a Gauss transform, where I is a k k identity matrix. Then
L =
_
_
I
k
0 0
0 1 0
0 l
21
I
_
_
(itself a Gauss transform) is its inverse: L
L =

LL = I.
Proof:
LL =
_
_
I
k
0 0
0 1 0
0 l
21
I
_
_
_
_
I
k
0 0
0 1 0
0 l
21
I
_
_
=
_
_
I
k
0 0
0 1 0
0 l
21
+Il
21
I
_
_
=
_
_
I
k
0 0
0 1 0
0 0 I
_
_
= I.
Similarly

LL = I. (Notice that when we use I without indicating its dimensions, it has the
dimensions that are required to t the situation.)
Exercise 3.21 To complete the above proof, show that

LL = I.
Exercise 3.22 Recall that
_
_
1 0 0
0 1 0
0 1.6 1
_
_
_
_
1 0 0
2 1 0
3 0 1
_
_
_
_
2 4 2
4 2 6
6 4 2
_
_
=
_
_
2 4 2
0 10 10
0 0 8
_
_
.
Show that
_
_
2 4 2
4 2 6
6 4 2
_
_
=
_
_
1 0 0
2 1 0
3 0 1
_
_
_
_
1 0 0
0 1 0
0 1.6 1
_
_
_
_
2 4 2
0 10 10
0 0 8
_
_
.
Exercise 3.23 Show that
_
_
1 0 0
2 1 0
3 0 1
_
_
_
_
1 0 0
0 1 0
0 1.6 1
_
_
=
_
_
1 0 0
2 1 0
3 1.6 1
_
_
so that
_
_
2 4 2
4 2 6
6 4 2
_
_
=
_
_
1 0 0
2 1 0
3 1.6 1
_
_
_
_
2 4 2
0 10 10
0 0 8
_
_
.
Theorem 3.24 Let

L
(0)
, ,

L
(n1)
be the sequence of Gauss transforms that transform
an n n matrix A to an upper triangular matrix:
L
(n1)

L
(0)
A = U.
Then
A = L
(0)
L
(n2)
L
(n1)
U,
where L
(j)
=

L
(j) 1
, the inverse of

L
(j)
.
Proof: If
L
(n1)
L
(n2)

L
(0)
A = U.
then
A = L
(0)
L
(n2)
L
(n1)
L
(n1)
. .
I
L
(n2)
. .
I

L
(0)
. .
I
A = L
(0)
L
(n2)
L
(n1)
U.
3.6. Toward the LU Factorization (GE Take 6) 111
Lemma 3.25 Let

L
(0)
, . . . ,

L
(n1)
be the sequence of Gauss transforms that transforms
a matrix A into an upper triangular matrix U:
L
(n1)

L
(0)
A = U
and let L
(j)
=

L
(j) 1
. Then

L
(k)
= L
(0)
L
(k1)
L
(k)
has the structure
L
(k)
=
_

L
(k)
TL
0
L
(k)
BL
I
_
where

L
(k)
TL
is a (k + 1) (k + 1) unit lower trianglar matrix.
Proof: Proof by induction on k.
Base case: k = 0.

L
(0)
= L
(0)
=
_
1 0
l
(0)
21
I
_
meets the desired criteria since 1 is a trivial
unit lower triangular matrix.
Inductive step: Assume

L
(k)
meets the indicated criteria. We will show that then

L
(k+1)
does too. Let
L
(k)
=
_

L
(k)
TL
0
L
(k)
BL
I
_
=
_
_
_
L
(k)
00
0 0
l
(k) T
10
1 0
L
(k) T
20
0 I
_
_
_
where

L
TL
(and hence

L
00
) are unit lower triangular matrices of dimension (k + 1)
(k + 1). Then
L
(k+1)
=

L
(k)
L
(k+1)
=
_
_
_
L
(k)
00
0 0
l
(k) T
10
1 0
L
(k) T
20
0 I
_
_
_
_
_
I
k+1
0 0
0 1 0
0 l
(k+1)
21
I
_
_
=
_
_
_
L
(k)
00
0 0
l
(k) T
10
1 0
L
(k) T
20
l
(k+1)
21
I
_
_
_
=
_
_
_
_
_

L
(k)
00
0
l
(k) T
10
1
_
0
_

L
(k) T
20
l
(k+1)
21
_
I
_
_
_
_
=
_

L
(k+1)
TL
0
L
(k+1)
BL
I
_
,
which meets the desired criteria since

L
(k+1)
TL
is unit lower triangular.
By the Principle of Mathematical Induction the result holds for L
(j)
, 0 j < n 1.
Corollary 3.26 Under the conditions of Lemma 3.25, L =

L
(n1)
is a unit lower trian-
gular matrix the strictly lower triangular part of which is the sum of all the strictly lower
triangular parts of L
(0)
, . . . , L
(n1)
:
L =

L
(n1)
=
_
_
_
_
_
_
_
1 0
l
(0)
21
1 0
l
(2)
21
.
.
.
1 0
l
(n2)
21
1
_
_
_
_
_
_
_
.
(Note that l
(n1)
21
is a vector of length zero, so that the last step of involving L
(n1)
is
really a no op.)
Example 3.27 A consequence of this corollary is that the fact that
_
_
1 0 0
2 1 0
3 0 1
_
_
_
_
1 0 0
0 1 0
0 1.6 1
_
_
=
_
_
1 0 0
2 1 0
3 1.6 1
_
_
in Exercise 3.23 is not a coincidence: For these matrices all you have to do nd the strictly
lower triangular parts of the right-hand side is to move the nonzeroes below the diagonal
in the matrices on the left-hand side to the corresponding elements in the matrix on the
right-hand side of the equality sign.
Exercise 3.28 The order in which the Gauss transforms appear is important. In particular,
verify that
_
_
1 0 0
0 1 0
0 1.6 1
_
_
_
_
1 0 0
2 1 0
3 0 1
_
_
,=
_
_
1 0 0
2 1 0
3 1.6 1
_
_
.
Theorem 3.29 Let

L
(0)
, . . . ,

L
(n1)
be the sequence of Gauss transforms that transforms
an nn matrix A into an upper triangular matrix U:

L
(n1)

L
(0)
A = U and let L
(j)
=
L
(j) 1
. Then A = LU, where L = L
(0)
L
(n1)
is a unit lower triangular matrix and can
be easily obtained from

L
(0)
, . . . ,

L
(n1)
by the observation summarized in Corollary 3.26.
3.7. Coding Up Gaussian Elimination 113
Remark 3.30 Notice that Theorem 3.29 does not say that for every square matrix Gaus-
sian elimination is well-dened. It merely says that if Gaussian elimination as presented
thus far completes, then there is a unit lower triangular matrix L and upper triangular
matrix U such that A = LU.
3.7 Coding Up Gaussian Elimination
Let us look at GE Take 3b, which overwrites a matrix with the upper triangular matrix that
results from Gaussian elimination while overwriting the lower triangular part of that matrix
with the multipliers. Assuming we start with an nn matrix A, the algorithm and M-script
code are given in Figure 3.7.
Think of the loop indexed by j as tracking the diagonal element of the matrix:
j,j
(A(j,j)) identies how many columns we have already processed by zeroing elements
below the diagonal.
The loop indexed by i computes the multiplier
i,j
, multiplies the jth row by that
multiplier, and then subtracts this from the ith row. Notice that this only is done with
the part of the row to the right of the jth column, since to the left we already have
introduced zeroes and have overwritten those zeroes with the multipliers.
The loop indexed by p performs the multiply the jth row by
i,j
(which has overwritten
i,j
).
Now, we can recognize the operation multiply the jth row by a constant and subtract
it from the ith row as an axpy operation. In Figure 3.8 we illustrate how the loop indexed
by p implements that axpy operation. Next, in Figure 3.9 we rearrange the algorithm and
code in Figure 3.7 to compute the multipliers (which overwrite a
i,j
) rst, before moving on
to update rows j +1, j +2, . . . with the double nested loops indexed with i and p. This allows
us to then notice that order of the double-nested loop indexed by i and p can be reversed,
in Figure 3.10. Again, the inner-most loop can then be recognized as an axpy operation, in
Figure 3.11. Finally, we notice that the inner loops indexed by i and p in fact implement a
rank-1 update, in Figure 3.12.
We would like to think that it is easier to reason about the algorithm and code in Fig-
ures 3.6 and 3.13, which are an alternative representation for the algorithms in Figure 3.12.
for j = 0, . . . , n 1
for i = j + 1, . . . , n 1
i,j
:=
i,j
/
j,j
for p = j + 1, . . . , n 1
i,p
:=
i,p
i,j
j,p
endfor
endfor
endfor
for j=1:n
for i=j+1:n
A(i,j) = A(i,j)/A(j,j);
for p=j+1:n
A(i,p) = A(i,p) + A(i,j) * A(j,p);
end
end
end
Figure 3.7: Algorithm and M-script for GE Take 3b.
for j = 0, . . . , n 1
for i = j + 1, . . . , n 1
i,j
:=
i,j
/
j,j
i,j+1:n1
:=
i,j+1:n1
i,j

j,j+1:n1
endfor
endfor
for j=1:n
for i=j+1:n
A(i,j) = A(i,j)/A(j,j);
A(i,j+1:n) = ...
SLAP_Axpy( - A(i,j), A(j,j+1:n), ...
A(i,j+1:n));
end
end
Figure 3.8: Algorithm and M-script for GE Take 3b, with the inner loop (indexed by p)
identied as an axpy operation.
for j = 0, . . . , n 1
for i = j + 1, . . . , n 1
i,j
:=
i,j
/
j,j
endfor
for i = j + 1, . . . , n 1
for p = j + 1, . . . , n 1
i,p
:=
i,p
i,j
j,p
endfor
endfor
endfor
for j=1:n
for i=j+1:n
A(i,j) = A(i,j)/A(j,j);
end
for i=j+1:n
for p=j+1:n
A(i,p) = A(i,p) + A(i,j) * A(j,p);
end
end
end
Figure 3.9: The algorithm and M-script for GE Take 3b in Figure 3.7, but with the multipliers
computed before the rest of the rows are updated. Notice that now the inner two loops
(indexed by i and p) can be done in either order.
3.7. Coding Up Gaussian Elimination 115
for j = 0, . . . , n 1
for i = j + 1, . . . , n 1
i,j
:=
i,j
/
j,j
endfor
for p = j + 1, . . . , n 1
for i = j + 1, . . . , n 1
i,p
:=
i,p
i,j
j,p
endfor
endfor
endfor
for j=1:n
for i=j+1:n
A(i,j) = A(i,j)/A(j,j);
end
for p=j+1:n
for i=j+1:n
A(i,p) = A(i,p) + A(i,j) * A(j,p);
end
end
end
Figure 3.10: The algorithm and M-script for GE Take 3b in Figure 3.9, with the order of
the inner two loops (indexed by i and p) reversed.
for j = 0, . . . , n 1
j+1:n1,j
:=
j+1:n1,j
/
j,j
for p = j + 1, . . . , n 1
j+1:n1,p
:=
j+1:n1,p
j,p
j+1:n1,j
endfor
endfor
for j=1:n
SLAP_Scal( 1/A(j,j), A(j+1:n,j) );
for p=j+1:n
A(j+1:n,p) = ...
SLAP_Axpy( A(j,p), A(j+1:n,j), ...
A(j+1:n,p) );
end
end
Figure 3.11: The algorithm and M-script for GE Take 3b in Figure 3.10, recognizing that
rst loop indexed by i is a scal operation and the second loop indexed by i is an axpy
operation.
for j = 0, . . . , n 1
j+1:n1,j
:=
j+1:n1,j
/
j,j
j+1:n1,j+1:n1
:=
j+1:n1,j+1:n1
j+1:n1,j
j,j+1:n1
endfor
for j=1:n
SLAP_Scal( 1/A(j,j), A(j+1:n,j) );
A(j+1:n,j+1:n) = SLAP_Ger( ...
-1, A(j+1:n,j), A(j,j+1:n), ...
A(j+1:n,j+1:n) );
end
end
Figure 3.12: The algorithm and M-script for GE Take 3b in Figure 3.11, recognizing that the
loop indexed by p, which calls the axpy operations, implements the updating of the indi-
cated matrix by subtracting o an outer product (combined this is called a rank-1 update).
Compare and contrast also with the algorithm in Figure 3.6.
function [ A_out ] = GE_Take5( A )
[ ATL, ATR, ...
ABL, ABR ] = FLA_Part_2x2( A,
0, 0, FLA_TL );
while ( size( ATL, 1 ) < size( A, 1 ) )
[ A00, a01, A02, ...
a10t, alpha11, a12t, ...
A20, a21, A22 ] = ...
FLA_Repart_2x2_to_3x3( ATL, ATR, ...
ABL, ABR, 1, 1, FLA_BR );
%-----------------------------------------------------%
% a21 = a21 / alpha11;
a21 = SLAP_Scal( 1 / alpha11, a21 );
% A22 = A22 - a21 * a12t;
A22 = SLAP_Ger( -1, a21, a12t, A22 );
%-----------------------------------------------------%
[ ATL, ATR, ...
ABL, ABR ] =
FLA_Cont_with_3x3_to_2x2( A00, a01, A02, ...
a10t, alpha11, a12t, ...
A20, a21, A22,
FLA_TL );
end
A_out = [ ATL, ATR
ABL, ABR ];
return
Figure 3.13: FLAME@lab code for algorithm in Figure 3.6. Compare and contrast also with
the code in Figure 3.12. Two loops are hidden inside of the call to FLA Ger.
Chapter 4
LU Factorization
In this chapter, we will use the insights into how blocked matrix-matrix and matrix-vector
multiplication works to derive and state algorithms for solving linear systems in a more
concise way that translates more directly into algorithms.
The idea is that, under circumstances to be discussed later, a matrix A R
nn
can be
factored into the product of two matrices L, U R
nn
A = LU,
where L is unit lower triangular (it has ones on the diagonal) and U is upper triangular. When
solving the linear system of equations (in matrix notation) Ax = b, one can substitute A =
LU to nd that (LU)x = b. Now, using the associative properties of matrix multiplication,
we nd that L(Ux) = b. Substituting in z = Ux, this means that Lz = b. Thus, solving
Ax = b can be accomplished via the steps
Solve Lz = b; followed by
Solve Ux = z.
What we will show is that this process is equivalent to Gaussian elimination with the aug-
mented system
_
A b
_
followed by backward substitution.
Next, we will discuss how to overcome some of the conditions under which the precedure
breaks down, leading to LU factorization with pivoting.
The reader will notice that this chapter starts where the last chapter ended: with A = LU
and how to compute this decomposition more directly. We will show how starting from this
point leads directly to the algorithm in Figure 3.6. We then work backwards, exposing once
again how Gauss transforms t into the picture. This allows us to then introduce swapping
of equations (pivoting) into the basic algorithm. So, be prepared for seeing some of the same
material again, under a slightly dierent light.
117
118 Chapter 4. LU Factorization
4.1 Gaussian Elimination Once Again
In Figure 4.1 we illustrate how Gaussian Elimination is used to transform a linear system
of three equations in three unknowns into an upper triangular system by considering the
problem
2
0

1
+
2
= 6
2
0
2
1
3
2
= 3
4
0
+4
1
+7
2
=3
(4.1)
Step 1: A multiple of the rst row is substracted from the second row to eliminate the
0
term. This multiple is computed from the coecient of the term to be eliminated
and the coecient of the same term in the rst equation.
Step 2: Similarly, a multiple of the rst row is substracted from the third row.
Step 3: A multiple of the second row is substracted from the third row to eliminate
the
1
term.
This leaves the upper triangular system
2
0

1
+
2
= 6
3
1
2
2
= 9
2
= 3
which is easier to solve, via backward substitution to be discussed later.
In Figure 4.2 we again show how it is not necessary to write down the entire linear equa-
tions: it suces to perform Gaussian elimination on the matrix of coecients, augmented
by the right-hand side.
4.2 LU factorization
Next, let us consider the computation of the LU factorization of a square matrix. We will
ignore for now when this factorization can be computed, and focus on the computation itself.
Assume A R
nn
is given and that L and U are to be computed such that A = LU,
where L R
nn
is unit lower triangular and U R
nn
is upper triangular. We derive an
algorithm for computing this operation by partitioning
A
_

11
a
T
12
a
21
A
22
_
, L
_
1 0
l
21
L
22
_
, and U
_

11
u
T
12
0 U
22
_
,
where we use our usual notation that lower-case Greek letter denote scalars, lower-case
Roman letters vectors, and upper-case Roman letters matrices. Now, A = LU implies (using
4.2. LU factorization 119
Step Current system Operation
1
2
0

1
+
2
= 6
2
0
2
1
3
2
= 3
4
0
+4
1
+7
2
=3
2
0
2
1
3
2
= 3
_
2
2
_
(2
0

1
+
2
= 6)
3
1
2
2
= 9
2
2
0

1
+
2
= 6
3
1
2
2
= 9
4
0
+4
1
+7
2
=3
2
0
2
1
3
2
= 3
_
4
2
_
(4
0
+4
1
+7
2
= 3)
6
1
+5
2
=15
3
2
0

1
+
2
= 6
3
1
2
2
= 9
6
1
+5
2
=15
6
1
+5
2
=15
_
6
3
_
(3
1
2
2
= 9)
2
= 3
4
2
0

1
+
2
= 6
3
1
2
2
= 9
2
= 3
Figure 4.1: Gaussian Elimination on a linear system of three equations in three unknowns.
Step Current system Multiplier Operation
1
_
_
2 1 1 6
2 2 3 3
4 4 7 3
_
_
2
2
= 1
2 2 3 3
1(2 1 1 6)
0 3 2 9
2
_
_
2 1 1 6
0 3 2 9
4 4 7 3
_
_
4
2
= 2
4 4 7 3
(2)(2 1 1 6)
0 6 5 15
3
_
_
2 1 1 6
0 3 2 9
0 6 5 15
_
_
6
3
= 2
0 6 5 15
(2)(0 3 2 9)
0 0 1 3
4
_
_
2 1 1 6
0 3 2 9
0 0 1 3
_
_
Figure 4.2: Gaussian Elimination with an augmented matrix of coecients. (Compare and
contrast with Figure 4.1.
what we learned about multiplying matrices that have been partitioned into submatrices)
A
..
_

11
a
T
12
a
21
A
22
_
=
L
..
_
1 0
l
21
L
22
_
U
..
_

11
u
T
12
0 U
22
_
=
LU
..
_

11
u
T
12
l
21
11
l
21
u
T
12
+L
22
U
22
_
.
For two matrices to be equal, their elements must be equal, and therefore, if they are parti-
tioned conformally, their submatrices must be equal:
11
=
11
a
T
12
= u
T
12
a
21
= l
21
11
A
22
= l
21
u
T
12
+L
22
U
22
or, rearranging,
11
=
11
u
T
12
= a
T
12
l
21
= a
21
/
11
L
22
U
22
= A
22
l
21
u
T
12
.
This suggests the following steps for overwriting a matrix A with its LU factorization:
Partition
A
_

11
a
T
12
a
21
A
22
_
.
Update a
21
= a
21
/
11
(= l
21
).
Update A
22
= A
22
a
21
a
T
12
(= A
22
l
21
u
T
12
).
Overwrite A
22
with L
22
and U
22
by continuing recursively with A = A
22
.
This will leave U in the upper triangular part of A and the strictly lower triangular part of
L in the strictly lower triangular part of A. (Notice that the diagonal elements of L need
not be stored, since they are known to equal one.)
This algorithm is presented in Figure 4.3 as a loop-based algorithm.
Example 4.1 The LU factorization algorithm in Figure 4.3 is illustrated in Figure 4.4 for
the coecient matrix from Figure 4.1. Examination of the computations in Figure 4.2
and Figure 4.4 highlights how Gaussian elimination and LU factorization require the same
computations on the coecient matrix. To make it easy to do this comparison, we repeat
these gures in Figures. 4.54.6.
Of course, we arrive at the same conclusion (that Gaussian elimination on the coecient
matrix is the same as LU factorization on that matrix) by comparing the algorithms in
Figure 4.3 to Figure 3.6.
Remark 4.2 LU factorization and Gaussian elimination with the coecient matrix are
one and the same computation.
4.2. LU factorization 121
A := LU unb(A)
Partition A
_
A
TL
A
TR
A
BL
A
BR
_
where A
TL
is 0 0
while m(A
TL
) < m(A) do
Repartition
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
where
11
is 1 1
a
21
:= a
21
/
11
(= l
21
)
A
22
:= A
22
a
21
a
T
12
(= A
22
l
21
a
T
12
)
Continue with
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
endwhile
Figure 4.3: Algorithm for computing the LU factorization. Notice that this is exactly the
algorithm in Figure 3.6.
Step
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
a
21
/
11
A
22
a
21
a
T
12
1-2
_
_
2 1 1
2 2 3
4 4 7
_
_
_
2
4
_
/(2) =
_
1
2
_
_
2 3
4 7
_
_
1
2
__
1 1
_
=
_
3 2
6 5
_
3
_
_
2 1 1
1 3 2
2 6 5
_
_
_
6
_
/(3) =
_
2
_
_
5
_
_
2
_ _
2
_
=
_
1
_
_
_
2 1 1
1 3 2
2 2 1
_
_
Figure 4.4: LU factorization of a 33 matrix. Here, Step refers to the corresponding step
for Gaussian Elimination in Figure 4.2
Step Current system Multiplier Operation
1
_
_
2 1 1 6
2 2 3 3
4 4 7 3
_
_
2
2
= 1
2 2 3 3
1(2 1 1 6)
0 3 2 9
2
_
_
2 1 1 6
0 3 2 9
4 4 7 3
_
_
4
2
= 2
4 4 7 3
(2)(2 1 1 6)
0 6 5 15
3
_
_
2 1 1 6
0 3 2 9
0 6 5 15
_
_
6
3
= 2
0 6 5 15
(2)(0 3 2 9)
0 0 1 3
4
_
_
2 1 1 6
0 3 2 9
0 0 1 3
_
_
Figure 4.5: Figure 4.2 again: Gaussian Elimination with an augmented matrix of coecients.
Step
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
a
21
/
11
A
22
a
21
a
T
12
1-2
_
_
2 1 1
2 2 3
4 4 7
_
_
_
2
4
_
/(2) =
_
1
2
_
_
2 3
4 7
_
_
1
2
__
1 1
_
=
_
3 2
6 5
_
3
_
_
2 1 1
1 3 2
2 6 5
_
_
_
6
_
/(3) =
_
2
_
_
5
_
_
2
_ _
2
_
=
_
1
_
_
_
2 1 1
1 3 2
2 2 1
_
_
Figure 4.6: LU factorization of a 33 matrix. Compare with the above Gaussian elimination
with the coecient matrix!
4.3. Forward Substitution = Solving a Unit Lower Triangular System 123
Step
Stored multipliers
and right-hand side Operation
1
_
_
6
1 3
2 2 3
_
_
3
(1)( 6)
9
2
_
_
6
1 9
2 2 3
_
_
3
(2)( 6)
15
3
_
_
6
1 9
2 2 15
_
_
15
(2)( 9)
3
4
_
_
6
1 9
2 -2 3
_
_
Figure 4.7: Forward substitution with the multipliers computed for a linear system in Fig-
ure 4.2. Compare to what happens to the right-hand side (the part to the right of the [) in
Figure 4.2.
4.3 Forward Substitution = Solving a Unit Lower Triangular
System
It is often the case that the coecient matrix for the linear system is available a priori and
the right-hand side becomes available later. In this case, one may want to perform Gaussian
elimination without augmenting the system with the right-hand-side or, equivalently, LU
factorization on the coecient matrix. In Figure 4.7 we illustrate that if the multipliers are
stored, typically over the elements that were zeroed when a multiplier was used, then the
computations that were performed during Gaussian Elimination can be applied a postiori
(afterwards), once the right-hand side becomes available. This process is often referred to
as forward substitution.
Next, we show how forward substitution is the same as solving the linear system Lz = b
where b is the right-hand side and L is the matrix that resulted from the LU factorization
(and is thus unit lower triangular, with the multipliers from Gaussian Elimination stored
below the diagonal).
Given unit lower triangular matrix L R
nn
and vectors z, b R
n
, consider the equation
Lz = b where L and b are known and z is to be computed. Partition
L
_
1 0
l
21
L
22
_
, z
_

1
z
2
_
, and b
_

1
b
2
_
.
(Note: the horizontal line here partitions the result. It is not a division.) Now, Lz = b
implies
b
..
_

1
b
2
_
=
L
..
_
1 0
l
21
L
22
__

1
z
2
_
=
Lz
..
_

1
l
21
1
+L
22
z
2
_
so that
_

1
=
1
b
2
= l
21
1
+L
22
z
2
_
or
_

1
=
1
z
2
= b
2
l
21
1
_
.
This suggests the following steps for overwriting the vector b with the solution vector z:
Partition
L
_
1 0
l
21
L
22
_
and b
_

1
b
2
_
Update b
2
= b
2
1
l
21
(this is an axpy operation!).
Continue recursively with L = L
22
and b = b
2
.
This algorithm is presented as an iteration using our notation in Figure 4.8. It is il-
lustrated for the matrix L that results from Equation (4.1) in Figure 4.9. Examination
of the computations in Figure 4.7 on the right-hand-side and 4.9 highlights how forward
substitution and the solution of Lz = b are related: they are the same!
Exercise 4.3 Use http://www.cs.utexas.edu/users/flame/Spark/ to write a FLAME@lab
code for computing the solution of Lx = b, overwriting b with the solution and assuming
that L is unit lower triangular.
Exercise 4.4 Modify the algorithm in Figure 4.8 so that it solves Lz = b when L is lower
triangular matrix (not unit lower triangular). Next implement it using FLAME@lab.
4.4 Backward Substitution = Solving an Upper Triangular
System
Next, let us consider how to solve a linear system Ux = b and how it is the same as backward
substitution.
Given upper triangular matrix U R
nn
and vectors x, b R
n
, consider the equation
Ux = b where U and b are known and x is to be computed. Partition
U
_

11
u
T
12
0 U
22
_
, x
_

1
x
2
_
and b
_

1
b
2
_
.
4.4. Backward Substitution = Solving an Upper Triangular System 125
Algorithm: [b] := Ltrsv unb(L, b)
Partition L
_
L
TL
0
L
BL
L
BR
_
, b
_
b
T
b
B
_
where L
TL
is 0 0, b
T
has 0 rows
while m(L
TL
) < m(L) do
Repartition
_
L
TL
0
L
BL
L
BR
_
_
_
L
00
0 0
l
T
10

11
0
L
20
l
21
L
22
_
_
,
_
b
T
b
B
_
_
_
b
0
1
b
2
_
_
where
11
is 1 1 ,
1
has 1 row
b
2
:= b
2
1
l
21
Continue with
_
L
TL
0
L
BL
L
BR
_
_
_
L
00
0 0
l
T
10

11
0
L
20
l
21
L
22
_
_
,
_
b
T
b
B
_
_
_
b
0
1
b
2
_
_
endwhile
Figure 4.8: Algorithm for triangular solve with unit lower triangular matrix.
Step
_
_
L
00
0 0
l
T
10

11
0
L
20
l
21
L
22
_
_
_
_
b
0
1
b
2
_
_
b
2
l
21
1
1-2
_
_
1 0 0
1 1 0
2 2 1
_
_
_
_
6
3
3
_
_
_
3
3
_
_
1
2
_
(6) =
_
9
15
_
3
_
_
1 0 0
1 1 0
2 2 1
_
_
_
_
6
9
15
_
_
_
15
_
_
2
_
(9) = (3)
_
_
1 0 0
1 1 0
2 2 1
_
_
_
_
6
9
3
_
_
Figure 4.9: Triangular solve with unit lower triangular matrix computed in Figure 4.4.
Algorithm: [b] := Utrsv unb(U, b)
Partition U
_
U
TL
U
TR
U
BL
U
BR
_
, b
_
b
T
b
B
_
where U
BR
is 0 0, b
B
has 0 rows
while m(U
BR
) < m(U) do
Repartition
_
U
TL
U
TR
0 U
BR
_
_
_
U
00
u
01
U
02
0
11
u
T
12
0 0 U
22
_
_
,
_
b
T
b
B
_
_
_
b
0
1
b
2
_
_
where
11
is 1 1 ,
1
has 1 row
1
:= (
1
u
T
12
b
2
)/
11
Continue with
_
U
TL
U
TR
0 U
BR
_
_
_
U
00
u
01
U
02
0
11
u
T
12
0 0 U
22
_
_
,
_
b
T
b
B
_
_
_
b
0
1
b
2
_
_
endwhile
Figure 4.10: Algorithm for triangular solve with upper triangular matrix.
Now, Ux = b implies
b
..
_

1
b
2
_
=
U
..
_

11
u
T
12
0 U
22
_
x
..
_

1
x
2
_
=
Ux
..
_

11
1
+u
T
12
x
2
U
22
x
2
_
so that
_

1
=
11
1
+u
T
12
x
2
b
2
= U
22
x
2
_
or
_

1
= (
1
u
T
12
x
2
)/
11
U
22
x
2
= b
2
_
This suggests the following steps for overwriting the vector b with the solution vector x:
Partition
U
_

11
u
T
12
0 U
22
_
, and b
_

1
b
2
_
Solve U
22
x
2
= b
2
for x
2
, overwriting b
2
with the result.
Update
1
= (
1
u
T
12
b
2
)/
11
(= (
1
u
T
12
x
2
)/
11
).
This suggests the algorithms in Figure 4.10.
4.5. Solving the Linear System 127
Exercise 4.5 Side-by-side, solve the upper triangular linear system
2
0

1
+
2
= 6
3
1
2
2
= 9
2
= 3
using the usual approach and apply the algorithm in Figure 4.10 with
U =
_
_
2 1 1
0 3 2
0 0 1
_
_
and b =
_
_
6
9
3
_
_
.
In other words, for this problem, give step-by-step details of what both methods do, much
like Figures. 4.5 and 4.6.
Exercise 4.6 Use http://www.cs.utexas.edu/users/flame/Spark/ to write a FLAME@lab
code for computing the solution of Ux = b, overwriting b with the solution and assuming
that U is upper triangular.
4.5 Solving the Linear System
What we have seen is that Gaussian Elimination can be used to convert a linear system
into an upper triangular linear system, which can then be solved. We also showed that
computing the LU factorization of a matrix is the same as performing Gaussian Elimination
on the matrix of coecients. Finally, we showed that forward substitution is equivalent
to solving Lz = b, where L is the unit lower triangular matrix that results from the LU
factorization. We can now show how the solution of the linear system can computed using
the LU factorization.
Let A = LU and assume that Ax = b, where A and b are given. Then (LU)x = b or
L(Ux) = b. Let us introduce a dummy vector z = Ux. Then Lz = b and z can be computed
as described in the previous section. Once z has been computed, x can be computed by
solving Ux = z where now U and z are known.
4.6 When LU Factorization Breaks Down
A question becomes Does Gaussian elimination always solve a linear system? Or, equiva-
lently, can an LU factorization always be computed?
What we do know is that if an LU factorization can be computed and the upper triangular
factor U has no zeroes on the diagonal, then Ax = b can be solved for all right-hand side
vectors b. The reason is that if the LU factorization can be computed, then A = LU for
some unit lower triangular matrix L and upper triangular matrix U. Now, if you look at the
algorithm for forward substitition (solving Lz = b), you will see that the only computations
that are encountered are multiplies and adds. Thus, the algorithm will complete. Similarly,
the backward substitution algorithm (for solving Ux = z) can only break down if the division
causes an error. And that can only happen if U has a zero on its diagonal.
Are there examples where LU (Gaussian elimination as we have presented it so far) can
break down? The answer is yes. A simple example is the matrix A =
_
0 1
1 0
_
. In the rst
step, the algorithm for LU factorization will try to compute the multiplier 1/0, which will
cause an error.
Now, Ax = b is given by the set of linear equations
_
0 1
1 0
__

0
1
_
=
_

1
0
_
so that Ax = b is equivalent to
_

1
0
_
=
_

0
1
_
and the solution to Ax = b is given by the vector x =
_

1
0
_
.
To motivate the solution, consider applying Gaussian elimination to the following exam-
ple:
2
0
+ 4
1
+(2)
2
=10
4
0
+ 8
1
+ 6
2
= 20
6
0
+(4)
1
+ 2
2
= 18
Recall that solving this linear system via Gaussian elimination relies on the fact that its
solution does not change if equations are reordered.
Now,
By subtracting (4/2) = 2 times the rst row from the second row, we get
2
0
+ 4
1
+(2)
2
=10
0
0
+ 0
1
+ 10
2
= 40
6
0
+(4)
1
+ 2
2
= 18
4.6. When LU Factorization Breaks Down 129
By subtracting (6/2) = 3 times the rst row from the third row, we get
2
0
+ 4
1
+(2)
2
=10
0
0
+ 0
1
+ 10
2
= 40
0
0
+(16)
1
+ 8
2
= 48
Now, weve got a problem. The algorithm we discussed so far would want to subtract
((16)/0) times the second row from the third row, which causes a divide-by-zero
error. Instead, we have to use the fact that reordering the equations does not change
the answer, swapping the second row with the third:
2
0
+ 4
1
+(2)
2
=10
0
0
+(16)
1
+ 8
2
= 48
0
0
+ 0
1
+ 10
2
= 40
at which point we are done transforming our system into an upper triangular system,
and the backward substition can commence to solve the problem.
Another example:
0
0
+ 4
1
+(2)
2
=10
4
0
+ 8
1
+ 6
2
= 20
6
0
+(4)
1
+ 2
2
= 18
Now,
We start by trying to subtract (4/0) times the rst row from the second row, which
leads to an error. So, instead, we swap the rst row with any of the other two rows:
4
0
+ 8
1
+ 6
2
= 20
0
0
+ 4
1
+(2)
2
=10
6
0
+(4)
1
+ 2
2
= 18
By subtracting (6/4) = 3/2 times the rst row from the third row, we get
4
0
+ 8
1
+ 6
2
= 20
0
0
+ 4
1
+(2)
2
=10
0
0
+(16)
1
+(7)
2
=22
Next, we subtract (16)/4 = 4 times the second row from the third to obtain
4
0
+8
1
+ 6
2
= 20
0
0
+4
1
+ (2)
2
=10
0
0
+0
1
+(15)
2
=62
at which point we are done transforming our system into an upper triangular system,
and the backward substition can commence to solve the problem.
The above discussion suggests that the LU factorization in Fig. 4.11 needs to be modied
to allow for row exchanges. But to do so, we need to create some machinery.
4.7 Permutations
_
_
0 1 0
0 0 1
1 0 0
_
_
. .
P
_
_
2 1 2
3 2 1
1 0 3
_
_
. .
A
=
_
_
3 2 1
1 0 3
2 1 2
_
_
.
Notice that multiplying A by P from the left permuted the order of the rows in that matrix.
Examining the matrix P in Example 4.7 we see that each row of P appears to equals a
unit basis vector. This leads us to the following denitions:
Denition 4.8 A vector p = (k
0
, k
1
, . . . , k
n1
)
T
is said to be a permutation (vector) if
k
j
0, . . . , n 1, 0 j < n, and k
i
= k
j
implies i = j.
We will below write (k
0
, k
1
, . . . , k
n1
)
T
to indicate a column vector, for space considerations.
This permutation is just a rearrangement of the vector (0, 1, . . . , n 1)
T
.
Denition 4.9 Let p = (k
0
, . . . , k
n1
)
T
be a permutation. Then
P =
_
_
_
_
_
e
T
k
0
e
T
k
1
.
.
.
e
T
k
n1
_
_
_
_
_
is said to be a permutation matrix.
In other words, P is the identity matrix with its rows rearranged as indicated by the n-tuple
(k
0
, k
1
, . . . , k
n1
). We will frequently indicate this permutation matrix as P(p) to indicate
that the permutation matrix corresponds to the permutation vector p.
4.7. Permutations 131
Theorem 4.10 Let p = (k
0
, . . . , k
n1
)
T
be a permutation. Consider
P = P(p) =
_
_
_
_
_
e
T
k
0
e
T
k
1
.
.
.
e
T
k
n1
_
_
_
_
_
, x =
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
, and A =
_
_
_
_
_
a
T
0
a
T
1
.
.
.
a
T
n1
_
_
_
_
_
.
Then
Px =
_
_
_
_
_
k
0
k
1
.
.
.
k
n1
_
_
_
_
_
, and PA =
_
_
_
_
_
a
T
k
0
a
T
k
1
.
.
.
a
T
k
n1
_
_
_
_
_
.
In other words, Px and PA rearrange the elements of x and the rows of A in the order in-
dicated by permutation vector p.
Proof: Recall that unit basis vectors have the property that e
T
j
A = a
T
j
.
PA =
_
_
_
_
_
e
T
k
0
e
T
k
1
.
.
.
e
T
k
n1
_
_
_
_
_
A =
_
_
_
_
_
e
T
k
0
A
e
T
k
1
A
.
.
.
e
T
k
n1
A
_
_
_
_
_
=
_
_
_
_
_
a
T
k
0
a
T
k
1
.
.
.
a
T
k
n1
_
_
_
_
_
.
The result for Px can be proved similarly or, alternatively, by viewing x as a matrix with
only one column.
Exercise 4.11 Let p = (2, 0, 1)
T
. Compute
P(p)
_
_
2
3
1
_
_
and P(p)
_
_
2 1 2
3 2 1
1 0 3
_
_
.
Hint: it is not necessary to write out P(p): the vector p indicates the order in which the
elements and rows need to appear.
Corollary 4.12 Let p = (k
0
, k
1
, . . . , k
n1
)
T
be a permutation and P = P(p). Consider
A =
_
a
0
a
1
a
n1
_
. Then AP
T
=
_
a
k
0
a
k
1
a
k
n1
_
.
Proof: Recall that unit basis vectors have the property that Ae
k
= a
k
.
AP
T
= A
_
_
_
_
_
e
T
k
0
e
T
k
1
.
.
.
e
T
k
n1
_
_
_
_
_
T
= A
_
e
k
0
e
k
1
e
k
n1
_
=
_
Ae
k
0
Ae
k
1
Ae
k
n1
_
=
_
a
k
0
a
k
1
a
k
n1
_
.
Corollary 4.13 If P is a permutation matrix, then so is P
T
.
This follows from the observation that if P can be viewed either as a rearrangement of
the rows or as a (usually dierent) rearrangement of the columns.
Corollary 4.14 Let P be a permutation matrix. Then PP
T
= P
T
P = I
Proof: Let p = (k
0
, k
1
, . . . , k
n1
)
T
be the permutation that denes P. Then
PP
T
=
_
_
_
_
_
e
T
k
0
e
T
k
1
.
.
.
e
T
k
n1
_
_
_
_
_
_
_
_
_
_
e
T
k
0
e
T
k
1
.
.
.
e
T
k
n1
_
_
_
_
_
T
=
_
_
_
_
_
e
T
k
0
e
T
k
1
.
.
.
e
T
k
n1
_
_
_
_
_
_
e
k
0
e
k
1
e
k
n1
_
=
_
_
_
_
_
e
T
k
0
e
k
0
e
T
k
0
e
k
1
e
T
k
0
e
k
n1
e
T
k
1
e
k
0
e
T
k
1
e
k
1
e
T
k
1
e
k
n1
.
.
.
.
.
.
.
.
.
.
.
.
e
T
k
n1
e
k
0
e
T
k
n1
e
k
1
e
T
k
n1
e
k
n1
_
_
_
_
_
=
_
_
_
_
_
1 0 0
0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1
_
_
_
_
_
= I.
Now, we already argued that P
T
is also a permutation matrix. Thus, I = P
T
(P
T
)
T
= P
T
P,
which proves the second part of the corollary.
4.8. Back to When LU Factorization Breaks Down 133
Denition 4.15 Let us call the special permutation matrix of the form
P() =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
e
T
e
T
1
.
.
.
e
T
1
e
T
0
e
T
+1
.
.
.
e
T
n1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
0 0 0 1 0 0
0 1 0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1 0 0 0
1 0 0 0 0 0
0 0 0 0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 0 1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a pivot matrix.
Theorem 4.16 When

P() multiplies a matrix from the left, it swaps rows 0 and .
When

P() multiplies a matrix from the right, it swaps columns 0 and .
4.8 Back to When LU Factorization Breaks Down
Let us reiterate the algorithmic steps that were exposed for the LU factorization in Sec-
tion 4.2:
Partition
A
_

11
a
T
12
a
21
A
22
_
.
Update a
21
= a
21
/
11
(= l
21
).
Update A
22
= A
22
a
21
a
T
12
(= A
22
l
21
u
T
12
).
Overwrite A
22
with L
22
and U
22
22
.
Instead of overwriting A with the factors L and U, we can compute L separately and
overwrite A with U, and letting the elements below its diagonal become zeroes. This allows
us to get back to formulating the algorithm using Gauss transforms
Partition
A
_

11
a
T
12
a
21
A
22
_
.
Compute l
21
= a
21
/
11
.
[L, A] := LU unb var5 alt(A)
Partition L := I
A
_
A
TL
A
TR
A
BL
A
BR
_
, L
_
L
TL
0
L
BL
L
BR
_
where A
TL
is 0 0
while m(A
TL
) < m(A) do
Repartition
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
L
TL
0
L
BL
L
BR
_
_
_
L
00
0 0
l
T
10
1 0
L
20
l
21
L
22
_
_
where
11
is 1 1
l
21
:= a
21
/
11
_

11
a
T
12
a
21
A
22
_
:=
_
1 0
l
21
I
__

11
a
T
12
a
21
A
22
_
=
_

11
a
T
12
0 A
22
l
21
a
T
12
_
Continue with
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
L
TL
0
L
BL
L
BR
_
_
_
L
00
0 0
l
T
10
1 0
L
20
l
21
L
22
_
_
endwhile
Figure 4.11: Algorithm for computing the LU factorization, exposing the update of the
matrix as multiplication by a Gauss transform.
Update
_

11
a
T
12
a
21
A
22
_
:=
_
1 0
l
21
I
__

11
a
T
12
a
21
A
22
_
=
_

11
a
T
12
0 A
22
l
21
a
T
12
_
.
Overwrite A
22
with L
22
and U
22
22
.
This leads to the equivalent LU factorization algorithm in Fig. 4.11. In that algorithm the
elements below the diagonal of A are overwritten with zeroes, so that it eventually equals
the upper triangular matrix U. The unit lower triangular matrix L is now returned as a
separate matrix. The matrix
_
1 0
l
21
I
_
is known as a Gauss transform.
Example 4.17 In Fig. 4.12 we illustrate the above alternative description of LU factoriza-
tion with the same matrix that we used in Fig. 4.4.
Let us explain this in one more, slightly dierent, way:
S
t
e
p
_ _
A
0
0
a
0
1
A
0
2
a
T 1
0
1
1
a
T 1
2
A
2
0
a
2
1
A
2
2
_ _
_ _
L
0
0
0
0
l
T 1
0
1
0
L
2
0
l
2
1
L
2
2
_ _
l
2
1
:
=
a
2
1
/
1
1
_
1
0
l
2
1
I
_
_
1
1
a
T 1
2
a
2
1
A
2
2
_
1
-
2
_ _
1
1
2
4
4
7
_ _
_ _
1
0
0
0
1
0
0
0
1
_ _
_
2
4
_
/
(
2
)
=
_
1
2
_
_ _
1
0
0
1
1
0
2
0
1
_ _
_ _
1
1
2
4
4
7
_ _
=
_ _
1
1
0
2
0
6
5
_ _
3
_ _
1
1
0
2
0
6
5
_ _
_ _
1
0
0
1
1
0
2
0
1
_ _
_
6
_
/
(
3
)
=
_
2
_
_
1
0
2
)
1
_
_
2
6
5
_
=
_
2
0
1
_
_ _
1
1
0
2
0
2
1
_ _
_ _
1
0
0
1
1
0
2
2
1
_ _
Figure 4.12: LU factorization based on Gauss transforms of a 3 3 matrix. Here, Step
refers to the corresponding step for Gaussian Elimination in Fig. 4.2.
Partition
A
_
_
A
00
a
01
A
02
0
11
a
T
12
0 a
01
A
02
_
_
and L
_
_
L
00
0 0
l
T
10
1 0
L
20
0 I
_
_
where the thick line indicates, as usual, how far we have gotten into the computation.
In other words, the elements below the diagonal of A in the columns that contain A
00
have already been replaced by zeroes and the corresponding columns of L have already
been computed.
Compute l
21
= a
21
/
11
.
Update
_
_
A
00
a
01
A
02
0
11
a
T
12
0 0 A
02
_
_
:=
_
_
I 0 0
0 1 0
0 l
21
I
_
_
_
_
A
00
a
01
A
02
0
11
a
T
12
0 a
01
A
02
_
_
.
Continue by moving the thick line forward one row and column.
This leads to yet another equivalent LU factorization algorithm in Fig. 4.13. Notice that upon
completion, A is an upper triangular matrix, U. The point of this alternative explanation is
to show that if

L
(i)
represents the ith Gauss transform, computed during the ith iteration
of the algorithms, then the nal matrix stored in A, the upper triangular matrix U, satises
U =

L
(n2)
L
(n3)

L
(0)

A, where

A is the original matrix stored in A.
Example 4.18 Let us illustrate these last observations with the same example as in Fig. 4.4:
Start with A = A
(0)
=
_
_
2 1 1
2 2 3
4 4 7
_
_
and L =
_
_
1 0 0
0 1 0
0 0 1
_
_
.
In the rst step,
Partition A and L to expose the rst row and column:
_
_
2 1 1
2 2 3
4 4 7
_
_
and
_
_
1 0 0
0 1 0
0 0 1
_
_
;
Compute
l
21
=
_
2
4
_
/(2) =
_
1
2
_
.
Update A with
A
(1)
=
_
_
1 0 0
1 1 0
2 0 1
_
_
. .
L
(0)
_
_
2 1 1
2 2 3
4 4 7
_
_
. .
A
(0)
=
_
_
2 1 1
0 3 2
0 6 5
_
_
.
[L, A] := LU unb var5 alt(A)
Partition L := I
A
_
A
TL
A
TR
A
BL
A
BR
_
, L
_
L
TL
0
L
BL
L
BR
_
where A
TL
is 0 0
while m(A
TL
) < m(A) do
Repartition
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
L
TL
0
L
BL
L
BR
_
_
_
L
00
0 0
l
T
10
1 0
L
20
l
21
L
22
_
_
where
11
is 1 1
l
21
:= a
21
/
11
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
22
0 A
02
_
_
:=
_
_
I 0 0
0 1 0
0 l
21
I
_
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
01
A
02
_
_
=
_
_
I 0 0
0 1 0
0 l
21
I
_
_
_
_
A
00
a
01
A
02
0
11
a
T
12
0 a
01
A
02
_
_
=
_
_
A
00
a
01
A
02
0
11
a
T
12
0 0 A
02
l
21
a
T
12
_
_
Continue with
_
A
TL
A
TR
A
BL
A
BR
_
_
_
A
00
a
01
A
02
a
T
10

11
a
T
12
A
20
a
21
A
22
_
_
,
_
L
TL
0
L
BL
L
BR
_
_
_
L
00
0 0
l
T
10
1 0
L
20
l
21
L
22
_
_
endwhile
Figure 4.13: Algorithm for computing the LU factorization.
We emphasize that now A
(1)
=

L
(0)
A
(0)
.
In the second step,
Partition A (which now contains A
(1)
and L to expose the second row and column:
_
_
2 1 1
0 3 2
0 6 5
_
_
and
_
_
1 0 0
1 1 0
2 0 1
_
_
;
Compute
l
21
=
_
6
_
/(3) =
_
2
_
.
Update A with
A
(2)
=
_
_
1 0 0
0 1 0
0 2 1
_
_
. .
L
(1)
_
_
2 1 1
0 3 2
0 6 5
_
_
. .
A
(1)
=
_
_
2 1 1
0 3 2
0 0 1
_
_
.
We emphasize that now
A =
_
_
2 1 1
0 3 2
0 0 1
_
_
. .
A
(2)
=
_
_
1 0 0
0 1 0
0 2 1
_
_
. .
L
(1)
_
_
2 1 1
0 3 2
0 6 5
_
_
. .
A
(1)
=
_
_
1 0 0
0 1 0
0 2 1
_
_
. .
L
(1)
_
_
1 0 0
1 1 0
2 0 1
_
_
. .
L
(0)
_
_
2 1 1
2 2 3
4 4 7
_
_
. .
A
(0)
The point of this last example is to show that LU factorization can be viewed as the com-
putation of a sequence of Gauss transforms so that, upon completion U =

L
(n1)
L
(n2)
L
(n3)

L
(0)
A.
(Actually,

L
(n1)
is just the identity.)
Now, let us consider the following property of a typical Gauss transform:
_
_
I 0 0
0 1 0
0 l
21
I
_
_
. .
L
(i)
_
_
I 0 0
0 1 0
0 l
21
I
_
_
. .
L
(i)
=
_
_
I 0 0
0 1 0
0 0 I
_
_
. .
I
The inverse of a Gauss transform can be found by changing l
21
to l
21
!!!
This means that if U =

L
(n2)
L
(n3)

L
(0)
A, then if L
(i)
is the inverse of

L
(i)
, then
L
(0)
. . . L
(n3)
L
(n2)
U = A. In the case of our example,
_
_
1 0 0
1 1 0
2 0 1
_
_
. .
L
(0)
_
_
1 0 0
0 1 0
0 2 1
_
_
. .
L
(1)
_
_
2 1 1
0 3 2
0 0 1
_
_
. .
U
=
_
_
2 1 1
2 2 3
4 4 7
_
_
. .
A
Finally, note that
_
_
1 0 0
1 1 0
2 0 1
_
_
. .
L
(0)
_
_
1 0 0
0 1 0
0 2 1
_
_
. .
L
(1)
=
_
_
1 0 0
1 1 0
2 2 1
_
_
. .
L
(1)
In other words, the LU factor L can be constructed from the Gauss transforms by setting
the jth column of L to the jth column of the jth Gauss transform, and then ipping the
sign of the elements below the diagonal. One can more formally prove this by noting that
_
_
L
00
0 0
l
T
10
1 0
L
20
0 I
_
_
_
_
I 0 0
0 1 0
0 l
21
I
_
_
=
_
_
L
00
0 0
l
T
10
1 0
L
20
l
21
I
_
_
as part of an inductive argument.
Now, we are ready to add pivoting (swapping of rows, in other words: swapping of
equations) into the mix.
Example 4.19 Consider again the system of linear equations
2
0
+ 4
1
+(2)
2
=10
4
0
+ 8
1
+ 6
2
= 20
6
0
+(4)
1
+ 2
2
= 18
and let us focus on the matrix of coecients
A =
_
_
2 4 2
4 8 6
6 4 2
_
_
.
Let us start the algorithm in Figure 4.11. In the rst step, we apply a pivot to ensure that
the diagonal element in the rst column is not zero. In this example, no pivoting is required,
so the rst pivot matrix,

P
(0)
= I.
_
_
1 0 0
0 1 0
0 0 1
_
_
. .
P
(0)
_
_
2 4 2
4 8 6
6 4 2
_
_
=
_
_
2 4 2
4 8 6
6 4 2
_
_
. .
A
(0)
[L, A] := LU unb var5 piv(A)
Partition L := I
A
_
A
TL
A
TR
A
BL
A
BR
_
, L
_
L
TL
0
L
BL
L
BR
_
, p
_
p
T
p
B
_
where A
TL
is 0 0
while m(A
TL
) < m(A) do
Repartition
_
A
TL
A
TR
A
BL
A
BR
_
,
_
L
TL
0
L
BL
L
BR
_
,
_
p
T
p
B
_
_
_
p
0
p
1
p
2
_
_
where
11
is 1 1
1
= Pivot
__

11
a
21
__
_

11
a
T
12
a
21
A
22
_
:= P(
1
)
_

11
a
T
12
a
21
A
22
_
l
21
:= a
21
/
11
_

11
a
T
12
a
21
A
22
_
:=
_
1 0
l
21
I
__

11
a
T
12
a
21
A
22
_
=
_

11
a
T
12
0 A
22
l
21
a
T
12
_
Continue with
_
A
TL
A
TR
A
BL
A
BR
_
,
_
L
TL
0
L
BL
L
BR
_
,
_
p
T
p
B
_
_
_
p
0
p
1
p
2
_
_
endwhile
Figure 4.14: Algorithm for computing the LU factorization, exposing the update of the
matrix as multiplication by a Gauss transform and adding pivoting.
Next, a Gauss transform is computed and applied:
_
_
1 0 0
2 1 0
3 0 1
_
_
. .
L
(0)
_
_
2 4 2
4 8 6
6 4 2
_
_
. .
A
(0)
=
_
_
2 4 2
0 0 10
0 16 8
_
_
.
. .
A
(1)
In the second step, we apply a pivot to ensure that the diagonal element in the second
column is not zero. In this example, the second and third row must be swapped by pivot
matrix

P
(1)
:
_
_
1 0 0
0 0 1
0 1 0
_
_
. .
_
1 0
0

P
(1)
_
_
_
2 4 2
0 0 10
0 16 8
_
_
. .
A
(1)
=
_
_
2 4 2
0 16 8
0 0 10
_
_
. .
A
(1)
Next, a Gauss transform is computed and applied:
_
_
1 0 0
0 1 0
0 0 1
_
_
. .
L
(1)
_
_
2 4 2
0 16 8
0 0 10
_
_
. .
A
(1)
=
_
_
2 4 2
0 16 8
0 0 10
_
_
.
. .
A
(2)
Notice that at each step, some permutation matrix is used to swap two rows, after which
a Gauss transform is computed and then applied to the resulting (permuted) matrix. One
can describe this as U =

L
(n2)

P
(n2)
L
(n3)

P
(n3)

L
(0)

P
(0)
A, where P
(i)
represents the
permutation applied during iteration i. Now, once an LU factorization with pivoting is com-
puted, one can solve Ax = b by noting that Ux =

L
(n2)

P
(n2)
L
(n3)

P
(n3)

L
(0)

P
(0)
Ax =
L
(n2)

P
(n2)
L
(n3)

P
(n3)

L
(0)

P
(0)
b. In other words, the pivot matrices and Gauss trans-
forms, in the proper order, must be applied to the right-hand side,
z =

L
(n2)

P
(n2)
L
(n3)

P
(n3)

L
(0)

P
(0)
b,
after which x can be obtained by solving the upper triangular system Ux = z.
Remark 4.20 If the LU factorization with pivoting completes without encountering a
zero pivot, then given any right-hand size b this procedure produces a unique solution x.
In other words, the procedure computes the net eect of applying A
1
to the right-hand
side vector b, and therefore A has an inverse. If a zero pivot is encountered, then there
exists a vector x ,= 0 such that Ax = 0, and hence the inverse does not exist.
4.9 The Inverse of a Matrix
4.9.1 First, some properties
Denition 4.21 Given A R
nn
, a matrix B that has the property that BA = I, the
identity, is called the inverse of matrix A and is denoted by A
1
.
Remark 4.22 We will later see that not every square matrix has an inverse! The inverse
of a nonsquare matrix is not dened. Indeed, we will periodically relate other properties
of a matrix to the matrix having an inverse as these notes unfold.
Notice that A
1
is the matrix that undoes the transformation A: A
1
(Ax) = x. It
acts as the inverse function of the function F(x) = Ax.
Example 4.23 Lets start by looking at some matrices for which it is easy to determine the
inverse:
The identity matrix: I
1
= I, since I I = I.
Diagonal matrices:
if D =
_
_
_
_
_
0
0 0
0
1
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0
n1
_
_
_
_
_
then D
1
=
_
_
_
_
_
1
0
0 0
0
1
1
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0
1
n1
_
_
_
_
_
.
In particular, if D = I (all elements on the diagonal equal ) then D
1
=
1
I.
Zero matrix: Let O denote the n n matrix of all zeroes. This matrix does not
have an inverse. Why? Pick any vector x ,= 0 (not equal to the zero vector). Then
O
1
(Ox) = 0 and, if O
1
existed, (O
1
O)x = Ix = x, which is a contradiction.
Theorem 4.24 Let Ax = b and assume that A has an inverse, A
1
. Then x = A
1
b.
Proof: If Ax = b then A
1
Ax = A
1
b and hence Ix = x = A
1
b.
Corollary 4.25 Assume that A has an inverse, A
1
. Then Ax = 0 implies that x = 0.
Proof: If A has an inverse and Ax = 0, then x = Ix = (A
1
A)x = A
1
(Ax) = A
1
0 = 0.
4.9. The Inverse of a Matrix 143
Theorem 4.26 If A has an inverse A
1
, then AA
1
= I.
Theorem 4.27 (Uniqueness of the inverse) If A has an inverse, then that inverse is
unique.
Proof: Assume that AB = BA = I and AC = CA = I. Then by associativity of matrix
multiplication C = CI = C(AB) = (CA)B = B.
Let us assume in this section that A has an inverse and let us assume that we would
like to compute C = A
1
. Matrix C must satisfy AC = I. Partition matrices C and I by
columns:
A
_
b
0
b
1
b
n1
_
. .
C
=
_
Ac
0
Ac
1
Ac
n1
_
=
_
e
0
e
1
e
n1
_
,
. .
I
where e
j
equals the jth column of I. (Notice that we have encountered e
j
before in Section ??.
Thus, the jth column of C, c
j
, must solve Ac
j
= e
j
.
Now, let us recall how if Gaussian elimination works (and in the next section we will
see it doesnt always!) then you can solve Ax = b by applying Gaussian elimination to
the augmented system
_
A b
_
, leaving the result as
_
U z
_
(where we later saw that z
solves Lz = b), after which backward substitution could be used to solve the upper triangular
system Ux = z.
So, this means that we should do this for each of the equations Ac
j
= e
j
: Append
_
A e
j
_
, leaving the result as
_
U z
j
_
and then perform back substitution to solve Uc
j
=
z
j
.
4.9.2 Thats about all we will say about determinants
Example 4.28 Consider the 2 2 matrix
_
2 1
1 1
_
. How would we compute its inverse?
One way is to start with A
1
=
_

0,0

0,1
1,0

1,1
_
and note that
_
2 1
1 1
__

0,0

0,1
1,0

1,1
_
=
_
1 0
0 1
_
,
which yields two linear systems:
_
2 1
1 1
__

0,0
1,0
_
=
_
1
0
_
and
_
2 1
1 1
__

0,1
1,1
_
=
_
0
1
_
.
Solving these yields A
1
=
_
1
3
1
3
1
3
2
3
_
.
Exercise 4.29 Check that
_
2 1
1 1
__
1
3
1
3
1
3
2
3
_
=
_
1 0
0 1
_
.
One can similarly compute the inverse of any 2 2 matrix: Consider
_

0,0

0,1
1,0

1,1
__

0,0

0,1
1,0

1,1
_
=
_
1 0
0 1
_
,
which yields two linear systems:
_

0,0

0,1
1,0

1,1
__

0,0
1,0
_
=
_
1
0
_
and
_

0,0

0,1
1,0

1,1
__

0,1
1,1
_
=
_
0
1
_
.
Solving these yields
_

0,0

0,1
1,0

1,1
_
1
=
1
0,0
1,1
0,1
1,0
_

1,1

0,1
1,0

0,0
_
.
Here the expression
0,0
1,1
0,1
1,0
is known as the determinant. The inverse of the 2 2
matrix exists if and only if this expression is not equal to zero.
Similarly, a determinant can be dened for any n n matrix A and there is even a
method for solving linear equations, known as Kramers rule and taught in high school
algebra classes, that requires computation the determinants of various matrices. But this
method is completely impractical and therefore does not deserve any of our time.
4.9.3 Gauss-Jordan method
There turns out to be a convenient way of computing all columns of the inverse matrix
simultaneously. This method is known as the Gauss-Jordan method. We will illustrate this
for a specic matrix and relate it back to the above discussion.
Consider the matrix A =
_
_
2 4 2
4 2 6
6 4 2
_
_
. Computing the columns of the inverse matrix
could start by applying Gaussian elimination to the augmented systems
_
_
2 4 2 1
4 2 6 0
6 4 2 0
_
_
,
_
_
2 4 2 0
4 2 6 1
6 4 2 0
_
_
, and
_
_
2 4 2 0
4 2 6 0
6 4 2 1
_
_
.
Why not apply them all at once by creating an augmented system with all three right-hand
side vectors:
_
_
2 4 2 1 0 0
4 2 6 0 1 0
6 4 2 0 0 1
_
_
.
Then, proceeding with Gaussian elimination:
By subtracting (4/2) = 2 times the rst row from the second row, we get
_
_
2 4 2 1 0 0
0 10 10 2 1 0
6 4 2 0 0 1
_
_
By subtracting (6/2) = 3 times the rst row from the third row, we get
_
_
2 4 2 1 0 0
0 10 10 2 1 0
0 16 8 3 0 1
_
_
By subtracting ((16)/(10)) = 1.6 times the second row from the third row, we get
_
_
2 4 2 1 0 0
0 10 10 2 1 0
0 0 8 0.2 1.6 1
_
_
Exercise 4.30 Apply the LU factorization in Figure 4.3 to the matrix
_
_
2 4 2
4 2 6
6 4 2
_
_
.
Compare and contrast it to the Gauss-Jordan process that we applied to the appended
system
_
_
2 4 2 1 0 0
4 2 6 0 1 0
6 4 2 0 0 1
_
_
.
Next, one needs to apply backward substitution to each of the columns. It turns out that
the following procedure has the same net eect:
Look at the 10 and the -8 in last column on the left of the [. Subtract (10)/(8)
times the last row from the second row, producing
_
_
2 4 2 1 0 0
0 10 0 1.75 1 1.25
0 0 8 0.2 1.6 1
_
_
Now take the -2 and the -8 in last column on the left of the [ and subtract
(2)/(8) times the last row from the rst row, producing
_
_
2 4 0 0.95 0.4 0.25
0 10 0 1.75 1 1.25
0 0 8 0.2 1.6 1
_
_
Finally take the 4 and the -10 in second column on the left of the [ and subtract
(4)/(10) times the second row from the rst row, producing
_
_
2 0 0 0.25 0 0.25
0 10 0 1.75 1 1.25
0 0 8 0.2 1.6 1
_
_
Finally, divide the rst, second, and third row by the diagonal elements on the left,
respectively, yielding
_
_
1 0 0 0.125 0 0.125
0 1 0 0.175 0.1 0.125
0 0 1 0.025 0.2 0.125
_
_
Lo and behold, the matrix on the right is the inverse of the original matrix:
_
_
2 4 2
4 2 6
6 4 2
_
_
_
_
0.125 0 0.125
0.175 0.1 0.125
0.025 0.2 0.125
_
_
=
_
_
1 0 0
0 1 0
0 0 1
_
_
.
Notice that this procedure works only if no divide by zero is encountered.
(Actually, more accurately, no zero pivot (diagonal element of the upper triangular
matrix, U, that results from Gaussian elimination) can be encountered. The elements by
which one divides in Gauss-Jordan become the diagonal elements of U, but notice that
one never divides by the last diagonal element. But if that element equals zero, then the
backward substitution breaks down.)
Exercise 4.31 Apply the Gauss-Jordan method to the matrix in Example 4.28 to compute
its inverse.
Remark 4.32 Just like Gaussian elimination and LU factorization could be xed if a zero
pivot were encountered by swapping rows (pivoting), Gauss-Jordan can be xed similarly.
It is only if in the end swapping of rows does not yield a nonzero pivot that the process
fully breaks down. More on this, later.
Although we dont state the above remark as a formal theorem, let us sketch a proof
anyway:
Proof: We previous showed that LU factorization with pivoting could be viewed as com-
puting a sequence of Gauss transforms and pivoting matrices that together transform n n
matrix A to an upper triangular matrix:
L
(n2)

P
(n2)
L
(n3)

P
(n3)

L
(0)

P
(0)
A = U,
where

L
(i)
and P
(i)
represents the permutation applied during iteration i.
Now, if U has no zeroes on the diagonal (no zero pivots were encountered during LU with
pivoting) then it has an inverse. So,
U
1
L
(n2)

P
(n2)
L
(n3)

P
(n3)

L
(0)

P
(0)
. .
A
1
A = I,
which means that A has an inverse:
A
1
= U
1
L
(n2)

P
(n2)
L
(n3)

P
(n3)

L
(0)

P
(0)
.
What the rst stage of Gauss-Jordon process does is to compute
U =
_
L
(n2)
_
P
(n2)
_
L
(n3)
_
P
(n3)

__
L
(0)
_
P
(0)
A
____

___
,
applying the computed transformations also to the identity matrix:
B =
_
L
(n2)
_
P
(n2)
_
L
(n3)
_
P
(n3)

__
L
(0)
_
P
(0)
I
____

___
The second stage of Gauss-Jordan (where the elements of A above the diagonal are elimi-
nated) is equivalent to applying U
1
from the left to both U and B.
By viewing the problems as the appended (augmented) system
_
A I
_
is just a conve-
nient way for writing all the intermediate results, applying each transformation to both A
and I.
4.9.4 Inverting a matrix using the LU factorization
An alternative to the Gauss-Jordan method illustrated above is to notice that one can
compute the LU factorization of matrix A, A = LU, after which each Ab
j
= e
j
can be solved
by instead solving Lz
j
= e
j
followed by the computation of the solution to Ub
j
= z
j
. This
is an example of how an LU factorization can be used to solve multiple linear systems with
dierent right-hand sides.
One could also solve AB = I for the matrix B by solving LZ = I for matrix Z followed
by a computation of the solution to UB = Z. This utilizes what are known as triangular
solves with multiple right-hand sides, which go beyond the scope of this document.
Notice that, like for the Gauss-Jordan procedure, this approach works only if no zero
pivot is encountered.
Exercise 4.33 The answer to Exercise 4.31 is
_
_
2 4 2
4 2 6
6 4 2
_
_
=
_
_
1 0 0
2 1 0
3 1.6 1
_
_
_
_
2 4 2
0 10 10
0 0 8
_
_
.
Solve the three lower triangular linear systems
_
_
1 0 0
2 1 0
3 1.6 1
_
_
_
_
0,0
1,0
2,0
_
_
=
_
_
1
0
0
_
_
,
_
_
1 0 0
2 1 0
3 1.6 1
_
_
_
_
0,1
1,1
2,1
_
_
=
_
_
0
1
0
_
_
,
and
_
_
1 0 0
2 1 0
3 1.6 1
_
_
_
_
0,2
1,2
2,2
_
_
=
_
_
0
0
1
_
_
.
Check (using octave if you are tired of doing arithmetic) that LZ = I:
_
_
1 0 0
2 1 0
3 1.6 1
_
_
_
_
0,0

0,1

0,2
1,0

1,1

1,2
2,0

2,1

2,2
_
_
=
_
_
1 0 0
0 1 0
0 0 1
_
_
.
Next, solve
_
_
2 4 1
0 10 10
0 0 8
_
_
_
_
0,0
1,0
2,0
_
_
=
_
_
0,0
1,0
2,0
_
_
,
_
_
2 4 1
0 10 10
0 0 8
_
_
_
_
0,1
1,1
2,1
_
_
=
_
_
0,1
1,1
2,1
_
_
,
and
_
_
2 4 1
0 10 10
0 0 8
_
_
_
_
0,2
1,2
2,2
_
_
=
_
_
0,2
1,2
2,2
_
_
.
Check (using octave if you are tired of doing arithmetic) that UB = Z and AB = I.
Compare and contrast this process to the Gauss-Jordan process that we applied to the
appended system
_
_
2 4 2 1 0 0
4 2 6 0 1 0
6 4 2 0 0 1
_
_
.
You probably conclude that the Gauss-Jordan process is a more convenient way for comput-
ing the inverse of a matrix by hand because it organizes the process more conveniently.
Theorem 4.34 Let L R
nn
be a lower triangular matrix with (all) nonzero diagonal
elements. Then its inverse L
1
exists and is lower triangular.
Proof: Proof by Induction. Let L be a lower triangular matrix with (all) nonzero diagonal
elements.
Base case: Let L =
_

11
_
. Then, since
11
,= 0, we let L
1
=
_
1/
11
_
, which is
lower triangular and well-dened.
Inductive step: Inductive hypothesis: Assume that for a given k 0 the inverse of
all k k lower triangular matrices with nonzero diagonal elements exist and are lower
triangular. We will show that the inverse of a (k + 1) (k + 1) lower trianglar matix
with (all) nonzero diagonal elements exists and is lower triangular.
Let (k +1) (k +1) matrix L be lower triangular with (all) nonzero diagonal elements.
We will construct a matrix B that is its inverse and is lower triangular. Partition
L
_

11
0
l
21
L
22
_
and B
_

11
b
T
12
b
21
B
22
_
,
where L
22
is lower triangular (why?) and has (all) nonzero diagonal elements (why?),
and
11
,= 0 (why?). We will try to construct B such that
_

11
0
l
21
L
22
__

11
b
T
12
b
21
B
22
_
=
_
1 0
0 I
_
.
(Note: we dont know yet that such a matrix exist.) Equivalently, employing blocked
matrix-matrix multiplication, we want to nd
11
, b
T
12
, b
21
, and B
22
such that
_

11
0
l
21
L
22
__

11
b
T
12
b
21
B
22
_
=

11
11

11
b
T
12
11
l
21
+L
22
b
21
l
21
b
T
12
+L
22
B
22
. =
_
1 0
0 I
_
.
Thus, the desired submatrices must satisfy
11
11
= 1
11
b
T
12
= 0
11
l
21
+L
22
b
21
= 0 l
21
b
T
12
+L
22
B
22
= I
.
Now, let us choose
11
, b
T
12
, b
21
, and B
22
so that

11
= 1/
11
(why?);

T
12
= 0 (why?);
L
22
B
22
= I. By the inductive hypothesis such a B
22
exists and is lower triangular;
and nally
l
21
11
+ L
22
b
21
= 0, or, equivalently, b
21
= L
1
22
l
21
/
11
(which is well-dened
because B
22
= L
1
22
exists).
Indeed,
_

11
0
l
21
L
22
__
1/
11
0
L
1
22
l
21
/
11
L
1
22
_
=
_
1 0
0 I
_
.
By the Principle of Mathematical Induction the result holds for all lower trian-
gular matrices with nonzero diagonal elements.
nn
be a unit lower triangular matrix with (all) nonzero diag-
onal elements. Then its inverse L
1
exists and is unit lower triangular.
Theorem 4.36 Let U R
nn
be an upper triangular matrix with (all) nonzero diagonal
elements. Then its inverse U
1
exists and is upper triangular.
Theorem 4.37 Let U R
nn
be a unit upper triangular matrix with (all) nonzero
diagonal elements. Then its inverse U
1
exists and is unit upper triangular.
Exercise 4.38 Prove Theorems 4.354.37.
Exercise 4.39 Use the insights in the proofs of Theorems 4.34 4.37 to formulate algorithms
for
1. a lower triangular matrix.
2. a unit lower triangular matrix.
3. an upper triangular matrix.
4. a unit upper triangular matrix.
Hints for part 1:
Overwrite the matrix L with its inverse.
In the proof for Theorem 4.34, b
21
= L
1
22
l
21
/
11
. When computing this, rst update
b
21
:= l
21
/
11
. Next, do not invert L
22
. Instead, recognize that the operation l
21
:=
L
1
22
l
21
can be instead viewed as solving L
22
x = l
21
, overwriting the vector l
21
with the
result x. Then use the algorithm in Figure 4.8 (modied as in Exercise 4.4).
Corollary 4.40 Let L R
nn
be a lower triangular matrix with (all) nonzero diag-
onal elements. Partition L
_
L
TL
0
L
BL
L
BR
_
, where L
TL
is k k. Then L
1
=
_
L
1
TL
0
L
1
BR
L
BL
L
1
TL
L
1
BR
_
.
Proof: Notice that both L
TL
and L
BR
are themselves lower triangular matrices with (all)
nonzero diagonal elements. Here, their inverses exists. To complete the proof, multiply out
LL
1
for the partitioned matrices.
Exercise 4.41 Formulate and prove a similar result for the inverse of a partitioned upper
triangular matrix.
4.9.5 Inverting the LU factorization
Yet another way to compute A
1
is to compute its LU factorization, A = LU and to then
note that A
1
= (LU)
1
= U
1
L
1
. But that requires us to discuss algorithms for inverting
a triangular matrix, which is also beyond the scope of this document. This is actually (closer
to) how matrices are inverted in practice.
Again, this approach works only if no zero pivot is encountered.
4.9.6 In practice, do not use inverted matrices!
Inverses of matrices are a wonderful theoretical tool. They are not a practical tool.
We noted that if one wishes to solve Ax = b, and A has an inverse, then x = A
1
b. Does
this mean we should compute the inverse of a matrix in order to compute the solution of
Ax = b? The answer is a resounding no.
Here is the reason: In the previous sections, we have noticed that as part of the compu-
tation of A
1
, one computes the LU factorization of matrix A. This costs 2/3n
3
ops for an
n n matrix. There are many additional computations that must be performed. Indeed,
although we have not shown this, it takes about 2n
3
ops to invert a matrix. After this, in
order to compute b = A
1
x, one needs to perform a matrix-vector multiplication, at a cost of
about 2n
2
ops. Now, solving Lz = b requires about n
2
ops, as does solving Ux = b. Thus,
simply using the LU factors of matrix A to solve the linear system costs about as much as
does computing b = A
1
x but avoids the additional 4/3n
3
ops required to compute A
1
after the LU factorization has been computed.
Remark 4.42 If anyone ever indicates they invert a matrix in order to solve a linear
system of equations, they either are (1) very naive and need to be corrected; or (2) they
really mean that they are just solving the linear system and dont really mean that they
invert the matrix.
4.9.7 More about inverses
Theorem 4.43 Let A, B, C R
nn
assume that A
1
and B
1
exist. Then (AB)
1
exists
and equals B
1
A
1
.
Proof: Let C = AB. It suces to nd a matrix D such that CD = I since then C
1
= D.
Now,
C(B
1
A
1
) = (AB)(B
1
A
1
) = A (BB
1
)
. .
I
A
1
= AA
1
= I
and thus D = B
1
A
1
has the desired property.
nn
and assume that A
1
exists. Then (A
T
)
1
exists and
equals (A
1
)
T
.
Proof: We need to show that A
T
(A
1
)
T
= I or, equivalently, that (A
T
(A
1
)
T
)
T
= I
T
= I.
But
(A
T
(A
1
)
T
)
T
= ((A
1
)
T
)
T
(A
T
)
T
= A
1
A = I,
which proves the desired result.
nn
, x R
n
, and assume that A has an inverse. Then Ax = 0
if and only if x = 0.
Proof:
Assume that A
1
exists. If Ax = 0 then
x = Ix = A
1
Ax = A
1
0 = 0.
Let x = 0. Then clearly Ax = 0.
nn
. Then A has an inverse if and only if Gaussian elimination
with row pivoting does not encounter a zero pivot.
Proof:
Assume Gaussian elimination with row pivoting does not encounter a zero pivot. We
will show that A then has an inverse.
Let

L
n1
P
n1

L
0
P
0
A = U, where P
k
and

L
k
are the permutation matrix that swaps
the rows and the Gauss transform computed and applied during the kth iteration of
Gaussian elimination, and U is the resulting upper triangular matrix. The fact that
Gaussian elimination does not encounter a zero pivot means that all these permutation
matrices and Gauss transforms exist and it means that U has only nonzeroes on the
diagonal. We have already seen that the inverse of a permutation matrix is its transpose
and that the inverse of each

L
k
exists (let us call it L
k
). We also have seen that a
triangular matrix with only nonzeroes on the diagonal has an inverse. Thus,
U
1
L
n1
P
n1

L
0
P
0
. .
A
1
A = U
1
U = I,
and hence A has an inverse.
Let A have an inverse, A
1
. We will next show that then Gaussian elimination with row
pivoting will execute to completion without encountering a zero. We will prove this by
contradiction: Assume that Gaussian elimination with column pivoting encounters a
zero pivot. Then we will construct a nonzero vector x so that Ax = 0, which contradicts
the fact that A has an inverse by Theorem 4.45.
Let us assume that k steps of Gaussian elimination have proceeded without enountering
a zero pivot, and now there is a zero pivot. Using the observation that this process
can be explained with pivot matrices and Gauss transforms, this means that
L
k1
P
k1

L
0
P
0
A = U =
_
_
U
00
u
01
U
02
0 0 a
T
12
0 0

A22
_
_
,
where U
00
is a k k upper triangular matrices with only nonzeroes on the diagonal
(meaning that U
1
00
exists). Now, let
x =
_
_
U
1
00
u
01
1
0
_
_
,= 0 so that Ux = 0.
Then
Ax = P
T
0
L
0
P
T
k1
L
k1
Ux = P
T
0
L
0
P
T
k1
L
k1
0 = 0.
As a result, A
1
cannot exist, by Theorem 4.45.
_
_
U
00
u
01
U
02
0 0 a
T
12
0 0

A22
_
_
,
_
_
U
1
00
u
01
1
0
_
_
=
_
_
0
0
0
_
_
= 0.
Chapter 5
Vector Spaces: Theory and Practice
So far, we have worked with vectors of length n and performed basic operations on them
like scaling and addition. Next, we looked at solving linear systems via Gaussian elimination
and LU factorization. Already, we ran into the problem of what to do if a zero pivot is
encountered. What if this cannot be xed by swapping rows? Under what circumstances
will a linear system not have a solution? When will it have more than one solution? How
can we describe the set of all solutions? To answer these questions, we need to dive deeper
into the theory of linear algebra.
5.1 Vector Spaces
The reader should be quite comfortable with the simplest of vector spaces: R ,R
2
, and R
3
,
which represent the points in one-dimentional, two-dimensional, and three-dimensional (real
valued) space, respectively. A vector x R
n
is represented by the column of n real numbers
x =
_
_
_
0
.
.
.
n1
_
_
_
which one can think of as the direction (vector) from the origin (the point
0 =
_
_
_
0
.
.
.
0
_
_
_
) to the point x =
_
_
_
0
.
.
.
n1
_
_
_
. However, notice that a direction is position
independent: You can think of it as a direction anchored anywhere in R
n
.
What makes the set of all vectors in R
n
a space is the fact that there is a scaling operation
(multiplication by a scalar) dened on it, as well as the addition of two vectors: The denition
of a space is a set of elements (in this case vectors in R
n
) together with addition and
multiplication (scaling) operations such that (1) for any two elements in the set, the element
that results from adding these elements is also in the set; (2) scaling an element in the set
results in an element in the set; and (3) there is an element in the set, denoted by 0, such
that additing it to another element results in that element and scaling by 0 results in the 0
155
156 Chapter 5. Vector Spaces: Theory and Practice
element.
Example 5.1 Let x, y R
2
and R. Then
z = x +y R
2
;
x = x R
2
; and
0 R
2
and 0 x =
_
0
0
_
.
In this document we will talk about vector spaces because the spaces have vectors as their
elements.
Example 5.2 Consider the set of all real valued m n matrices, R
mn
. Together with
matrix addition and multiplication by a scalar, this set is a vector space.
Note that an easy way to visualize this is to take the matrix and view it as a vector of
length m n.
Example 5.3 Not all spaces are vector spaces. For example, the spaces of all functions
dened from R to R has addition and multiplication by a scalar dened on it, but it is not
a vectors space. (It is a space of functions instead.)
Recall the concept of a subset, B, of a given set, A. All elements in B are elements in
A. If A is a vector space we can ask ourselves the question of when B is also a vector space.
The answer is that B is a vector space if (1) x, y B implies that x +y B; (2) x B and
B implies x B; and (3) 0 B (the zero vector). We call a subset of a vector space
that is also a vector space a subspace.
Example 5.4 Reason that one does not need to explicitly say that the zero vector is in a
(sub)space.
Denition 5.5 Let A be a vector space and let B be a subset of A. Then B is a subspace
of A if (1) x, y B implies that x+y B; and (2) x B and R implies that x B.
5.2. Why Should We Care? 157
One way to describe a subspace is that it is a subset that is closed under addition and
scalar multiplication.
Example 5.6 The empty set is a subset of R
n
. Is it a subspace? Why?
Exercise 5.7 What is the smallest subspace of R
n
? (Smallest in terms of the number of
elements in the subspace.)
5.2 Why Should We Care?
A =
_
_
3 1 2
1 2 0
4 1 2
_
_
, b
0
=
_
_
8
1
7
_
_
, and b
1
=
_
_
5
1
7
_
_
Does Ax = b
0
have a solution? The answer is yes: x = (1, 1, 2)
T
. Does Ax = b
1
have a
solution? The answer is no. Does Ax = b
0
have any other solutions? The answer is yes.
The above example motivates the question of when a linear system has a solution, when
it doesnt, and how many solutions it has. We will try to answer that question in this section.
Let A R
mn
, x R
n
, b R
m
, and Ax = b. Partition
A
_
a
0
a
1
a
n1
_
and x
_
_
_
_
_
1
.
.
.
n1
_
_
_
_
_
.
Then
0
a
0
+
1
a
1
+ +
n1
a
n1
= b.
Denition 5.9 Let a
0
, . . . , a
n1
R
m
and
0
, . . . ,
n1
R. Then
0
a
0
+
1
a
1
+
+
n1
a
n1
is said to be a linear combination of the vectors a
0
, . . . , a
n1
.
We note that Ax = b can be solved if and only if b equals a linear combination of the
vectors that are the columns of A, by the denition of matrix-vector multiplication. This
observation answers the question Given a matrix A, for what right-hand side vector, b, does
Ax = b have a solution? The answer is that there is a solution if and only if b is a linear
combination of the columns (column vectors) of A.
Denition 5.10 The column space of A R
mn
is the set of all vectors b R
m
for
which there exists a vector x R
n
such that Ax = b.
Theorem 5.11 The column space of A R
mn
is a subspace (of R
m
).
Proof: We need to show that the column space of A is closed under addition and scalar
multiplication:
Let b
0
, b
1
R
m
be in the column space of A. Then there exist x
0
, x
1
R
n
such that
Ax
0
= b
0
and Ax
1
= b
1
. But then A(x
0
+ x
1
) = Ax
0
+ Ax
1
= b
0
+ b
1
and thus b
0
+ b
1
is in the column space of A.
Let b be in the column space of A and R. Then there exists a vector x such that
Ax = b and hence Ax = b. Since A(x) = Ax = b we conclude that b is in the
column space of A.
Hence the column space of A is a subspace (of R
m
).
Example 5.12 Consider again
A =
_
_
3 1 2
1 2 0
4 1 2
_
_
, b
0
=
_
_
8
1
7
_
_
.
Set this up as two appended systems, one for solving Ax = b
0
and the other for solving Ax = 0
(this will allow us to compare and contrast, which will lead to an interesting observation later
on):
_
_
3 1 2 8
1 2 0 1
4 1 2 7
_
_
_
_
3 1 2 0
1 2 0 0
4 1 2 0
_
_
. (5.1)
Now, apply Gauss-Jordan elimination.
It becomes convenient to swap the rst and second equation:
_
_
1 2 0 1
3 1 2 8
4 1 2 7
_
_
_
_
1 2 0 0
3 1 2 0
4 1 2 0
_
_
.
Use the rst row to eliminate the coecients in the rst column below the diagonal:
_
_
1 2 0 1
0 7 2 11
0 7 2 11
_
_
_
_
1 2 0 0
0 7 2 0
0 7 2 0
_
_
.
Use the second row to eliminate the coecients in the second column below the diag-
onal:
_
_
1 2 0 1
0 7 2 11
0 0 0 0
_
_
_
_
1 2 0 0
0 7 2 0
0 0 0 0
_
_
.
Divide the rst and second row by the diagonal element:
_
_
1 2 0 1
0 1 2/7 11/7
0 0 0 0
_
_
_
_
1 2 0 0
0 1 2/7 0
0 0 0 0
_
_
.
Use the second row to eliminate the coecients in the second column above the diag-
onal:
_
_
1 0 4/7 15/7
0 1 2/7 11/7
0 0 0 0
_
_
_
_
1 0 4/7 0
0 1 2/7 0
0 0 0 0
_
_
. (5.2)
Now, what does this mean? For now, we will focus only on the results for the appended
system
_
A b
0
_
on the left.
We notice that 0
2
= 0. So, there is no constraint on variable
2
. As a result, we
will call
2
a free variable.
We see from the second row that
1
2/7
2
= 11/7 or
1
= 11/7 + 2/7
2
. Thus,
the value of
1
is constrained by the value given to
2
.
Finally,
0
+ 4/7
2
= 15/7 or
0
= 15/7 4/7
2
. Thus, the value of
0
is also
constrained by the value given to
2
.
We conclude that any vector of the form
_
_
15/7 4/7
2
11/7 + 2/7
2
2
_
_
solves the linear system. We can rewrite this as
_
_
15/7
11/7
0
_
_
+
2
_
_
4/7
2/7
1
_
_
. (5.3)
So, for each choice of
2
, we get a solution to the linear system by plugging it into Equa-
tion (5.3).
Example 5.13 We now give a slightly slicker way to view Example 5.12. Consider again
Equation (5.2):
_
_
1 0 4/7 15/7
0 1 2/7 11/7
0 0 0 0
_
_
.
This represents
_
_
1 0 4/7
0 1 2/7
0 0 0
_
_
_
_
2
_
_
=
_
_
15/7
11/7
0
_
_
.
Using blocked matrix-vector multiplication, we nd that
_

0
1
_
+
2
_
4/7
2/7
_
=
_
15/7
11/7
_
and hence
_

0
1
_
=
_
15/7
11/7
_
2
_
4/7
2/7
_
which we can then turn into
_
_
2
_
_
=
_
_
15/7
11/7
0
_
_
+
2
_
_
4/7
2/7
1
_
_
In the above example, we notice the following:
Let x
p
=
_
_
15/7
11/7
0
_
_
. Then Ax
p
=
_
_
3 1 2
1 2 0
4 1 2
_
_
_
_
15/7
11/7
0
_
_
=
_
_
8
1
7
_
_
. In other
words, x
p
is a particular solution to Ax = b
0
. (Hence the p in the x
p
.)
Let x
n
=
_
_
4/7
2/7
1
_
_
. Then Ax
n
=
_
_
3 1 2
1 2 0
4 1 2
_
_
_
_
4/7
2/7
1
_
_
=
_
_
0
0
0
_
_
. We will see
that x
n
is in the null space of A, to be dened later. (Hence the n in x
n
.)
Now, notice that for any , x
p
+x
n
is a solution to Ax = b
0
:
A(x
p
+x
n
) = Ax
p
+A(x
n
) = Ax
p
+Ax
n
= b
0
+ 0 = b
0
.
So, the system Ax = b
0
has many solutions (indeed, an innite number of solutions). To
characterize all solutions, it suces to nd one (nonunique) particular solution x
p
that
satises Ax
p
= b
0
. Now, for any vector x
n
that has the property that Ax
n
= 0, we know
that x
p
+x
n
is also a solution.
Denition 5.14 Let A R
mn
. Then the set of all vectors x R
n
that have the
property that Ax = 0 is called the null space of A and is denoted by ^(A).
Theorem 5.15 The null space of A R
mn
is a subspace of R
n
.
Proof: Clearly ^(A) is a subset of R
n
. Now, assume that x, y ^(A) and R. Then
A(x + y) = Ax + Ay = 0 and therefore (x + y) ^(A). Also A(x) = Ax = 0 = 0
and therefore x ^(A). Hence, ^(A) is a subspace.
Notice that the zero vector (of appropriate length) is always in the null space of a matrix
A.
Example 5.16 Let us use the last example, but with Ax = b
1
: Let us set this up as an
appended system
_
_
3 1 2 5
1 2 0 1
4 1 2 7
_
_
.
Now, apply Gauss-Jordan elimination.
It becomes convenient to swap the rst and second equation:
_
_
1 2 0 1
3 1 2 5
4 1 2 7
_
_
.
Use the rst row to eliminate the coecients in the rst column below the diagonal:
_
_
1 2 0 1
0 7 2 8
0 7 2 11
_
_
.
Use the second row to eliminate the coecients in the second column below the diag-
onal:
_
_
1 2 0 1
0 7 2 8
0 0 0 3
_
_
.
Now, what does this mean?
We notice that 0
2
= 3. This is a contradiction, and therefore this linear system
has no solution!
Consider where we started: Ax = b
1
represents
3
0
+ (1)
1
+ 2
2
= = 5
1
0
+ 2
1
+ (0)
2
= = 1
4
0
+
1
+ 2
2
= = 7.
Now, the last equation is a linear combination of the rst two. Indeed, add the rst equation
to the second, you get the third. Well, not quite: The last equation is actually inconsistent,
because if you subtract the rst two rows from the last, you dont get 0 = 0. As a result,
there is no way of simultaneously satisfying these equations.
5.2.1 A systematic procedure (rst try)
Let us analyze what it is that we did in Examples 5.12 and 5.13.
We start with A R
mn
, x R
n
, and b R
m
where Ax = b and x is to be computed.
Append the system
_
A b
_
.
Use the Gauss-Jordan method to transform this appended system into the form
_
I
kk

A
TR

b
T
0
(mk)k
0
(mk)(nk)

b
B
_
, (5.4)
where I
kk
is the k k identity matrix,

A
TR
R
k(nk)
,

b
T
R
k
, and

b
B
R
mk
.
Now, if

b
B
,= 0, then there is no solution to the system and we are done.
Notice that (5.4) means that
_
I
kk

A
TR
0 0
__
x
T
x
B
_
=
_

b
T
0
_
.
Thus, x
T
+

A
TR
x
B
=

b
T
. This translates to x
T
=

b
T

A
TR
x
B
or
_
x
T
x
B
_
=
_

b
T
0
_
+
_

A
TR
I
(mk)(mk)
_
x
B
.
By taking x
B
= 0, we nd a particular solution x
p
=
_

b
T
0
_
.
By taking, successively, x
B
= e
i
, i = 0, . . . , (m k) 1, we nd vectors in the null
space:
x
n
i
=
_

A
TR
I
(mk)(mk)
_
e
i
.
The general solution is then given by
x
p
+
0
x
n
0
+ +
(mk)1
x
n
(mk)1
.
Example 5.17 We now show how to use these insights to systematically solve the problem
in Example 5.12. As in that example, create the appended systems for solving Ax = b
0
and
Ax = 0 (Equation (5.1)).
_
_
3 1 2 8
1 2 0 1
4 1 2 7
_
_
_
_
3 1 2 0
1 2 0 0
4 1 2 0
_
_
. (5.5)
We notice that for Ax = 0 (the appended system on the right), the right-hand side never
changes. It is always equal to zero. So, we dont really need to carry out all the steps for
it, because everything to the left of the [ remains the same as it does for solving Ax = b.
Carrying through with the Gauss-Jordan method, we end up with Equation (5.2):
_
_
1 0 4/7 15/7
0 1 2/7 11/7
0 0 0 0
_
_
Now, our procedure tells us that x
p
=
_
_
15/7
11/7
0
_
_
is a particular solution: it solves Ax =
b. Next, we notice that
_

A
TR
I
(mk)(mk)
_
=
_
_
4/7
2/7
1
_
_
so that A
TR
=
_
4/7
2/7
_
, and
I
(mk)(mk)
= 1 (since there is only one free variable). So,
x
n
=
_

A
TR
I
(mk)(mk)
_
e
0
=
_
_
4/7
2/7
1
_
_
1 =
_
_
4/7
2/7
1
_
_
The general solution is then given by
x = x
p
+x
n
=
_
_
15/7
11/7
0
_
_
+
_
_
4/7
2/7
1
_
_
,
for any scalar .
5.2.2 A systematic procedure (second try)
Example 5.18 We now give an example where the procedure breaks down. Note: this
example is borrowed from the book.
Consider Ax = b where
A =
_
_
1 3 3 2
2 6 9 7
1 3 3 4
_
_
and b =
_
_
2
10
10
_
_
.
Let us apply Gauss-Jordan to this:
Append the system:
_
_
1 3 3 2 2
2 6 9 7 10
1 3 3 4 10
_
_
.
The boldfaced 1 is the pivot, in the rst column. Subtract 2/1 times the rst row
and (1)/1 times the rst row from the second and third row, respectively:
_
_
1 3 3 2 2
0 0 3 3 6
0 0 6 6 12
_
_
.
The problem is that there is now a 0 in the second column. So, we skip it, and move
on to the next column. The boldfaced 3 now becomes the pivot. Subtract 6/3 times
_
_
1 3 3 2 2
0 0 3 3 6
0 0 0 0 0
_
_
.
Divide each (nonzero) row by the pivot in that row to obtain
_
_
1 3 3 2 2
0 0 1 1 2
0 0 0 0 0
_
_
.
We can only eliminate elements in the matrix above pivots. So, take 3/1 times the
second row and subtract from the rst row:
_
_
1 3 0 1 4
0 0 1 1 2
0 0 0 0 0
_
_
. (5.6)
This does not have the form advocated in Equation 5.4. So, we remind ourselves of
the fact that Equation 5.6 stands for
_
_
1 3 0 1
0 0 1 1
0 0 0 0
_
_
_
_
_
_
3
_
_
_
_
=
_
_
4
2
0
_
_
(5.7)
Notice that we can swap the second and third column of the matrix as long as we also
swap the second and third element of the solution vector:
_
_
1 0 3 1
0 1 0 1
0 0 0 0
_
_
_
_
_
_
3
_
_
_
_
=
_
_
4
2
0
_
_
. (5.8)
Now, we notice that
1
and
3
are the free variables, and with those we can nd
equations for
0
and
2
as before.
Also, we can now nd vectors in the null space just as before. We just have to pay
attention to the order of the unknowns (the order of the elements in the vector x).
In other words, a specic solution is now given by
x
p
_
_
_
_
3
_
_
_
_
=
_
_
_
_
4
2
0
0
_
_
_
_
and two linearly independent vectors in the null space are given by the columns of
_
_
_
_
3 1
0 1
1 0
0 1
_
_
_
_
giving us a general solution of
_
_
_
_
3
_
_
_
_
=
_
_
_
_
4
2
0
0
_
_
_
_
+
_
_
_
_
3
0
1
0
_
_
_
_
+
_
_
_
_
1
1
0
1
_
_
_
_
.
But notice that the order of the elements in the vector must be xed (permuted):
_
_
_
_
3
_
_
_
_
=
_
_
_
_
4
0
2
0
_
_
_
_
+
_
_
_
_
3
1
0
0
_
_
_
_
+
_
_
_
_
1
0
1
1
_
_
_
_
.
Exercise 5.19 Let A R
mn
, x R
m
, and b R
n
. Let P R
mm
be a permutation
matrix. Show that AP
T
Px = b. Argue how this relates to the transition from Equation 5.7
to Equation 5.8.
Exercise 5.20 Complete Example 5.18 by computing a particular solution and two vectors
in the null space (one corresponding to
1
= 1,
3
= 0 and the other to
1
= 0,
3
= 1).
5.3 Linear Independence
Denition 5.21 Let a
0
, . . . , a
n1
R
m
. Then this set of vectors is said to be linearly
independent if
0
a
0
+
1
a
1
+ +
n1
a
n1
= 0 implies that
0
= =
n1
= 0. A set
of vectors that is not linearly independent is said to be linearly dependent.
Notice that if
0
a
0
+
1
a
1
+ +
n1
a
n1
= 0 and
j
,= 0,
then
j
a
j
=
0
a
0
+
1
a
1

j1
a
j1
j+1
a
j+1

n1
a
n1
and therefore
a
j
=
j
a
0
+
j
a
1

j1
j
a
j1

j+1
j
a
j+1

n1
j
a
n1
.
In other words, a
j
can be written as a linear combination of the other n 1 vectors. This
motivates the term linearly independent in the denition: none of the vectors can be written
as a linear combination of the other vectors.
Theorem 5.22 Let a
0
, . . . , a
n1
R
m
and let A =
_
a
0
a
n1
_
. Then the vec-
tors a
0
, . . . , a
n1
are linearly independent if and only if ^(A) = 0.
5.3. Linear Independence 167
Proof:
() Assume a
0
, . . . , a
n1
are linearly independent. We need to show that ^(A) = 0.
Assume x ^(A). Then Ax = 0 implies that 0 = Ax =
_
a
0
a
n1
_
_
_
_
0
.
.
.
n1
_
_
_
=
0
a
0
+
1
a
1
+ +
n1
a
n1
and hence
0
= =
n1
= 0. Hence x = 0.
() Notice that we are trying to prove P Q, where P represents the vectors a
0
, . . . , a
n1
are linearly independent and Q represents ^(A) = 0. It suces to prove the con-
verse: P Q. Assume that a
0
, . . . , a
n1
are not linearly independent. Then there
exist
0
, ,
n1
with at least one
j
,= 0 such that
0
a
0
+
1
a
1
+ +
n1
a
n1
= 0.
Let x = (
0
, . . . ,
n1
)
T
. Then Ax = 0 which means x ^(A) and hence ^(A) ,= 0.
Example 5.23 The columns of an identity matrix I R
nn
form a linearly independent
set of vectors.
Proof: Since I has an inverse (I itself) we know that ^(I) = 0. Thus, by Theorem 5.22,
the columns of I are linearly independent.
Example 5.24 The columns of L =
_
_
1 0 0
2 1 0
1 2 3
_
_
are linearly independent. If we consider
_
_
1 0 0
2 1 0
1 2 3
_
_
_
_
2
_
_
=
_
_
0
0
0
_
_
and simply solve this, we nd that
0
= 0/1 = 0,
1
= (0 2
0
)/(1) = 0, and
2
= (0
0
2
1
)/(3) = 0. Hence, ^(L) = 0 (the zero vector) and we conclude, by Theorem 5.22,
that the columns of L are linearly independent.
The last example motivates the following theorem:
nn
be a lower triangular matrix with nonzeroes on its diago-
nal. Then its columns are linearly independent.
Proof: Let L be as indicated and consider Lx = 0. If one solves this via whatever method
one pleases, the solution x = 0 will emerge as the only solution. Thus ^(L) = 0 and by
Theorem 5.22, the columns of L are linearly independent.
Exercise 5.26 Let U R
nn
be an upper triangular matrix with nonzeroes on its diagonal.
Then its columns are linearly independent.
Exercise 5.27 Let L R
nn
be a lower triangular matrix with nonzeroes on its diagonal.
Then its rows are linearly independent. (Hint: How do the rows of L relate to the columns
of L
T
?)
Example 5.28 The columns of L =
_
_
_
_
1 0 0
2 1 0
1 2 3
1 0 2
_
_
_
_
are linearly independent. If we
consider
_
_
_
_
1 0 0
2 1 0
1 2 3
1 0 2
_
_
_
_
_
_
2
_
_
=
_
_
_
_
0
0
0
0
_
_
_
_
and simply solve this, we nd that
0
= 0/1 = 0,
1
= (0 2
0
)/(1) = 0,
2
= (0
0
2
1
)/(3) = 0. Hence, ^(L) = 0 (the zero vector) and we conclude, by Theorem 5.22, that
the columns of L are linearly independent.
This example motivates the following general observation:
mn
have linearly independent columns and let B R
kn
.
Then the matrices
_
A
B
_
and
_
B
A
_
have linearly independent columns.
5.4. Bases 169
Proof: Proof by contradiction. Assume that
_
A
B
_
is not linearly independent. Then, by
Theorem 5.22, there exists x R
n
such that x ,= 0 and
_
A
B
_
x = 0. But that means that
_
Ax
Bx
_
=
_
0
0
_
, which in turn implies that Ax = 0. This contradicts the fact that the
columns of A are linearly independent.
Corollary 5.30 Let A R
mn
. Then the matrix
_
A
I
nn
_
has linearly independent
columns.
Next, we observe that if one has a set of more than m vectors in R
m
, then they must be
linearly dependent:
Theorem 5.31 Let a
0
, a
1
,
.
.
., a
n1
R
m
and n > m. Then these vectors are linearly
dependent.
Proof: This proof is a bit more informal than I would like it to be: Consider the matrix
A =
_
a
0
a
n1
_
. If one apply the Gauss-Jordan method to this, at most m columns
with pivots will be encountered. The other n m columns correspond to free variables,
which allow us to construct nonzero vectors x so that Ax = 0.
5.4 Bases
Denition 5.32 Let v
0
, v
1
, , v
n1
R
m
. Then the span of these vectors,
Span(v
0
, v
1
, , v
n1
), is said to be the space of all vectors that are a linear combi-
nation of this set of vectors.
Notice that Span(v
0
, v
1
, , v
n1
) equals the column space of the matrix
_
v
0
v
n
_
.
Denition 5.33 Let V be a subspace of R
m
. Then the set v
0
, v
1
, , v
n1
R
m
is
said to be a spanning set for V if Span(v
0
, v
1
, , v
n1
) = V.
Denition 5.34 Let V be a subspace of R
m
. Then the set v
0
, v
1
, , v
n1
R
m
is said to be a basis for V if (1) v
0
, v
1
, , v
n1
are linearly independent and (2)
Span(v
0
, v
1
, , v
n1
) = V.
The rst condition says that there arent more vectors than necessary in the set. The
second says there are enough to be able to generate V.
Example 5.35 The vectors e
0
, . . . , e
m1
R
m
, where e
j
equals the jth column of the
identity, are a basis for R
m
.
Note: these vectors are linearly independent and any vector x R
m
with x =
_
_
_
0
.
.
.
m1
_
_
_
can be written as the linear combination
0
e
0
+ +
m1
e
m1
.
Example 5.36 Let a
0
, . . . , a
m1
R
m
and let A =
_
a
0
a
m1
_
be invertible. Then
a
0
, . . . , a
m1
R
m
form a basis for R
m
.
Note: The fact that A is invertible means there exists A
1
such that A
1
A = I. Since
Ax = 0 means x = A
1
Ax = A
1
0 = 0, the columns of A are linearly independent. Also,
given any vector y R
m
, there exists a vector x R
m
such that Ax = y (namely x = A
1
y).
Letting x =
_
_
_
0
.
.
.
m1
_
_
_
we nd that y =
0
a
0
+ +
m1
a
m1
and hence every vector in
R
m
is a linear combination of the set a
0
, . . . , a
m1
R
m
.
Now here comes a very important insight:
Theorem 5.37 Let V be a subspace of R
m
and let v
0
, v
1
, , v
n1
R
m
and
w
0
, w
1
, , w
k1
R
m
both be bases for V. Then k = n. In other words, the number
of vectors in a basis is unique.
Proof: Proof by contradiction. Without loss of generality, let us assume that k > n.
(Otherwise, we can switch the roles of the two sets.) Let V =
_
v
0
v
n1
_
and W =
_
w
0
w
k1
_
. Let x
j
have the property that w
j
= V x
j
. (We know such a vector x
j
exists because V spans V and w
j
V.) Then W = V X, where X =
_
x
0
x
k1
_
.
Now, X R
nk
and recall that k > n. This means that ^(X) contains nonzero vectors
(why?). Let y ^(X). Then Wy = V Xy = V (Xy) = V (0) = 0, which contradicts the fact
5.5. Exercises 171
that w
0
, w
1
, , w
k1
are linearly independent, and hence this set cannot be a basis for V.
Note: generally speaking, there are an innite number of bases for a given subspace.
(The exception is the subspace 0.) However, the number of vectors in each of these bases
is always the same. This allows us to make the following denition:
Denition 5.38 The dimension of a subspace V equals the number of vectors in a basis
for that subspace.
A basis for a subspace V can be derived from a spanning set of a subspace V by, one-
to-one, removing vectors from the set that are dependent on other remaining vectors until
the remaining set of vectors is linearly independent , as a consequence of the following
observation:
mn
. The rank of A equals the number of vectors in a basis
for the column space of A. We will let rank(A) denote that rank.
Theorem 5.40 Let v
0
, v
1
, , v
n1
R
m
be a spanning set for subspace V
and assume that v
i
equals a linear combination of the other vectors. Then
v
0
, v
1
, , v
i1
, v
i1
, , v
n1
is a spanning set of V.
Similarly, a set of linearly independent vectors that are in a subspace V can be built
up to be a basis by successively adding vectors that are in V to the set while maintaining
that the vectors in the set remain linearly independent until the resulting is a basis for V.
Theorem 5.41 Let v
0
, v
1
, , v
n1
R
m
be linearly independent and assume that
v
0
, v
1
, , v
n1
V . Then this set of vectors is either a spanning set for V or there
exists w V such that v
0
, v
1
, , v
n1
, w are linearly independent.
5.5 Exercises
(Most of these exercises are borrowed from Linear Algebra and Its Application by Gilbert
Strang.)
1. Which of the following subsets of R
3
are actually subspaces?
(a) The plane of vectors x = (
0
,
1
,
2
)
T
R
3
such that the rst component
0
= 0.
In other words, the set of all vectors
_
_
0
2
_
_
where
1
,
2
R.
(b) The plane of vectors x with
0
= 1.
(c) The vectors x with
1
2
= 0 (this is a union of two subspaces: those vectors with
1
= 0 and those vectors with
2
= 0).
(d) All combinations of two given vectors (1, 1, 0)
T
and (2, 0, 1)
T
.
(e) The plane of vectors x = (
0
,
1
,
2
)
T
that satisfy
2
1
+ 3
0
= 0.
2. Describe the column space and nullspace of the matrices
A =
_
1 1
0 0
_
, B =
_
0 0 3
1 2 3
_
, and C =
_
0 0 0
0 0 0
_
.
3. Let P R
3
be the plane with equation x + 2y + z = 6. What is the equation of the
plane P
0
through the origin parallel to P? Are P and/or P
0
subspaces of R
3
?
4. Let P R
3
be the plane with equation x + y 2z = 4. Why is this not a subspace?
Find two vectors, x and y, that are in P but with the property that x +y is not.
5. Find the echolon form U, the free variables, and the special (particular) solution of
Ax = b for
(a) A =
_
0 1 0 3
0 2 0 6
_
, b =
_

0
1
_
. When does Ax = b have a solution? (When
1
= ?.) Give the complete solution.
(b) A =
_
_
_
_
0 0
1 2
0 0
3 6
_
_
_
_
, b =
_
_
_
_
3
_
_
_
_
. When does Ax = b have a solution? (When ...)
Give the complete solution.
6. Write the complete soluton x = x
p
+x
n
to these systems:
_
1 2 2
2 4 5
_
_
_
u
v
w
_
_
=
_
1
4
_
and
_
1 2 2
2 4 4
_
_
_
u
v
w
_
_
=
_
1
4
_
7. Which of these statements is a correct denition of the rank of a given matrix A
R
mn
? (Indicate all correct ones.)
5.5. Exercises 173
(a) The number of nonzero rows in the row reduced form of A.
(b) The number of columns minus the number of rows, n m.
(c) The number of columns minus the number of free columns in the row reduced
form of A. (Note: a free column is a column that does not contain a pivot.)
(d) The number of 1s in the row reduced form of A.
8. Let A, B R
mn
, u R
m
, and v R
n
. The operation B := A + uv
T
is often called
a rank-1 update. Why?
9. Find the complete solution of
x +3y +z = 1
2x +6y +9z = 5
x 3y +3z = 5
.
_
_
1 3 1 2
2 6 4 8
0 0 2 4
_
_
_
_
_
_
x
y
z
t
_
_
_
_
=
_
_
1
3
1
_
_
.
5.6 The Answer to Life, The Universe, and Everything
We complete this chapter by showing how many answers about subspaces can be answered
from the upper-echolon form of the linear system.
To do so, consider again
_
_
1 3 1 2
2 6 4 8
0 0 2 4
_
_
_
_
_
_
x
y
z
t
_
_
_
_
=
_
_
1
3
1
_
_
from Question 10 in the previous section. Reducing this to upper echolon format yields
_
_
1 3 1 2 1
2 6 4 8 3
0 0 2 4 1
_
_

_
_
1 3 1 2 1
0 0 2 4 1
0 0 0 0 0
_
_
.
Here the boxed entries are the pivots (the rst nonzero entry in each row) and they identify
that the corresponding variables (x and z) are dependent variables while the other variables
(y and t) are free variables.
Give the general solution to the problem To nd the general solution to problem, you
recognize that there are two free variables (y and t) and a general solution can thus be given
by
_
_
_
_
0
0
_
_
_
_
+
_
_
_
_
1
0
_
_
_
_
+
_
_
_
_
0
1
_
_
_
_
.
Here x
p
=
_
_
_
_
0
0
_
_
_
_
is a particular (special) solution that solves the system. To obtain it,
you set the free variables to zero and solve for the values in the boxes:
_
1 3 1 2
0 0 2 4
_
_
_
_
_
x
0
z
0
_
_
_
_
=
_
_
1
1
0
_
_
or
x +z = 1
2z = 1
5.6. The Answer to Life, The Universe, and Everything 175
so that z = 1/2 and x = 1/2 yielding a particular solution x
p
=
_
_
_
_
1/2
0
1/2
0
_
_
_
_
.
Next, we have to nd the two vectors in the null space of the matrix. To obtain the rst,
we set the rst free variable to one and the other(s) to zero:
_
1 3 1 2
0 0 2 4
_
_
_
_
_
x
1
z
0
_
_
_
_
=
_
_
0
0
0
_
_
or
x +3 1 +z = 0
2z = 0
so that z = 0 and x = 3, yielding the rst vector in the null space x
n
0
=
_
_
_
_
3
1
0
0
_
_
_
_
.
To obtain the second, we set the second free variable to one and the other(s) to zero:
_
1 3 1 2
0 0 2 4
_
_
_
_
_
x
0
z
1
_
_
_
_
=
_
_
0
0
0
_
_
or
x +z +2 1 = 0
2z +4 1 = 0
so that z = 4/2 = 2 and x = z 2 = 0, yielding the second vector in the null space
x
n
1
=
_
_
_
_
0
0
2
1
_
_
_
_
.
And thus the general solution is given by
_
_
_
_
1/2
0
1/2
0
_
_
_
_
+
_
_
_
_
3
1
0
0
_
_
_
_
+
_
_
_
_
0
0
2
1
_
_
_
_
,
where and are scalars.
Find a specic (particular) solution to the problem The above procedure yields the
particular solution as part of the rst step.
Find vectors in the null space The rst step is to gure out how many (linear independent)
vectors there are in the null space. This equals the number of free variables. The above
procedure then gives you a step-by-step procedure for nding that many linearly independent
vectors in the null space.
Find linearly independent columns in the original matrix Note: this is equivalent to
nding a basis for the column space of the matrix.
To nd the linearly independent columns, you look at the upper echolon form of the
matrix:
_
_
1 3 1 2
0 0 2 4
0 0 0 0
_
_
with the pivots highlighted. The columns that have pivots in them are linearly independent.
The corresponding columns in the original matrix are linearly independent:
_
_
1 3 1 2
2 6 4 8
0 0 2 4
_
_
.
Thus, in our example, the answer is
_
_
1
2
0
_
_
and
_
_
1
4
2
_
_
(the rst and third column).
Dimension of the Column Space (Rank of the Matrix) The following are all equal:
The dimension of the column space.
The rank of the matrix.
The number of dependent variables.
The number of nonzero rows in the upper echelon form.
The number of columns in the matrix minus the number of free variables.
The number of columns in the matrix minus the dimension of the null space.
The number of linearly independent columns in the matrix.
The number of linearly independent rows in the matrix.
5.6. The Answer to Life, The Universe, and Everything 177
Find a basis for the row space of the matrix. The row space (we will see in the next
chapter) is the space spanned by the rows of the matrix (viewed as column vectors). Reducing
a matrix to upper echelon form merely takes linear combinations of the rows of the matrix.
What this means is that the space spanned by the rows of the original matrix is the same
space as is spanned by the rows of the matrix in upper echolon form. Thus, all you need to
do is list the rows in the matrix in upper echolon form, as column vectors.
For our example this means a basis for the row space of the matrix is given by
_
_
_
_
1
3
1
2
_
_
_
_
and
_
_
_
_
0
0
2
4
_
_
_
_
.
5.7 Answers
1. Which of the following subsets of R
3
are actually subspaces?
(a) The plane of vectors x = (
0
,
1
,
2
)
T
R
3
such that the rst component
0
= 0.
In other words, the set of all vectors
_
_
0
2
_
_
where
1
,
2
R.
Answer
This is a subspace. To show this, we need to show that it is closed under multi-
plication and addition. Let
_
_
0
2
_
_
and
_
_
0
2
_
_
lie in the plane and let R.
Show
_
_
0
2
_
_
is also in the plane:
_
_
0
2
_
_
=
_
_
0
2
_
_
which has a zero rst component and hence is in the plane.
Show
_
_
0
2
_
_
+
_
_
0
2
_
_
is also in the plane:
_
_
0
2
_
_
+
_
_
0
2
_
_
=
_
_
0
1
+
1
2
+
2
_
_
which has a zero rst component and is therefore also in the plane.
(b) The plane of vectors x with
0
= 1.
Answer This is not a subspace. The quickest way to show that a set is not a
subspace is to show that the origin is not in the set. Since the original does not
have a rst component equal to one, it is indeed not in the set. (Note: a set can
have the origin in it and still not be a subspace. But if it doesnt have the origin
in it, it is denitely not a subspace.)
5.7. Answers 179
(c) The vectors x with
1
2
= 0 (this is a union of two subspaces: those vectors with
1
= 0 and those vectors with
2
= 0).
Answer This is not a subspace. This time the origin is in the set. So we
have to show it is not a subspace in some other way. The next easiest way
to do this is to show that it is either not closed under multiplication or under
addition. Now, consider the vectors (0, 1, 0)
T
and (0, 0, 1)
T
. Both are in the set,
but (0, 1, 0)
T
+(0, 0, 1)
T
= (0, 1, 1) is not. So, the set is not closed under addition
and is thus not a subspace.
(d) All (linear) combinations of two given vectors (1, 1, 0)
T
and (2, 0, 1)
T
.
Answer This is a subspace. Let x and y both be a linear combination of these two
vectors. Then there exist
x
,
x
,
y
, and
y
such that x =
x
(1, 1, 0)
T
+
x
(2, 0, 1)
T
and y =
y
(1, 1, 0)
T
+
y
(2, 0, 1)
T
.
Show that the set is closed under scalar multiplication: Let R. We need
to show that x is also in the set. But x = (
x
(1, 1, 0)
T
+
x
(2, 0, 1)
T
=
(
x
)(1, 1, 0)
T
+ (
x
)(2, 0, 1)
T
) and hence a linear combination of the given
vectors.
Show that the set is closed under addition: x+y =
x
(1, 1, 0)
T
+
x
(2, 0, 1)
T
+
y
(1, 1, 0)
T
+
y
(2, 0, 1)
T
= (
x
+
y
)(1, 1, 0)
T
+ (
x
+
y
)(2, 0, 1)
T
and hence
a linear combination of the given vectors.
(e) The plane of vectors x = (
0
,
1
,
2
)
T
that satisfy
2
1
+ 3
0
= 0.
2. Describe the column space and nullspace of the matrices
A =
_
1 1
0 0
_
, B =
_
0 0 3
1 2 3
_
, and C =
_
0 0 0
0 0 0
_
.
Answer The column space of A equals all vectors
_

0
0
_
.
The column space of B is all of R
2
.
The column space of C is the origin.
3. Let P R
3
be the plane with equation x + 2y + z = 6. What is the equation of the
plane P
0
through the origin parallel to P? Are P and/or P
0
subspaces of R
3
?
Answer The plane x+2y+z = 0 is parallel to P and goes through the origin ((0, 0, 0)
T
satises the equation). P is not a subspace since it does not contain the origin. P
0
is
a subspace.
(Hint: a plane that goes through the origin is always closed under multi-
plication and addition, and is thus a subspace.)
4. Let P R
3
be the plane with equation x + y 2z = 4. Why is this not a subspace?
Find two vectors, x and y, that are in P but with the property that x +y is not.
Answer This is not a subspace because it does not contain the origin. The vectors
(4, 0, 0)
T
and (0, 4, 0)
T
lie in the plane (they satisfy the equation) but (4, 4, 0)
T
does
not (it does not satisfy the equation).
5. Find the echolon form U, the free variables, and the special (particular) solution of
Ax = b for
(a) A =
_
0 1 0 3
0 2 0 6
_
, b =
_

0
1
_
. When does Ax = b have a solution? (When
1
= ?.) Give the complete solution.
Answer
Echelon form U:
_
0 1 0 3
0
0 0 0 0
1
2
0
_
This is consistent only if
1
2
2
= 0. In other words, when
1
= 2
2
. Complete
solution: Note that
0
,
2
, and
3
are free variables. Thus, the complete solution
has the form
_
_
_
_
0
0
0
_
_
_
_
+
_
_
_
_
1
0
0
_
_
_
_
+
_
_
_
_
0
1
0
_
_
_
_
+
_
_
_
_
0
0
1
_
_
_
_
.
Solving for the boxes yields the special solution (the rst vector) and the vectors
in the null spaces (the other three vectors):
_
_
_
_
0
0
0
0
_
_
_
_
+
_
_
_
_
1
0
0
0
_
_
_
_
+
_
_
_
_
0
0
1
0
_
_
_
_
+
_
_
_
_
0
-3
0
1
_
_
_
_
.
(b) A =
_
_
_
_
0 0
1 2
0 0
3 6
_
_
_
_
, b =
_
_
_
_
3
_
_
_
_
. When does Ax = b have a solution? (When ...)
Give the complete solution.
Answer Echelon form:
_
_
_
_
1 2
1
0 0
0
0 0
2
0 0
3
3
1
_
_
_
_
. Free variable:
1
. Solution: only
when
0
=
2
= 0 and
3
= 3
1
. Complete solution:
_

0
0
_
+
_
-2
1
_
5.7. Answers 181
6. Write the complete soluton x = x
p
+x
n
to these systems:
_
1 2 2
2 4 5
_
_
_
u
v
w
_
_
=
_
1
4
_
and
_
1 2 2
2 4 4
_
_
_
u
v
w
_
_
=
_
1
4
_
Answer: The echelon form for the rst system is
_
1 2 2 1
0 0 1 2
_
and hence the complete solution has the form
_
_
-3
0
2
_
_
+
_
_
-2
1
0
_
_
.
7. Which of these statements is a correct denition of the rank of a given matrix A
R
mn
? (Indicate all correct ones.)
(a) The number of nonzero rows in the row reduced form of A. True
(b) The number of columns minus the number of rows, n m. False
(c) The number of columns minus the number of free columns in the row reduced
form of A. (Note: a free column is a column that does not contain a pivot.) True
(d) The number of 1s in the row reduced form of A. False
8. Let A, B R
mn
, u R
m
, and v R
n
. The operation B := A + uv
T
is often called
a rank-1 update. Why?
Answer The reason is that a matrix of the form uv
T
has at most rank one: Partition
v = (
0
, ,
n1
)
T
. Then
uv
T
= alphau
_

0

n1
_
= alpha
_

0
u
n1
u
_
=
_
(
0
)u (
n1
)u
_
.
Thus all columns are a multiple of the same vector u and thus (n1) columns must be
linearly dependent. Hence the rank is one. A rank-1 update thus stand for updating a
given matrix by adding a matrix of rank (at most) one to it.
x +3y +z = 1
2x +6y +9z = 5
x 3y +3z = 5
.
Answer
_
_
1 3 1 1
2 6 9 5
1 3 3 5
_
_
_
_
1 3 1 1
0 0 7 3
0 0 4 6
_
_
_
_
1 3 1 1
0 0 7 3
0 0 0 6
4
7
3(,= 0)
_
_
Thus the system is inconsistent and doesnt have a solution.
_
_
1 3 1 2
2 6 4 8
0 0 2 4
_
_
_
_
_
_
x
y
z
t
_
_
_
_
=
_
_
1
3
1
_
_
.
Answer See Section 5.6.
Chapter 6
Orthogonality
6.1 Orthogonal Vectors and Subspaces
Recall that if nonzero vectors x, y R
n
are linearly independent then the subspace of all
vectors x +y, , R (the space spanned by x and y) form a plane. All three vectors x,
y, and (x y) lie in this plane and they form a triangle:
#
t
t
t
t
t
t
t
x
y z = y x
where this page represents the plane in which all of these vectors lie.
Vectors x and y are considered to be orthogonal (perpendicular) if they meet at a right
angle. Using the Euclidean length |x|
2
=
_
2
0
+ +
2
n1
=
x
T
x, we nd that the
Pythagorean Theorem dictates that if the angle in the triangle where x and y meet is a right
angle, then |z|
2
2
= |x|
2
2
+|y|
2
2
. In this case,
|x|
2
2
+|y|
2
2
= |z|
2
2
= |y x|
2
2
= (y x)
T
(y x) = (y
T
x
T
)(y x) = (y
T
x
T
)y (y
T
x
T
)x
= y
T
y
..
|y|
2
2
(x
T
y +y
T
x)
. .
2x
T
y
+ x
T
x
..
|x|
2
2
= |x|
2
2
2x
T
y +|y|
2
2
,
in other words,
|x|
2
2
+|y|
2
2
= |x|
2
2
2x
T
y +|y|
2
2
.
Cancelling terms on the left and right of the equality, this implies that x
T
y = 0. This
motivates the following denition:
183
184 Chapter 6. Orthogonality
Denition 6.1 Two vectors x, y R
n
are said to be orthogonal if x
T
y = 0.
Sometimes we will use the notation x y to indicate that x is perpendicular to y.
We can extend this to dene orthogonality of two subspaces:
Denition 6.2 Let V, W R
n
be subspaces. Then V and W are said to be orthogonal if
v V and w W implies that v
T
w = 0.
We will use the notation V W to indicate that subspace V is orthogonal to subspace W.
Denition 6.3 Given subspace V R
n
, the set of all vectors in R
n
that are orthogonal to
V is denoted by V
(pronounced as V-perp).
Exercise 6.4 Let V, W R
n
be subspaces. Show that if V W then V W = 0, the
zero vector.
Whenever V W = 0 we will sometimes call this the trivial intersection. Trivial in the
sense that it only contains the zero vector.
Exercise 6.5 Show that if V R
n
is a subspace, then V
is a subspace.
Let us recall some denitions:
Denition 6.6 The column space of a matrix A R
mn
, ((/), equals the set of all vectors
in R
m
that can be written as Ax: y [ y = Ax.
Denition 6.7 The null space of a matrix A R
mn
, ^(A), equals the set of all vectors
in R
n
that map to the zero vector: x [ Ax = 0.
Denition 6.8 The row space of a matrix A R
mn
, 1(/), equals the set of all vectors in
R
n
that can be written as A
T
x: y [ y = A
T
x.
Denition 6.9 The left null space of a matrix A R
mn
equals the set of all vectors x in
R
m
such x
T
A = 0.
Exercise 6.10 Show that the left null space of a matrix A R
mn
equals ^(A
T
).
mn
. Then the null space of A is orthogonal to the row space of
A: 1(/) ^(/).
Proof: Assume that y 1(/) and x ^(A). Then there exists a vector z R
n
such that
y = A
T
z. Now, y
T
x = (A
T
z)
T
x = (z
T
A)x = z
T
(Ax) = 0.
Theorem 6.12 The dimension of 1(/) equals the dimension of ((/).
Proof: We saw this in Chapter 2 of Strangs book: The dimension of the row space equals
the number of linearly independent rows, which equals the number of nonzero rows that
result from the Gauss-Jordan method, which equals the number of pivots that show up in
that method, which equals the number of linearly independent columns.
mn
. Then every x R
n
can be written as x = x
r
+ x
n
where
x
r
1(/) and x
n
^(A).
6.1. Orthogonal Vectors and Subspaces 185
R
n
R
m
nullspace dim n r
row
space
of A
dim r
column
space
of A
dim r
left nullspace
dim mr
x
r
x
n
x = x
r
+x
n
Ax
r
= b
Ax = b
b
Figure 6.1: A pictorial description of how x = x
r
+ x
n
is transformed by A R
mn
into
b = Ax = A(x
r
+ x
n
). (Blatently borrowed from Strangs book.) Important to note: (1)
The row space is orthogonal to the nullspace; (2) The column space is orthogonal to the left
nullspace; (3) n = r + (n r) = dim(1(/)) + dim(^(/)); and (4) m = r + (m r) =
dim(((/)) + dim(^(/
T
)).
Proof: Recall from Chapter 2 of Strangs book that the dimension of ^(A) and the dimen-
sion of ((/), r, add to the number of columns, n. Thus, the dimension of 1(/) equals r and
the dimension of ^(A) equals n r. If x R
n
cannot be written as x
r
+ x
n
as indicated,
then consider the set of vectors that consists of the union of a basis of 1(/) and a basis of
^(A), plus the vector x. This set is linearly independent and has n+1 vectors, contradicting
the fact that R
n
has dimension n.
mn
. Then A is a one-to-one, onto mapping from 1(/) to ((/).
Proof: Let A R
mn
. We need to show that
A maps 1(/) to ((/). This is trivial, since any vector x R
m
maps to ((/).
Uniqueness: We need to show that if x, y 1(/) and Ax = Ay then x = y. Notice
that Ax = Ay implies that A(x y) = 0, which means that (x y) is both in 1(/)
(since it is a linear combination of x and y, both of which are in 1(/)) and in ^(A).
Since we just showed that these two spaces are orthogonal, we conclude that (xy) = 0,
the zero vector. Thus x = y.
Onto: We need to show that for any b ((/) there exists x
r
1(/) such that
Ax
r
= b. Notice that if b (, then there exists x R
n
such that Ax = b. By
Theorem 6.13, x = x
r
+ x
n
where x
r
1(/) and x
n
^(A). Then b = Ax =
A(x
r
+x
n
) = Ax
r
+Ax
n
= Ax
r
.
mn
. Then the left null space of A is orthogonal to the columns
space of A and the dimension of the left null space of A equals mr, where r is the dimension
of the column space of A.
Proof: This follows trivially by applying the previous theorems to A
T
.
The last few theorems are summarized in Figure 6.1.
6.2 Motivating Example, Part I
Let us consider the following set of points:
(
0
,
0
) = (1, 1.97), (
1
,
1
) = (2, 6.97), (
2
,
2
) = (3, 8.89), (
3
,
3
) = (4, 10.01),
which we plot in Figure 6.2. What we would like to do is to nd a line that interpolates
these points as near as is possible, in the sense that the sum of the square of the distances
to the line are minimized. Let us express this with matrices and vectors.
Let
x =
_
_
_
_
3
_
_
_
_
=
_
_
_
_
1
2
3
4
_
_
_
_
y =
_
_
_
_
3
_
_
_
_
=
_
_
_
_
1.97
6.97
8.89
10.01
_
_
_
_
.
If we give the equation of the line as y =
0
+
1
x then, IF this line COULD go through
all these points THEN the following equations would have to be simultaneously satied:
0
=
0
+
1
1
=
0
+
1
2
=
0
+
1
3
=
0
+
1
4
or, specically,
1.97 =
0
+
1
6.97 =
0
+ 2
1
8.89 =
0
+ 3
1
10.01 =
0
+ 4
1
which can be written in matrix notation as
_
_
_
_
3
_
_
_
_
=
_
_
_
_
1
0
1
1
1
2
1
3
_
_
_
_
_

0
1
_
or, specically,
_
_
_
_
1.97
6.97
8.89
10.01
_
_
_
_
=
_
_
_
_
1 1
1 2
1 3
1 4
_
_
_
_
_

0
1
_
.
6.2. Motivating Example, Part I 187
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6
Figure 6.2: Left: A plot of the points. Right: Some rst guess of a line that interpolates,
and the distance from the points to the line.
Now, just looking at Figure 6.2, it is obvious that these point do not lie on the same line
and that therefore all these equations cannot be simultaneously satied. So, what do we
do?
How does it relate to column spaces? The rst question we ask is For what right-hand
sides could we have solved all four equations simultaneously? We would have had to choose
y so that Ac = y, where
A =
_
_
_
_
1
0
1
1
1
2
1
3
_
_
_
_
=
_
_
_
_
1 1
1 2
1 3
1 4
_
_
_
_
and c =
_

0
1
_
.
This means that y must be in the column space of A. It must be possible to express it as
y =
0
a
0
+
1
a
1
, where A =
_
a
0
a
1
_
! What does this mean if we relate this back to the
picture? Only if
0
, ,
3
have the property that (1,
0
), , (4,
3
) lie on a line can
we nd coecients
0
and
1
such that Ac = y.
How does this problem relate to orthogonality and projection? The problem is that
the given y does not lie in the column space of A. So, a question is what vector, z, that
does lie in the column space we should use to solve Ac = z instead so that we end up
with a line that best interpolates the given points. Now, if z solves Ac = z exactly, then
z =
_
a
0
a
1
_
_

0
1
_
=
0
a
0
+
1
a
1
, which is of course just a repeat of the observation that
z is in the column space of A. Thus, what we want is y = z + w, where w is as small (in
length) as possible. This happens when w is orthogonal to z! So, y =
0
a
0
+
1
a
1
+w, with
a
T
0
w = a
T
1
w = 0. The vector z in the column space of A that is closest to y is known as the
projection of y onto the column space of A. So, it would be nice to have a way of nding a
way to compute this projection.
6.3 Solving a Linear Least-Squares Problem
The last problem motivated the following general problem: Given m equations in n un-
knowns, we end up with a system Ax = b where A R
mn
, x R
n
, and b R
m
.
This system of equations may have no solutions. This happens when b is not in the
column space of A.
This system may have a unique solution. This happens only when r = m = n, where
r is the rank of the matrix (the dimension of the column space of A). Another way of
saying this is that it happens only if A is square and nonsingular (it has an inverse).
This system may have many solutions. This happens when b is in the column space of
A and r < n (the columns of A are linearly dependent, so that the null space of A is
nontrivial).
Let us focus on the rst case: b is not in the column space of A. In the last section, we
argued that what we want is an approximate solution x such that A x = z, where z is the
vector in the column space of A that is closest to b: b = z + w where w
T
v = 0 for all
v ((/). From Figure 6.1 we conclude that this means that w is in the left null space of
A. So, A
T
w = 0. But that means that
0 = A
T
w = A
T
(b z) = A
T
(b A x)
which we can rewrite as
A
T
A x = A
T
b. (6.1)
Lemma 6.16 If A R
mn
has linearly independent columns, then A
T
A is nonsingular
(equivalently, has an inverse, A
T
A x = A
T
b has a solution for all b, etc.).
Proof: Proof by contradiction. Assume that A R
mn
has linearly independent columns
and A
T
A is singular. Then there exists x ,= 0 such that A
T
Ax = 0. Hence, there exists
y = Ax ,= 0 such that A
T
y = 0 (because A has linearly independent columns and x ,= 0).
This means y is in the left null space of A. But y is also in the column space of A, since
Ax = y. Thus, y = 0, since the intersection of the column space of A and the left null space
of A only contains the zero vector. This contradicts the fact that A has linearly independent
columns.
6.4. Motivating Example, Part II 189
This means that if A has linearly independent columns, then the desired x in (6.1) is
given by
x = (A
T
A)
1
A
T
b
and the vector z ((/) closest to b is given by
z = A x = A(A
T
A)
1
A
T
b.
This shows that if A has linearly independent columns, then z = A(A
T
A)
1
A
T
b is the vector
in the columns space closest to b. Think of this as the projection of z onto the column space
of A.
Let us now formulate the above observations as a special case of a linear least-squares
problem:
mn
, b R
m
, and x R
n
and assume that A has linearly
independent columns. Then the solution that minimizes min
x
|b Ax|
2
is given by x =
(A
T
A)
1
A
T
b.
mn
.
If A has linearly independent columns, then (A
T
A)
1
A
T
is called the (left) pseudo
inverse. Note that this means m n and (A
T
A)
1
A
T
A = I.
If A has linearly independent rows, then A
T
(AA
T
)
1
is called the right pseudo inverse.
Note that this means m n and AA
T
(AA
T
) = I.
Now, let us discuss the least-squares in the name of the section. Notice that we are
trying to nd x that minimizes min
x
|b Ax|
2
. Now, if x minimizes min
x
|b Ax|
2
, it also
minimizes the function F(x) = |b Ax|
2
2
(since |b Ax|
2
is always positive). But
F(x) = |b Ax|
2
2
= (b Ax)
T
(b Ax) = b
T
b 2b
T
Ax x
T
A
T
Ax.
Recall how one would nd the minimum of a function f : R R, f(x) =
2
x
2
2x +
2
:
One would take the derivative f
(x) = 2
2
x 2 and set it to zero: 2
2
x 2 = 0. The
minimum is then attained by x that solves this equation: x = /. Now, here F : R
n
R.
Those who have taken multivariate calculus will know that this function is optimized when
its gradient (essentially the derivative) equals zero: 0 2A
T
b + 2A
T
Ax = 0, or, A
T
Ax =
A
T
b. And thus we arrive at the same place as in (6.17)! We are looking for x that solves
A
T
Ax = A
T
b.
Now, let z = A x. Then F(x) = |b z|
2
2
= (b z)
T
(b z) =
m1
i
(
i
i
)
2
and hence
the desired solution x minimizes the sum of the squares of the elements of b A x.
> x = [
1
2
3
4
];
> A = [
1 1
1 2
1 3
1 4
];
> y = [
1.97
6.97
8.89
10.01
];
> B = A * A
B =
4 10
10 30
> yhat = A * y
yhat =
27.840
82.620
> c = B \ yhat % this solves B c = yhat
c =
0.45000
2.60400
> plot( x, y, o ) % plot the points ( x(1), y(1) ), ...
> axis( [ 0, 6, 0, 12 ] ) % adjust the x and y ranges
> hold on % plot the next graph over the last
> z = A * c; % z = projection of y onto the column
% space of A
> plot( x, z, - ) % plot the line through the points
% ( x(1), z(1) ), ( x(2), z(2) ), ...
> hold off
Figure 6.3: Solving the motivating problem in Section 6.4.
6.4. Motivating Example, Part II 191
6.4 Motivating Example, Part II
Let us return to the example from Section 6.2. To nd the best solution to Ac = y:
_
_
_
_
1 1
1 2
1 3
1 4
_
_
_
_
_

0
1
_
=
_
_
_
_
1.97
6.97
8.89
10.01
_
_
_
_
we instead solve
_
_
_
_
1 1
1 2
1 3
1 4
_
_
_
_
T
_
_
_
_
1 1
1 2
1 3
1 4
_
_
_
_
_

0
1
_
=
_
_
_
_
1 1
1 2
1 3
1 4
_
_
_
_
T
_
_
_
_
1.97
6.97
8.89
10.01
_
_
_
_
.
Lets use octave to do so, in Figure 6.3. By looking at the resulting plot, you will recognize
that this is a nice t to the data.
6.5 Computing an Orthonormal Basis
Denition 6.19 Let q
0
, q
1
, . . . , q
k1
be a basis for subspace V R
m
. Then these vectors
form an orthonormal basis if each vector is of unit length and q
T
i
q
j
= 0 if i ,= j.
Exercise 6.20 Let q
0
, q
1
, . . . , q
k1
be an orthonormal basis for subspace V R
m
. Show
that
1. q
T
i
q
j
=
_
1 if i = j
0 otherwise
2. If Q =
_
q
0
q
k1
_
then Q
T
Q = I, the identity.
Given a basis for a subspace we show how to compute an orthonormal basis for that same
space.
Let us start with two vectors a
0
, a
1
R
m
and matrix A =
_
a
0
a
1
_
and let us assume
these vectors are linearly independent. Then, by denition, they form a basis for the column
space of A. Some observations:
We can easily come up with a vector q
0
that is of length one and points in the same
direction as a
0
: q
0
= a
0
/
00
where
00
= |a
0
|
2
.
Check: |q
0
|
2
= |a
0
/|a
0
|
2
|
2
= |a
0
|
2
/|a
0
|
2
= 1.
Now, we can write a
1
=
01
q
0
+v where v is orthogonal to q
0
, for some scalar
00
. Then
multiplying on the left by q
T
0
we nd that q
T
0
a
1
= q
T
0
(
01
q
0
+v) = q
T
0

00
q
0
+q
T
0
v =
01
q
T
0
q
0
+q
T
0
v =
01
because q
T
0
q
0
= 1 and q
T
0
v = 0.
Once we have computed
01
, we can compute v = a
1
01
q
0
.
What we have now are two vectors, q
0
and v, the rst of which has length 1 and
the second of which is orthogonal to the rst. If we now scale the second one to
have length 1, then we have found two orthonormal vectors (vectors that are of
length 1 that are mutually orthogonal): Compute
11
= |v|
2
and let q
1
= v/
11
.
Notice that a
1
=
01
q
0
+ v =
01
q
0
+
11
q
1
so that q
1
= (a
1

01
q
0
)/
11
=
(a
1

01
00
a
0
)/
11
.
Now, the important observation is that the column space of A =
_
a
0
a
1
_
is equal to
the column space of the just computed Q =
_
q
0
q
1
_
.
: Let z be in the column space of A. Then there exist
0
,
1
R such that
z =
0
a
0
+
1
a
1
=
0
(
00
q
0
)+
1
(
01
q
0
+
11
q
1
) = (
0
00
+
1
01
)
. .
0
q
0
+ (
1
11
)
. .
1
q
1
=
0
q
0
+
1
q
1
.
In other words, there exist
0
,
1
R such that z =
0
q
0
+
1
q
1
. Hence z is in the
column space of Q.
: Let z be in the column space of Q. Then there exist
0
,
1
R such that
z =
0
q
0
+
1
q
1
=
0
(a
0
/
00
) +
1
(a
1

01
00
a
0
)/
11
= etc. =
0
a
0
+
1
a
1
.
In other words, there exist
0
,
1
R such that z =
0
a
0
+
1
a
1
. Hence z is in the
column space of A.
6.6 Motivating Example, Part III
Let us again return to the example from Section 6.2. We wanted to nd z that was the
projection of b onto (the closest point in) the column space of A. What we just found is
that this is the same as nding z that is the projection of y onto the column space of Q, as
computed in Section 6.5.
So, this brings up the question: If we have a matrix Q with orthonormal columns, how
do we compute the projection onto the column space of Q? We will work with a matrix Q
with two columns for now.
We want to nd an approximate solution to Ac = y. Now, y = z + w where z is in the
column space of Q and w is orthogonal to the column space of Q. Thus
y = z +w =
0
q
0
+
1
q
1
+w
where q
T
0
w = 0 and q
T
1
w = 0. Now
6.6. Motivating Example, Part III 193
q
T
0
y = q
0
(
0
q
0
+
1
q
1
+ w) =
0
q
T
0
q
0
+
1
q
T
0
q
1
+ q
T
0
w =
0
since q
T
0
q
0
= 1 and q
T
0
q
1
=
q
T
0
w = 0.
q
T
1
y = q
1
(
0
q
0
+
1
q
1
+ w) =
0
q
T
1
q
0
+
1
q
T
1
q
1
+ q
T
1
w =
1
since q
T
1
q
1
= 1 and q
T
1
q
0
=
q
T
1
w = 0.
So, we have a way of computing z:
z =
0
q
0
+
1
q
1
= q
T
0
y q
0
. .
projection
of y
onto q
0
+ q
T
1
y q
1
.
. .
projection
of y
onto q
1
Thus, we have a systematic way of solving our problem. First we must compute Q:
The matrix is given by
A =
_
_
_
_
1
0
1
1
1
2
1
3
_
_
_
_
so that a
0
=
_
_
_
_
1
1
1
1
_
_
_
_
and a
1
=
_
_
_
_
3
_
_
_
_
.

00
= |a
0
|
2
=
_
a
T
0
a
0
=
_
3
i=0
1 =
4 = 2.
q
0
= a
0
/
00
= (0.5, 0.5, 0.5, 0.5)
T
.

01
= q
T
0
a
1
= (0.5, 0.5, 0.5, 0.5)(1, 2, 3, 4)
T
= 5
v = a
1
01
q
0
= (1, 2, 3, 4)
T
5(0.5, 0.5, 0.5, 0.5, 0.5)
T
= (1.5, 0.5, 0.5, 1.5)
T
.

11
= |v|
2
=
_
(1.5)
2
+ (0.5)
2
+ (0.5)
2
+ (1.5)
2
=
_
2
_
3
2
_
2
+ 2
_
1
2
_
2
=
5.
q
1
= v/
11
=
5/5(1.5, 0.5, 0.5, 1.5)

T
.
Next, we compute z = q
T
0
y q
0
+q
T
1
y q
1
. For this, we again use octave, in Figure 6.4.
Now, it turns out there is an easier way to proceed than to compute z and then solve,
from scratch, Ac = z.
Recall that a
0
=
00
q
0
and a
1
=
01
q
0
+v =
01
q
0
+
11
q
1
. Now, we can write this as
_
a
0
a
1
_
=
_
q
0
q
1
_
_

00

01
0
11
_
=
_

00
q
0

01
q
0
+
11
q
1
_
.
Also, we know that
z = q
T
0
yq
0
+q
T
1
yq
1
=
_
q
0
q
1
_
_
q
T
0
y
q
T
1
y
_
.
> q0 = 0.5 * ones( 4, 1 )
q0 =
0.50000
0.50000
0.50000
0.50000
> q1 = sqrt(5)/5 * [ -1.5 -0.5 0.5 1.5 ]
q1 =
-0.67082
-0.22361
0.22361
0.67082
> y = [ 1.97 6.97 8.89 10.01 ]
y =
1.9700
6.9700
8.8900
10.0100
> z = q0 * y * q0 + q1 * y * q1
z =
3.0540
5.6580
8.2620
10.8660
> c = A \ z % Solve A c = z
Figure 6.4: Solution of using the Q computed in Section 6.6.
6.7. What does this all mean? 195
Thus, Ac = z can be rewritten as QRc = Q z where z =
_
q
T
0
y
q
T
1
y
_
= Q
T
z. Now, multiply
both side from the left by Q
T
: Q
T
QRc = Q
T
Q z and recognize that Q
T
Q = I, the identity,
since Q has orthonormal columns! We are left with Rc = z. In other words
_

00

01
0
11
__

0
1
_
=
_
q
T
0
y
q
T
1
y
_
.
For our specic example this means we need to solve the upper triangular system
_
2 5
0
5
__

0
1
_
=
_
13.92
5.8227
_
.
We can then solve the system with octave via the steps
> R = [
> 2 5
> 0 sqrt(5)
> ]
R =
2.00000 5.00000
0.00000 2.23607
> zhat = [
> q0 * y
> q1 * y
> ]
zhat =
13.9200
5.8227
> c = R \ zhat
c =
0.45000
2.60400
In Figure 6.5 we plot the line y =
0
+
1
x with the resulting coecients.
6.7 What does this all mean?
Lets summarize a few observations:
0
2
4
6
8
10
12
14
0 1 2 3 4 5 6
Figure 6.5: Least-squares t to the data.
Component in the direction of a unit vector Given q, y R
m
where q has unit length
(|q|
2
= 1), the component of y in the direction of q is given by q
T
yq. In other words,
y = q
T
yq +z where z is orthogonal to q (q
T
z = z
T
q = 0).
Orthonormal basis Let q
0
, . . . , q
k1
R
m
form an orthonormal basis for V. Let Q =
_
q
0
q
k1
_
. Then
Q
T
Q = I. Why? The (i, j) entry of Q
T
Q = q
T
i
q
j
=
_
0 if i ,= j
1 otherwise
.
Given y R
m
, the vector z in V closest to y is given by y = q
T
0
yq
0
+ +q
T
k1
yq
k1
=
QQ
T
y. The matrix QQ
T
is the projection of y onto V.
Given y R
m
, the component of this vector that is orthogonal to V is given by
z = y y = y (q
T
0
yq
0
+ + q
T
k1
yq
k1
) = y QQ
T
y = (I QQ
T
)y. The matrix
I QQ
T
is the projection of y onto the space orthonormal to V, V
.
Computing an orthonormal basis Given linearly independent vectors a
0
, . . . , a
k1
R
m
that form a basis for subspace V, the following procedure will compute an orthonormal basis
for V, q
0
, . . . , q
k1
:

00
= |a
0
| (the length of a
0
).
q
0
= a
0
/
00
(the vector in the direction of a
0
of unit length).

01
= q
T
0
a
1
.
v
1
= a
1
01
q
0
= a
1
q
T
0
a
1
q
0
(= the component of a
1
orthogonal to q
0
).

11
= |v
1
|
2
(= the length of the component of a
1
orthogonal to q
0
).
q
1
= v
1
/
11
(= the unit vector in the direction of the component of a
1
orthogonal to
q
0
).

02
= q
T
0
a
2
.

12
= q
T
1
a
2
.
v
2
= a
2
02
q
0
12
q
1
= a
2
(q
T
0
a
2
q
0
+q
T
1
a
2
q
1
) (= the component of a
2
orthogonal to
q
0
and q
1
).

22
= |v
2
|
2
(= the length of the component of a
2
orthogonal to q
0
and q
1
).
q
2
= v
2
/
22
(= the unit vector in the direction of the component of a
2
orthogonal to
q
0
and q
1
).
etc.
This procedure is known as Classical Gram-Schmidt. It is stated as an algorithm in Fig. 6.6.
It can be veried that A = QR, where
A =
_
a
0
a
n1
_
, Q =
_
q
0
q
n1
_
, and R =
_
_
_
_
_
00

01

0(n1)
0
11

1(n1)
.
.
.
.
.
.
.
.
.
.
.
.
0 0
(n1)(n1)
_
_
_
_
_
so that
_
a
0
a
n1
_
=
_
q
0
q
n1
_
_
_
_
_
_
00

01

0(n1)
0
11

1(n1)
.
.
.
.
.
.
.
.
.
.
.
.
0 0
(n1)(n1)
_
_
_
_
_
.
for j = 0, . . . , k 1
Compute component of a
j
orthogonal to q
0
, . . . , q
j1
:
v
j
= a
j
for i = 0, . . . , j 1
Let
ij
= q
T
i
a
j
v
j
= v
j
ij
q
i
endfor
Normalize v
j
to unit length:
jj
= |v
j
|
2
q
j
= v
j
/
jj
endfor
Figure 6.6: Gram-Schmidt procedure for computing an orthonormal basis for the space
spanned by linearly independent vectors a
0
, . . . , a
k1
, which are now taken to be the
columns of matrix A.
for j = 0, . . . , k 1
Compute component of a
j
orthogonal to q
0
, . . . , q
j1
:
_
_
_
0j
.
.
.
(j1)j
_
_
_
=
_
q
0
q
j1
_
T
a
j
=
_
_
_
q
T
0
.
.
.
q
T
j1
_
_
_
a
j
=
_
_
_
q
T
0
a
j
.
.
.
q
T
j1
a
j
_
_
_
v
j
= a
j
_
q
0
q
j1
_
_
_
_
0j
.
.
.
(j1)j
_
_
_
= a
j
(
0j
q
0
+ +
(j1)j
q
j
)
Normalize v
j
to unit length:
jj
= |v
j
|
2
q
j
= v
j
/
jj
endfor
Figure 6.7: Gram-Schmidt procedure for computing an orthonormal basis for the space
spanned by linearly independent vectors a
0
, . . . , a
k1
.
Two ways for solving linear least-squares problems We motivated two dierent ways for
computing the best solution, x, to the linear least-squares problem min
x
|Ax = b|
2
where
A R
mn
has linearly independent columns:
x = (A
T
A)
1
A
T
b.
Compute Q with orthonormal columns and upper triangular R such that A = QR.
Solve R x = Q
T
b. In other words, x = R
1
Q
T
b.
These, fortunately, yield the same result:
x = (A
T
A)
1
A
T
b
= ((QR)
T
(QR))
1
(QR)
T
b
= (R
T
Q
T
QR)
1
R
T
Q
T
b
= (R
T
R)
1
R
T
Q
T
b
= R
1
(R
T
)
1
R
T
Q
T
b
= R
1
Q
T
b.
6.8 Exercises
(Some of these exercises have been inspired by similar exercises in Strangs book.)
1. Consider A =
_
_
1 0
0 1
1 1
_
_
and b =
_
_
1
1
0
_
_
.
(a) Determine if b is in the column space of A.
(b) Compute the approximate solution, in the least squares sense, of Ax = b. (This
means: solve A
T
Ax = A
T
b.)
(c) What is the project of b onto the column space of A?
(d) Compute the QR factorization of A and use it to solve Ax = b. (In other words,
compute QR using the Gram-Schmidt process and then solve Rx = Q
T
b.)
2. Consider A =
_
_
1 1
1 0
1 1
_
_
and b =
_
_
4
5
9
_
_
.
means: solve A
T
Ax = A
T
b.)
T
b.)
3. Consider A =
_
_
1 1
1 1
2 4
_
_
and b =
_
_
1
2
7
_
_
.
(a) Find the projection of b onto the column space of A.
(b) Split b into u+v where u is in the column space and v is perpendicular (orthogonal)
to that space.
(c) Which of the four subspaces (C(A), R(A), ^(A), ^(A
T
)) contains q?
4. What 2 2 matrix A projects the x-y plane onto the line x +y = 0?
5. Find the best straight line t to the following data:
y = 2 at t = 1
y = 3 at t = 1
y = 0 at t = 0
y = 5 at t = 2
6.8. Exercises 201
Answers
1. Consider A =
_
_
1 0
0 1
1 1
_
_
and b =
_
_
1
1
0
_
_
.
Answer
To answer this, you simply go through the usual steps of Gaussian Elimination
with an appended right-hand side:
_
_
1 0 1
0 1 1
1 1 0
_
_
_
_
1 0 1
0 1 1
0 1 1
_
_
_
_
1 0 1
0 1 1
0 0 2
_
_
which has a contradiction in the last row. So, b is not in the column space of A.
means: solve A
T
Ax = A
T
b.)
Answer
A
T
A =
_
_
1 0
0 1
1 1
_
_
T
_
_
1 0
0 1
1 1
_
_
=
_
2 1
1 2
_
and
A
T
b =
_
_
1 0
0 1
1 1
_
_
T
_
_
1
1
0
_
_
=
_
1
1
_
Now, you can either
Solve
_
2 1
1 2
__

0
1
_
=
_
1
1
_
via Gaussian elimination:
_
2 1 1
1 2 1
_
_
1 2 1
2 1 1
_
_
1 2 1
0 3 1
_
So that
1
= 1/3 and
0
= 1 2(1/3) = 1/3.
You can remember that
_
a b
c d
_
1
=
1
ad bc
_
d b
c a
_
and compute
_

0
1
_
=
_
2 1
1 2
_
1
_
1
1
_
=
1
3
_
2 1
1 2
_
1
_
1
1
_
=
1
3
_
1
1
_
=
_
1/3
1/3
_
.
Answer
Ax =
_
_
1 0
0 1
1 1
_
_
_
1/3
1/3
_
=
_
_
1/3
1/3
2/3
_
_
T
b.)
Answer
To do this, you execute the Gram-Schmidt process:
a
0
=
_
_
1
0
1
_
_
and a
1
=
_
_
0
1
1
_
_
.

00
=
_
a
T
0
a
0
=
2
q
0
= a
0
/
00
=
_
_
1
0
1
_
_
/
2 =
_
_
2
2
0
2
2
_
_
.

01
= q
T
0
a
1
=
_
_
2
2
0
2
2
_
_
T _
_
0
1
1
_
_
=
2
2
.
a
1
= a
1
01
q
0
=
_
_
0
1
1
_
_
2
2
_
_
2
2
0
2
2
_
_
=
_
_
1/2
1
1/2
_
_
.

11
=
_
a
T
1
a
1
=
_
_
_
1/2
1
1/2
_
_
T
_
_
1/2
1
1/2
_
_
=
_
1/4 + 1 + 1/4 =
6
2
.
q
1
= a
1
/
11
=
_
_
1/2
1
1/2
_
_
/
6
2
=
_
_
6/6
6/3
6/6
_
_
.
Thus,
A = QR =
_
_
2/2
6/6
0
6/3
2/2
6/6
_
_
_

2
2
2
0
6
2
_
.
Now, we must solve the problem with this factorization by solving
Rx = Q
T
b
6.8. Exercises 203
or
_

2
2
2
0
6
2
_
_

0
1
_
=
_
_
2/2
6/6
0
6/3
2/2
6/6
_
_
T
_
_
1
1
0
_
_
=
_
2/2
6/6
_
.
Thus,
1
= (
6/6)/(
6/2) = 1/3 and

0
= (
2/2 (
2/2)(1/3))/
2 = 1/3
2. Consider A =
_
_
1 1
1 0
1 1
_
_
and b =
_
_
4
5
9
_
_
.
Answer
To answer this, you simply go through the usual steps of Gaussian Elimination
with an appended right-hand side:
_
_
1 1 4
1 0 5
1 1 9
_
_
_
_
1 1 4
0 1 1
0 2 5
_
_
_
_
1 1 4
0 1 1
0 0 3
_
_
which has a contradiction in the last row. So, b is not in the column space of A.
means: solve A
T
Ax = A
T
b.)
Answer
A
T
A =
_
_
1 1
1 0
1 1
_
_
T
_
_
1 1
1 0
1 1
_
_
=
_
3 0
0 2
_
and
A
T
b =
_
_
1 1
1 0
1 1
_
_
T
_
_
4
5
9
_
_
=
_
18
5
_
Now, you can either
Solve
_
3 0
0 2
__

0
1
_
=
_
18
5
_
, which yields
1
= 5/2 = 2.5 and
0
=
18/3 = 6.
You can remember that
_
3 0
0 2
_
1
=
_
1/3 0
0 1/2
_
and compute
_

0
1
_
=
_
1/3 0
0 1/2
_
1
_
18
5
_
=
_
6
2.5
_
.
Answer
Ax =
_
_
1 1
1 0
1 1
_
_
_
6
2.5
_
=
_
_
3.5
6
8.5
_
_
T
b.)
Answer
To do this, you execute the Gram-Schmidt process:
a
0
=
_
_
1
1
1
_
_
and a
1
=
_
_
1
0
1
_
_
.

00
=
_
a
T
0
a
0
=
3
q
0
= a
0
/
00
=
_
_
1
1
1
_
_
/
3 =
_
_
_
3
3
3
3
3
3
_
_
_
.

01
= q
T
0
a
1
=
_
_
_
3
3
3
3
3
3
_
_
_
T _
_
1
0
1
_
_
= 0.
a
1
= a
1
01
q
0
=
_
_
1
0
1
_
_
0
_
_
_
3
3
3
3
3
3
_
_
_
=
_
_
1
0
1
_
_
.

11
=
_
a
T
1
a
1
=
_
_
_
1
0
1
_
_
T
_
_
1
0
1
_
_
=
2.
q
1
= a
1
/
11
=
_
_
1
0
1
_
_
/
2 =
_
_
2/2
0
2/2
_
_
.
Thus,
A = QR =
_
_
3/3
2/2
3/3 0
3/3
2/2
_
_
_
3 0
0
2
_
.
6.8. Exercises 205
Now, we must solve the problem with this factorization by solving
Rx = Q
T
b
or
_
3 0
0
2
__

0
1
_
=
_
_
3/3
2/2
3/3 0
3/3
2/2
_
_
T
_
_
4
5
9
_
_
=
_
6
3
5
2/2
_
.
Thus,
0
= 6
3/
3 = 6 and
1
= (5
2/2)/
2 = 5/2 = 2.5
3. Consider A =
_
_
1 1
1 1
2 4
_
_
and b =
_
_
1
2
7
_
_
.
(a) Find the projection of b onto the column space of A.
Answer
The formular for the project, when A has linearly independent columns, is
A(A
T
A)
1
Ab.
Now
A
T
A =
_
_
1 1
1 1
2 4
_
_
T
_
_
1 1
1 1
2 4
_
_
=
_
6 8
8 18
_
(A
T
A)
1
=
1
(6)(18) (8)(8)
_
18 8
8 6
_
=
1
44
_
18 8
8 6
_
A
T
b =
_
_
1 1
1 1
2 4
_
_
T
_
_
1
2
7
_
_
=
_
11
27
_
(A
T
A)
1
A
T
b =
1
44
_
18 8
8 6
__
11
27
_
=
1
44
_
18
74
_
A(A
T
A)
1
A
T
b =
1
44
_
_
1 1
1 1
2 4
_
_
_
18
74
_
=
1
44
_
_
92
56
260
_
_
which I choose not to simplify...
(b) Split b into u+v where u is in the column space and v is perpendicular (orthogonal)
to that space.
Answer Notice that u = A(A
T
A)
1
Ab so that
v = b A(A
T
A)
1
Ab =
_
_
1
2
7
_
_
1
44
_
_
92
56
260
_
_
.
(c) Which of the four subspaces (C(A), R(A), ^(A), ^(A
T
)) contains v?
Answer
This vector is orthogonal to the column space and therefore is in the left null
space of A: v ^(A
T
).
4. What 2 2 matrix A projects the x-y plane onto the line x +y = 0?
Answer Notice that we rst need a vector that satises the equation: x = 1, y = 1
satises the equation, so all points on the line are in the column space of the matrix
A =
_
1
1
_
.
Now the matrix that projects onto the column space of a matrix A is given by
A(A
T
A)
1
A
T
=
_
1
1
_
(
_
1
1
_
T
_
1
1
_
)
_
1
1
_
T
=
_
1
1
_
(2)
_
1
1
_
= 2
_
1
1
__
1
1
_
T
= 2
_
1 1
1 1
_
=
_
2 2
2 2
_
5. Find the best straight line t to the following data:
y = 2 at t = 1
y = 3 at t = 1
y = 0 at t = 0
y = 5 at t = 2
Answer Let y =
0
+
1
t be the straight line, where
0
and
1
are to be determined.
Then
0
+
1
(1) = 2
0
+
1
( 1) =3
0
+
1
( 0) = 0
0
+
1
( 2) =5
which in matrix notation means that we wish to approximately solve Ac = b where
A =
_
_
_
_
1 1
1 1
1 0
1 2
_
_
_
_
, c =
_

0
1
_
, and b =
_
_
_
_
2
3
0
5
_
_
_
_
.
The solution to this is given by c = (A
T
A)
1
A
T
b.
A
T
A =
_
_
_
_
1 1
1 1
1 0
1 2
_
_
_
_
T
_
_
_
_
1 1
1 1
1 0
1 2
_
_
_
_
=
_
4 2
2 6
_
6.8. Exercises 207
(A
T
A)
1
=
1
(4)(6) (2)(2)
_
6 2
2 4
_
=
1
20
_
6 2
2 4
_
A
T
b =
_
_
_
_
1 1
1 1
1 0
1 2
_
_
_
_
T
_
_
_
_
2
3
0
5
_
_
_
_
=
_
6
15
_
(A
T
A)
1
A
T
b =
1
20
_
6 2
2 4
__
6
15
_
=
1
20
_
6
48
_
which I choose not to simplify.
So, the desired coecients are given by
0
= 6/20 and
1
= 48/20.
Chapter 7
The Singular Value Decomposition
NOTE: I have not proof-read these notes!!!
7.1 The Theorem
Theorem 7.1 (Singular Value Decomposition Theorem) Given A C
mn
there exists uni-
tary U Cmm, unitary V C
nn
, and R
mn
such that A = UV
H
where
=
_

TL
0
0 0
_
with
TL
= diag(
0
, ,
r1
) and
0

1

r1
> 0. The
0
, . . . ,
r1
are known as the singular values of A.
Proof: First, let us observe that if A = 0 (the zero matrix) then the theorem trivially holds:
A = UV
H
where U = I
mm
, V = I
nn
, and =
_
0
_
, so that
TL
is 0 0.
We will prove this for m n, leaving the case where m n as an exercise, employing a
proof by induction on n.
Base case: n = 1. In this case A =
_
a
0
_
where a
0
is its only column. Then
A =
_
a
0
_
=
_
u
0
_
(|a
0
|
2
)
_
1
_
where u
0
= a
0
/|a
0
|
2
. Choose U
1
so that U =
_
u
0
U
1
_
is unitary. Then
A =
_
a
0
_
=
_
u
0
_
(|a
0
|
2
)
_
1
_
=
_
u
0
U
1
_
_
|a
0
|
2
0
_
_
1
_
= UV
H
where
TL
=
0
= |a
0
|
2
and V =
_
1
_
.
Inductive step: Assume the result is true for all matrices A C
mk
with 1 k < n.
Show that it is true for A C
mn
.
209
210 Chapter 7. The Singular Value Decomposition
Let A C
mn
with n 2. Without loss of generality, assume that A ,= 0 (the zero ma-
trix) so that |A|
2
,= 0. Let
0
and v
0
C
n
have the property that |v
0
|
2
= 1 and
0
=
|Av
0
|
2
= |A|
2
. (In other words, v
0
is the vector that maximizes max
x
2
=1
|Ax|
2
.)
Let u
0
= Av
0
/
0
. Choose U
1
and V
1
so that

U =
_
u
0
U
1
_
and

V =
_
v
0
V
1
_
are
unitary. Then
U
H
A
V =
_
u
0
U
1
_
H
A
_
v
0
V
1
_
=
_
u
H
0
Av
0
u
H
0
AV
1
U
H
1
Av
0
U
H
1
AV
1
_
=
_

0
u
H
0
u
0
u
H
0
AV
1
U
H
1
u
0
U
H
1
AV
1
_
=
_

0
w
H
0 B
_
,
where w = V
H
1
A
H
u
0
and B = U
H
1
AV
1
. Now, we will argue that w = 0:
2
0
= |A|
2
2
= |U
H
AV |
2
2
= max
x=0
|U
H
AV x|
2
2
|x|
2
2
= max
x=0
_
_
_
_
_

0
w
H
0 B
_
x
_
_
_
_
2
2
|x|
2
2
_
_
_
_
_

0
w
H
0 B
__

0
w
__
_
_
_
2
2
_
_
_
_
_

0
w
__
_
_
_
2
2
=
_
_
_
_
_

2
0
+w
H
w
Bw
__
_
_
_
2
2
_
_
_
_
_

0
w
__
_
_
_
2
2
(
2
0
+w
H
w)
2
2
0
+w
H
w
=
2
0
+w
H
w.
Thus w = 0 so that

U
H
A
V =
_

0
0
0 B
_
.
By the induction hypothesis, there exists unitary

U C
(m1)(m1)
, unitary

V
C
(n1)(n1)
, and

R
mn
such that B =

U
V
H
where

=
_

TL
0
0 0
_
with
TL
= diag(
1
, ,
r1
) and
1

2

r1
> 0. Also, clearly |A|
2
|B|
2
so
that
0

1
. Now, let
U =

U
_
1 0
0

U
_
, V =

V
_
1 0
0

V
_
, and =
_

0
0
0

_
.
(There are some really tough to see checks in the denition of U, V , and !!) Then
A = UV
H
where U, V , and have the desired properties.
By the Principle of Mathematical Induction the result holds for all matrices
A C
mn
with m n.
7.2. Consequences of the SVD Theorem 211
Exercise 7.2 Prove Theorem 7.1 for m n.
The theorem has profound implications which we describe briey. Let us track what the
eect of Ax = UV
H
x is on vector x. We assume that m n.
Let U =
_
u
0
u
m1
_
and V =
_
v
0
v
n1
_
.
x = V V
H
x =
_
v
0
v
n1
_ _
v
0
v
n1
_
H
x =
_
v
0
v
n1
_
_
_
_
v
H
0
x
.
.
.
v
H
n1
x
_
_
_
=
v
H
0
xv
0
+ + v
H
n1
xv
n1
. This can be interpreted as follows: vector x can be written
in terms of the usual basis of C
n
as
0
e
0
+ +
1
e
n1
or in the orthonormal basis
formed by the columns of V as v
H
0
xv
0
+ +v
H
n1
xv
n1
.
Notice that Ax = A(v
H
0
xv
0
+ + v
H
n1
xv
n1
) = v
H
0
xAv
0
+ + v
H
n1
xAv
n1
so that
we next look at how A transforms each v
i
: Av
i
= UV
H
v
i
= Ue
i
=
i
Ue
i
=
i
u
i
.
Thus, another way of looking at Ax is
Ax = v
H
0
xAv
0
+ +v
H
n1
xAv
n1
= v
H
0
x
0
u
0
+ +v
H
n1
x
n1
u
n1
=
0
u
0
v
H
0
x + +
n1
u
n1
v
H
n1
x
=
_
0
u
0
v
H
0
+ +
n1
u
n1
v
H
n1
_
x.
7.2 Consequences of the SVD Theorem
Throughout this section we will assume that
A = UV
H
is the SVD of A, with U and V unitary and diagonal.
=
_

TL
0
0 0
_
where
TL
= diag(
0
, . . . ,
r1
) with
0

1
. . .
r1
> 0.
U =
_
U
L
U
R
_
with U
L
C
mr
.
V =
_
V
L
V
R
_
with V
L
C
nr
.
Corollary 7.3 A = U
L
TL
V
L
. This is called the reduced SVD of A.
Proof:
A = UV
H
=
_
U
L
U
R
_
_

TL
0
0 0
_
_
V
L
V
R
_
H
= U
L
TL
V
L
.
Corollary 7.4 ((A) = ((U
L
).
Proof:
Let y ((A). Then there exists x C
n
such that y = Ax (by the denition of
y ((A)). But then
y = Ax = U
L

TL
V
H
L
x
. .
z
= U
L
z
so that there exists z C
r
such that y = U
L
z. But that means y ((U
L
).
Let y ((U
L
). Then there exists z C
r
such that y = U
L
z (by the denition of
y ((U
L
)). But then
y = U
L
z = U
L

TL
1
TL
. .
I
z = U
L
TL
V
H
L
V
L
. .
I
1
TL
z = A V
L
1
TL
z
. .
x
= Ax
so that there exists x C
n
such that y = Ax. But that means y ((A).
Corollary 7.5 The rank of A is r.
Proof: The rank of A equals the dimension of ((A) = ((U
L
). But the dimension of ((U
L
)
is clearly r.
Corollary 7.6 ^(A) = ((V
R
).
Proof:
Let x ^(A). Then
x = V V
H
. .
I
x =
_
V
L
V
R
_ _
V
L
V
R
_
H
x =
_
V
L
V
R
_
_
V
H
L
V
H
R
_
x
=
_
V
L
V
R
_
_
V
H
L
x
V
H
R
x
_
= V
L
V
H
L
x +V
R
V
H
R
x.
If we can show that V
H
L
x = 0 then x = V
R
z where z = V
H
R
x and hence x
((V
R
). Assume that V
H
L
x ,= 0. Then
TL
(V
H
L
x) ,= 0 (since
TL
is nonsingular) and
U
L
(
TL
(V
H
L
x)) ,= 0 (since U
L
has linearly independent columns). But that contradicts
the fact that Ax = U
L
TL
V
H
L
x = 0.
7.2. Consequences of the SVD Theorem 213
Let x ((V
R
). Then x = V
R
z for some z C
r
and Ax = U
L
TL
V
H
L
V
R
. .
0
z = 0.
Corollary 7.7 For all x C
n
there exists z ((V
L
) such that Ax = Az.
Proof:
Ax = A V V
H
. .
I
x = A
_
V
L
V
R
_ _
V
L
V
R
_
H
x
= A
_
V
L
V
H
L
x +V
R
V
H
R
x
_
= AV
L
V
H
L
x +AV
R
V
H
R
x
= AV
L
V
H
L
x +U
L
TL
V
H
L
V
R
. .
0
V
H
R
x = A V
L
V
H
L
x
. .
z
.
Alternative proof (which uses the last corollary):
Ax = A V V
H
. .
I
x = A
_
V
L
V
R
_ _
V
L
V
R
_
H
x
= A
_
V
L
V
H
L
x +V
R
V
H
R
x
_
= AV
L
V
H
L
x +A V
R
V
H
R
x
. .
^(A)
= A V
L
V
H
L
x
. .
z
The proof of the last corollary also shows that
Corollary 7.8 Any vector x C
n
can be written as x = z + x
n
where z ((V
L
) and
x
n
^(A) = ((V
R
).
Corollary 7.9 A
H
= V
L
TL
U
H
L
so that ((A
H
) = ((V
L
) and ^(A
H
) = ((U
R
).
The above corollaries are summarized in Figure 7.1.
Corollary 7.10 If A C
mn
has linearly independent columns then A = U
L
TL
V
H
L
where
U
L
has n columns and V
L
is square (and hence unitary).
Proof: This is obvious from the fact that then r = rank(A) = n.
Corollary 7.11 If A C
mn
has linearly independent then A
H
A is invertible (nonsingular)
and (A
H
A)
1
= V
L
(
2
TL
)
1
V
H
L
.
C
n
C
m
^(A) = ((V
R
)
dim n r
((V
L
) = ((A
H
)
dim r
((A) = ((U
L
)
dim r
((U
R
) = ^(A
H
)
dim mr
z
x
n
x = z +x
n
y = Az
y = Ax = U
L
TL
V
H
L
x
y
Figure 7.1: A pictorial description of how x = z + x
n
is transformed by A C
mn
into
y = Ax = A(z + x
n
). What the picture intends to depict is that ((V
L
) and ((V
R
) are
orthogonal complements of each other within C
n
. Similarly, ((U
L
) and ((U
R
) are orthogonal
complements of each other within C
m
. Any vector x can be written as the sum of a vector
z ((V
R
) and x
n
((V
C
) = ^(A).
Proof: Since A has linearly independent columns, A = U
L
TL
V
H
L
is the reduced SVD where
U
L
has n columns and V
L
is unitary. Hence
A
H
A = (U
L
TL
V
H
L
)
H
U
L
TL
V
H
L
= V
L
H
TL
U
H
L
U
L
TL
V
H
L
= V
L
TL
TL
V
H
L
= V
L
2
TL
V
H
L
.
Now, V
L
is unitary and hence invertible and
TL
is diagonal with nonzero diagonal entries.
Thus
_
V
L
2
TL
V
H
L
_
_
V
L
(
2
TL
)
1
V
H
L
)
_
= I, which means A
T
A is invertible and (A
T
A)
1
is as
given.
7.3 Projection onto a column space
Denition 7.12 Let U
L
C
mk
have orthonormal columns. The projection of a vector
y C
m
onto ((Q
L
) is the vector U
L
x that minimizes y U
L
x, where x C
k
.
Theorem 7.13 Let U
L
C
mk
have orthonormal columns. The project of y onto ((Q
L
) is
given by U
L
U
H
L
y.
7.3. Projection onto a column space 215
Proof: The vector U
L
x that we want must satisfy
|U
L
x y|
2
= min
wC
k
|U
L
w y|
2
.
Now, the 2-norm is invariant under multiplication by a unitary matrix so that
|U
L
x y|
2
2
= min
wC
k
|U
L
w y|
2
2
= min
wC
k
_
_
_
_
U
L
U
R
_
H
(U
L
w y)
_
_
_
2
2
(since the two norm is preserved)
= min
wC
k
_
_
_
_
_
U
H
L
U
H
R
_
(U
L
w y)
_
_
_
_
2
2
= min
wC
k
_
_
_
_
_
U
H
L
U
H
R
_
U
L
w
_
U
H
L
U
H
R
_
y
_
_
_
_
2
2
= min
wC
k
_
_
_
_
_
U
H
L
U
L
w
U
H
R
U
L
w
_
_
U
H
L
y
U
H
R
y
__
_
_
_
2
2
= min
wC
k
_
_
_
_
_
w
0
_
_
U
H
L
y
U
H
R
y
__
_
_
_
2
2
= min
wC
k
_
_
_
_
_
w U
H
L
y
U
H
R
y
__
_
_
_
2
2
= min
wC
k
_
_
_
w U
H
L
y
_
_
2
2
+
_
_
U
H
R
y
_
_
2
2
_
(since
_
_
_
_
_
u
v
__
_
_
_
2
2
= |u|
2
2
+|v|
2
2
)
=
_
min
wC
k
_
_
w U
H
L
y
_
_
2
2
_
+
_
_
U
H
R
y
_
_
2
2
.
This is minimized when w = U
H
L
y so that the vector that is closest to y in the space spanned
by U
L
is given by x = U
L
U
H
L
y.
Corollary 7.14 Let A C
mn
and A = U
L
TL
V
H
L
be its reduced SVD. Then the projection
of y C
m
onto ((A) is given by U
L
U
H
L
y.
Proof: This follows immediately from the fact that ((A) = ((U
L
).
Corollary 7.15 Let A C
mn
have linearly independent columns. Then the projection of
y C
m
onto ((A) is given by A(A
H
A)
1
A
H
y.
Proof: Let A = U
L
TL
V
H
L
be the reduced SVD of A. Then the projection of y C
m
onto
((A) is given by U
L
U
H
L
y by a previous theorem. From a previous corrolary we also know
that A
H
A is nonsingular and that (A
H
A)
1
= V
L
(
2
TL
)
1
V
H
L
. Now
A(A
H
A)
1
A
H
y = U
L
TL
V
H
L
V
L
_
2
TL
_
1
V
H
L
(U
L
TL
V
H
L
)
H
y
= U
L
TL
V
H
L
V
L
. .
I
1
TL
1
TL
V
H
L
V
L
. .
I
TL
U
H
L
y
= U
L
TL
1
TL
1
TL
TL
U
H
L
y = U
L
U
H
L
y.
Hence the projection of y onto ((A) is given by A(A
H
A)
1
A
H
y.
Denition 7.16 Let A have linearly independent columns. Then (A
H
A)
1
A
H
is called the
pseudo-inverse of matrix A.
Remark 7.17
The matrix P
A
= U
L
U
H
L
projects a vector y onto the column space of A: P
A
y =
U
L
U
H
L
y.
If A has linearly independent columns, then P
A
= A(A
H
A)
1
A
H
projects a vector
y onto the column space of A: P
A
y = A(A
H
A)
1
A
H
y = U
L
U
L
y.
The component of a vector y orthogonal to the column space of A is given by
y P
A
y = (I P
A
)y. Thus, the matrix P
A
= (I P
A
) projects a given vector y
onto the space orthogonal to the column space of A.
7.4 Low-rank Approximation of a Matrix
Theorem 7.18 Let A C
mn
have SVD A = UV
H
and assume A has rank r. Partition
U =
_
U
L
U
R
_
, V =
_
V
L
V
R
_
, and =
_

TL
0
0
BR
_
,
where U
L
C
ms
, V
L
C
ns
, and
TL
R
ss
with s r. Then B = U
L
TL
V
H
L
is the
matrix in C
mn
closest to A in the following sense:
|A B|
2
= inf
C C
mn
rank(C) s
|A C|
2
=
s
.
7.4. Low-rank Approximation of a Matrix 217
Proof: First, if B is as dened, then clearly |A B|
2
=
s
:
|A B|
2
= |U
H
(A B)V |
2
= |U
H
AV U
H
BV |
2
=
_
_
_
_
U
L
U
R
_
H
B
_
V
L
V
R
_
_
_
_
2
=
_
_
_
_
_

TL
0
0
BR
_
_

TL
0
0 0
__
_
_
_
2
=
_
_
_
_
_
0 0
0
BR
__
_
_
_
2
= |
BR
|
2
=
s
Assume that C has rank t s and |AC|
2
< |AB|
2
. We will show that this leads to
a contradiction. The null space of C has rank at least ns since rank(^(C))+rank(C) = n.
Now, if x ^(C) with |x|
2
= 1 then |Ax|
2
= |(AC)x|
2
|AC|
2
|x|
2
= |AC|
2
<
s
. Thus, the subspace of vectors that satisfy that |Ax|
2
<
s
has rank at least n s.
Partition U =
_
u
0
u
m1
_
and V =
_
v
0
v
n1
_
. Then |Av
j
|
2
= |
j
u
j
|
2
=
j

s
for j = 0, . . . , s. In other words, the subspace of vectors that satisfy that |Ax|
2

s
has rank greater than or equal s + 1. Since both these subspaces are subspaces of C
n
and
their ranks add up to more than n there must be at least one nonzero vector z that satises
both |Az|
2
<
s
and |Az|
2

s
, which is a contradiction.
The above theorem tells us how to pick the best approximation with given rank to a
given matrix.
Chapter 8
QR factorization
8.1 Classical Gram-Schmidt process
A classic problem in linear algebra is the computation of an orthonormal basis for the space
spanned by a given set of linearly independent vectors: Given a linearly independent set of
vectors a
0
, , a
n1
C
m
we would like to nd a set of mutually orthonormal vectors
q
0
, , q
n1
C
m
so that
Span(a
0
, , a
n1
) = Span(q
0
, , q
n1
).
This problem is equivalent to the problem of given a matrix A =
_
a
0
a
n1
_
computing
a matrix Q =
_
q
0
q
n1
_
with Q
H
Q = I so that ((A) = ((Q).
Notice that we have already shows that the SVD will give us the required Q. However,
the SVD for now is not a practical algorithm and later we will see that even the practical
computation of the SVD is expensive.
The classic algorithm for computing the desired q
0
, , q
n1
is known as the Gram-
Schmidt process and given by Alg. 8.1.
Algorithm 8.1 Gram-Schmidt Given a set of linearly independent vectors a
0
, , a
n1

C
m
the following algorithm computes the set of mutually orthonormal vectors q
0
, , q
n1

C
m
so that Span(a
0
, . . . , a
n1
) = Span(q
0
, . . . , q
n1
):
for j = 0, . . . , n 1
a
j
:= a
j
j1
i=0
q
H
i
a
j
q
i
q
j
:= a
j
/| a
j
|
2
endfor
Here a
j
denotes the component of a
j
that is perpendicular to a
0
, . . . , a
j1
(and therefore
perpendicular to q
0
, . . . , q
j1
).
219
220 Chapter 8. QR factorization
Exercise 8.2 Show that the relation between the vectors a
j
and q
j
in the above algorithm
can alternatively be stated as
_
a
0
a
1
a
n1
_
=
_
q
0
q
1
q
n1
_
_
_
_
_
_
00

01

0(n1)
0
11

1(n1)
.
.
.
.
.
.
.
.
.
.
.
.
0 0
(n1)(n1)
_
_
_
_
_
where
q
H
i
q
j
=
_
1 if i = j
0 otherwise
and
ij
=
_
_
_
q
H
i
a
j
when i < j
|a
j
j1
i=0

ij
q
i
|
2
when i = j
0 otherwise
Thus, this relationship between the linearly independent vectors a
j
and the orthonormal
vectors q
j
can be concisely stated as
A = QR
where A and Q are mn matrices (m n), Q
H
Q = I, and R is an n n upper triangular
matrix.
Theorem 8.3 Let A, Q C
mn
, and R C
nn
with Q
H
Q = I and R upper triangular with
no zero on the diagonal. Then the rst k columns of A span the same space as the rst k
columns of Q.
Proof: Partition
A
_
A
L
A
R
_
, Q
_
Q
L
Q
R
_
, and R
_
R
TL
R
TR
0 R
BR
_
,
where A
L
, Q
L
C
mk
and R
TL
C
kk
. Then R
TL
is nonsingular (since it is upper triangular
and has no zero on its diagonal), Q
H
L
Q
L
= I, and A
L
= Q
L
R
TL
.
We rst show that ((A
L
) ((Q
L
). Let y ((A
L
). Then there exists x C
k
such
that y = A
L
x. But then y = Q
L
R
TL
x
. .
z
so that y ((Q
L
). Hence ((A
L
) ((Q
L
).
We next show that ((Q
L
) ((A
L
). Let y ((Q
L
). Then there exists z C
k
such
that y = Q
L
z. But then y = A
L
R
1
TL
z
. .
x
so that y ((A
L
). Hence ((Q
L
) ((A
L
).
We conclude that ((Q
L
) = ((A
L
) which is another way of saying that the rst k columns of
A span the same space as the rst k columns of Q.
8.1. Classical Gram-Schmidt process 221
Theorem 8.4 Let A C
mn
have linearly independent columns. Then there exist Q
C
mn
with Q
H
Q = I and upper triangular R with no zeroes on the diagonal such that
A = QR.
Proof: (by induction)
Base case: n = 1. In this case A =
_
a
0
_
where a
0
is its only column. Since A has
linearly independent columns, a
0
,= 0. Then
A =
_
a
0
_
=
_
q
0
_
. .
Q
(
00
)
..
R
where
00
= |a
0
|
2
and q
0
= a
0
/
00
.
Inductive step: Assume that the result is true for all A C
m(n1)
with linearly
independent columns. We will show it is true for A C
mn
with linearly independent
columns.
Let A C
mn
. Partition A
_
A
0
a
1
_
. By the induction hypothesis, there exist
Q
0
and R
00
such that Q
H
0
Q
0
= I, R
00
is upper triangular with no zeroes on its diagonal
and A
0
= Q
0
R
00
. Now, compute r
01
= Q
H
0
a
0
and a
1
= a
1
Q
0
r
01
, the component of
a
1
orthogonal to ((Q
0
). a
1
,= 0 by virtue of the fact that the columns of A are linearly
independent. Let
11
= |a
1
|
2
and q
1
= a
1
/
11
. Then
_
Q
0
q
1
_
. .
Q
_
R
00
r
01
0
11
_
. .
R
=
_
Q
0
R
00
Q
0
r
01
+q
1
11
_
=
_
A
0
Q
0
r
01
+a
1
_
=
_
A
0
a
1
_
= A
By the Principle of Mathematical Induction the result holds for all matrices
A C
mn
with m n.
The proof motivates the algorithm in Fig. 8.1, which overwrites A with Q.
An alternative of motivating that algorithm is as follows: Consider A = QR. Partition
A, Q, and R to yield
_
A
0
a
1
A
2
_
=
_
Q
0
q
1
Q
2
_
_
_
R
00
r
01
R
02
0
11
r
T
12
0 0 R
22
_
_
.
Algorithm: [Q, R] := QR(A)
Partition A
_
A
L
A
R
_
, Q
_
Q
L
Q
R
_
, R
_
R
TL
R
TR
0 R
BR
_
where A
L
and Q
L
has 0 columns and R
TL
is 0 0
while n(A
L
) ,= n(A) do
Repartition
_
A
L
A
R
_
_
A
0
a
1
A
2
_
,
_
Q
L
Q
R
_
_
Q
0
q
1
Q
2
_
,
_
R
TL
R
TR
0 R
BR
_
_
_
R
00
r
01
R
02
0
11
r
T
12
0 0 R
22
_
_
where a
1
and q
1
are columns,
11
is a scalar
r
01
:= Q
T
0
a
1
a
1
:= a
1
Q
0
r
01
11
:= |a
1
|
2
q
1
:= a
1
/
11
Continue with
_
A
L
A
R
_
_
A
0
a
1
A
2
_
,
_
Q
L
Q
R
_
_
Q
0
q
1
Q
2
_
,
_
R
TL
R
TR
0 R
BR
_
_
_
R
00
r
01
R
02
0
11
r
T
12
0 0 R
22
_
_
endwhile
Figure 8.1: Algorithm for computing the QR factorization via the Gram-Schmidt process.
Assume that Q
0
and R
00
have already been computed. Since corresponding columns of both
sides must be equal, we nd that
a
1
= Q
0
r
01
+q
1
11
. (8.1)
Also, Q
H
0
Q
0
= I and Q
H
0
q
1
= 0, since the columns of Q are mutually orthonormal. Hense
Q
H
0
a
1
= Q
H
0
Q
0
r
01
+ Q
H
0
q
1
11
= r
01
. This indicates how r
01
can be computed from Q
0
and
a
1
, which are already known. Next, a
1
= a
1
Q
0
r
01
is computed from (8.1). This is the
component of a
1
that is perpendicular to the columns of Q
0
. Since
11
q
1
= a
1
and we know
that q
1
has unit length, we now compute
11
= |a
1
|
2
and q
1
= a
1
/
11
, which completes a
derivation of the algorithm in Fig. 8.1. In that algorithm, A is overwritten by Q.
As a third way to understand the algorithm, the table in Figure 8.2 link the observations
in Exercise 8.2 to the algorithm in Fig. 8.1.
Exercise 8.5 Let A have linearly independent columns and let A = QR be a QR factoriza-
tion of A. Partition
A
_
A
L
A
R
_
, Q
_
Q
L
Q
R
_
, and R
_
R
TL
R
TR
0 R
BR
_
,
where A
L
and Q
L
have k columns and R
TL
is k k. Show that
8.2. Modied Gram-Schmidt process 223
In Exercise 8.2 In Fig. 8.1
a
j
a
1
q
j
q
1
_
_
_
0j
.
.
.
(j1),j
_
_
_
r
01
jj

11
ij
= q
H
i
a
j
if i < j r
01
:= Q
H
0
a
1
a
j
j1
i=0

ij
q
j
a
1
:= a
j
Q
0
r
01
jj
= |a
j
j1
i=0

ij
q
j
|
2

11
:= |a
j
|
2
Figure 8.2: Link between Exercise 8.2 and the algorithm in Figure 8.1.
1. A
L
= Q
L
R
TL
: Q
L
R
L
equal the QR factorization of A
L
,
2. ((A
L
) = ((Q
L
): the rst k columns of Q equal an orthonormal basis for the space
spanned by the rst k columns of A.
3. R
TR
= Q
H
L
A
R
,
4. (A
R
Q
L
R
TR
)
H
Q
L
= 0,
5. A
R
Q
L
R
TR
= Q
R
R
BR
, and
6. ((A
R
Q
L
R
TR
) = ((Q
R
).
8.2 Modied Gram-Schmidt process
We start by considering the following problem: Given y C
m
and Q C
mn
with orthonor-
mal columns, compute y
, the component of y orthogonal to the columns of Q. This is a

key step in the Gram-Schmidt process in Figure 8.1.
Mathematically, the solution is given by
y
= (I QQ
H
)y = y QQ
H
y
= y
_
q
0
q
n1
_ _
q
0
q
n1
_
H
y
= y
_
q
0
q
n1
_
_
_
_
q
H
0
y
.
.
.
q
H
n1
y
_
_
_
= y
_
q
H
0
yq
0
+ +q
H
n1
yq
n1
_
= y q
H
0
yq
0
q
H
n1
yq
n1
.
[y
, r] = Proj orthog to Q
CGS
(Q, y) [y
, r] = Proj orthog to Q
MGS
(Q, y)
(used by classical Gram-Schmidt) (used by modied Gram-Schmidt)
y
= y y
= y
for i = 0, . . . , n 1 for i = 0, . . . , n 1
i
:= q
H
i
y
i
:= q
H
i
y
:= y
i
q
i
y
:= y
i
q
i
endfor endfor
Figure 8.3: Two dierent ways of computing y
= (I QQ
H
)y, the component of y orthog-
onal to ((Q).
Algorithm: [AR] := Gram-Schmidt(A) (overwrites A with Q)
Partition A
_
A
L
A
R
_
, R
_
R
TL
R
TR
0 R
BR
_
where A
L
has 0 columns and R
TL
is 0 0
while n(A
L
) ,= n(A) do
Repartition
_
A
L
A
R
_
_
A
0
a
1
A
2
_
,
_
R
TL
R
TR
0 R
BR
_
_
_
R
00
r
01
R
02
0
11
r
T
12
0 0 R
22
_
_
where a
1
and q
1
are columns,
11
is a scalar
CGS MGS MGS (alternative)
r
01
:= A
T
0
a
1
a
1
:= a
1
A
0
r
01
[a
1
, r
01
] = Proj orthog to Q
MGS
(A
0
, a
1
)
11
:= |a
1
|
2

11
:= |a
1
|
2

11
:= |a
1
|
2
a
1
:= a
1
/
11
q
1
:= a
1
/
11
a
1
:= a
1
/
11
r
T
12
:= a
H
1
A
2
A
2
:= A
2
a
1
r
T
12
Continue with
_
A
L
A
R
_
_
A
0
a
1
A
2
_
,
_
R
TL
R
TR
0 R
BR
_
_
_
R
00
r
01
R
02
0
11
r
T
12
0 0 R
22
_
_
endwhile
Figure 8.4: Left: Classical Gram-Schmidt algorithm. Middle: Modied Gram-Schmidt al-
gorithm. Right: Modied Gram-Schmidt algorithm where every time a new column of Q,
q
1
is computed the component of all future columns in the direction of this new vector are
subtracted out. This last algorithm is not (yet) explained in the text.
This can be computed by the algorithm in Figure 8.3 (left) and is used by what is often
called the Classical Gram-Schmidt (CGS) algorithm given in Figure 8.4.
An alternative algorithm for computing y
is given in Figure 8.3 (right) and is used

by the Modied Gram-Schmidt (MGS) algorithm also given in Figure 8.4. This approach
is mathematically equivalent to the algorithm to its left for the following reason. Letting
Q
i
=
_
q
0
q
i1
_
for a given i the current contents of y
equals y Q
i
Q
H
i
y (the
component of y perpendicular to q
0
, . . . , q
i1
. Then
y
q
H
i
y
q
i
= y Q
i
Q
H
i
y q
H
i
(y Q
i
Q
H
i
y)q
i
= y Q
i
Q
H
i
y (q
H
i
y q
H
i
Q
i
. .
0
Q
H
i
y)q
i
= y Q
i
Q
H
i
y q
H
i
yq
i
= y Q
i+1
Q
H
i+1
y
Thus, both of the algorithms in Figure 8.3 in exact arithmetic compute the same nal result.
In practice, in the presense of round-o error, MGS is more accurate than CGS. We will
(hopefully) get into detail about this later, but for now we will illustrate it. When storing
real (or complex for that matter) valued numbers in a computer, a limited accuracy can be
maintained, leading to round-o error when a number is stored and/or when computation
with numbers are performed. The machine epsilon or unit roundo error is dened as the
largest positive number
mach
such that the stored value of 1 +
mach
is rounded to 1. Now,
let us consider a computer where the only error that is ever incurred is when 1 +
mach
is
computed and rounded to 1. Let =
mach
and consider the matrix
A =
_
_
_
_
1 1 1
0 0
0 0
0 0
_
_
_
_
=
_
a
0
a
1
a
2
_
(8.2)
In Figure 8.5 we execute the CGS algorithm. It yields the approximate matrix
Q
_
_
_
_
_
1 0 0

2
2

2
2
0
2
2
0
0 0
2
2
_
_
_
_
_
If we now ask the question Are the columns of Q orthonormal? we can check if Q
H
Q = I.
The answer:
Q
H
Q =
_
_
_
_
_
1 0 0

2
2

2
2
0
2
2
0
0 0
2
2
_
_
_
_
_
H
_
_
_
_
_
1 0 0

2
2

2
2
0
2
2
0
0 0
2
2
_
_
_
_
_
=
_
_
_
1 +
mach

2
2

2
2

2
2
1
1
2
2
2

1
2
1
_
_
_
.
Similarly, in Figure 8.6 we execute the MGS algorithm. It yields the approximate matrix
Q
_
_
_
_
_
1 0 0

2
2
6
6
0
2
2

6
6
0 0
2
6
6
_
_
_
_
_
0,0
= |a
0
|
2
=
1 +
2
=
1 +
mach
which is rounded to 1.
q
0
= a
0
/
0,0
=
_
_
_
_
1
0
0
_
_
_
_
/1 =
_
_
_
_
1
0
0
_
_
_
_
0,1
= q
H
0
a
1
= 1
a
1
= a
1
0,1
q
0
=
_
_
_
_
0
0
_
_
_
_
1,1
= |a
1
|
2
=
2
2
=
2
q
1
= a
1
/
1,1
=
_
_
_
_
0
0
_
_
_
_
/(
2) =
_
_
_
_
0
2
2
2
2
0
_
_
_
_
0,2
= q
H
0
a
2
= 1
1,2
= q
H
1
a
2
= 0
a
2
= a
2
0,2
q
0
1,2
q
1
=
_
_
_
_
0
_
_
_
_
2,2
= |a
2
|
2
=
2
2
=
2
q
2
= a
2
/
2,2
=
_
_
_
_
0
_
_
_
_
/(
2) =
_
_
_
_
0
2
2
0
2
2
_
_
_
_
Figure 8.5: Execution of the CGS algorithm on the example in (8.2).
0,0
= |a
0
|
2
=
1 +
2
=
1 +
mach
which is rounded to 1.
q
0
= a
0
/
0,0
=
_
_
_
_
1
0
0
_
_
_
_
/1 =
_
_
_
_
1
0
0
_
_
_
_
0,1
= q
H
0
a
1
= 1
a
1
= a
1
0,1
q
0
=
_
_
_
_
0
0
_
_
_
_
1,1
= |a
1
|
2
=
2
2
=
2
q
1
= a
1
/
1,1
=
_
_
_
_
0
0
_
_
_
_
/(
2) =
_
_
_
_
0
2
2
2
2
0
_
_
_
_
0,2
= q
H
0
a
2
= 1
a
2
= a
2
0,2
q
0
=
_
_
_
_
0
_
_
_
_
1,2
= q
H
1
a
2
=
2
2

a
2
= a
2

1,2
q
1
=
_
_
_
_
0
_
_
_
_
2,2
= |a
2
|
2
=
_
6
4
2
=
6
2

q
2
= a
2
/
2,2
=
_
_
_
_
0
_
_
_
_
/(
6
2
) =
_
_
_
_
_
0
6
6
6
6
2
6
6
_
_
_
_
_
Figure 8.6: Execution of the MGS algorithm on the example in (8.2).
First iteration
0,0
= |a
0
|
2
=
1 +
2
=
1 +
mach
which is rounded to 1. Com-
puted by
11
in Fig. 8.4.
q
0
= a
0
/
0,0
=
_
_
_
_
1
0
0
_
_
_
_
/1 =
_
_
_
_
1
0
0
_
_
_
_
Computed by a
1
:= a
1
/
11
_

0,1

0,2
_
= q
H
0
_
a
1
a
2
_
=
_
1 1
_
Computed by r
T
12
:= a
H
1
A
2
_
a
1
a
2
_
=
_
a
1
a
2
_
q
0
_

0,1

0,2
_
=
_
_
_
_
1 1
0 0
0
0
_
_
_
_
_
_
_
_
1
0
0
_
_
_
_
_
1 1
_
=
_
_
_
_
0 0

0
0
_
_
_
_
Computed by A
2
:= A
2
a
1
r
T
12
Second iteration
a
1
is now already in a
1
1,1
= |a
1
|
2
=
2
2
=
2 Computed by
11
:= |a
1
|
2
.
q
1
= a
1
/
1,1
=
_
_
_
_
0
0
_
_
_
_
/(
2) =
_
_
_
_
0
2
2
2
2
0
_
_
_
_
Computed by a
1
:= a
1
/
11
.
1,2
= q
H
1
a
2
=
2
2
Computed by r
T
12
:= a
H
1
A
2
.
a
2
= a
2
1,2
q
1
=
_
_
_
_
0
_
_
_
_
Computed by A
2
:= A
2
a
1
r
T
12
.
Third iteration
a
1
is now already in a
2
2,2
= |a
2
|
2
=
_
6
4
2
=
6
2
Computed by
11
:= |a
1
|
2
.
q
2
= a
2
/
2,2
=
_
_
_
_
0
_
_
_
_
/(
6
2
) =
_
_
_
_
_
0
6
6
6
6
2
6
6
_
_
_
_
_
Computed by a
1
:= a
1
/
11
.
Figure 8.7: Execution of the alternative MGS algorithm on the example in (8.2).
If we now ask the question Are the columns of Q orthonormal? we can check if Q
H
Q = I.
The answer:
Q
H
Q =
_
_
_
_
_
1 0 0

2
2
6
6
0
2
2

6
6
0 0
2
6
6
_
_
_
_
_
H
_
_
_
_
_
1 0 0

2
2
6
6
0
2
2

6
6
0 0
2
6
6
_
_
_
_
_
=
_
_
_
1 +
mach

2
2

6
6

2
2
1 0
6
6
0 1
_
_
_
,
which shows that for this example MGS yields better orthogonality than does CGS. Of
course, a thorough analysis is needed to truly explain it.
An alternative way of presenting the MGS algorithm is as follows. Consider A = QR
and let

A represent the original contents of matrix A, since we are going to overwrite A with
Q. Partition
A
_
A
L
A
R
_
, Q
_
Q
L
Q
R
_
, and R
_
R
TL
R
TR
0 R
BR
_
.
Notice that
_
A
L
A
R
_
, =
_
Q
L
Q
R
_
,
_
R
TL
R
TR
0 R
BR
_
.
so that
A
L
= Q
L
R
TL
and A
R
= Q
L
R
TR
+Q
R
R
BR
or, equivalently,
A
L
= Q
L
R
TL
and A
R
Q
L
R
TR
= Q
R
R
BR
.
Now, each column of A
R
Q
L
R
TR
equals the component of the corresponding component
column of A
R
orthogonal to the columns of Q
R
.
So, let us assume that matrix A currently contains
_
A
L
A
R
_
=
_
Q
L

A
R
Q
L
R
TR
_
while R
TL
and R
TR
have also already been computed. This means that if we repartition to
expose the next column
_
A
L
A
R
_
_
A
0
a
1
A
2
_
,
_
Q
L
Q
R
_
_
Q
0
q
1
Q
2
_
,
_
R
TL
R
TR
0 R
BR
_
_
_
R
00
r
01
R
02
0
11
r
T
12
0 0 R
22
_
_
we get that
_
A
0
a
1
A
2
_
=
_
Q
0
a
1
Q
0
r
01

A
2
Q
0
R
02
_
.
In other words, a
1
has already been updated to hold a
1
so we can compute
11
:= |a
1
|
2
to
set
11
to |a
1
|
2
. Also, a
1
:= a
1
/
11
then computes q
0
overwriting a
1
with the result. Now,
A
2
equals the rest of the columns with the components in the directions of the columns of
Q
0
already subtracted out. What we would like to do is next subtract out the components
in the direction of q
1
(which has overwritten a
1
). This is accomplished by rst computing
r
T
12
:= a
T
1
A
2
after which the rest of the columns are updated by A
2
:= A
2
a
1
r
T
12
. This
alternative MGS algorithm, applied to the example in (??) is illustrated in Figure 8.7. If one
compares Figures 8.6 and 8.7 one notices that only the order in which computations occurs
has changed.
8.3. Householder QR factorization 231
8.3 Householder QR factorization
A fundamental problem to avoid in numerical codes is the situation where one starts with
large values and one ends up with small values with large relative errors in them. This
is known as catastrophic cancelation. The Gram-Schmidt algorithms can inherently fall
victim to this: column a
j
is successively reduced in length as components in the directions
of q
0
, . . . , q
j1
are subtracted, leaving a small vector if a
j
was almost in the span of the rst
j columns of A. Application of a unitary transformation to a matrix or vector inherently
preserves length. Thus, it would be benecial if the QR factorization can be implementated
as the successive application of unitary transformations. The Householder QR factorization
accomplishes this.
The rst fundamental insight is that the product of unitary matrices is itself unitary.
Thus, if, given A C
mn
one could nd a sequence of unitary matrices H
0
, . . . , H
n1
such
that
H
n1
, H
0
A =
_
R
0
_
,
where R is upper triangular, then
H
n1
, H
0
A = H
0
H
n1
. .
Q
_
R
0
_
= Q
_
R
0
_
=
_
Q
L
Q
R
__
R
0
_
= Q
L
R,
where Q
L
equals the rst n columns of A. Then A = Q
L
R is the QR factorization of A. The
second fundamental insight is that the desired unitary transformations H
0
, . . . , H
n1
can be
computed and applied cheaply.
8.3.1 Householder transformations (reectors)
In this section we discuss Householder transformations, also referred to as reectors.
Denition 8.6 Let u C
n
be a vector of unit length (|u|
2
= 1). Then H = I 2uu
H
is
said to be a reector or Householder transformation.
We observe:
Any vector z that is perpendicular to u is left unchanged:
(I 2uu
H
)z = z 2u(u
H
z) = z.
Any vector x can be written as x = z +u
H
xu where z is perpendicular to u and u
H
xu
is the component of x in the direction of u. Then
(I 2uu
H
)x = (I 2uu
H
)(z +u
H
xu) = z +u
H
xu 2u u
H
z
..
0
2uu
H
u
H
xu
= z +u
H
xu 2u
H
x u
H
u
..
0
u = z u
H
xu.
u
z
H
x = z + u x u
u x u
H
u x u
H
(I 2 u u )x
H
u
v = x y
y
x
u
(I 2 u u )x
H
Figure 8.8: ??
This can be interpreted as follows: The space perpendicular to u acts as a mirror: any
vector in that space (along the mirror) is not reected, while any other vector has the
component that is orthogonal to the space (the component outside, orthogonal to, the mirror)
reversed in direction, as illustrated in Fig. 8.8. Notice that a reection preserves the length
of the vector.
Exercise 8.7 Show that if H is a reector, then
HH = I (reecting the reection of a vector results in the original vector),
H = H
H
, and
H
H
H = I (a reection is a unitary matrix and thus preserves the norm).
Next, let us ask the question of how to reect a given x C
n
into another vector y C
n
,
where |x|
2
= |y|
2
. In other words, how do we compute vector u so that (I 2uu
H
)x = y.
From our discussion above, we need to nd a vector u that is perpendicular to the space
with respect to which we will reect. From Fig. 8.8(right) we notice that the vector from y
to x, v = x y, is perpendicular to the desired space. Thus, u must equal a unit vector in
the direction v: u = v/|v|
2
.
Remark 8.8 In subsequent discussion we will prefer to give Householder transformations
as I uu
H
/, where = u
H
u/2 so that u needs no longer be a unit vector, just a direction.
The reason for this will become obvious later.
In the next subsection, we will wish to nd a Householder transformation, H, for a vector,
x, such that Hx equals a vector with zeroes below the top element.
Let us rst discuss how to nd H in the case where x R
n
. We will rst to nd v
so that (I
2
v
T
v
uu
T
)x = |x|
2
e
0
. Now, y = |x|
2
e
0
in our previous discussion so that
v = x y = x |x|
2
e
0
.
Exercise 8.9 Show that if x R
n
and v = x |x|
2
e
0
then (I
1
vv
T
)x = |x|
2
e
0
.
Continuing with the case where x R
n
, in practice, one chooses v = x + sign(
1
)|x|
2
e
0
where
1
denotes the rst element of x. The reason is as follows: the rst element of v,
1
,
will be
1
=
1
|x|
2
. If
1
is positive and |x|
2
is almost equal to
1
, then
1
|x|
2
is
a small number and if there is error in
1
and/or |x|
2
, this error becomes large relative to
the result
1
|x|
2
. Regardless of whether
1
is positive or negative (or complex valued),
this can be avoided by choosing x =
1
+ sign(
1
)|x|
2
e
0
.
Next, let us work out the complex case, dealing explicitly with x as a vector that consists
of its rst element,
1
, and the rest of the vector, x
2
: More precisely, partition
x =
_

1
x
2
_
,
where
1
equals the rst element of x and x
2
is the rest of x. Then we will wish to nd a
Householder vector u =
_
1
u
2
_
so that
_
I
1
_
1
u
2
__
1
u
2
_
H
_
_

1
x
2
_
=
_
_|x|
2
0
_
.
Here _ denotes a complex scalar on the complex unit circle
1
Notice that this means that y
in the previous discussion equals the vector
_
_|x|
2
0
_
so that the direction of u is given
by
v =
_

1
_|x|
2
x
2
_
.
We now wish to normalize this vector so its rst entry equals 1:
u =
v
|v|
2
=
1
1
_|x|
2
_

1
_|x|
2
x
2
_
=
_
1
x
2
/
1
_
.
where
1
=
1
_|x|
2
. (Note that if
1
= 0 then u
2
can be set to 0.)
Exercise 8.10 Verify that
_
I
1
_
1
u
2
__
1
u
2
_
H
_
_

1
x
2
_
=
_

0
_
where = u
H
u/2 = (1 +u
H
2
u
2
)/2 and = _|x|
2
.
Hint: = [rho[
2
= |x|
2
2
since H preserves the norm. Also, |x|
2
2
= [
1
[
2
+ |x
2
|
2
2
and
_
z
z
= fracz[z[.
1
For those who have problems thinking in terms of complex numbers, pretend this entire discussion deals
with real numbers and treat _ as .
Algorithm:
__

u
2
_
,
_
= Housev
__

1
x
2
__
2
:= |x
2
|
2
:=
_
_
_
_
_

1
2
__
_
_
_
2
(= |x|
2
)
= sign(
1
)|x|
2
:= sign(
1
)
1
=
1
+ sign(
1
)|x|
2

1
:=
1
u
2
= x
2
/
1
u
2
:= x
2
/
1
2
=
2
/[
1
[(= |u
2
|
2
)
= (1 + u
H
2
u
2
)/2 = (1 +
2
2
)/2
Figure 8.9: Computing the Householder transformation. Left: simple formulation. Right:
ecient computation. Note: I have not completely double-checked these formulas
for the complex case. They work for the real case.
The choice _ is important when computer arithmetic is used and roundo error is
a concern. In practice one chooses
1
=
1
+ sign(
1
)|x|
2
to avoid creating catastrophic
cancelation so that _ = sign(
1
) (_ points in the opposite direction, in the complex
plane, as
1
). If
1
is real valued, this simply means that sign(
1
) equals the sign of
1
in
the usual sense. The reason is as follows (again restricting our discussion to real numbers):
If
1
is positive and |x|
2
is almost equal to
1
, then
1
|x|
2
is a small number and if
there is error in
1
and/or |x|
2
, this error becomes large relative to the result
1
|x|
2
.
Regardless of whether
1
is positive, negative, or complex valued, this can be avoided by
using
1
=
1
+ sign(
1
)|x|
2
. It is not hard to see that sign(
1
) =
1
/[
1
[.
Let us introduce the notation
__

u
2
_
,
_
:= Housev
__

1
x
2
__
as the computation of the above mentioned vector u
2
, and scalars and , from vector x.
We will use the notation H(x) for the transformation I
1
uu
H
where u and are computed
by Housev(x).
8.3.2 Algorithms
Let A be an m n with m n. We will now show how to compute A QR, the QR
factorization, as a sequence of Householder transformations applied to A, which eventually
zeroes out all elements of that matrix below the diagonal.
In the rst iteration, we partition
A
_

11
a
T
12
a
21
A
22
_
.
Let
__

11
u
21
_
,
1
_
= Housev
_

11
a
21
_
be the Householder transform computed from the rst column of A. Then applying this
Householder transform to A yields
_

11
a
T
12
a
21
A
22
_
:=
_
I
1
1
_
1
u
2
__
1
u
2
_
H
_
_

11
a
T
12
a
21
A
22
_
=
_

11
a
T
12
w
T
12
0 A
22
u
21
w
T
12
_
,
where w
T
12
= (a
T
12
+u
H
21
A
22
)/
1
. Computation of a full QR factorization of A will now proceed
with the updated matrix A
22
.
Now let us assume that after k iterations of the algorithm matrix A contains
A
_
R
TL
R
TR
0 A
BR
_
=
_
_
R
00
r
01
R
02
0
11
a
T
12
0 a
21
A
22
_
_
,
where R
TL
and R
00
are k k upper triangular matrices. Let
__

11
u
21
_
,
1
_
= Housev
_

11
a
21
_
.
and update
A :=
_
_
_
I 0
0
_
I
1
1
_
1
u
2
__
1
u
2
_
H
_
_
_
_
_
_
R
00
r
01
R
02
0
11
a
T
12
0 a
21
A
22
_
_
=
_
_
_
I
1
1
_
_
0
1
u
2
_
_
_
_
0
1
u
2
_
_
H
_
_
_
_
_
R
00
r
01
R
02
0
11
a
T
12
0 a
21
A
22
_
_
=
_
_
R
00
r
01
R
02
0
11
a
T
12
w
T
12
0 0 A
22
u
21
w
T
12
_
_
,
where again w
T
12
= (a
T
12
+u
H
21
A
22
)/
1
.
Let
H
k
=
_
_
_
I
1
1
_
_
0
k
1
u
21
_
_
_
_
0
k
1
u
21
_
_
H
_
_
_
be the Householder transform so computed during the (k + 1)st iteration. Then upon com-
pletion matrix A contains
R =
_
R
TL
0
_
= H
n1
H
1
H
0

A
where

A denotes the original contents of A and R
TL
is an upper triangular matrix. Rear-
ranging this we nd that
A = H
0
H
1
H
n1
R
which shows that if Q = H
0
H
1
H
n1
then

A = QR.
_
_
_
I 0
0
_
I
1
1
_
1
u
2
__
1
u
2
_
H
_
_
_
_
=
_
_
_
I
1
1
_
_
0
1
u
2
_
_
_
_
0
1
u
2
_
_
H
_
_
_
.
Typically, the algorithm overwrites the original matrix A with the upper triangular ma-
trix, and at each step u
21
is stored over the elements that become zero, thus overwriting a
21
.
(It is for this reason that the rst element of u was normalized to equal 1.) In this case Q
is usually not explicitly formed as it can be stored as the separate Householder vectors below
the diagonal of the overwritten matrix. The algorithm that overwrites A in this manner is
given in Fig. 8.10.
We will let
[UR, t] = URt(A)
denote the operation that computes the QR factorization of m n matrix A, with m n,
via Householder transformations. It returns the Householder vectors and matrix R in the
rst argument and the vector of scalars
i
that are computed as part of the Householder
transformations in t.
Theorem 8.12 Given A C
mn
the cost of the algorithm in Figure 8.10 is given by
C
HQR
(m, n) 2mn
2
2
3
n
3
ops.
Proof: The bulk of the computation is in w
T
12
= (a
T
12
+u
H
21
A
22
)/
1
and A
22
u
21
w
T
12
. During
the kth iteration (when R
TL
is k k), this means a matrix-vector multiplication (u
H
21
A
22
)
and rank-1 update with matrix A
22
which is of size approximately (m k) (n k) for a
cost of 4(mk)(n k) ops. Thus the total cost is approximately
n1
k=0
4(mk)(n k) = 4
n1
j=0
(mn +j)j = 4(mn)
n1
j=0
j + 4
n1
j=0
j
2
= 2(mn)n(n 1) + 4
n1
j=0
j
2
2(mn)n
2
+ 4
_
n
0
x
2
dx = 2mn
2
2n
3
+
4
3
n
3
= 2mn
2
2
3
n
3
.
Algorithm: [A, t] = URt(A)
Partition A
_
UR
TL
R
TR
U
BL
A
BR
_
and t
_
t
T
t
B
_
where UR
TL
is 0 0 and t
T
has 0 elements
while n(A
BR
) ,= 0 do
Repartition
_
UR
TL
R
TR
U
BL
A
BR
_
_
_
UR
00
r
01
R
02
u
T
10

11
a
T
12
U
20
a
21
A
22
_
_
and
_
t
T
t
B
_
_
_
t
0
1
t
2
_
_
where
11
and
1
are scalars
__

11
u
21
_
,
1
_
:= Housev
_

11
a
21
_
Update
_
a
T
12
A
22
_
:=
_
I
1
1
_
1
u
21
_
_
1 u
H
21
_
__
a
T
12
A
22
_
via the steps
w
T
12
:= (a
T
12
+ u
H
21
A
22
)/
1
_
a
T
12
A
22
_
:=
_
a
T
12
w
T
12
A
22
u
21
w
T
12
_
Continue with
_
UR
TL
R
TR
U
BL
A
BR
_
_
_
UR
00
r
01
R
02
u
T
10

11
r
T
12
U
20
u
21
A
22
_
_
and
_
t
T
t
B
_
_
_
t
0
1
t
2
_
_
endwhile
Figure 8.10: Unblocked Householder transformation based QR factorization.
8.3.3 Forming Q
Given A C
mn
, let [A, t] = URt(A) yield the matrix A with the Householder vectors stored
below the diagona, R stored on and above the diagonal, and the
i
stored in vector t. We
now discuss how to form the rst n columns of Q = H
0
H
1
H
n1
. Notice that to pick out
the rst n columns we must form
Q
_
I
nn
0
_
= H
0
H
n1
_
I
nn
0
_
= H
0
H
k1
H
k
H
n1
_
I
nn
0
_
. .
B
k
.
where B
k
is dened as indicated.
Lemma 8.13 B
k
has the form
B
k
= H
k
H
n1
_
I
nn
0
_
=
_
I
kk
0
0

B
k
_
.
Proof: The proof of this is by induction on k:
Base case: k = n. Then B
n
=
_
I
nn
0
_
, which has the desired form.
Inductive step: Assume the result is true for B
k
. We show it is true for B
k1
:
B
k1
= H
k1
H
k
H
n1
_
I
nn
0
_
H
k1
B
k
= H
k1
_
I
kk
0
0

B
k
_
.
=
_
_
I
(k1)(k1)
0
0 I
1
k
_
1
u
k
_
_
1 u
H
k
_
_
_
_
_
I
(k1)(k1)
0 0
0 1 0
0 0

B
k
_
_
=
_
_
I
(k1)(k1)
0
0
_
I
1
k
_
1
u
k
_
_
1 u
H
k
_
__
1 0
0

B
k
_
_
_
=
_
_
I
(k1)(k1)
0
0
_
1 0
0

B
k
_
_
1
u
k
_
_
1/
k
t
T
k
_
_
_
where y
T
k
= u
H
k
B
k
/
k
=
_
_
I
(k1)(k1)
0
0
_
1 1/
k
y
T
k
u
k
/
k
B
k
u
k
y
T
k
_
_
_
=
_
_
I
(k1)(k1)
0 0
0 1 1/
k
y
T
k
0 u
k
/
k
B
k
u
k
y
T
k
_
_
=
_
I
(k1)(k1)
0
0

B
k1
_
.
By the Principle of Mathematical Induction the result holds for B
0
, . . . , B
n
.
Theorem 8.14 Given [A, t] = URt(A) from Figure 8.10, the algorithm in Figure 8.11 over-
writes A with the rst n = n(A) columns of Q as dened by the Householder transformations
stored below the diagonal of A and in the vector t.
Proof: The algorithm is justied by the proof of Lemma 8.13.
Theorem 8.15 Given A C
mn
the cost of the algorithm in Figure 8.11 is given by
C
FormQ
(m, n) 2mn
2
2
3
n
3
ops.
Proof: Hence the proof for Theorem 8.12 can be easily modied to establish this result.
Algorithm: [A] = FormQ(A, t)
Partition A
_
UR
TL
R
TR
U
BL
A
BR
_
and t
_
t
T
t
B
_
where UR
TL
is n(A) n(A) and t
T
has n(A) elements
while n(A
TR
) ,= 0 do
Repartition
_
UR
TL
R
TR
U
BL
A
BR
_
_
_
UR
00
r
01
R
02
u
T
10

11
r
T
12
U
20
u
21
A
22
_
_
and
_
t
T
t
B
_
_
_
t
0
1
t
2
_
_
where
11
and
1
are scalars
Update
_

11
a
T
12
a
21
A
22
_
:=
_
I
1
1
_
1
u
21
_
_
1 u
H
21
_
__
1 0
0 A
22
_
via the steps

11
:= 1 1/
1
a
T
12
:= (u
H
21
A
22
)/
1
A
22
:= A
22
+ u
21
w
T
12
a
21
:= u
21
/
1
Continue with
_
UR
TL
R
TR
U
BL
A
BR
_
_
_
UR
00
r
01
R
02
u
T
10

11
a
T
12
U
20
a
21
A
22
_
_
and
_
t
T
t
B
_
_
_
t
0
1
t
2
_
_
endwhile
Figure 8.11: Algorithm for overwriting A with Q from the Householder transformations
stored as Householder vectors below the diagonal of A (as produced by [A, t] = URt(A) ).
Exercise 8.16 If m = n then Q could be accumulated by the sequence
Q = ( ((IH
0
)H
1
) H
n1
).
Give a high-level reason why this would be (much) more expensive than the algorithm in
Figure 8.11.
8.4 Solving Linear Least-Squares Problems
Chapter 9
Eigenvalues and Eigenvectors
9.1 Motivating Example
9.2 Problem Statement
nn
. Then C and x C
n
are said to be an eigenvalue of A
and corresponding eigenvector of A if x ,= 0 and Ax = x.
Denition 9.2 The set of all eigenvalues of A is given by (A).
Note: C is the set of all complex numbers. We have not talked about complex numbers
before. We will see that a real valued matrix can have complex valued eigenvalues, in which
case it has complex valued eigenvectors. The results we present carry through for complex
valued matrix A as well.
Ax = x
Then (A I)x = 0. What does this mean???
Theorem 9.4 Let A R
nn
. Then the following are equivalent:
C is an eigenvalue of A.
There exists x ,= 0 such that Ax = x.
A I is nonsingular.
A I has a nontrivial null space.
Ax = x
A(3x) = .... What does this mean???
241
242 Chapter 9. Eigenvalues and Eigenvectors
A(x) = .... What does this mean???
Example 9.6
A =
_
3 0
0 1
_
What are the eigenvalues of A?
0 = (A I)x =
__
3 0
0 1
_
_
1 0
0 1
__
x
=
_
(3 ) 0
0 (1 )
_
x
Under what circumstances does
_
(3 ) 0
0 (1 )
_
have a nontrivial null space?
What does that mean about the eigenvalues of A?
How do you compute eigenvectors corresponding to the eigenvalues?
Theorem 9.7 Let
A =
_
_
_
_
_
0
0 0
0
1
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0
(n1)
_
_
_
_
_
.
Then the eigenvalues of A equal the diagonal elements of the matrix: (A) =
0
,
1
, . . . ,
n1
.
Note that Ae
i
=
i
e
i
.
Exercise 9.8 Let A =
_
_
1 0 0
0 2 0
0 0 3
_
_
.
Give (A).
For each eigenvalue, give an eigenvector.
_
_
0 0 1
0 2 0
2 0 0
_
_
.
Give (A).
For each eigenvalue, give an eigenvector.
_
3 2
0 1
_
.
9.2. Problem Statement 243
What are the eigenvalues of A?
0 = (A I)x =
__
3 2
0 1
_
_
1 0
0 1
__
x
=
_
(3 ) 2
0 (1 )
_
x
Under what circumstances does
_
(3 ) 2
0 (1 )
_
have a nontrivial null space?
When = 3 or = 1 since then the matrix has linearly independent columns.
What does that mean about the eigenvalues of A? (A) = 3, 1.
How do you compute eigenvectors corresponding to the eigenvalues?
Let = 3. Then
_
(3 ) 2
0 (1 )
_
=
_
0 2
0 4
_
Reduce to row echelon form:
_
0 2
0 0
_
.
Where are the pivots? Which variable is a free variable? How do you nd a vector
in the null space?
Repeat for = 1.
In other words, once you know an eigenvalue, , you can compute the eigenvector by
nding a vector in the null space of A I, using the techniques you learned in an earlier
chapter.
Theorem 9.11 Let
A =
_
_
_
_
_
00

01

0(n1)
0
11

1(n1)
.
.
.
.
.
.
.
.
.
.
.
.
0 0
(n1)(n1)
_
_
_
_
_
.
Then (A) =
0
,
1
, . . . ,
n1
.
Exercise 9.12 Find the eigenvalues and vectors of the matrix A =
_
_
2 1 3
0 3 1
0 0 1
_
_
.
Example 9.13 Let A R
mn
have n linearly independent columns. Consider the matrix
B = A(A
T
A)
1
A
T
, which projects a vector x onto the column space of A.
244 Chapter 9. Eigenvalues and Eigenvectors
If x is already in the column space of A, then Bx = x. (Become projecting that vector
should just return that vector.) Conclude that = 1 is an eigenvalue of B and that there
are r = rank(A) linearly independent eigenvectors that correspond to that eigenvalue.
(The eigenvalues equal the columns of A!).
If x is the left null space of A, then Bx = 0. Conclude that 0 is an eigenvalue of A if
m > n. How would you characterize the eigenvalues of B?
9.2.1 Eigenvalues and vector a a 2 2 matrix
Let A =
_

_
. Then
1

_

__

_
=
1

_
+
+
_
=
_
1 0
0 1
_
We conclude that A has a nontrivial null space if and only if = 0. This quantity,
, is called the determinant of the given A.
Denition 9.14 Let A =
_

_
. Then its determinant is given by det(A) = .
A =
_
4 1
5 2
_
Find the eigenvalues of A:
A I =
_
4 1
5 2
_
det(A I) = 0:
0 = det(A I)
= (4 )(2 ) (1)(5)
= 8 2 +
2
+ 5 =
2
2 3
= ( + 1)( 3).
The eigenvalues are 1 and 3.
Exercise 9.16 Find the eigenvectors for the last example.
9.2. Problem Statement 245
_
3 0
0 1
_
. Then det(A I) = det
__
3 0
0 1
__
=
(3)(1)00. Thus the roots of (3)(+1) equal the eigenvalues: (A) = 3, 1.
_
3 2
0 1
_
. Then det(A I) = det
__
3 2
0 1
__
=
(3)(1)02. Thus the roots of (3)(+1) equal the eigenvalues: (A) = 3, 1.
nn
, C, and x C
n
with x ,= 0 and Ax = x. Then
=
x
H
Ax
x
H
x
.
Here
x
H
x =
0
0
+
1
1
+ +
n1
n1
= [
0
[
2
+[
1
[
2
+ +[
n1
[
2
.
where means the complex conjucate of : if = +i then = i and [[
2
= =
( +i)( i) =
2
i
2
2
=
2
+
2
.
Notice that if x R
n
then x
H
x = x
T
x.
_
1 1
2 4
_
. Find the eigenvalues and eigenvectors.

Practical Linear Algebra

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Practical Linear Algebra

Enviado por

Direitos autorais:

Formatos disponíveis

Practical Linear Algebra

. (a)-(b) Trigonomitry tells

5/5(1.5, 0.5, 0.5, 1.5)

6/2) = 1/3 and

, the component of y orthogonal to the columns of Q. This is a

is given in Figure 8.3 (right) and is used

Você também pode gostar