Você está na página 1de 78



'

&

<

'

>

&

CHAPTER 1

Linear Equations

1.1 The Method of Elimination.


The solution of problems which arise in the real world frequently requires the calculation of
unknown quantities. In the simplest cases, the unknowns are numbers and are found by solving
equations which the unknown numbers must satisfy. We begin with an example.
A mining company owns three mines which produce silver, lead and zinc. The ores from
the three mines produce dierent quantities of each of the metals. The following table gives the
quantities per tonne of ore.
Mine
1
2
3

Silver(100gm)
1
2
1

Lead(100kg)
2
1
3

Zinc(100kg)
3
1
1

The mining company has an order for 11,000 gm of silver, 17,000 kg of lead and 10,000 kg of zinc.
How much ore must be taken from each mine in order to ll the order?
Clearly the company could ll the order by taking say 110 tonnes of ore from the rst mine.
This would produce 11,000 gm of silver, 22,000 kg of lead and 33,000 kg of zinc. This is a very
wasteful solution and the company would wish to avoid such a course of action. What is required
is an extraction regime from the three mines which lls the order exactly. It is not obvious what
the solution might be or even whether there is a solution.
The rst step in dealing with a problem such as this is to formulate the problem mathematically.
This step is often referred to as formulating a mathematical model of the problem. Mathematical
calculations can then be carried out to nd the solution. Such calculations may not be easy, but
they are almost always very much simpler than attempting to solve the problem in any other way.
The problem requires some unknowns to be found. These are the amounts of ore which must
be taken from each mine and so we begin by giving symbolic names to these quantities. Let the
amount of ore taken from the rst mine be x tonnes, the amount from the second mine be y tonnes
and the amount from the third be z tonnes. The method used in mathematics to nd unknowns
is to formulate equations which the unknowns must satisfy and then to solve these equations. The
equations constitute the mathematical model of the real world problem.
Consider rst the quantity of silver which must be produced. Since x tonnes of ore are taken
from the rst mine and each tonne of ore produces 100 gm of silver, so the rst mine will give
1 x 100 gm of silver. Similarly the second mine will give 2 y 100 gm of silver and the third

1 Linear Equations

will give 1 z 100 gm. The total amount of silver produced is then 100x + 200y + 100z gm and
this is required to be 11,000 gm. We thus require that the equation
100x + 200y + 100z = 11, 000

or

x + 2y + z = 110

be satised by x, y and z. By similar considerations with the amounts of lead and zinc, we obtain
two further equations
2x + 1y + 3z = 170

and

3x + 1y + 1z = 100.

To avoid the large numbers on the right hand sides of these equations, we change the denitions
of x, y and z to be the number of tens of tonnes of ore taken from each mine. The amount of ore
taken from the rst mine is now 10x tonnes and so the amount of silver obtained is 10x 100 gm.
With similar calculations for the other mines, the equation for the production of silver becomes
1, 000x + 2, 000y + 1, 000z = 11, 000

or

x + 2y + z = 11.

The other two equations change in a similar way and so the set of equations to be solved is
x + 2y + z = 11,
2x + y + 3z = 17,
3x + y + z = 10.
These equations are called algebraic as the unknowns are numbers and the only operations
which occur are those of simple arithmetic. Further, they are linear because only rst powers of
the unknowns occur and there are no products of unknowns. These denitions can be made more
precise, but as yet we have no reason to do so. For any equations, there are three questions which
can be asked.
1. Are there any solutions of the equations? In the example above it will certainly be possible
for the company to ll its order by taking out very large amounts of ore from each mine, but
the question we are asking is whether it is possible to ll the order with no waste.
2. If there are solutions, how many are there? In the example the question is whether the company
can ll the order in a variety of dierent ways or in just one way.
3. Given that there are one or more solutions of the equations, how do we nd them?
These questions can be asked about any equations and the way in which they are answered
varies with the type and complexity of the equations in question. For a set of linear equations,
such as the one formulated above, one method of solution is the method of elimination. In this
method we rst eliminate one of the variables between two of the equations and then eliminate the
same variable between another two of the equations. To describe the method, we must number the
equations.
x + 2y + z = 11,
2x + y + 3z = 17,

(1)
(2)

3x + y + z = 10.

(3)

We can eliminate z between the rst two equations by multiplying the rst by 3 and subtracting
the result from the second to obtain
x 5y = 16.
(4)

1.2

Gaussian Reduction

To eliminate z between the rst and third equations, we simply subtract equation (1) from equation
(3) to obtain
2x y = 1.
(5)
We now have two equations in the two unknowns x and y. To eliminate y, we multiply equation
(5) by ve and subtract equation (4). We then obtain
11x = 11.

(6)

From (6), we obtain x = 1 and then from (5) we obtain y = 3. Finally from (1), we nd z = 4. The
solution which the company is seeking is that it should take 10 tonnes of ore from its rst mine, 30
tonnes from its second mine and 40 tonnes from the third mine.
These calculations have answered the three questions simultaneously. There is a solution, there
is only one and we have computed it. For sets of linear equations, it may appear that there is little
more to be said, but this is far from the case. There are a variety of complications which can occur.
1. In the example above, we have only three unknowns. In more complex problems there may be
hundreds or perhaps thousands of unknowns. In order to solve such problems computers must
be used and a simple and ecient method of automating the method of elimination must be
used.
2. In some problems which arise in practice there are unequal numbers of equations and unknowns.
Our methods of solution should be able to handle these situations as well as the simple case
of equal numbers of each.
3. In the problem outlined above, the mining company will receive many orders over time and so
it will have to solve the given equations many times. The sets of equations however will always
have the same left hand sides as the left hand side of each equation is determined only by the
properties of the mine. The right hand sides are determined by the orders and so will vary. In
solving the sets of equations we should be able to avoid performing the same calculations on
the left hand sides every time. How can we achieve this?

1.2 Gaussian Reduction.


The rst problem to be considered is the automation of the elimination method described in
the previous section. In the earlier calculation we eliminated z rst and then y. It is irrelevant
which variable is eliminated rst and for hand calculations we choose a sequence of eliminations
which produces the simplest numbers. For a computer however a xed sequence of eliminations
must be specied. The order which is usually used is to eliminate x rst and then y as in the
following calculation.
x + 2y + z = 11,
2x + y + 3z = 17,

(1)
(2)

3x + y + z = 10.

(3)

We subtract twice the rst equation from the second and three times the rst equation from the
third to obtain
3y + z = 5,
5y 2z = 23.

(4)
(5)

1 Linear Equations

To eliminate y, we multiply equation (5) by three and subtract ve times equation (4). This gives
11z = 44.

(6)

From (6), we nd z = 4, then from (5), y = 3 and nally from (1) we obtain x = 1.
This calculation involves only the coecients of x, y and z, together with the numbers on the
right hand sides of the equations. It can thus be written out as a calculation with this collection
of numbers, with the unknowns omitted. To do this we begin with the array of numbers obtained
from the equations.
1 2 1 11
2 1 3 17
3 1 1 10
It is useful to separate the right hand sides of the equations and we shall do this by inserting a
vertical line in the array, although this line is of no signicance in performing the calculations.
1 2 1
2 1 3
3 1 1

11
17
10

Denition 1.1
The array of numbers obtained from a set of simultaneous linear equations by removing the unknowns and separating the right hand sides by a vertical stroke is called the
augmented array of the set of equations.

The operations previously carried out on the equations can now be performed on the rows of
numbers in the array. To describe these operations, we number each row in the array. The rst
step is to subtract twice row 1 from row 2. This operation does not change row 1. We shall denote
the operation by R2 R2 2R1. Thus the array becomes
1
2
0 3
3
1

1
1
1

11
5
10

(R2 R2 2R1)

A similar operation is performed to produce a zero as the rst element of the third row.
1
2
1
0 3
1
0 5 2

11
5
23

(R3 R3 3R1)

The next step is to obtain a zero as the second element of the third row. We can do this by
multiplying the third row by 3 to obtain
1
2
1
0
3
1
0 15 6

11
5
69

(R3 3R3)

1.2

Gaussian Reduction

and then subtracting 5 times row 2 from row 3.


1
0
0

2
1
3
1
0 11

11
5
44

(R3 R3 5R2)

The solution can be found from this array. From the third row we nd 11z = 44 and so z = 4.
The second row gives 3y + z = 5 and so y = 3. Finally, from the rst row x + 2y + z = 11 and
so x = 1. There is some terminology used to describe the procedure we have just outlined.

Denition 1.2
1. In an array, the element in the ith row and jth column is called the (i,j) element.
2. The main diagonal of an augmented array is the diagonal beginning at the top left
hand corner and proceeding downwards and to the right, but not crossing the vertical
stroke.

The following diagram shows an array with the main diagonal circled. This array is square to
the left of the vertical line.
a11
..
.

...

a1i
..
.

...

a1n
..
.

b1
..
.

ai1
..
.

...

aii
..
.

...

ain
..
.

bi
..
.

an1

...

ani

...

ann

bn

To solve a set of simultaneous linear equations by elimination, we seek to transform the augmented array into an array in which all elements below the main diagonal are zero. From this new
array, the unknowns can be easily found.

Denition 1.3
1. The process of transforming an augmented array into an array in which all elements
below the main diagonal are zero is called Gaussian Reduction.
2. The process of nding the unknowns successively from the reduced array is called
back substitution.

Gaussian reduction does not always work in such a straightforward way as it did in the above
example and we shall progressively rene the procedure as we consider examples in which diculties
of various types arise.

1 Linear Equations

The rst diculty that we shall consider is the appearance of a zero on the main diagonal at
some stage of the calculation. This will of course prevent the other elements in the column being
reduced to zero. This diculty is easily dealt with. We simply interchange rows so that a non-zero
element appears on the main diagonal. This operation corresponds to writing the original equations
in a dierent order and this will certainly not change the solutions.
Example 1.1
We consider an example with four equations in four unknowns.
x + 2y + z + 3t = 4
2x + 4y + 3z + t = 5
x + 2y z + 2t = 3
2x + y + 3z t = 6
The augmented array of the set of equations is
1
2
1
2

2
4
2
1

1
3
3
1
1
2
3 1

4
5
3
6

The reduction of the array is carried out column by column beginning at the left hand side of
the array. The reduction however is performed by using row operations and is often referred to as
row reduction.
1
0
0
0

2
0
4
3

1
1
0
1

3
5
5
7

4
3
1
2

(R2 R2 2R1)
(R3 R3 + R1)
(R4 R4 2R1)

We now encounter the problem that the second element of the second row is zero and so cannot be
used to reduce the elements below it. This can be overcome by interchanging the second and third
rows.
1
0
0
0

2
4
0
3

1
0
1
1

3
5
5
7

4
1
3
2

(R2 R3)

The reduction of the second column can now be carried out.


1
2
0
4
0
0
0 12

1
3
0
5
1
5
4 28

4
1
3
8

(R4 4R4)

1.2

Gaussian Reduction

1
0
0
0

2
4
0
0

1
3
0
5
1
5
4 13

4
1
3
5

(R4 R4 + 3R2)

1
0
1
0

4
1
3
7

(R4 R4 4R3)

The third column is easily reduced.


1
0
0
0

2
4
0
0

3
5
5
7

From this array, the solution of the set of equations can be obtained by back substitution as t = 1,
z = 2, y = 1 and x = 1.

The method used in this example will fail if the column in which the zero appears contains
only zeros below the main diagonal. In this case there is no suitable row to be interchanged with
the one in question. Because of this possibility, the general description of Gaussian reduction is
more elaborate than we have given here. We shall not however consider sets of equations which
give rise to these more complicated situations.
The method of Gaussian reduction is simply a systematic and concise way of writing out the
method of elimination. The method can be described as the use of row operations on the augmented
array in order to transform it to the reduced array in which all elements below the main diagonal
are zero. We begin at the top left hand corner and use the element there to reduce all elements
below it to zero. We then use the second element in the second column to reduce all elements below
it to zero and so on.
In reducing an array in the manner just described, three types of row operation are permitted.
Each corresponds to an operation on the original set of equations. These operations do not change
the solutions of the equations and this is the reason why they can be used in transforming the
augmented array. The operations are as follows.
1. Multiplication or division of a row by a constant.
2. Interchange of two rows.
3. Addition or subtraction of one row multiplied by a constant to another row.
The rst of these operations was used in Example 1.1, but an array can be reduced without
it. It was used above to avoid fractions, but in a calculation carried out on a computer, it is
irrelevant whether or not fractions or decimal numbers appear. In the previous example, the step
of multiplying the fourth row by 4 could be omitted. The reduction of the second column would
then be carried out by adding (3/4)R2 to R4.
1
0
0
0

2
4
0
0

1
0
1
1

3
5
5
13/4

4
1
3
5/4

(R4 R4 + (3/4)R2)

The appearance of fractions makes the hand calculation very tedious but makes no dierence
to the computer. This example also illustrates the fact that there are many ways to carry out a
reduction, and many dierent reduced arrays, for a given set of equations. All reduced arrays will of
course be equivalent, in the sense that they give the same solutions to the original set of equations.
In some cases the reduction of the array can be carried further than is the case with Gaussian
reduction. If the array to the left of the line is square and all elements on the main diagonal in the

1 Linear Equations

Gaussian reduction are non-zero, then all elements above the main diagonal can also be reduced to
zero. The solutions of the equations can then simply be read from the nal array.
Example 1.2
We carry on the reduction of the last example. Again the reduction is done column by column
using row operations. We begin with the Gaussian array.
1
0
0
0

2
4
0
0

1
0
1
0

3
5
5
7

4
1
3
7

1
0
0
0

2
1
0
0

1
0
1
0

3
5/4
5
1

4
1/4
3
1

1
0
0
0

0
1
0
0

1
0
1
0

1/2
5/4
5
1

7/2
1/4
3
1

(R1 R1 2R2)

1
0
0
0

0
1
0
0

0 11/2
0
5/4
1
5
0
1

13/2
1/4
3
1

(R1 R1 R3)

1
0
0
0

0
1
0
0

0
0
1
0

0
0
0
1

1
1
2
1

(R2 R2/4)
(R4 R4/7)

(R1 R1 11/2R4)
(R2 R2 5/4R4)
(R3 R3 + 5R4)

From this array, the solution can be read o immediately and back substitution is not needed. 

Denition 1.4
The process of transforming an array which is square to the left of the vertical line
into one in which all elements on the main diagonal are one and all elements above and
below the main diagonal are zero is called Gauss-Jordan reduction.

Gauss-Jordan reduction appears to be a more ecient method than Gaussian reduction. It is


plausible however, and can be demonstrated, that it requires more operations to reduce the array

1.3

Special Cases

from Gauss to Gauss-Jordan form than it does to carry out the back substitution. It is thus more
ecient to solve sets of equations by Gaussian rather than Gauss-Jordan reduction. There are
however other uses of Gauss-Jordan reduction and we shall later consider one of these.

1.3 Special Cases.


We return to the example of the mining company with three mines each producing silver, lead
and zinc. We shall consider the same problem as before, with the company having to nd the
amount of ore to be extracted from each mine to ll an order without wastage. We now suppose
that the amounts of metal produced by the ore from each mine are given by the following table.
Mine
1
2
3

Silver(100gm)
2
1
4

Lead(100kg)
4
5
6

Zinc(100kg)
3
3
5

The order to be lled is for 2,000 gm of silver, 3,000 kg of lead and 5,000 kg of zinc. We introduce
the same unknowns x, y and z as before and construct the equations which must be solved.
2x + y + 4z = 2,
4x + 5y + 6z = 3,
3x + 3y + 5z = 5.
The augmented array of the set of equations is easily written down.
2
4
3

1
5
3

4
6
5

2
3
5

At this stage there appears to be little dierence between this case and the one presented
earlier. When we carry out the reduction however a signicant dierence becomes apparent.
(R1 R3 R1)

2
5
3

1
6
5

3
3
5

1
2
0 3
0 3

1
2
2

3
9
4

(R2 R2 4R1)
(R3 R3 3R1)

1
2
0 3
0
0

1
2
0

3
9
5

(R3 R3 R2)

1
4
3

The array has been reduced to the required form, but the nal array has a row of zeros to
the left of the line. This cannot be reduced further because we can only exchange a row with
a row below it in the array. If we seek to use a row above the row, then the reduction already

10

1 Linear Equations

carried out will be destroyed. Each row in any array corresponds to an equation and the equation
corresponding to the last row of the above array is 0x + 0y + 0z = 5. For any values of x, y and
z, this gives 0 = 5, which is impossible. As a consequence, the original set of equations has no
solutions. In terms of the original problem, this means that the mining company cannot ll the
given order without wastage. It can of course ll the order by extracting a suciently large amount
of ore from any one of its mines, but there will always be product left over unless the equations are
satised.

Denition 1.5
A set of simultaneous linear equations for which the reduced array contains a row
with zeros to the left of the line and a non-zero number to the right is called inconsistent.

It should be noted that it is not possible for a set of equations to be inconsistent if every
equation has right hand side zero. A set of equations with this property is given a particular name.

Denition 1.6
1. A set of simultaneous linear equations is called homogeneous if every equation in the
set has right hand side zero.
2. If at least one right hand side is non-zero, then the set is called non-homogeneous.

Homogeneous equations do not of course occur in the example of the mining company, because
in any real order, a nonzero amount of at least one of the metals will be required. We shall however
encounter homogeneous sets of equations in another context later and the fact that such equations
cannot be inconsistent will be important.
An inconsistent set of equations has no solutions. We have now seen that a set of simultaneous
linear equations may have one solution or it may have no solutions. There is one other possibility
and again we illustrate it with an example from the mining company. Suppose that the three mines
produce ore as in the previous case but the order is now for 2, 000 gm of silver, 8,000 kg of lead
and 5, 000 kg of zinc. The calculations are almost the same as before, the only dierence being in
the elements on the right hand side of the dividing line. The equations are
2x + y + 4z = 2,
4x + 5y + 6z = 8,
3x + 3y + 5z = 5.

1.3

Special Cases

11

and the augmented array is


2
4
3

1
5
3

4
6
5

2
8
5

The Gaussian reduction is almost identical to that given above and the result is
2
0
0

1
3
0

4
2
0

2
4
0

Again we have a row of zeros to the left of the line, but the corresponding element to the right of
the line is also zero. The equation corresponding to this line is 0 = 0 and this is satised for any
values of x, y and z. There are then only two equations to be solved for the three unknowns. In
fact we can let z have any value at all and we can nd the corresponding values for x and y. Let
z = a. Then from the second row of the array we nd y = (4 + 2a)/3 and from the rst row we
obtain x = (1 7a)/3. Some particular solutions are x = 2, y = 2, z = 1, given by a = 1 and
x = 0.1, y = 1.4, z = 0.1, given by a = 0.1. There are in fact an innite number of solutions of
the equations.

Denition 1.7
A set of simultaneous linear equations for which the reduced array contains a row of
zeros to the left of the line and zero to the right is said to be redundant.

If a set of equations is redundant, then at least one of the equations can be obtained by adding
together suitable multiples of the others. The number of distinct equations is thus less than the
number of equations originally given. In the above example, if we subtract the second equation from
twice the third, then we get the rst equation and so there are really only two distinct equations.
To put this in another way, once we have found a solution of two of the equations, then it will
automatically satisfy the third equation. This is not true of any of the earlier examples.
It is quite possible for the reduced array to contain several rows of zeros to the left of the line.
For some of these, the element on the right may be zero and for some it may be non-zero. Thus
it is possible for a set of equations to be both inconsistent and redundant. This occurs if some
equations can be obtained as combinations of others, but the set of equations has no solutions.
In relation to the above example, it should be observed that not every value of a gives a solution
which the mining company can use. Clearly the mining company cannot extract a negative amount
of ore from a mine and so the values of x, y and z must all be positive. In order to obtain a solution
to the original real world problem we can only allow values of a between 0 and 1/7. This restriction
illustrates the fact that when we construct a mathematical model of a real world problem, the
mathematical properties of the model may not all correspond to properties of the original problem.
In particular, solutions of equations may not always be able to be implemented in practice. In
solving problems with mathematics, we must always be aware of the interpretation of the results
and not simply accept all mathematical results as solutions.

12

1 Linear Equations

There is a simple geometric interpretation of the solution of sets of simultaneous linear equations, at least in two and three dimensions. The general form of a linear equation in two unknowns
is
ax + by = c
and such an equation is represented graphically by a line in the plane. A set of two simultaneous
linear equations in two unknowns thus represents a pair of lines in the plane and a solution of the
set of equations represents a point where the lines intersect. Similar considerations apply to an
equation with three unknowns. The general form is
ax + by + cz = d
and this is represented graphically as a plane in three dimensional space. Each solution of a set of
equations in three unknowns represents a point where all of the planes represented by the equations
intersect.
We shall consider the two dimensional case where we have two equations in two unknowns.
Consider rst the equations
x + y = 2,
2x + y = 3.
The augmented array of the equations is
1
2

1
1

2
3

The Gaussian reduction is easily obtained and the result is


1
1
0 1

2
1

The solution of the set of equations is x = 1, y = 1. Each of the equations represents a line in the
plane and the solution of the set of equations is the point where the two lines intersect.
4
3
2
1

In two dimensions two lines will fail to intersect if they are are parallel and in this case the
corresponding equations will have no solutions. An example is provided by the equations
x 2y = 1,
2x + 4y = 6.

1.4

Unequal Numbers of Equations and Unknowns

13

The augmented array of the equations is


1 2
2
4

1
6

The Gaussian reduction is easily obtained and the result is


1 2
0
0

1
8

The reduction shows that the equations are inconsistent and this can be seen graphically by the
fact that the two lines are parallel.
3
2
1

The nal case is when the equations have an innite number of solutions. This occurs when
the two lines are the same and so every point on the line is a solution of the set of equations.
In two dimensions the geometrical possibilities are easy to understand but trivial. In three
dimensions there are clearly more possibilities and in more than three dimensions we can no longer
draw pictures to see what is happening.

1.4 Unequal Numbers of Equations and Unknowns.


We have used the mining company example to illustrate various features of the solutions of
sets of simultaneous linear equations. At the end of Section 1.1 we listed three problems to be dealt
with. The rst concerned the formulation of an ecient procedure for solving sets of equations by
elimination. We presented such a method, the method of reduction, in Sections 1.2 and 1.3. In
these Sections however, we considered only situations where the number of equations is the same
as the number of unknowns. The next task is to examine cases where there are more unknowns
than equations or more equations than unknowns.
Suppose we consider the original mining company which has three mines, with the production
of metal per tonne of ore from each mine given by the following table.
Mine
1
2
3

Silver(100gm)
1
2
1

Lead(100kg)
2
1
3

Zinc(100kg)
3
1
1

We suppose the company has an order for 11,000 gm of silver and 17,000 kg of lead, but has no
order for zinc. Letting x, y and z be the amounts of ore taken from each mine in tens of tonnes,

14

1 Linear Equations

we obtain the following equations to solve.


x + 2y + z = 11
2x + y + 3z = 17
The augmented array is
1 2 1
2 1 3

11
17

The main diagonal of this array does not reach the vertical line and to reduce the array, we again
seek to reduce all elements below the main diagonal to zero. This is easily done and requires only
one operation.
1
2
1
11
5
(R2 R2 2R1)
0 3
1
We can allow z to have any real value and so we put z = a. Then from the second row of the
array, y = (5 + a)/3. From the rst row, x = (23 5a)/3. A mathematical solution of the
equations is obtained for any value of a, but in the original problem we must restrict a to the range
0 a 23/5. Possible solutions for the company are x = 6, y = 2, z = 1, given by a = 1, and
x = 23/3, y = 5/3, z = 0, given by a = 0, but not x = 28/3, y = 4/3, z = 1, given by a = 1.
We have an innite number of solutions in this situation but the equations are not redundant. The
company would decide between the various solutions on some other grounds than those we have
so far considered. Such criteria may involve the minimisation of costs. The application of such
extra criteria takes us into optimisation theory, which is of great importance for rational decision
making, but which is beyond the mathematical methods we are considering here.
Example 1.3
We consider an example with three equations and four unknowns.
x+y+z+t=2
2x y + 3z t = 1
x + 2y + 4z + 3t = 4
The augmented array is
1
1
2 1
1
2

1
1
3 1
4
3

2
1
4

The Gaussian reduction is as follows.


1
0
0

1
3
1

1
1
1 3
3
2

2
3
2

1
0
0

1
1
3

1
1
3
2
1 3

2
2
3

1
0
0

1
1
0

1
3
10

1
2
3

2
2
3

(R2 R2 2R1)
(R3 R3 R1)

(R2 R3)

(R3 R3 + 3R2)

1.4

Unequal Numbers of Equations and Unknowns

15

The last unknown t may be assigned any value, t = a, and then


z = 3(1 a)/10,
y = 11(1 a)/10,
x = (3 + 2a)/5.

The opposite case is when there are more equations than unknowns. Suppose that our company
discovers that there are traces of gold in each of its mines and that the content of the ore per tonne
is given by the following table.
Mine
1
2
3

Silver(100gm)
1
2
1

Lead(100kg)
2
1
3

Zinc(100kg)
3
1
1

Gold(gm)
1
3
2

Suppose that the company obtains an order for 11,000 gm of silver, 13,000 kg of lead, 10,000 kg of
zinc and 100 gm of gold. Can it ll this order without wastage from its mines? There will now be
four equations in the three unknowns x, y and z. The equations are easy to formulate
x + 2y + z = 11
2x + y + 3z = 13
3x + y + z = 10
x + 3y + 2z = 10
We begin the solution of the problem with the augmented array.
1
2
3
1

2
1
1
3

1
3
1
2

11
13
10
10

The Gaussian reduction of this array is carried out as follows.


1
0
0
0

2
3
5
1

1
1
2
1

11
9
23
1

1
0
0
0

2
1
5
3

1
1
2
1

11
1
23
9

1
0
0
0

2
1
0
0

1
1
3
4

11
1
28
12

(R2 R2 2R1)
(R3 R3 3R1)
(R4 R4 R1)

(R2 R4)

(R3 R3 + 5R2)
(R4 R4 + 3R2)

16

1 Linear Equations
1
0
0
0

2
1
0
0

1
1
3
0

11
1
28
76/3

(R4 R4 4/3R3)

The reduced form shows that the equations are inconsistent, that is they have no solutions.
The interpretation of this result is that it is impossible for the company to ll the given order from
its mines without wastage.
Example 1.4
Consider the following set of 4 equations in three unknowns.
2x y + 3z = 9
x + y + z = 0
3x + 2y z = 1
2x + 3y + 2z = 1
The augmented array is
1
2
3
2

1
1
2
3

1
3
1
2

0
9
1
1

Where we have written the second equation as the rst row of the array to simplify the calculation.
The Gaussian reduction of this array is carried out as follows.
1
0
0
0

1
1
5
1

1
5
2
0

0
9
1
1

(R2 R2 + 2R1)
(R3 R3 + 3R1)
(R4 R4 2R1)

1
0
0
0

1
1
1
5
0 23
0 5

0
9
46
10

(R3 R3 5R2)
(R4 R4 R2)

1
0
0
0

1
1
1
5
0 23
0
0

0
9
46
0

(R4 R4 (5/23)R3)

The nal row of the reduced array shows that the equations are redundant. The solution of the set
of equations is
z = 2, y = 1, x = 1.

In the various examples above, we have seen that a set of simultaneous linear equations may
have a unique solution, no solution or an innite number of solutions. There are no other possibilities
and the various cases can be decided by reducing the augmented array of the set of equations to
the form in which all elements below the main diagonal are zero. All of the cases arise in practical
problems and so all are needed if we are to be able to deal with applications.

1.5

Problems

17

1.5 Problems.
Solve each of the following sets of equations by
1. Gaussian reduction,
2. GaussJordan reduction,where possible.
(i)

5x 6y = 4
8x 9y = 7

x + 3y = 2
2x y = 3

(ii)

(iii)

x + 2y z = 5
2x y z = 0
x y + 3z = 8

(iv) 2x y + 2z = 12
3x + 4y 3z = 5
4x 3y + 2z = 14

(v)

x y + 4z = 1
2x + 7y 6z = 2
x + 9y + 8z = 3

(vi) 3x + y 3z = 0
x + 4y + 2z = 0
3x 10y 12z = 0

(vii)

7x 8y + 9z = 33
9x + 8y z = 1
x 7y + 9z = 26

(ix)

2x + y 5z = 0
x y z = 3
3x 3y + 9z = 3

(xi)

x + y 2z t = 1
2x y + 3z + t = 2
x + 4y 9z 4t = 2

(xii)

(xiii)

2x y = 5
3x + 2y = 4
7x + 8y = 6

(xiv) 7x + 3y = 2
4x 2y = 8
x+ y =1

(xv)

2x y + z + 2t = 5
x + 2y + 2z 3t = 3
3x y + z + t = 6
2x + y 2z 3t = 6

(xvi)

(viii)

x y + 2z = 4
3x + y + 4z = 6
x+y+ z =1

(x) 4x 3y + 2z = 8
3x + y 4z = 2
5x 7y + 8z = 1
2x + y + z 3t = 1
x + 2y z + 3t = 2
x 2y + 2z t = 3

2x + 2y z = 2
3x y + 2z = 3
2x + 3y + z = 0
4x y z = 0

CHAPTER 2

Matrices

2.1 Denitions.
The methods presented in Chapter 1 are very eective for solving a single set of simultaneous
equations. If these were the only problems which were encountered then we would not need to
develop other methods. One of the situations in which other methods are needed was referred to
in Chapter 1. Consider again the mining company. Over time it will receive many orders and each
time it receives an order it will have to solve a set of simultaneous linear equations. As it is only
the orders which change and not the ores in the mines, these sets of equations all have the same left
hand sides. It is only the right hand sides which change. In reducing each array, the calculations
on the left hand side of the line are the same every time and it is wasteful to have to repeat them.
We should be able to devise methods which avoid this. There are several such methods and in this
chapter we consider one of them.
In this method we treat the set of equations as a single equation. Consider the simplest possible
case of one equation in one unknown. The solution is easy to obtain. The equation is
ax = b,

with

a = 0.

(1)

The solution, x = b/a, is easily obtained by dividing both sides by a, but we shall write this out
in a way which we shall later be able to generalise to more than one equation. First multiply both
sides by a1 , the inverse of a, and then we obtain
a1 ax = a1 b,
1x = a1 b,
x = a1 b.
It is this pattern of solution which we seek to imitate for a set of simultaneous equations. We note
that the left hand side of the equation determines a 1 . Once this has been computed, we need only
multiply it by the right hand side b to obtain the solution. If the right hand side changes this does
not aect a1 .
The rst problem is to write a set of equations as a single equation. We begin with two
equations in two unknowns and we shall use subscript notation.
a11 x + a12 y = b1 ,
a21 x + a22 y = b2 .
Equation (1) has the form
constant times unknown = constant

(2)

2.1

Denitions

19

and we seek to write the equations (2) in this form. We have two unknowns and so we group these
into an array.
 
x
.
y

Denition 2.1
A rectangular array of numbers is called a matrix. The element in the ith row and
jth column of a matrix is called the (i, j) element of the matrix.

Matrices will be fundamental to our work in this chapter. Returning to the set of simultaneous
equations (2), on the right hand sides of the equations we have two constants and we group these
into a matrix
 
b1
.
b2
There are four constants on the left hand side of the equations and, following the pattern of the
arrays used in the previous chapter, these are grouped into a matrix with two rows and two columns.


a11 a12
.
a21 a22
A matrix is always enclosed in parentheses to indicate that the array is being treated as a single
object.
The equations (2) can now be written in the form
constant times unknown = constant,
by using the three matrices which we have introduced.
    

x
b1
a11 a12
.
=
y
a21 a22
b2

(3)

This is a single equation, but the constants and the unknown are matrices rather than numbers.
Before the equation can be solved however there are several aspects of it which require clarication.
1. What does it mean to say that two matrices are equal?
2. How are matrices multiplied together?
There are no predetermined answers to these questions. The answers are denitions, but the
denitions are adopted because of their usefulness in applications.
Equation (3) contains matrices of several dierent shapes and the following terminology is used
to describe the shape of a matrix.

Denition 2.2
A matrix containing m rows and n columns is said to be of size m n. An n n
matrix is called square. An n 1 or 1 n matrix is also called a vector. No distinction is
usually made between a 1 1 matrix and a number.

20

2 Matrices
For example, the matrices

1
2,
3

2 3 4
5 6 7


and

(1 2

3 4)

are of sizes 3 1, 2 3 and 1 4 respectively.


Matrices are usually denoted by capital letters such as A and B. Some care however is needed
when using computer algebra systems such as Mathematica which use capital letters to begin
internal commands. In Mathematica, problems will be encountered if we attempt to use C, D
and some other capital letters as names of matrices. These letters are reserved for internal use in
Mathematica, and we get an error message if we attempt to use them for some other purpose, such
as naming a matrix.
If A is a matrix, then the elements of A are denoted by a ij or by Aij . We shall usually use
the rst of these notations when the matrix is named by a single capital letter, but for matrices
with composite names, such as A + B or A B T , we shall use the second notation, referring to the
elements of A + B for example by (A + B)ij .
The denition of equality of matrices can now be given. It is the obvious one to adopt.

Denition 2.3
Two matrices are equal if they have the same size and if the elements in corresponding
positions are equal.

Example 2.1

1
1.
1

1
2.
1
3.

2
1
2
1


= ( 1 2 )


2 1
=
1 1
 
1
( 1 2 ) =
2

(dierent sizes)

(same size but unequal elements)
(dierent sizes)

2.2 Multiplication of Matrices.


The second of the two questions we must deal with concerns multiplication of matrices. The
denition is complicated, but its form is dictated by its applications. In our case we must look at
equation (2) from the previous section. As an alternative to equation (3), we can write equation
(2) in matrix form as
  

b1
a11 x + a12 y
.
=
b2
a21 x + a22 y

2.2

Multiplication of Matrices

21

Comparing this equation with equation (3), we see that




a11
a21

a12
a22

   

x
a11 x + a12 y
.
=
y
a21 x + a22 y

This gives the rule for multiplying a 2 2 matrix by a 2 1 matrix to produce a 2 1 matrix.
Applying this rule to an example gives

   
  
2 3
6
26+37
33
.
=
=
.
4 5
7
46+57
59
This rule is suciently unusual that it may be useful to give some further examples of its use. The
rst example is one that we shall use again later.
Suppose we have a country in which an epidemic is raging. The country will contain people
who are well, people who are sick and people who are dead. Suppose that each month 40% of
the well people become sick, 40% of the sick die and 20% of the sick recover. There are of course
no changes from being dead. What is the state of the population after 12 months? What is it in
the long run? We shall later be able to answer these questions in a very ecient way, but for the
moment we shall be content with formulating the problem in matrix form.
Let the numbers of well, sick and dead people after n months be w n , sn and dn respectively.
The initial numbers in each group are w 0 , s0 and d0 respectively. After one month, 40% of the well
people have become sick and 20% of the sick have become well. Thus the number of well people at
the end of one month is 60% of the number originally well added to 20% of the number originally
sick. In symbols,
w1 = 3/5w0 + 1/5s0 .
Similarly, the number of sick at the end of one month is given by
s1 = 2/5w0 + 2/5s0
and the number of dead is given by
d1 = 2/5s0 + d0 .
These results can be written in matrix form as


w1
3/5
s1 = 2/5
d1
0

follows,
1/5
2/5
2/5

0
w0
0 . s0 ,
d0
1

and we see that the same rule for matrix multiplication applies as in the previous case. The result
we have obtained in this case is not a set of equations to be solved. Rather it is a rule telling us
how to compute the state of the population after one month, assuming we know the initial state.
We shall return to this problem in Chapter 4.
Both in the case of simultaneous equations and in the case of the epidemic problem, the rule
which is used to multiply a matrix by a vector may best be described as tipping rows of the rst
matrix down the columns of the second. This same rule is used to multiply matrices of other sizes,
but it is easily seen that there must be restrictions on the sizes of the matrices being multiplied.
Suppose we multiply matrices A and B to obtain matrix C, that is
A B = C.

22

2 Matrices

In order to apply the rule of tipping rows of A down columns of B, the number of elements in a
row of A must equal the number of elements in a column of B. This can be expressed as
number of columns in A = number of rows in B.
It is only when this condition is satised that the matrices can be multiplied together.
In the examples considered above, a square matrix and a vector were multiplied together. We
next consider an application where a product of matrices of other sizes arises.
A manufacturing company sells three products X, Y and Z. Suppose there is a sales sta of four
and that their names are A, B, C and D. The selling prices of the products and the commissions
received by the each member of the sales sta are given in thousands of dollars, ($K), in the
following table.
Price($K)
2.50
3.80
4.05

Product
X
Y
Z

Commission($K)
0.25
0.35
0.38

Each month the sales manager is presented with the quantities of each product sold by each
salesperson and the manager must compute the total sales and the commission to be paid to each of
them. Suppose that in some particular month the manager is presented with the following gures.
SalesPerson
A
B
C
D

Product X
6
5
8
7

Product Y
4
7
3
6

Product Z
5
8
6
4

Each of the tables contains an array of numbers, and these arrays can be written as matrices.
We shall call these the Sales and Prices matrices, and denote them by S and P repectively.

6
5
S=
8
7

4
7
3
6

5
8
,
6
4

2.50
P = 3.80
4.05

0.25
0.35
0.38

We wish to calculate the total value of the sales of each salesperson. For A we calculate the total
sales in $K as
6 2.50 + 4 3.80 + 5 4.05.
This is easily recognised as the (1, 1) entry in the matrix product which is obtained by tipping
the rst row of S down the rst column of P . Similarly, the total sales of B is given by the (2, 1)
element of the product,
5 2.50 + 7 3.80 + 8 4.05,
obtained by tipping the second row of S down the rst column of P . The total sales of C and D
are given by the (3, 1) and (4, 1) elements, respectively. The total sales of each member of the sales

2.2

Multiplication of Matrices

23

sta can thus be obtained from the rst column of the matrix product S P .

6 4 5
2.50 0.25
5 7 8
SP =
3.80 0.35
8 3 6
4.05 0.38
7 6 4

6 2.50 + 4 3.8 + 5 4.05 6 0.25 + 4 0.35 + 5 0.38


5 2.50 + 7 3.8 + 8 4.05 5 0.25 + 7 0.35 + 8 0.38
=

8 2.50 + 3 3.8 + 6 4.05 8 0.25 + 3 0.35 + 6 0.38


7 2.50 + 6 3.8 + 4 4.05 7 0.25 + 6 0.35 + 4 0.38

50.45 4.80
71.50 6.74
=
.
55.70 5.33
56.50 5.37
The commission in $K paid to A is given by
6 0.25 + 4 0.35 + 5 0.38.
This is the (2, 1) element in the matrix product, obtained by tipping the rst row of S down
the second column of P . Similar calculations show that the commissions earned by the other
salespersons are given by the remaining elements of the second column of the matrix product.
We have shown that the total sales and the commissions for each member of the sales sta
can be read from the product matrix. The total sales of A for example are $50,450 and the
commission paid is $4,800. All computer algebra systems can perform matrix multiplications, and
so the identication of the solution of a problem as a matrix product will enable the problem to be
solved with such software.
The rule for matrix multiplication can be expressed in general terms in the following way. If
the rst matrix has size m n and the second has size n p, then the product will have size m p.
To obtain the (i, j) element of the product of two matrices, row i of the rst matrix is tipped down
column j of the second, corresponding elements are multiplied together the products are added.
The result is
ai1 b1j + ai2 b2j + + ain bnj .
The operation is illustrated diagrammatically in the following gure.

a11
..
.

ai1
.
..
am1

...

a1k
..
.

...

...

aik
..
.

...

...

amk

...

a1n
..
.

ain
..
.
amn

b11
..
.

bk1

.
..
bn1

...

b1j
..
.

...

...

bkj
..
.

...

...

bnj

...

The precise denition of matrix multiplication can now be formulated.

b1p
..
.

bkp

..
.
bnp

24

2 Matrices
Denition 2.4
Let A be an m n matrix and B be an n p matrix. The product of A and B
is the m p matrix whose (i, j) element is obtained by tipping the i  th row of A down
the j  th column of B, multiplying together each pair of elements and adding the results.
The product is denoted by A B or simply by AB. This can be expressed formally in
summation notation as
(A B)ij = ai1 b1j + ai2 b2j + + ain bnj
n

=
aik bkj .
k=1

Note that we are using lower case letters to denote elements on the right hand side of this
denition, but not on the left hand side where the matrix has a composite name. We shall adopt
the same convention for the other operations on matrices in the next section.
We consider some examples and in these examples we shall write out the calculations in detail,
although in hand calculations much of the arithmetic would be done mentally.
Example 2.2

 



3 4
1 2
3 1 + 4 2 3 (2) + 4 (3)
1.

=
1 2
2 3
1 1 + 2 2 1 (2) + 2 (3)


11 18
=
5
8


3 8
2.
(2 1)
= (2 3 + 1 7 2 8 + 1 6)
7 6

3.

( 1 1

= ( 13

22 )

2
2 ) 1 = 1 2 + (1) (1) + 2 3
3
=9

Matrix multiplication diers from ordinary multiplication of numbers in two ways. First,
because of the size restrictions it is not always possible to multiply two matrices together. Second,
and more important, is the fact that the order of the factors usually aects the result. There are
ve possibilities. We shall give examples of each.
1. The multiplication is not possible in either direction. For example, let


1 2
A=
,
B = (1 2 3).
3 1
Since A is 2 2, while B is 1 3, neither A B nor B A can be calculated.

2.2

Multiplication of Matrices

25

2. The multiplication is possible in one direction but not the other. For example, let

A = (1
Then A B = ( 8

2),

B=

2 1
3 2


.

5 ), but B A is undened because B is 2 2, while A is 1 2.

3. The multiplication is possible in both directions, but matrices of dierent sizes are produced.
For example, let
 
3
A=
,
B = (2 3).
1


Then
AB =

6 9
2 3


B A = 9.

4. The multiplication is possible in both directions and the resulting matrices have the same
dimensions, but are unequal. For example, let

A=


Then
AB =

2 3
1 4


,

7 2
11 6

B=

1
2
3 2


,

BA=


.

0 5
4 1


.

This case can only occur when the matrices are square and of the same size. This possibility
is important in algebraic calculations with matrices and we give several examples below.
5. The multiplication is possible in both directions and the resulting matrices are equal. Again
this can only occur when the matrices are square and of the same size. For example, let

A=


Then
AB =

3 1
2
0

7 5
10 8


,

B=

5 4
8 7


,

BA=

7
10

5
8


.

The following terminology is used to describe these possibilities.

Denition 2.5
1. Two matrices A and B are said to commute if
A B = B A.
2. Because there are pairs of matrices which do not commute, matrix multiplication is
said to be non-commutative.

26

2 Matrices

Throughout this section, we have denoted matrix multiplication by a dot, writing the product
of the matrices A and B as AB. This is the notation used by Mathematica for matrix multiplication,
but it is not usually used in printed text. From here on we shall usually omit the dot and simply
write the product of A and B as AB.
Finally in this section, we return to the representation of a set of simultaneous linear equations
in matrix form. Having dened matrix equality and matrix multiplication, we can write the set of
equations in the form
AX = B,
where A is the matrix of coecients from the left hand side, X is the column matrix or vector of
unknowns and B is the column matrix or vector of constants from the right hand side. In the case
of two equations in two unknowns for example we would have

 
 

x
b1
a11 a12
,
X=
and B =
.
A=
y
a21 a22
b2

2.3 Other Operations with Matrices.


There are several other operations which can be performed on matrices in addition to the the
operation of multiplication. In this section we shall introduce some of these, in particular sums,
multiples and transposes. As with multiplication, the denitions of these operations are chosen
because of their usefulness in particular applications. While the denition of multiplication is
complicated and unintuitive, the denitions of the present operations are quite straightforward.
Consider the mining company which we introduced in the previous chapter. Suppose the
company receives two orders, one for 5,000 gm of silver, 7,000 kg of lead and 6,000 kg of zinc, while
the other is for 3,000 gm of silver, 5,000 kg of lead and 5,000 kg of zinc. The total order is for 8,000
gm of silver, 12,000 kg of lead and 11,000 kg of zinc. Each order is represented by a 3 1 matrix
and the fact that the total order is the sum of the two individual orders suggests that we should
write

5
3
8
7 + 5 = 12 .
6
5
11
This in turn suggests that the sum of two matrices is obtained by adding the elements in corresponding positions. In order for this to be possible, the two matrices must have the same size.

Denition 2.6
Let A and B be two matrices of the same size. The sum of A and B, denoted by
A + B, is the matrix obtained by adding the elements in corresponding positions. In
symbols,
(A + B)ij = aij + bij .

We next consider some examples. In these examples, as with our earlier examples of matrix
multiplication, we shall write out the calculations in detail, although in hand calculations much of
the arithmetic would be done mentally.

2.3

Other Operations with Matrices

Example 2.3

 
2 1
1
1.
+
3 6
1

2
4


=

=

2.

27

21
3+1
1
3
4 10

(1 2) + (4 3) = (1 + 4 2 + 3)
= (5


3.

1+2
6+4


( 2 1 ) +

1
4

3
5

5)




is undened.

The second operation on matrices to be considered in this section is multiplication of a matrix


by a number. This is carried out element by element and there are no size restrictions. The
denition is suggested by considerations similar to those used above for addition of matrices.

Denition 2.7
Let A be a matrix and c be a number. The product of A by c, denoted by cA, is the
matrix obtained by multiplying each element of A by c. In symbols
(cA)ij = caij .

This denition is consistent with the denition of addition, as it justies results of the form
A + A = 2A,
Example 2.4

1.

1
4

2
5

=
=


2.

1 4 5
2 1 6

3A 4A = A


=
=


31 32
34 35


3
6
12 15

(1) 1
(1) (4)
(1) (2)
(1) 1


1
4 5
2 1 6

(1) 5
(1) 6

The operations we have considered so far have analogies with the operations of ordinary arithmetic with numbers. The next operation however has no analogue in ordinary arithmetic. It is an

28

2 Matrices

operation which does not appear to have any obvious application, but we shall use it in the next
chapter when discussing the inverse of a matrix.

Denition 2.8
Let A be a matrix. The transpose of A, denoted by A T , is the matrix obtained by
interchanging the rows and columns of A. Thus
(AT )ij = aji .

There are no size restrictions on the operation of transposing a matrix. If A is m n, then


AT is n m. If A is square, then AT has the same size as A.
Example 2.5


T
1
1 2
=
1.
2
3 4

3
4


.

 T
1
= (1 2).
2

2.

3.

T

1 4
2 5 = 1
4
3 6

2 3
5 6




There are many frequently used properties of the operations used in elementary arithmetic of
real numbers. They are all obvious and are not usually explicitly mentioned, but these properties are
not obvious when we consider other types of entities besides numbers. Indeed, the properties might
not even hold. We have seen one example of this already with the fact that matrix multiplication
is not commutative.
In ordinary arithmetic with numbers we have two main operations, addition and multiplication.
There are also inverses to both of these operations. The principal properties of the two operations
are as follows.
1. Both addition and multiplication are commutative, that is
a + b = b + a,

ab = ba.

2. Both addition and multiplication are associative, that is


(a + b) + c = a + (b + c),

(ab)c = a(bc).

3. Multiplication is distributive over addition, that is


a(b + c) = ab + ac.

2.4

The Inverse of a Matrix

29

For matrix addition and multiplication, there are size restrictions which result in the operations
being not always dened. In many cases however, we apply these operations to square matrices of
the same size and then there are no problems with sizes. If we consider only matrices of appropriate
sizes, then the only one of the above properties which does not hold is that multiplication is not
commutative. Thus for matrices of appropriate sizes, we have
A + B = B + A,

(A + B) + C = A + (B + C),

(AB)C = A(BC),

A(B + C) = AB + AC,

but not AB = BA. Powers of a matrix can be dened in the obvious way as
A2 = A A,

A3 = A2 A,

and so on.

The fact that matrix multiplication is not commutative implies that care is needed in applying
familiar algebraic results to matrices. Two particular examples are that (AB) 2 may not equal A2 B 2
and (A + B)2 may not be equal to A2 + 2AB + B 2 . What is true in the rst case is that by the
associative rule for matrix multiplication,
(AB)2 = (AB)(AB) = A(BA)B,
but it will usually not be correct to replace BA by AB in order to get (AA)(BB). In the second
case, we can use the distributive rule to obtain
(A + B)2 = (A + B)(A + B) = A2 + AB + BA + B 2
and again we cannot usually replace BA by AB to obtain 2AB.
The inverse operation to addition is subtraction and this works in a straightforward manner
for matrices. Matrix multiplication however is a complicated operation and we have to be careful
with its inverse. This is the topic of the next section.

2.4 The Inverse of a Matrix.


In the arithmetic of real numbers, the inverse of the operation of multiplication is the operation
of division, but division is not a useful concept for matrices because matrix multiplication is not
commutative. There is however another way of thinking about division of numbers. If we take a
nonzero number b, then it has a reciprocal or inverse b 1 . The quotient a/b can then be regarded
as b1 a. It could equally well be regarded as ab 1 . For real numbers, it does not matter which is
used because multiplication is commutative, but this is not the case for matrices. Thus for matrices
we cannot meaningfully dene A/B and we must use the terminology of inverses.
At the beginning of this chapter we wrote out the solution of an equation in one unknown
using the terminology of inverses, rather than division. We must now consider this in more detail
and then apply it to the matrix equation
AX = B.
The inverse of a nonzero number a is the number a 1 with the property that
a1 a = 1.

30

2 Matrices

To apply this same idea to matrices, the rst thing we require is a matrix analogue of the number
1. This number has one very simple property, namely 1 x = x, for every number x. This is another
property which is so simple that it is not usually mentioned in ordinary arithmetic. In the case
of matrices however, there cannot be a matrix, I, which has the property that IA = A, for every
matrix A. This is prevented by the size restrictions on matrix multiplication. A more modest
requirement is that IA = A, for all matrices of a given size.
There is however still a problem because of the fact that matrix multiplication is noncommutative. Should we require IA = A or AI = A or both? The solution is to require both.

Denition 2.9
The n n matrix In with the property that
AIn = In A = A,
for every n n matrix A, is called the n n unit matrix.

From our discussion so far, there is no guarantee that such matrices exist. However some
experimentation quickly shows that a square matrix in which every element on the main diagonal
is 1 and all other elements are 0 has the required property. It can also be shown that these matrices
are the only ones with the property. If the size of the unit matrix is obvious from the context, then
the unit matrix is denoted simply by I.
Example 2.6


Let
A=

1 2
3 4


.

Then a simple calculation gives




1 0
0 1



1
I2 A =
3


1 2
1
AI2 =
3 4
0

2
4
0
1


=


=

1 2
3 4
1 2
3 4


= A,

= A.

A non-square matrix can be multiplied by a unit matrix provided the size restrictions are
satised, and the matrix will remain unchanged. However the multiplication can be carried out in
only one direction.
Example 2.7


Let
A=

1 3
2 1

4
2


.

2.4

The Inverse of a Matrix

31


Then
I2 A =

1 0
0 1



1 3 4
2 1 2


=

1 3 4
2 1 2


= A,

but AI2 is undened. However



AI3 =

1 3 4
2 1 2



1 0 0
1
3
4
0 1 0 =
= A,
2 1 2
0 0 1


but of course I3 A is undened.

We shall be particularly concerned with the situation where the n n unit matrix multiplies
an n 1 column vector. The result is simply the vector. Thus if n = 2,

   
1 0
x
x
=
= X.
I2 X =
0 1
y
y
Having dened unit matrices, we can consider the inverse of a square matrix. The denition
is straightforward, given our earlier discussion.

Denition 2.10
Let A be a square matrix. The matrix A1 is the inverse of A if
A1 A = AA1 = I,
where I is the unit matrix of the same size as A.

We restrict the discussion here to square matrices. For nonsquare matrices, we would have
to dene left and right inverses and these would not be equal. Indeed, they would have dierent
sizes. There is no assurance that any given square matrix has an inverse, but it can be shown that
a square matrix can have at most one inverse. We shall nd in fact that many matrices do not
have inverses and that the possession of an inverse by a square matrix A is related to whether or
not the equations AX = B have a unique solution.
The basic problem facing us is to nd the inverse of a given square matrix. Before we tackle
this problem however, let us consider the method for solving a set of equations, assuming that the
inverse is known. The equations are written as AX = B. We multiply both sides by A 1 and then
the solution is obtained as follows.
AX = B
A1 AX = A1 B
IX = A1 B
X = A1 B
Once the inverse of the matrix is known, then the solution can be obtained by a single matrix
multiplication.

32

2 Matrices

Example 2.8
Consider the set of equations
2x y = 2,
5x + 3y = 1.
These equations can be written in matrix form as AX = B, where


 


2 1
x
2
A=
, X=
and B =
.
5
3
y
1
We shall later show that the inverse of A is
1


=

3
5

1
2


.

It is easily checked that AA1 = A1 A = I. Given this result, the solution of the set of equations
is


  
3 1
2
5
X = A1 B =
=
,
5 2
1
8


that is x = 5 and y = 8.

The only remaining problem is to nd a method for calculating the inverse matrix. There
are several methods for doing this, but the one we consider here is based on Gaussian reduction.
Consider the case of a 2 2 matrix


a11 a12
A=
.
a21 a22

z12
.
A =
z22
 
1
Suppose we solve the equation AX = B, where B =
. The result is
0
  


1
z11
z11 z12
1
X=A B=
=
.
0
z21 z22
z21

Let the inverse matrix be

z11
z21

 
0
The solution is thus the rst column of the inverse matrix. If we solve the equation with B =
,
1
then we obtain the second column of the inverse matrix. In both cases the steps in the reduction
are determined by the elements of the matrix A and so the steps are the same. Because of this we
can carry out both reductions at once by beginning with the array
a11
a21

a12
a22

1
0

0
1

z11
z21

z12
z22

and reducing it to an array of the form


1 0
0 1

2.4

The Inverse of a Matrix

33

The array to the right of the vertical line gives the elements of the matrix A 1 .
Example 2.9


Let
A=

2 1
5
3


.

The array to be reduced is obtained by writing the elements of A to the left of the vertical line and
the elements of the 2 2 unit matrix to the right of the line. We then reduce the array to obtain
the elements of the unit matrix to the left of the line. The reduction is as follows.
2 1
5
3

1
0

0
1

6 3
5
3

3
0

0
1

(R1 3R1)

1
5

0
3

3
0

1
1

(R1 R1 + R2)

1
0

0
3

3
15

1
6

(R2 R2 + 5R1)

1
0

0
1

3
5

1
2

(R3 R3/3)

From this array we see that the inverse matrix is given by




3 1
1
.
A =
5 2
This result can be easily checked. Matrix multiplication shows that
A1 A = AA1 = I.

A similar procedure is followed for a square matrix of any size. We shall consider one further
example, in this case a 3 3 matrix.
Example 2.10
Consider the 3 3 matrix A dened by

A= 2
3

1
3.
4

2
1
2

The initial array is obtained by writing the array of elements of A to the left of the line and the
3 3 unit array to the right of the line.
1
2
3

2
1
2

1
3
4

1
0
0

0
1
0

0
0
1

34

2 Matrices

The reduction can be carried out as follows. In this hand calculation, we have sought to avoid
fractions where possible.
1
0
0

2
3
8

1
1
1

1
2
3

0
1
0

0
0
1

1
0
0

2
9
8

1
3
1

1
6
3

0
3
0

0
0
1

1
0
0

2
1
8

1
2
1

1
3
3

0
3
0

0
1
1

1
0
0

0
1
0

5
2
15

5
3
21

6
3
24

2
1
9

1
0
0

0
1
0

5
2
1

5
3
7/5

6
3
8/5

2
1
3/5

1
0
0

0
1
0

0
0
1

2
1/5
7/5

2
1
1/5
1/5
8/5 3/5

(R2 R2 2R1)
(R3 R3 3R1)

(R2 3R2)

(R2 R2 R3)

(R1 R1 + 2R2)
(R3 R3 8R2)

(R3 (1/15)R3)
(R1 R1 5R3)
(R2 R2 2R3)

The inverse matrix can be read from this array as

A1

2
= 1/5
7/5

2
1/5
8/5

1
10
1/5 = 1/5 1
3/5
7

Again this result is easily checked by matrix multiplication.

10 5
1 1 .
8
3


Calculations such as this are extremely tedious to carry out by hand for matrices any larger
than that considered here, but the calculations are ideally suited to a computer. Mathematica has
a command Inverse which performs the required calculation. For the above example, Mathematica
gives the result in the rst form without taking the fraction outside the matrix.

2.5 Singular Matrices.


In the arithmetic of real numbers, every number except zero has an inverse. The number 0
has the property that a + 0 = a and it is not dicult to nd a matrix analogue of this property.

2.5

Singular Matrices

35

Denition 2.11
The m n matrix with the property that
A + = + A = A,
for every m n matrix A, is called the m n zero matrix. The n n zero matrix is
denoted by n .

A little experimentation shows that is the m n matrix with all elements zero. The number
0 has no inverse and we would expect the zero matrix n to similarly have no inverse. It is easily
shown that this is indeed the case. Since
A n = n A = n ,
for every n n matrix A, so it is not possible to have a matrix B with the property
B n = n B = In .
Thus n has no inverse. The new feature of matrix arithmetic however is that there are non-zero
n n matrices which do not have an inverse. Consider the matrix


1 2
A=
.
1
2
Suppose that A has an inverse B. Then A B = I 2 and so
 
 


 
b11 2b21
1 0
b12 2b22
1 2
b11 b12
=
=
.

0 1
1
2
b21 b22
b11 + 2b21 b12 + 2b22
Using the denition of equality of matrices this requires
b11 2b21 = 1,
b11 + 2b21 = 0.
This is impossible and so we have found a non-zero matrix which has no inverse.
The question of whether or not a square matrix has an inverse is closely related to whether or
not the set of simultaneous equations AX = B has a unique solution. If the matrix has an inverse,
then the equations have the unique solution X = A 1 B. The statement
if the matrix has an inverse then the equations have a unique solution
is the same as the statement
if the equations do not have a unique solution, then the matrix does not have an inverse.
In Chapter 1, we found that the equations do not have a unique solution if a row of zeros
appears to the left of the line in a Gaussian reduction. When this happens the equations either
have no solution because they are inconsistent or they have an innite number of solutions because
they are redundant. Thus we can determine whether a matrix has an inverse by performing a

36

2 Matrices

Gaussian reduction on it. In the previous section we used Gaussian reduction to nd the inverse
of a square matrix and we can now conclude that if a row of zeros appears to the left of the line
during the reduction, then the matrix has no inverse. In technical terms, the matrix is singular.
This gives us a method for determining whether or not a square matrix is singular. In fact we
dont need to have any equations at all. We only need to carry out a Gaussian reduction on the
array of numbers in the matrix itself. We usually carry out the reduction using matrix notation,
but it is important to remember that the reduced matrix is not equal to the original matrix. The
reduced matrix tells us a lot about the original matrix, but the two are not equal. In the situation
here, the reduced matrix tells us whether or not the original matrix is singular.
Example 2.11
Consider the matrix


A=

1 2
1
2


.

The reduction requires only one step, R2 R2 + R1, and the reduced matrix is


1 2
0
0


.


Thus A is singular.
Example 2.12
Consider the matrix

1 2
5
A = 2
3 1 .
5
4
3

The reduction of this matrix is carried out as follows.

1 2
5
0
7 11
(R2 R2 2R1)
0 14 22
(R3 R3 5R1)

1 2
5
0
7 11
0
0
0

(R3 R3 2R2)


Thus the matrix A is singular.

2.6 Problems.
1. Let



2
A=
,
1
 
2
D=
,
2
1
3

2 1
B=
1
4


1 0
I=
,
0 1


,

C = (1

=

0
0

0
0

3),


.

2.6

Problems

37

Find where possible


(i)
(iv)

AB
A+

(ii)
(v)

A+B
C

(iii) AI
(vi) BC

BT C
2A B T
B DC

(ix) BA
(xii) IC
(xv) D

(vii) 2
(x) CI
(xiii) C

(viii)
(xi)
(xiv)

(xvi)
(xix)

(xvii) (IC)
(xx) A3

A(DC)
A2

2. Let

A= 4
1

3 2
1 1 ,
3
2

1 1
D=
2 3

(xviii)

AT + CD

1
0

B = 1
3
2 1

4
,
I = I3 ,
1

2
1,
2

C= 3
4


1
2
E=
.
2 1

1
1 ,
2

Find where possible


(i)

AB

(ii)

BC

(iv) IE
(vii) A 2I
(x) CD

(v) C 2D
(viii) IB
(xi) DC

(xiii)
(xvi)

CE
ED

(xiv)
(xvii)

(xix)

(CE)D

CT B
DE

(iii)

AB

(vi)
(ix)
(xii)

AD
ID
AT B

(xv) DA
(xviii) C + D T

(xx) D T C T

1 1
2
2 2
1
1
1 1
A = 2
1
3,
B = 1
1 1 ,
C = 2 2
3.
1 2
1
3
1
2
3
1
2
Verify each of the following ten properties for these matrices. The rst six equalities can be
shown to be true for any square matrices of the same size. The nal two results are inequlities
which hold for these particular matrices. In cases where the matrices commute however, these
results become equalities.
(i) A(BC) = (AB)C

3. Let

(ii)

A + (B + C) = (A + B) + C

(iii)

(A + B)T = AT + B T

(iv)

A(B + C) = AB + AC

(v) (AB)T = B T AT
(vi)

(AT )2 = (A2 )T

(vii)

(A + B)2 = A2 + 2AB + B 2

(viii)

(AB)2 = A2 B 2

38

2 Matrices

4. Determine whether each of the following matrices is singular or nonsingular. For those which
are nonsingular, nd the inverse and check whether the inverse is correct by multitplying it by
the original matrix.

(i)

(iv)

3 1
5
4

1
2
4

1
2
4
3 2 1
1
3
5

3 1
5 3
3 2

(ii)

(iii)

3
1
7
4
1 4

1
2
1

(v)

(vi)

1 1
4
2
7 6
1
9
8

3
5 1
1
1 3 2
2

2
4
1
7
1
9
0
8

5. Let A be a nonsingular square matrix and let c be a nonzero number. Show that
(i)
(iii)

(A1 )1 = A,

(ii)

(A2 )1 = (A1 )2 ,

(cA)1 = 1c A1 ,

(iv)

(AT )1 = (A1 )T .

6. Use the method of inverse matrices, where possible, to solve each of the systems in Problem 1
of Chapter 1.
7. Find the inverse of the matrix

1 4
1 1
2 3

1
1
1

and use it to solve each of the following sets of simultaneous linear equations.
(i)

(iii)

x + 4y + z = 3
x+ y+z =6
2x + 3y + z = 6

(ii)

x + 4y + z = 7
x+ y+z =1
2x + 3y + z = 6

2x + 3y + z = 1
x + y + z = 2
x + 4y + z = 4

(iv) 4x + y + z = 12
x+y+ z = 6
3x + y + 2z = 13

(v) x + y + z = 1
x + 2y + 3z = 3
x + y + 4z = 2
8. Find the inverse of the matrix

2
4 1
1
3 2
3 2 4

and use it to solve each of the following sets of simultaneous linear equations.

2.6

Problems

39

(i)

2x + 4y z = 9
x + 3y 2z = 10
3x 2y 4z = 6

(ii)

2x + 4y z = 3
x + 3y 2z = 3
3x 2y 4z = 1

(iii)

3x 2y 4z = 16
2x + 4y z = 3
x + 3y 2z = 7

(iv)

4x 2y 3z = 8
2z + 3y + z = 13
x + 4y + 2z = 12

(v)

3x 2y + z = 15
4x y + 2z = 16
2x 4y 3z = 3

CHAPTER 3

Determinants

3.1 Denitions.
In the previous chapter, we used Gaussian reduction to determine whether or not a matrix
is singular, but there is another method which can be used for such calculations, and one which
introduces an important new property of matrices. To introduce the method, we return to the
solution of sets of simultaneous equations. In solving such sets of equations there are patterns in
the solutions which we have not yet examined. In order to see these patterns we shall use double
subscript notation for the equations. Consider rst two equations in two unknowns,
a11 x + a12 y = b1 ,
a21 x + a22 y = b2 .
To solve for x, we multiply the rst equation by a 22 , the second by a12 and subtract to obtain
(a11 a22 a21 a12 ) x = b1 a22 b2 a12 .

(1)

There is a dierent way to look at this calculation, and it is one we shall use later for three
equations in three unknowns. In this approach, we use the second equation to express y in terms
of x.
a22 y = b2 a21 x.
We multiply the rst equation by a22 and then substitute for a22 y. The result is
a11 a22 x + a12 (b2 a21 x) = b1 a22 ,
from which we obtain
(a11 a22 a21 a12 ) x = b1 a22 b2 a12 ,
as before. Similarly we can eliminate x to obtain
(a11 a22 a21 a12 ) y = a11 b2 a21 b1 .

(2)

The original equations will have a unique solution provided


a11 a22 a21 a12 = 0.
If we write the equations in the form AX = B, then the expression a 11 a22 a21 a12 is constructed
from the elements of the matrix A, and it is this expression which determines whether the equations
have a unique solution and hence, whether the matrix is singular.

3.1

Denitions

41

Denition 3.1
Let A be a 2 2 matrix. The number
a11 a22 a21 a12
is called the determinant of the matrix and is written as det A or as

a11

a21


a12
.
a22

Using this terminology, the 2 2 matrix A is singular if det A = 0. This criterion is very
easy to apply. We compute the determinant by multiplying the elements on the main diagonal and
subtracting the product of the elements on the other diagonal.
The problem now is to extend this criterion to larger matrices. It is by no means obvious
how to do this. We shall begin with the 3 3 case and shall again consider a set of simultaneous
equations written in subscript notation.
a11 x + a12 y + a13 z = b1

(3)

a21 x + a22 y + a23 z = b2


a31 x + a32 y + a33 z = b3

(4)
(5)

We could simply solve these by elimination and look for patterns in the solutions, but this generates
a very large amount of algebra and the pattern is dicult to extract. Instead, we shall use a method
similar to that used above for two equations in two unknowns. Some properties of 22 determinants
will also be required. We shall later extend these properties to determinants of any size, but they
are easy to check in the 2 2 case.
1. If each element in one column of the determinant is a sum, then the determinant is a sum of
determinants. Thus




a11 + b1 a12 a11 a12 b1 a12
=
+


a21 + b2 a22 a21 a22 b2 a22 .
2. If each element in one column of a determinant is multiplied by a constant, then the determinant is multiplied by the constant. Thus




ka11 a12



= k a11 a12 .
ka21 a22
a21 a22
3. If the columns of a determinant are interchanged, then the determinant changes sign. Thus






a12 a11
= a11 a12 .

a21 a22
a22 a21
We return to the three equations in three unknowns. If we assume x is known, we can use
equations (4) and (5) to solve for y and z in terms of x. The equations to be solved are
a22 y + a23 z = b2 a21 x,
a32 y + a33 z = b3 a31 x.

42

Determinants

These can be solved by elimination as a set of two equations in two unknowns. Using our earlier
results together with the three properties of 2 2 determinants, we obtain

a22

a32

a22

a32





b2 a21 x a23 b2 a23 a21
a23




y=
=
a33
b3 a31 x a33 b3 a33 a31




a22 b2 a21 x a22 b2 a21
a23
+
=
z =
a32 b3 a31 x a32 b3 a31
a33


a23
x,
a33

a22
x.
a32

Notice that in the last line, we have interchanged the columns in the nal determinant and so the
sign has changed. If the equations are written in matrix form as AX = B, then the elements in
each determinant multiplying x, y or z in these expressions, are in the same column order as they
are in A. To achieve this requires the interchange of columns in the nal determinant.
To use these results, we return to equation (3) and multiply it by


a22 a23


a32 a33 .
The above two results can then be substituted into the resulting equation to obtain







a22 a23
a21 a23
a21 a22
a12



a11
a31 a33 + a13 a31 a32 x
a32 a33






a22 a23
b2 a23
b2 a22





.
= b1
a12
+ a13
a32 a33
b3 a33
b3 a32
Similar equations can be obtained for y and z. These results are the three dimensional analogue of
equations (1) and (2). The results are much more complicated than before, but the interpretation
is the same. The equations will have a unique solution provided






a22 a23
a21 a23
a21 a22





= 0.
a11
a12
+ a13
a32 a33
a31 a33
a31 a32
This expression provides

a11

det A = a21
a31

us with the denition of the determinant of a 3 3 matrix, A.









a12 a13
a22 a23
a21 a23
a21 a22







a22 a23 = a11
a12 a31 a33 + a13 a31 a32 .
a
a
32
33
a32 a33

The pattern in this denition can be described as follows. To calculate the determinant, we
work our way across the top row. We multiply each element by the 2 2 determinant obtained
by crossing out the row and column containing that element. Notice that this requires that the
columns in each 2 2 determinant maintain the same order of elements as the columns of A. We
then add the results together, but with alternating signs for the terms. For example, for the element
a12 we cross out the rst row and the second column of the determinant to obtain the required 2 2
determinant. The element a12 is multiplied by the value of this determinant with an appropriate
sign attached. In this case the sign is negative.

a11

a21

a31

a12
a22
a32


a13
a23
a33

3.1

Denitions

43

Example 3.1
Consider the matrix A given by

1 2
5
A = 2
3 1 .
5
4
3
The determinant is calculated as follows.


1 2
5

det A = 2
3 1
5
4
3





3 1
2 1
2




= 1
(2)
+ 5


4
3
5
3
5


3
4

= 1 13 + 2 11 + 5 (7)
=0
In Example 2.11, we showed that the matrix A is singular. We shall later show that a matrix is
singular precisely when its determinant is zero and so the present result veries the earlier one.
The calculations in the two cases are however quite dierent in appearance.

It is important to remember the alternating signs in the expansion of the determinant. It is
also important to remember that a determinant is a number and not an array of numbers. To
describe the method for calculating a determinant, the following terminology is used.

Denition 3.2
Let A be an n n matrix and let aij be an element of A. The cofactor of aij , denoted
by Aij , is the (n 1) (n 1) determinant obtained by
1. crossing out the ith row and jth column of det A, and
2. multiplying the resulting determinant by (1) i+j .

There is some ambiguity in the notation, as there are situations where the elements of the
matrix A are denoted by Aij . The context however, always makes clear which meaning is intended.
Using this notation, the denition of the determinant of a 3 3 matrix A can be written as
det A = a11 A11 + a12 A12 + a13 A13 .
This same pattern of denition of the determinant in terms of cofactors persists for square matrices
of any size. Just as in the 3 3 case, the required denition can be found by analysing the solutions
of sets of simultaneous linear equations.

44

Determinants

Denition 3.3
Let A be an n n matrix. The number
a11 A11 + + a1n A1n ,
where Aij is the cofactor of aij , is called the determinant of A and is denoted by det A,
or by


a11 a1n
.
..
.
. .
.


an1 ann

This denition tells us how to calculate a determinant in terms of progressively smaller determinants. Thus a 4 4 determinant requires the calculation of four 3 3 determinants, each of
which requires the calculation of three 2 2 determinants.
Example 3.2

2 1
1

3
1
2

4
0
2


1 1 2



4
3
1


5
2
= 1 0
3
1 2

1



2
5
1


3 2 4
2
1 2
1


2
3


+ (1) 4
0
1 1


5
3
1




2
5
3
1

3 4 4
0
2


1
1 1 2

= 1 31 2 (35) + (1) (17) 4 30


= 2

As the matrices become larger, such calculations become very tedious to perform by hand,
but they are ideally suited to computers. Procedures for calculating determinants are contained
in all computer algebra systems. For example, the Mathematica command for the calculation of
a determinant is Det. There are however various properties of determinants which enable the
calculations to be simplied. For a large determinant such simplications are important even in
computer calculations. We shall consider these properties in the next section.

3.2 Properties of Determinants.


Determinants have a large number of properties which are used for calculating their values
and for establishing results about square matrices. In this section we shall consider some of these
properties and their application to the calculation of the value of a determinant. In the next section
we shall consider their application to another method for calculating the inverse of a matrix.

3.2

Properties of Determinants

45

We shall not seek to prove the properties but only to make them plausible with examples.
Some of the properties are trivial for 2 2 determinants, and so we shall use 3 3 determinants as
examples.
Property 1. A determinant may be calculated by using a cofactor expansion along any row or
column.
For a 3 3 determinant we can check particular cases of this result by carrying out the
expansion and checking that all the same terms appear. Thus, expanding along the rst row using
the denition,


a11 a12 a13


a21 a22 a23 = a11 A11 + a12 A12 + a13 A13


a31 a32 a33
= a11 a22 a33 a11 a32 a23 a12 a21 a33 + a12 a31 a23 + a13 a21 a32 a13 a31 a22 ,
while expanding down the third column gives


a11 a12 a13


a21 a22 a23 = a13 A13 + a23 A23 + a33 A33


a31 a32 a33
= a13 a21 a32 a13 a31 a22 a23 a11 a32 + a23 a31 a12 + a33 a11 a22 a33 a21 a12 .
It is easy to check that the same terms appear in both expansions.
One use for this result is that in calculating a determinant containing zeros, we can expand
along the row or column containing the most zeros.
Example 3.3
Consider the determinant



4
0
3
1


2
2
4
1

.
0
5
3 1


4
2
0
1
The obvious way to calculate this determinant is to expand down the third column. Since this
column contains only one nonzero element, we shall only have one nonzero term in the expansion.




4
0
3
1
1
4
3



2
2
4
1
5 + 0 + 0 = 174


= 0 + 2(1)2+3 3 1
0
5
3 1

4
2
1


4
2
0
1
Property 2. The result of a cofactor expansion using the elements of one row and the cofactors
of a dierent row is always zero. The same result also holds for columns.
As an example, suppose we expand a general 3 3 determinant using the elements of the
second row and the cofactors of the third row. The result is


a11 a12 a13


a21 a22 a23 = a21 A31 + a22 A32 + a23 A33


a31 a32 a33
= a21 a12 a23 a21 a22 a13 a22 a11 a23 + a22 a21 a13 + a23 a11 a22 a23 a21 a12
= 0.

46

Determinants

Example 3.4
Consider the determinant


1

4

3


2 1
2
1 .
5 2

The cofactor expansion using elements of the third column and cofactors of the second column is
as follows.
a13 A12 + a23 A22 + a33 A32











1
1+2 4
2+2 1 1
3+2 1 1
+
1(1)
+
(2)(1)
= (1)(1)
3 2
3 2
4
1
= 11 + 1 + 10
=0
The next three properties are the general forms of the properties of 2 2 determinants introduced in Section 1.
Property 3. If each element in one row or one column of a determinant is a sum of numbers, then
the determinant may be written as a sum of determinants.
As a particular example, suppose that each element of the third row of a 3 3 determinant is
a sum. Then using a cofactor expansion along the third row we have



a11
a
a
12
13


a21
a22
a23 = (a + b)A31 + (c + d)A32 + (e + f )A33

a + b c + d e + f
= (aA31 + cA32 + eA33 ) + (bA31 + dA32 + f A33 )



a11 a12 a13 a11 a12 a13



= a21 a22 a23 + a21 a22 a23 .
a
c
e b
d
f
Property 4. If each element of one row or one column of a determinant is multiplied by a constant
then the value of the determinant is multiplied by that constant.
As a particular example, suppose that each element of the second column of the general 3 3
determinant is multiplied by k. Then using a cofactor expansion down the second column gives

a11

a21

a31

ka12
ka22
ka32


a13
a23 = (ka12 )A12 + (ka22 )A22 + (ka32 )A32
a33
= k(a12 A12 + a22 A22 + a32 A32 )


a11 a12 a13


= k a21 a22 a23 .
a31 a32 a33

3.2

Properties of Determinants

47

Property 5. If two rows or two columns of a determinant are interchanged, then the value of the
determinant changes sign.
Suppose we interchange the rst and third rows of the general 3 3 determinant.Then using
the properties of 2 2 determinants we have

a31

a21

a11

a32
a22
a12








a33
a22 a23
a21 a23
a21 a22







a32
+ a33
a23 = a31
a12 a13
a11 a13
a11 a12

a13







a22 a23
a21 a23
a21 a22






+ a32
a33
= a31
a12 a13
a11 a13
a11 a12







a12 a13
a11 a13
a11 a12






a32
+ a33
= a31
a22 a23
a21 a23
a21 a22


a11 a12 a13


= a21 a22 a23 .
a31 a32 a33

Property 6. If two rows or two columns of a determinant are the same, then the value of the
determinant is zero.
This is an immediate consequence of the previous property. Suppose we have a determinant
whose value is V and which has two rows equal. If we interchange the two equal rows then the new
determinant has value V . But the new determinant is the same as the old and so its value is V .
Thus
V = V
and this can only be true if V = 0.
Property 7. Adding a multiple of one row to another row leaves the value of the determinant
unchanged. A similar property holds for columns.
This is a consequence of properties 3, 4 and 6. For example, suppose we add k times row 3 to
row 2 in the general 3 3 determinant.


a11

a21 + ka31


a31

a12
a22 + ka32
a32


a11
a13

a23 + ka33 = a21
a31
a33

a11

= a21
a31

a12
a22
a32
a12
a22
a32



a11
a13


a23 + k a31
a31
a33

a13
a23
a33

a12
a32
a32


a13
a33
a33

These properties provide a method for simplifying the calculation of the value of a determinant.
We rst use row and column operations to produce a row or column containing a large number

48

Determinants

of zeros. We then calculate the determinant by using a cofactor expansion along that row or
column. The procedure is similar to Gaussian reduction on an array of numbers, but we have more
exibility with the calculation of a determinant because we can use column operations as well as
row operations. As an example, consider the determinant


2 3 3


4 1 6 .


1 2 3
which we shall denote by D. The rst column contains a 1 in the (3, 1) position and this can be
used to reduce the other elements in the rst column to zero.


0 1 3


D = 0 7 6
1
2
3

(R1 R1 2R3)
(R2 R2 4R3)

We can now expand down the rst column to obtain




1 3

= 15.
D=
7 6
In hand calculations we usually select a row or column containing a 1 and then use this element
to reduce the other elements in the row or column to 0. The cofactor expansion along that row or
column is then easy to carry out. For a 3 3 determinant, it is often simplest to expand out the
determinant directly, but this is rarely the case for larger determinants.
Example 3.5

2

6

1

2

3
3
2
0

4
2
4
1


2 10

0 2
=
3 9

5
0


3
4 18

3 2
10

2
4 17

0
1
0


10
3 18

= (1)4+3 2
3
10
9
2 17


8

0
28



= 2
3
10
9
2 17


2
0
7

= 4 2
3
10
9
2 17
= 4 (2(71) 7(23))
= 76

(C1 C1 + 2C2)
(C4 C4 5C3)

(expand by R4)

(R1 R1 R2)
(factor from R1)

(expand by R1)


3.3

The inverse of a matrix

49

3.3 The inverse of a matrix.


Determinants and their properties provide another method for computing the inverse of a
matrix. This method provides useful information about the inverse of a matrix, but it is not as
numerically ecient as the method of Gaussian reduction. The key is provided by properties 1
and 2 in the previous section. To use these properties we form a matrix out of cofactors and then
multiply it by the original matrix.

Denition 3.4
Let A be a square matrix. The cofactor matrix of A, denoted by A c , is the matrix
formed by replacing each element of A by its cofactor.

Example 3.6

Let

1
A = 2
3
Then
A11
A21


1

= (1)
2


2+1 2
= (1)
2
1+1

1
3.
4

2
1
2


3
= 10,
4

1
= 10,
4

and so on. The cofactor matrix is found to be

10
1
7
Ac = 10
1 8 .
5 1
3

In order for the matrix product Ac A to contain cofactor expansions we must interchange the
rows and columns of Ac .

Denition 3.5
Let A be a square matrix. The transpose of the cofactor matrix of A is called the
adjoint of A and is denoted by Aa . Thus
Aa = (Ac )T .

50

Determinants

We next calculate the product Aa A, for the case of the general 3 3 matrix. Using properties
1 and 2 from the previous section we nd that

A11 A21 A31


a11
Aa A = A12 A22 A32 a21
A13 A23 A33
a31

det A
0
0
= 0
det A
0
0
0
det A

a12
a22
a32

a13
a23
a33

= (det A)I3 .
The det A terms on the main diagonal of the product come from cofactor expansions down the
columns of A, while the remaining terms all yield a cofactor expansion using the elements of one
column and the cofactors of another. These terms are thus all zero. We can write the result of this
calculation as
1
Aa
A1 =
det A
and so we have obtained an explicit expression for the inverse of a square matrix. This expression
can be used to establish properties of the inverse of a matrix. For example a matrix will have an
inverse if and only if its determinant is zero. As a computational technique however, this method
for calculating the inverse of a matrix is very inecient. To calculate the inverse of a 3 3 matrix
for example requires the calculation of nine 2 2 determinants. The method of Gaussian reduction
is much more ecient.
Example 3.7

Let

A= 2
3

2
1
2

1
3.
4

The cofactor matrix was calculated in the previous example. The adjoint matrix is given by

10 10 5
1 1 .
Aa = 1
7 8
3
The determinant of A must also be calculated.

1 2

det A = 2 1
3
2
The inverse is thus
A1


1
3 = 5.
4

10 10 5
1
1
Aa = 1
1 1 .
=
det A
5
7 8
3

This result was previously obtained by Gaussian reduction in Example 2.9.

3.4

Cramers Rule

51

3.4 Cramers Rule.


We introduced the denition of a determinant by considering the solutions of sets of simultaneous linear equations. In Section 3.1, we began with the set of equations
a11 x + a12 y = b1 ,
a21 x + a22 y = b2
and we obtained the result
(a11 a22 a21 a12 ) x = b1 a22 b2 a12 .

a11

a21

This can be written as



b1
a12

x
=
b2
a22


a12
a22

and so, if a11 a22 a21 a12 = 0,

A similar calculation yields


b
x = 1
b2


a12 a11
a22 a21


a12
.
a22


a
y = 11
a21


b1 a11
b2 a21


a12
.
a22

There is a denite pattern here. Each unknown is a quotient of two determinants. The
denominator in each case is det A, while the numerator is det A with one column replaced by the
column of constants from the right hand side of the equations. For the rst unknown we replace
the rst column and for the second unknown we replace the second column. This same pattern
carries over into higher dimensions. For three equations in three unknowns,
a11 x + a12 y + a13 z = b1
a21 x + a22 y + a23 z = b2
a31 x + a32 y + a33 z = b3 ,
we showed in Section 3.1 that







a22 a23
a21 a23
a21 a22
a12



a11
a31 a33 + a13 a31 a32 x
a32 a33





a22 a23
b2 a23
b




a12
+ a13 2
= b1


a32 a33
b3 a33
b3
This result can be written

which gives


a11

a21

a31

b1

x = b2
b3

a12
a22
a32
a12
a22
a32



b1
a13


a23 x = b2
b3
a33

a11
a13


a21
a23


a31
a33

a12
a22
a32
a12
a22
a32


a13
a23
a33

a13
a23 .
a33


a22
.
a32

52

Determinants

Similar results are obtained for y and z.


This form of the solution, with each unknown written as a quotient of determinants, is called
Cramers Rule. It provides a convenient formula for the solutions, and this formula is often used to
establish properties of the solutions. It is not however an ecient method for actually calculating
the solutions. For this purpose, Gaussian reduction is preferable, as fewer arithmetic operations
are required to obtain the solutions. In this sense Cramers rule is similar to the cofactor matrix
method for nding the inverse of a matrix. In each case we obtain a formula for the required
quantity, but the formula is not an ecient computational tool.
In evaluating the determinants required for the calculations using Cramers Rule, all of the
methods discussed earlier for the simplication of determinants may be used. These methods are
essentially Gaussian reductions, and, if we are to use these methods, then the equations should
have been solved by Gaussian reduction to begin with. We conclude with an example of the use of
Cramers rule.
Example 3.8
Solve the set of equations
2x y + 3z = 1,
x 2y + z = 2,
3x + 2y + 4z = 3.
If the equations are written in matrix form as AX = B, then


2 1
3

det A = 1 2
1 = 5.
3
2
4
By Cramers rule we then have


1 1
3

45
1
1 =
= 9,
x = 2 2
5
5
3
2
4


2 1
3
1
2
y = 1
2
1 = ,
5
5
3
3
4


2 1 1


31
1
2 = .
z = 1 2
5
5
3
2
3

3.5 Problems.
1. Calculate each of the following determinants using the basic denitions of 2 2 and 3 3
determinants.

(i)


3

1


2
4

(ii)



1

3


2 1

(iii)



1 5


3 4

3.5

Problems

(iv)

53


1 4

2 1

3 2


8
5
4


4 1

(v) 3
2
1 2


2
4
3

(vi)


5

7

9


2
2
3
2
6 5

2. Calculate each of the following determinants by a cofactor expansion along a row and then by
a cofactor expansion down a column.

(i)


3

2

5


1
3
5 3
4 1

(ii)


3

1

4


1 2
4 5
0
1

(iii)


1
2

1 2

0
3

2
0

0
4
1
3


3

5

4

1

3. Calculate each of the following determinants using row and column operations to rst simplify
the determinant.

(i)

(iv)

(vii)


1

2

1

1

2

3

1

1

2

1

3

2


2
5
1
3
1 2
1
2
1
2
1
8
1 17
2
4
2
4
3

(ii)

3

3

5

1


2
5

3 1

2
1


7
2
3


4
1
2

1
1 2
(v)
2 1
1

3 1 2

(iii)

2

3

2

1

(vi)


2
3

2 1

4
2


2
3
1


9

18

30

24

17
33
54
46

13
28
40
37


4

8

13

11


1
3
2

3 2
1

3
3 1

2
5
2

1 1
1

4. Determine whether each of the matrices in Problem 4 of Chapter 2 is singular or nonsingular


by calculating a suitable determinant.
5. Use Cramers rule to solve, where possible, each of the sets of equations in Problem 1 of
Chapter1.
5. Use the cofactor matrix method to check several elements in the inverse of each of the nonsingular matrices in Problem 4 of Chapter 2. Except for the rst of these matrices, the cofactor
matrix method for nding the inverse results in a very lengthy calculation.
6. Verify each of the following properties for the matrices of Problem 3 of Chapter 2. The two
equalities can be shown to be true for any square matrices of the same size. The other two
results are inequalities which hold for most matrices, although there are some special cases
where the results become equalities. In the last result, 2 may be replaced by any nonzero
constant.

54

3
(i)

det(AB) = det A det B

(ii)

det AT = det A

(iii)

det(A + B) = det A + det B

(iv)

det(2A) = 2 det A

Determinants

CHAPTER 4

Eigenvalues and Eigenvectors

4.1 Diagonalisation of Matrices.


In Section 2.2, we introduced a problem concerning epidemics. Consider a country in which
an epidemic is raging. Each month 40% of the well become sick, 40% of the sick die and 20% of
the sick recover. Letting the numbers of well, sick and dead after n months be w n , sn and dn
respectively, we showed in Section 2.2 that the numbers in each category after one month are given
by

w1
3/5 1/5 0
w0
s1 = 2/5 2/5 0 . s0 ,
d1
d0
0
2/5 1
where the subscript 0 denotes the initial number. Let the vector of population numbers after n
months be denoted by pn and the transition matrix be denoted by T . Then p 1 = T p0 , where p0 is
the initial state of the population. The same matrix gives the transition to the second month and
so
p2 = T p 1 = T T p 0 = T 2 p0 .
This argument applies to every month and so in general
pn = T n p0 ,
where powers of a matrix have been denoted by normal superscript notation. The multiplication
involved in the calculation of the powers is of course matrix multiplication as dened in Chapter 2.
Suppose our problem is to predict the state of the population after 12 months. This requires the
calculation of the twelfth power of T and this is a lengthy calculation. There are however methods
which enable powers of a matrix to be calculated much more eciently than by simply carrying
out the matrix multiplications. These methods have many other uses in science and engineering
and are an essential part of the elementary theory of matrices.
There are two keys to the method for calculating powers, and the rst is the observation that
the powers of some matrices are easy to calculate.

Denition 4.1
A square matrix A is called diagonal if a ij = 0, whenever i = j, that is, if all elements
not on the main diagonal are zero.

56

4 Eigenvalues and Eigenvectors


Powers of a diagonal matrix are easily found. Consider the matrix

1
0
0
C = 0 1
0.
0
0
2

Then

1
0
C 2 = 0 1
0
0

0
1
0
0 0 1
2
0
0

2
0
1
0 = 0
0
2

and similarly for higher powers. For example



5
1
0
0
0
1
C 5 = 0 (1)5 0 = 0 1
0
0
0
0
25

0
(1)2
0

0
0
22

0
0.
32

The second key is the observation that a matrix transformation of the form
C = B 1 AB
interacts in a simple way with powers. We have
C 2 = B 1 ABB 1 AB = B 1 A2 B,
and in general for any positive integer, n,
C n = B 1 An B.
Now suppose that we could choose B so that C is diagonal. Then powers of C would be easy to
calculate and we could nd powers of A from
BC n B 1 = BB 1 An BB 1 = IAn I = An .
Example 4.1
As an example of the above procedure, let

6
2
A= 1
1
8 2

5
1.
7

In Section 4.4 we shall show that one possible choice for B is

1 1 1
B= 1
0
1.
1
1
2
As yet of course we have no way of nding B. Indeed a large part of our work in this chapter is
concerned with methods for nding such matrices. The inverse of B is calculated in the usual way
as

1
1 1
1 2
B 1 = 3
1
0
1

4.2

Diagonalising a 22 Matrix

57

and a simple calculation then gives

1
0
1

C = B AB = 0 1
0
0

0
0.
2

In fact we shall nd later that the method used to nd B also tells us C and so this last calculation
is unnecessary, except as a check of the working. The calculation of, for example, A 5 is now easy
and requires only two matrix multiplications. The same would hold for any power of A.
A5 = BC 5 B 1

1 1 1
1
0
= 1
0
1 0 1
1
1
2
0
0

36
2 35
= 31
1
31 .
68 2
67

0
1
0 3
32
1

1 1
1 2
0
1


Denition 4.2
Let A be a square matrix, B be a nonsingular square matrix of the same size and
C = B 1 AB.
1. The transformation of A into C is called a similarity transformation. The matrices
A and C are said to be similar.
2. If C is a diagonal matrix, then the similarity transformation of A into C is called the
diagonalisation of A.

The problem which remains is that of nding the matrix B for a given matrix A. In the next
section we shall investigate this problem in the case of 2 2 matrices.

4.2 Diagonalising a 22 Matrix.


Let A be a 2 2 matrix. We shall use double subscript notation for the elements of A,


a11 a12
A=
.
a21 a22
The objective is to nd a nonsingular matrix B such that the matrix
C = B 1 AB
is diagonal. We shall write C in the form
C=

1
0

0
2


.

58

4 Eigenvalues and Eigenvectors

Since the elements of B are unknowns which we shall obtain as solutions of a set of equations, we
shall write B in the form


p q
B=
.
r s
The relation between the matrices B and C may be written as AB = BC, or


 


a11 a12
p q
p q
1 0
=
.
0 2
a21 a22
r s
r s
If we carry out the matrix multiplications and equate elements of the resulting matrices, we obtain
a11 p + a12 r = 1 p,
a21 p + a22 r = 1 r,
a11 q + a12 s = 2 q,
a21 q + a22 s = 2 s.
The unknowns in these equations are p, q, r, s, 1 and 2 . The rst two results can be written in
matrix form as

 
 
a11 a12
p
p
= 1
.
a21 a22
r
r
Similarly the second two results may be written as
 
 

q
q
a11 a12
= 2
.
a21 a22
s
s
If we consider the matrix equation


a11
a21

a12
a22


v = v,

or Av = v, where the unknowns are the vector v and the number , then the elements of the
matrices B and C can be obtained from the solutions of this equation. The values of give the
elements of C and the vectors v give the columns of B.
The equation Av = v can be written as
 
0
Av v = ,
where =
.
0
To solve the equation we try to factorise it as
(A )v = ,
but this does not make sense, as A is a matrix and is a number. To overcome this diculty, we
insert the 2 2 unit matrix, writing the matrix equation as
Av = Iv.
Then we obtain
Av Iv =

4.2

Diagonalising a 22 Matrix

59

and this can be factorised to give


(A I)v = .
Written out in full, this equation is


a11
a21

a12
a22



v1
v2

 
0
=
.
0

If the matrix A I has an inverse, then there is only one solution for v and this is easily
obtained as
v = (A I)1 = .
This zero solution however must be excluded because it would result in a column of zeros in the
matrix B. The determinant of B would then be zero and so B would be singular. We must
accordingly impose the condition that the matrix (A I) is singular, and this enables us to nd
the possible values of . If the matrix (A I) is singular, then its determinant must be zero. This
gives a quadratic equation for .
Example 4.2


Let
A=

1 5
2 4

Then
A I =


=

1 5
2 4


.


1
2

5
4

1 0
0 1


and solving the equation det(A I) = 0, gives




1
5

= 0,
2
4
(1 )(4 ) 10 = 0,
2 5 6 = 0,


= 1, 6.

The values of give the elements of the diagonal matrix C. To nd the elements of B, we
return to the equation (A I)v = , and solve for v, for each value of separately. The resulting
vectors are the columns of B. For each value of , there are an innite number of solutions for v,
because any multiple of a solution is again a solution. This is easily seen. Let v be a solution and
c be a constant. Then
A(cv) = cAv = cv = (cv)

and so

(A I)(cv) = .

For each value of however, only one solution for v needs to be found. The other solutions are
simply multiples of it. The solutions for v provide the columns of the matrix B and the fact that
there are an innite number of solutions for each value of implies that the matrix B is not unique.
There are many possible matrices which can be used to diagonalise the matrix A by a similarity

60

4 Eigenvalues and Eigenvectors

transformation. While we can use any of the vectors v for each , it is essential that the rst
column of B is a vector corresponding to 1 and that the second column is a vector corresponding
to 2 .
Example 4.3
Continuing the previous example, we rst substitute = 1 into the equation (A I)v = .
The result is

   
2 5
v1
0
=
2 5
0
v2
and this gives the single equation
2v1 + 5v2 = 0.
We can choose any value for v2 and then solve for v1 . Let v2 = a. We then obtain
v1 = 25 a


and so
v=

25 a
a

is a solution for any value of a. We only need one solution and a convenient choice is a = 2, which
gives


5
v=
.
2
The same calculation is next performed with = 6. The equation (A I)v = 0 becomes


5
5
2 2



v1
v2

 
0
=
.
0

Again this yields only one equation for v 1 and v2 ,


v1 + v2 = 0.
We can choose any value for v2 and putting v2 = a gives v1 = a. Thus
 
a
v=
a
is a solution for any value of a. Taking a = 1 gives one solution as
 
1
v=
.
1
The matrices B and C can now be constructed. The rst column of B is a solution v for
= 1 and the second column of B is a solution v for = 6. Using the results above gives




5 1
1 0
B=
,
C=
.
2 1
0 6
It is easily checked that C = B 1 AB.

4.3

Solution of Polynomial Equations

61

The fact there there are an innite number of solutions of the equation (A I)v = for v,
results from the fact that the matrix A I is singular. The corresponding set of homogeneous
linear equations must then be redundant and so must have an innite number of solutions. Another
way to see this is that if v is a vector which satises (A I)v = , then any multiple of v will
also satisfy the same equation.
One of the key steps in diagonalising a 2 2 matrix is to solve the quadratic equation
det(A I) = 0. This same equation must be solved to diagonalise a higher dimensional matrix, but in this case, the equation is a higher degree polynomial rather than a simple quadratic.
Accordingly, before extending the method to higher dimensional matrices, we must consider solutions of polynomial equations.

4.3 Solution of Polynomial Equations.


Polynomial equations of degree 2 have the form
ax2 + bx + c = 0,

a = 0,

and are easily solved using the formula


x=

b2 4ac
.
2a

Formulas exist which give the solutions of polynomial equations of degrees 3 and 4, but these
formulas are complicated and are rarely used in hand calculations. For equations of degree greater
than 4, it can be shown that there is no formula which gives the solutions. As a result, approximate
numerical methods are almost always used to solve polynomial equations, except for quadratics and
a few other special cases.
There are two general results about polynomial equations which we shall need. These results
are easy to state, but dicult to prove. We are only interested in polynomial equations with real
coecients, but we shall be interested in both real and complex solutions. The rst result is that a
polynomial equation of degree n has exactly n solutions, some of which may be complex and some
may be repeated. This is an elementary fact for quadratics. For example
x2 3x + 2 = 0

has solutions

x = 1, 2,

x 4x + 4 = 0

has solutions

x = 2, 2,

x2 2x + 5 = 0

has solutions

x = 1 2i.

The second result is that for a polynomial equation with real coecients, the complex solutions
occur in complex conjugate pairs. As a result the number of complex solutions is always even and
so any polynomial equation of odd degree must have at least one real solution. A simple example
of the occurrence of complex conjugate pairs is given by the third of the above quadratics.
The detailed calculations involved in solving a polynomial equation are often lengthy, but are
ideally suited to a computer. Computer algebra systems such as Mathematica have these methods
built in and they enable polynomial equations of quite high degree to be solved with considerable
accuracy. Mathematica uses the exact formulae to solve equations of degree less than 5, provided
the equation is specied with exact coecients. The command is Solve.
Example 4.4

62

4 Eigenvalues and Eigenvectors

1. The solution of the quadratic x2 2x + 4 = 0.


In[1]:= Solvex ^ 2  2x  4  0, x
Out[1]= x  1  I



3 , x  1  I 3 

2. The solution of the cubic x3 + x2 + 2x + 1 = 0.


In[2]:= Solvex ^ 3  x ^ 2  2x  1  0, x

1
5 
2


Out[2]= x      
 
 
3

3  11  3

1 3

69

1
1
 1 3
   11  3 69  ,
3
2


5 1  I 3 
1
1
1

 1 3
 
,
x      1  I 3   11  3 69 
 1 3
3
6
2
3 22 3 11  3 69 

5 1  I 3 
1
1
1

 1 3
x      1  I 3   11  3 69 
 

 1 3
3
6
2
3 22 3 11  3 69 

3. The solution of the quintic x5 + 3x4 + 2x + 1 = 0 cannot be obtained exactly.


In[3]:= Solvex ^ 5  3x ^ 4  2x  1  0, x
Out[3]= x  Root1  2 #1  3 #1 4  #1 5 &, 1, x  Root1  2 #1  3 #1 4  #1 5 &, 2,

x  Root1  2 #1  3 #1 4  #1 5 &, 3, x  Root1  2 #1  3 #1 4  #1 5 &, 4,


x  Root1  2 #1  3 #1 4  #1 5 &, 5


The output in the third case tells us that the equation has ve solutions, each of which is a
solution of the equation. In the output, the left hand side of the equation has been written in
the Mathematica notation for pure functions. This is a variant of the slot notation for functions,
where the variable is written as a dot, (). In Mathematica, the dot is replaced by a hash, #. The
ampersand, &, tells Mathematica that the formula for the function has ended.
Except for quadratics and a few special higher order equations, the exact solutions of polynomials are of little use. An example is provided by the output in the second of the above examples.
Notice however that the number of solutions is the same as the degree of the equation and that the
complex solutions occur in complex conjugate pairs.
To obtain approximate numerical solutions of polynomial equations, we use the command
NSolve in place of Solve. Both of these commands apply only to polynomial equations. As we are
not dealing here with numerical methods, we shall make no attempt to explain what Mathematica
is doing in the following calculations.
Example 4.5
1. The cubic equation x3 + x2 + 2x + 1 = 0.
In[4]:= NSolve x ^ 3  x ^ 2  2x  1  0, x
Out[4]= x  0.56984 , x  0.21508  1.30714 I, x  0.21508  1.30714 I

2. The quintic equation x5 + 3x4 + 2x + 1 = 0.

4.4

Eigenvalues and Eigenvectors of a Matrix

63

In[5]:= NSolve x ^ 5  3x ^ 4  2x  1  0, x


Out[5]= x  2.93433 , x  0.61651  0.15727 I, x  0.61651  0.15727 I,

x  0.583674  0.707933 I, x  0.583674  0.707933 I

The command NSolve can take a third argument specifying the number of signicant gures
to be calculated in the solutions.
Example 4.6
In[6]:= NSolve x ^ 5  3x ^ 4  2x  1  0, x, 20
Out[6]= x  2.934328932923543936 ,

x
x
x
x






0.616509536818530000
0.616509536818530000
0.5836740032803019679
0.5836740032803019679

 0.157269571435608290 I,
 0.157269571435608290 I,
 0.7079327484231990624 I,
 0.7079327484231990624 I

With these capabilities, Mathematica is able to produce the solutions of most polynomials that
arise in eigenvalue calculations. As an example, try the following Mathematica commands.
NSolvex ^ 5  3x ^ 4  x ^ 3  2x  1  0, x
NSolvex ^ 5  3x ^ 4  10x ^ 3  6x ^ 2  5x  25  0, x
NSolvex ^ 21  x  1  0, x

The rst equation has three real solutions and one complex conjugate pair of solutions. The second
equation has one real solution together with a repeated complex conjugate pair of solutions and
the third has 21 solutions, only one of which is real.

4.4 Eigenvalues and Eigenvectors of a Matrix.


In Section 4.2, a method was developed for diagonalising a 2 2 matrix. The same method can
be used for a square matrix of any size, the only dierence being that the values of are solutions
of a polynomial equation of degree higher than 2. The terminology which is used to discuss the
method is contained in the following denition.

Denition 4.3
Let A be a square matrix. A number for which there exists a nonzero vector v such
that
Av = v
is called an eigenvalue of A. The vector v is called an eigenvector of A corresponding to
the eigenvalue .

The values of are determined by the requirement that the matrix A I be singular, so that
the equation
(A I)v =

64

4 Eigenvalues and Eigenvectors

can have a nonzero solution for v. This requirement gives


det(A I) = 0,
which is a polynomial equation of degree n, if A is an n n matrix.

Denition 4.4
Let A be a square matrix. The polynomial equation
det(A I) = 0
is called the characteristic equation of A.

Just as in the two dimensional case, the eigenvalues of A are the diagonal elements of the
matrix C, while the eigenvectors of A are the columns of the matrix B. We shall then nd that
C = B 1 AB.
The eigenvalues of A are found by solving the characteristic equation of A. The eigenvectors
can then be found by substituting each eigenvalue into the equation (A I)v = , and solving
for v by Gaussian reduction. In the reduction, a row of zeros always appears because A I is
singular. This shows that the equations are redundant and so have an innite number of solutions.
In constructing the matrix B, the ith column of B must be an eigenvector corresponding to the
eigenvalue i , but it can be any one of the eigenvectors. The matrix C is unique apart from the
order of the eigenvalues down the main diagonal.
In the remainder of this section we shall work through the diagonalisation of a particular 3 3
matrix. The matrix we shall use is

6
2 5
A= 1
1
1.
8 2
7
The characteristic equation is


6

1

8

2
1
2


5
1 = 0
7

and this is expanded as








(6 ) (1 )(7 ) + 2 2 (7 ) 8 5 2 8(1 ) = 0,
which simplies to
(6 )(9 8 + 2 ) 2(1 ) 5(10 + 8) = 0,
54 + 48 62 9 + 82 3 + 2 + 2 + 50 40 = 0,
3 + 22 + 2 = 0,
3 22 + 2 = 0.

4.4

Eigenvalues and Eigenvectors of a Matrix

65

We shall use Mathematica to solve this cubic equation.


In[7]:= Solve ^ 3  2 ^ 2   2  0, 
Out[7]=   1,   1,   2

The eigenvalues of A are thus

= 1, 1, 2.

To nd the eigenvectors, we rst substitute = 1 into det(A I) = . The array to be


reduced is
7
2
1
0
8 2

5
1
6

0
0
0

1
0
0 2
0
0

1
2
0

0
0
0

and the reduced array is

Putting v3 = a, we obtain v2 = a from the second row and then v1 = a from the rst row. Thus

a
a
a
ia an eigenvector for any value of a. Letting a = 1 gives one particular eigenvector as

1
1.
1
Next, let = 1. The array to be reduced is
5
2
1
2
8 2

5
1
8

0
0
0

1
0
0

0
0
0

and the reduced array is


1
0
0

2
12
0

Letting v3 = a, the second row gives v2 = 0 and the rst row gives v1 = a. Thus

a
0
a

66

4 Eigenvalues and Eigenvectors

is an eigenvector for any value of a. Letting a = 1 gives the particular eigenvector

1
0.
1
Finally putting = 2 gives the array
8
2
1 1
8 2

5
1
5

0
0
0

1 1
0 6
0
0

1
3
0

0
0
0

which reduces to

Putting v3 = a, the second row gives v2 = 12 a and the rst row then gives v1 = 21 a. Thus for any
value of a,
1
2a
1a
2

is an eigenvector of A. Taking a = 2 gives one particular eigenvector as

1
1.
2
From these results, we obtain

1 1 1
B= 1
0
1,
1
1
2

1
0
C = 0 1
0
0

0
0.
2

By nding the inverse of B and using matrix multiplication, it is easily checked that
C = B 1 AB.
These are the matrices used in Example 4.1 to diagonalise A.
We have developed a method for diagonalising a matrix and have applied it to several examples.
Unfortunately not every matrix can be diagonalised in this way, and in the next section we shall
consider some examples of matrices which cannot be diagonalised by a similarity transformation.
Before doing so however, we return to the problem which originally led us to investigate the problem
of diagonalisation. At the beginning of this Section, we formulated a simple model for the course of
an epidemic. To predict the course of the epidemic requires the calculation of powers of a matrix.
To calculate powers, we have found that the calculations are easier if the matrix is rst diagonalised
by a similarity transformation.

4.4

Eigenvalues and Eigenvectors of a Matrix

67

In the particular problem of the epidemic, we must calculate A 12 where

3/5
A = 2/5
0

1/5
2/5
2/5

0
0.
1

To diagonalise this matrix, we rst compute the characteristic equation



3/5

2/5


0

1/5
2/5
2/5


0
0 = 0.
1

Expanding the determinant and simplifying gives


253 502 + 29 4 = 0.
Again we shall use Mathematica to nd the solutions.
In[8]:= Solve25 ^ 3  50 ^ 2  29  4  0, 

1
5

4
5

Out[8]=    ,    ,   1

The eigenvalues of A are thus


= 1, 1/5, 4/5.
The eigenvectors can be found by substituting each of the eigenvalues into the equation
(A I)v = , and solving by Gaussian reduction. For the eigenvalue = 1, the array to be
reduced is
2/5
2/5
0

1/5
3/5
2/5

0
0
0

0
0
0

2/5
0
0

1/5
2/5
0

0
0
0

0
0
0

which reduces to

This yields an eigenvector for = 1 as


0
0.
1

After similar calculations with the other two eigenvalues, we obtain the results

= 1,


0
v = 0;
1

1
v = 2 ;
1

= 1/5,

1
v = 1 .
2

= 4/5,

68

4 Eigenvalues and Eigenvectors

The matrices B and C can now be assembled,

1
0
0
C = 0 1/5
0 ,
0
0
4/5

0
1

B = 0 2
1
1

1
1 .
2

The twelfth power of the diagonal matrix C is easy to calculate and this enables the twelfth
power of A to be calculated from
A12 = BC 12 B 1 .
If we start with a population in which nobody is sick, then the state of the population after twelve
months is given by
p12 = A12 p0
= BC 12 B 1 p0

12
0
1 1
1

= 0 2 1
0
1
1
2
0

0.046
= 0.046 .
0.908

0
(1/5)12
0

3
3
0

1/3
1 1
0
2 1
(4/5)12


3
1

0
0
0
0

Thus the epidemic has had drastic eects with 90.8% of the original population dead, 4.6% sick
and only 4.6% still well.

4.5 Complex Eigenvalues.


The method introduced in the previous two sections for diagonalising a matrix requires that
the n n square matrix A have n dierent real eigenvalues. If this is not the case, then the above
methods will not produce enough eigenvectors to construct the matrix B and the method will fail.
This will often be the case because the eigenvalues are solutions of a polynomial equation, and
such equations can have both complex solutions and repeated solutions. When there are complex
eigenvalues, the matrix cannot be diagonalised and when there are repeated eigenvalues it may or
may not be able to be diagonalised. In both cases however, the matrix can be transformed into
a simple form which enables powers to be calculated more easily, and which also simplies many
other matrix calculations. In this section we shall consider the case of complex eigenvalues, only
making a few general comments about the more dicult case of repeated eigenvalues.
If we work in complex arithmetic, then we can diagonalise a matrix with distinct complex
eigenvalues in the same way as we have done with matrices with distinct real eigenvalues. The only
dierence is that the diagonal form is complex. As we are working with real matrices, the complex
eigenvalues occur in complex conjugate pairs.
Consider the matrix

4
2 4
A= 2
1
1.
8 3
7
The characteristic equation is found in the usual way to be
3 42 + 6 4 = 0.

4.5

Complex Eigenvalues

69

The solutions are given as follows by Mathematica.


In[9]:= Solve ^ 3  4 ^ 2  6  4  0, 
Out[9]=   1  I,   1  I,   2

The eigenvalues are thus


= 2, 1 i.
To nd an eigenvector for the real eigenvalue 2, we follow the usual procedure and obtain

1
v = 1 .
1
To nd an eigenvector for = 1 + i, rst substitute = 1 + i into (A I)v = . The resulting
set of equations is solved by Gaussian reduction, with the initial array being
5 i
2
8

2
i
3

4
1
6i

0
0
0

The rst step in the reduction is to realise the (1,1) element of the array.
26
2
8

10 + 2i
i
3

20 4i
1
6i

0
0
0

(R1 (5 + i)R1)

We next interchange the rst and second rows to simplify the calculations.
2
26
8

i
10 + 2i
3

1
20 4i
6i

0
0
0

1
7 4i
2i

0
0
0

(R1 R2)

The rst column can now be reduced.


2
0
0

i
10 + 15i
3 + 4i

(R2 R2 13R1)
(R3 R3 4R1)

The second and third rows are multiples of each other and this can be checked by showing that the
2 2 determinant of nonzero elements is 0.
(10 + 15i)(2 i) (3 + 4i)(7 4i) = 0.
Thus the third row can be reduced to zeros, but there is no need to carry out the calculation.
Putting v3 = a, we nd from the third row,
v2 =
=

2i
a
3 + 4i

1
(2 + i) a.
5

70

4 Eigenvalues and Eigenvectors

The rst row then gives


1
(iv2 v3 )
2


1 i
=
(2 + i) a a
2 5

v1 =

1
(3 + i) a.
5
Taking a = 5 gives an eigenvector for 1 + i as

3 + i
v = 2 + i.
5
=

This is a longer calculation than is required for a real eigenvalue, but it does not need to
be repeated for the complex conjugate eigenvalue. The complex conjugate of the eigenvector just
found is an eigenvector for the complex conjugate eigenvalue. Thus we obtain the results

1
3 + i
3 i
= 1 + i, v = 2 + i ;
= 1 i, v = 2 i .
= 2, v = 1 ;
1
5
5
The matrices B and C can now be constructed in complex form

2
0
0
1 3 + i 3 i
C = 0
1+i
0 .
B = 1
2+i
2 i,
0
0
1i
1
5
5
Since A is a real matrix, it would be desirable to transform it into a simplied real form. This
can indeed be done, but the simplied form is not diagonal. It can be shown that the required
simplied real form of the matrix is as follows. The complex block
a + ib
0

0
a ib

in the diagonal form is replaced by the real block


a b
b
a
In the above example, the real transformed matrix is

2
0
0
Q = 0
1 1 .
0
1
1
The matrix which produces this simplied form is obtained by using the imaginary part of the
eigenvector as the rst of the two relevant columns and then the real part as the second of the
columns. In the above example, the matrix which produces the real simplied form of the matrix
is

1
1 3
P = 1
1
2.
1
0
5

4.5

Complex Eigenvalues

71

It is readily veried that Q = P 1 AP .


We have now found a similarity transformation which yields a real simplied form for any
matrix which has no repeated eigenvalues. The form is often called block diagonal because we can
regard the main diagonal as composed of blocks which are 1 1 for a real eigenvalue and 2 2 for
a pair of complex conjugate eigenvalues. The general form is thus

D1 . . .
D2 . . .
.
,
..
..
..
.
.

...

Dn

where each Di is a number or a 2 2 block and each is a zero matrix of the appropriate size,
possibly 1 1 and so just the number 0. Powers of the block diagonal matrix can be calculated by
simply calculating the powers of each block. We need to consider how to calculate powers of the
2 2 matrix


a b
R=
.
b
a
Rather than simply computing powers of R and trying to identify a pattern, we can use an alternative approach. Let




1 0
0 1
I=
,
J=
.
0 1
1
0
It is easily veried that


a b
b
a


= aI + bJ,

and

J 2 = I.

Thus we can calculate powers of R by calculating powers of aI + bJ. The latter can be calculated
using the binomial theorem, replacing J 2 by I whenever it appears. These calculations are very
similar to the calculation of powers of the complex number a + ib. Note that I and J commute,
and so there are no problems with non-commuting matrices. For example
I 2 J = IJI = J.
In the example we have been using, we can for example calculate the cube of A by calculating
the cube of Q. We need the result


1 1
1
1

3

= (I + J)3
= I 3 + 3I 2 J + 3IJ 2 + J 3
= I + 3J 3I J
= 2I + 2J


2 2
=
.
2 2

We can now calculate A3 . First

8
0
Q3 = 0 2
0
2

0
2
2

72

4 Eigenvalues and Eigenvectors

and then
A3 = P Q3 P 1

20
12 16
= 4
6
6 .
24 14
18

We have not considered cases where the matrix has repeated eigenvalues, whether real or complex. Matrices with repeated eigenvalues raise dicult theoretical and computational problems.
Nevertheless such matrices have a simplied form which can be obtained by a similarity transformation, and which can be used for calculating powers as well as for other purposes. There are thus
various simplied forms to which a matrix may be transformed by a similarity transformation and
in a particular case the form depends on the nature of the eigenvalues. In all cases, the simplied
form of the matrix is called the Jordan form and this form has many applications in calculations
with matrices.
We conclude with some brief comments about the calculation of eigenvalues and eigenvectors
using Mathematica. The Mathematica command Eigensystem[A] produces the eigenvalues and
eigenvectors of A. The eigenvalues are given as a list, followed by a list of eigenvectors. Mathematica
can also calculate the Jordan form of a matrix. The Mathematica command JordanDecomposition
produces two matrices. The rst is the matrix which carries out the similarity transformation and
the second is the Jordan form. Mathematica however produces the complex form of the Jordan
form, and for the matrix with complex eigenvalues considered above, it essentially produces the
matrices B and C, rather than the matrices P and Q. In fact, Mathematica writes the eigenvectors
in the order
1 i, 1 + i, 2,
rather than
2, 1 + i, 1 i
as we have done, and so its matrices dier from B and C, as given above. This dierence reects
the fact that neither the similarity transformation nor the Jordan form is unique.

4.6 Problems.
1. For each of the following matrices,
(a)
Find the characteristic equation,
(b)
use Mathematica to nd the eigenvalues,
(c)
for each eigenvalue, nd an eigenvector,
(d)
construct the matrices B and C,
(e)
nd B 1 ,
(f)
check that C = B 1 AB.

(i)

(iv)

(vii)

1 3
2
4
3
2
3 4

5
6
12

2
3
5


(ii)

(v)

(viii)

1
2 1
2
1
1
1
1
0

8 2
6
2
0 2
10
2 8


(iii)

2
4
6

4 5
2 1

(vi)

(ix)

2 2
2
3

5
0
6 1
6
0

3 2
1 4
4 0

2
4
2

1
1
2

4.6

Problems

(x)

0
2
2

73

1
1
3

2
1
2

2. For each of the following matrices,


(a)
Find the characteristic equation,
(b)
use Mathematica to nd the eigenvalues,
(c)
(d)
(e)

for each eigenvalue, nd an eigenvector,


construct the real matrices P and Q,
nd P 1 ,

(f)

check that Q = P 1 AP .


(i)

(iv)

1 1
1
1

3 10
2 5


(ii)

(vii)

(x)

4
5
0
5
7
1
10 15
4

3
2 4
2 1
1
2
2 3

(v)

(viii)

4 5
2 2

5
8
12

3
2
4

0 4
3
6
2
9

1
2
1 1
1
3


(iii)

(vi)

(ix)

1 2
5 1

3
1 1
0
2 1
1
1
1

0 2
0
6 3
5
4
1
3

74

Answers

Answers

Section 1.5

(i)
(iii)

x = 1, y = 1

(ii)

x = 2, y = 1

x = 4, y = 3, z = 5

(iv)

x = 3, y = 2, z = 4

(vi)

x=

14
11 a,

(viii)

x=

5
2

(v) Inconsistent
(vii)

x = 1, y = 1, z = 2

(ix)

x = 1 + 2a, y = 2 + a, z = a

(xi)

Inconsistent

(x)
(xii)

9
y = 11
a, z = a

32 a, y = 32 + 12 a, z = a

Inconsistent
x = 1 + 5a, y = 5a,
z = 1 2a, t = a

(xiii)

x = 2, y = 1

(xv) x = 1 + a, y = 2 + 3a,

(xiv)

Inconsistent

(xvi)

Inconsistent

z = 1 a, t = a

Section 2.6


1.

(i)

0
7
7 7

(v)

Undened

(ii)

3 1
2 3

(vi)

Undened

(vii)

(x)

(1

(xi)

(iii)



(ix)

1
5
11 6




(xiii)

Undened

(xiv)

3)

0 7
3 2


(xv)

Undened

(xviii)

Undened

(xix)

0 0
0 0

0
5
7 6

7 0
0 7


(iv)

1
3

2
1

 
0
0


(xvii)

1
2
3 1

(viii)

Undened

(xii)

Undened



(xvi)


(xx)

6
4

18
12

7 14
21 7

Answers

75

2.

(i)

2
5
6

1
7
9

11
4
7

(iv) Undened

(ii)

1
3 2
4 1 1
1
3
0

(viii)

2
3
1
(x) 5
0 11
0 10 18

(xiii)

2
1
8


(xvi)

(xi)

(xiv)

4.

6
7

(iv) Nonsingular

7.

(xvii)

(ii)

8
3

(vi)

Undened

(ix)

Undened


(xii)

11
5
2 5

11
5

Undened

5 11
8
4
0 13
5 5 1


(xv)

2 5
(xx) 3 0
1 11

(xviii)

9 16 5
11
0 3

(iii)

Singular

(v) Nonsingular

(vi)

Singular

(i) x = 2, y = 1, z = 5

(i) x = 6, y = 2, z = 5
(iv) x = 4, y = 1, z = 2

(ii)

1 1
4
2
8
3

0
10
18

Nonsingular

(iv) x = 2, y = 1, z = 3

8.

19
13

0
3 4
5 2 2
3
4
0

4 1 7
13 22 11
4 26 38

(i) Nonsingular

(iii)

(xix)

2
1
2

1
0
1
3
2 1


1
7
6

3
7
4 1

5
2
3

(v) Undened

(vii)

8
13
5

x = 1, y = 2, z = 2

(iii)

x = 3, y = 2, z = 1

(iii)

x = 2, y = 1, z = 3

(v) x = 4, y = 2, z = 1

(ii)

x = 1, y = 0, z = 1

(v) x = 4, y = 2, z = 1

76

Answers

Section 3.5
1.

(i)

10

2.

(i)

3.

(i)

(ii)

(iii)

19

17 (ii)

25

(iii)

110

68

(iii)

(ii)

(iv)

42

(v) 45

(vi)

(iv)

42

(v) 175

(vi)

15

Section 4.6

1.

(i)

B=

(ii)

B=

(iii)

B=


(iv)

B=

3 1
2
1
1 5
1 2

(v)


,

C=


,

2 1
1
2
2 1
1
3

1
1
B = 1 1
2
1


,

C=


,

C=

1
1,
0

(vi)

1 0 2
B = 1 1 0 ,
2 0 3

(vii)

1 0
0 6

C=

1 0 2
B = 1 1 0 ,
1 1 3

(ix)

(x)

1 0
0 2
2
0
0 3

1 1
0 5 ,
1
4

2
2
2 2,
2
2

1
0
C = 0 2
0
0

1
C = 0
0

B= 0
2

1
C = 0
0

1 1
1
B = 1
0 1 ,
1
1 2
0

B = 1
2

(viii)

1 0
0 2

0
0
3

0
1
0

0
0
2

0
1
0

0
0
2

0
0
2
0
0 2

0
0
3

0
C = 0
0

2
0

C = 0 2
0
0

C=

2
0
0

0
0
1+ 2
0
0
1 2

(vii)

220

Answers

77


2.

(i) P =

(ii)

P =

(iii)

P =


(iv) P =

1 0
0 1
1 3
0 2
3 1
0 5


,

Q=


,

Q=

Q=

1 2
0
1

(vii)

(viii)

1
P = 0
2

0
3

3
0

1
0,
1

0
1
0

1 3
1
2,
0
5

2
0,
2

1 0
(x) P = 1
2
0
0

1 2
2 1

1
Q= 0
0

2
Q = 0
0

(ix) P =
1
1

1
1

Q=

5
0
3 1 ,
0
5

1
P = 1
1

1
1

1
0,
1

1 0
(vi) P = 1 1
0 0

1
1

1 3
1
2,
0
5

1
(v) P = 1
1

1
1

0
2
1

0
1
2

0
2
1

0
1
2

1
0
0
Q = 0 1 1
0
1 1

1
0
0
Q = 0 1 1
0
1 1

Q=
0
0

1
Q = 0
0

0
1
1

0
1
1

0
0

1 2

2
1

Você também pode gostar