Você está na página 1de 42

Introduction to Numerical Analysis and Computing with Fortran

M.A. Wongsam R.W. Chantrell School of Electronic Engineering and Computer Systems University of Wales Dean Street, Bangor, Gwynedd, LL57 1UT

Contents

1 Outline of the Course 2 Lecture 1 - Computing with numbers

2.1 The computer representation of numbers : : : : : : : : : : : : : : 2.2 Errors due to the computer representation of numbers : : : : : : :

4 5 8

5 6

3 Lecture 2 - Errors arising from arithmetical operations 4 Lecture 3 - Solving Non-Linear Equations

3.1 Summary from lecture 1 : : : : : : : : : : : : : : : : : : : : : : : 8 3.2 Error accumulation : : : : : : : : : : : : : : : : : : : : : : : : : : 9 3.3 Loss of signi cance : : : : : : : : : : : : : : : : : : : : : : : : : : 10 4.1 Fixed-point iteration : : : : : : : : : : : : : : : : : : : : : : : : : 11 4.2 criteria for convergence : : : : : : : : : : : : : : : : : : : : : : : : 12 4.3 The Newton-Raphson method : : : : : : : : : : : : : : : : : : : : 13

11

5 Lecture 6 - Systems of linear equations (1)

5.1 Introduction : : : : : : : : : : : : : : : : : : : : 5.1.1 Solvability of systems of linear equations 5.2 Direct methods : : : : : : : : : : : : : : : : : : 5.2.1 De nitions : : : : : : : : : : : : : : : : : 5.2.2 The backsubstitution algorithm : : : : : 5.3 Gaussian elimination : : : : : : : : : : : : : : : 5.3.1 The reduction algorithm : : : : : : : : : 5.3.2 Failure of the reduction method : : : : : 5.4 Iterative methods : : : : : : : : : : : : : : : : : 5.4.1 Jacobi's method : : : : : : : : : : : : : : 5.4.2 The Gauss-Seidel method : : : : : : : : 6.1 The Algebraic eigenvalue problem : : : 6.2 The power method : : : : : : : : : : : 6.3 Similarity and the QR method : : : : : 6.3.1 Householder tridiagonalisation : 6.3.2 QR Factorisation : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : :

14

14 14 16 16 16 17 17 18 21 21 21 24 24 25 25 27

6 Lecture 7 - Systems of linear equations (2)


: : : : : : : : : :

: : : : :

: : : : :

: : : : :

24

7 Lecture 10 - Ordinary Di erential Equations, Part (1)

7.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : 7.2 Single-step methods : : : : : : : : : : : : : : : : : : : : : 7.2.1 The Euler method : : : : : : : : : : : : : : : : : 7.2.2 The improved Euler (predictor-corrector) method 7.2.3 Runge-Kutta methods : : : : : : : : : : : : : : : 7.2.4 4-th order Runge-Kutta algorithm : : : : : : : : :

30

30 31 31 32 32 33

8 Lecture 11 - Ordinary Di erential Equations, Part (2)

8.1 Multi-step methods : : : : : : : : : : : : : : : : : : : : : : : : : : 35 8.1.1 Implicit formulas : : : : : : : : : : : : : : : : : : : : : : : 35 8.1.2 Explicit formulas : : : : : : : : : : : : : : : : : : : : : : : 36

35

8.2 The Adams-Moulton predictor-corrector method : : : : : : : : : : 37 8.2.1 The Adams-Bashford method : : : : : : : : : : : : : : : : 37 8.2.2 The Adams-Moulton method : : : : : : : : : : : : : : : : 38

9 Problems 10 Answers to Problems 11 Lecture programme

39 39 42

1 Outline of the Course


These notes have been prepared as a companion to a part of an MSc. degree run at SEECS, UWB in the spring of 1998. It is intended for graduate engineers, and assumes a graduate level knowledge of applied mathematics including calculus, real and complex analysis and linear algebra. For instance, in section 4 reference is made to the mean value theorem and intermediate value theorem from calculus. The intention is to provide a grounding in some of the most useful areas of numerical analysis for physical scientists and engineers, together with some experience of writing implementations of the algorithms using Fortran90/95. No previous knowledge of Fortran programming is assumed. The course comprises 24 sessions including programming workshops, and covers the solution of systems of non-linear equations, linear algebra, interpolation, numerical quadrature, and the integration of ordinary and partial di erential equations. The programme for the 1997/98 presentation is given in Table 1 in section 11. Also, as part of the assessment is included two practical projects which enable students to demonstrate their grasp and ability to apply certain topics within the course. In the the 1997/98 presentation, the assignments are:(1) The calculation of switching speeds in hard disc recording. This assignment involves the integration of a system of coupled ordinary di erential equations (The Landau-Lifshitz equation), and will include a substantial programming element. (2) the measurement of coercive force distributions for characterisation of recording media. This assignment involves the numerical di erentiation of experimental data, which will be taken from an actual laboratory experiment. Recommended reading for the course is:Numerical Methods for Mathematics, Science and Engineering John H. Mathews, Prentice-Hall International Inc. ISBN 0-13-625047-5 A postscript copy of these notes is availible for downloading from the world wide web at URL http://www.mikewong.demon.co.uk/index.html.

A note on symbols We use the following special symbols:symbol meaning


9 8 2

=)
Z R

there exists ... such that for all ... we have is a member of the set is a subset of implies that the set of non-negative integers the set of all real numbers

2 Lecture 1 - Computing with numbers


In mathematical modelling, analytical solutions will be availible in only a small class of problems. These include problems which are highly symmetric, where classical functions are able to describe the behaviour of the solutions, and where the number of degrees of freedom are such that the problem is tractable from the practical standpoint. Where, there is a lower degree of symmetry, and many degrees of freedom, one generally has to resort to a numerical solution. The use of numerical mathematics is not limited to mathematical modelling problems. Experimental data is often analysed using sophisticated numerical techniques. We intend to illustrate the use of computing techniques both in modelling and in the treatment of experimenatl data in two practical assignments (1) the calculation of switching speeds in hard disc recording (2) the measurement of coercive force distributions for characterisation of recording media. Wherever numerical mathematics is to be used, one encounters problems of the representation of numbers, which is the subject of lecture 1.

Integers: Many real numbers can be represented by nite strings of digits, but

2.1 The computer representation of numbers

most real numbers require in nite strings of digits. Since for practical computations only nite strings of digits must be used, a trauncation error is entailed when real numbers are represented in calculations. For example, most computer subroutines for evaluating sin , cos , etc. rst subtract or add 2 from the argument in order to obtain an argument in the range ; + . But since requires an in nite number of digits for its representation, the subtraction entails an error which changes the result. A non-negative integer N with n digits may be represented according to the following polynomial scheme

N=

n ;1 X i=0

an;i;110n;i;1

0 fak 2Zg 9

so that the number 257 in this denary representation is 257 = 2 102 + 5 101 + 7 100 Since computers store numbers as states of electronic components, they often use a binary representation, so that a non-negative integer with n digits may be represented according to

N=

n ;1 X i=0

an;i;12n;i;1

0 fak 2Zg 1

For example, the binary representation of the denary number 13 is (1101) which has polynomial representation 1101 = 1 23 + 1 22 + 0 21 + 1 20

Fractions: A non-negative real number x in denary representation has an intexF =


1 X
i=1

gral part xI which is the largest integer less than or equal to x, while its fractional part xF = x ; xI . xF can have polynomial denary representation

bi 10;i

0 fbk 2Zg 9

If k i 2 Z , then the fraction xF is said to terminate if and only if it is true that 9 k 8 k + i] bk+i = 0. If no such k exists, then xF requires an in nite number of digits to represent it exactly. Now, if xI = (anan;1 :::a0 ) and xF = (b1 b2 :::) the we can write x = (an an;1:::a0 :b1 b2 :::) where the decimal point separates the ak from the bk . In the binary polynomial representation

xF = xF =

1 X
i=1

bi2;i
;i

0 fbk 2Zg 1 0 fbk 2 Zg


;1

and in a general number base , the polynomial representation will be


1 X
i=1

bi

When we use computers for numerical calculations we have to convert between the denary and representations of numbers. It so happens that not every number which is terminating in denary representation is also terminating in binary representation. Therefore, even when a fraction can be exactly represented by a nite number of digits in denary representation, it may still entail a truncation error when converted to binary. Computers use a normalised oating point t-digit representation x = (:b1 b2 :::bt ) e where (:b1b2 :::bt ) is called the mantissa and e is called the exponent. The precision of the mantissa is determined by the word length of the computer, and the exponent is bounded within some range m e M where m is a negative integer and M is some positive integer. Therefore, all the numbers which can be stored in a computer and used in calculations are contained within the set F (t m M ). Any number x that we use in a computer computation that is not in this set, must be approximated by a number which is a member of the set, ie. x fl(x) 2 F (t m M ) A number x is converted to normalised oating point representation fl(x) by choosing the nearest normalised oating point number to x, and applying some rule in the case of a tie, such as rounding (choosing the nearest oating point number to x and using a rule like rounding up in the case of a tie) or chopping (choosing the nearest oating point number smaller in magnitude that x), as long as e lies within the range m e M 1 . The di erence jx ; fl(x)j is called the absolute error.
1

2.2 Errors due to the computer representation of numbers

If e < m one has an under ow condition, and if e > M one has an over ow condition.

The distribution of oating point numbers is very uneven2 so that fl(x) depends strongly on x. It is always possible to write any nonzero decimal number x in the form 1 juj < 1 0 jvj < 1 x = u 10e + v 10e;t 10 For example, with t = 4 12:3456 = :1234 102 + :56 10;2 ;:0123456 = ;:1234 10;1 + (;:56) 10;5 3:1415927::: = :3141 101 + :5927::: 10;3 In the same way any base nonzero number can be written 1 juj < 1 0 jvj < 1 x = u e + v e;t Now, the rst term xC = u value xR is given by
e

is the chopped value of x, whereas the rounded


e jv j < 1=2 e + e;t u > 0 jv j e ; e;t u < 0 jv j

8 > <u xR = > u :u

1=2 1=2

whereas the absolute errors EC = x ; xC and ER = x ; xR for chopping and rounding respectively are given by

EC = jvj ER = jvj
and hence the upper bounds R = ER = jxj are
C

e;t e;t R

< <

e;t e;t =2 C

and
C R

on the relative errors


1;t 1;t =2

= EC = jxj and (1) (2)

= =

See problem 1

3 Lecture 2 - Errors arising from arithmetical operations


In the previous lecture we indicated some of the errors associated with using computers to represent numbers. No discussion was entered into about performing actual calculations with these numbers. In the present lecture, we will be concerned with errors arising from doing numerical calculations. In particular, we are concerned with the propagation of errors associated with number representation, and certain avoidable errors that can occur due to the form of the calculation being performed. We do not cover issues such as the stability of algorithms3. To be useful for doing a particular calculation, an algorithm must be stable that is to say, small changes in the initial data should produce small changes in the nal results. If small changes in the initial data produce large changes in the results, the algorithm is said to unstable4.

3.1 Summary from lecture 1


In lecture 1, it was demonstrated that 1 computers cannot store all real numbers x 2 R, but only a discrete subset of the real numbers which we have denoted by F (t m M ) R 2 the density of y 2 F (t m M ) is highly non-uniform in R 3 when a number x 2 R is to be used in a computer calculation, it is rst approximated by a number y = fl(x) 2 F (t m M ) and stored in normalised t-digit oating point -representation 4 in such representation, fl(x) has the form fl(x) = (:b1 b2 :::bt ) e where t is determined by the wordlength of the computer, is the number-base system used, and m e M The implications of the above are that numbers stored in computers entail errors even before calculations have been performed. In particular, the implication of point 2 above is that the error entailed in storing the number x depends sensitively on x. From points 3 and 4, it is clear that the error also depends on the wordlength of the computer, and the the magnitude of the numbers it is possible to store is bounded above and below. Also, the relative error C (x), R(x) for computers which utilise chopping or rounding rules of approximation are respectively given by C (x) = 1;t and R (x) = 1;t =2, Eqs. 1 and 2 respectively. We now want to see what further errors are entailed when some arithmetic is done.
An algorithm is a nite sequence of rules for performing a certain calculation such that, at each step, the rules determine exctly what is to be done in the following step. 4 This numerical instability is not to be confused with the mathematical instability of a problem, which is called ill-conditioning. If a problem is ill-conditioned, then no matter how good the algorithm is, the results will be extremely sensitive to small changes in the initial data.
3

3.2 Error accumulation

Let denote any of the binary arithmetical operations (+ ; ). Also, let x y 2 F (t m M ) and denote by fl x y] 2 F (t m M ) the computed oating point value of x y using the system F (t m M )5. Then C = (xC ; x)=x allows one to write xC = x(1 + C ) as a relation between the exact and chopped values, where C from Eq. 1, and where a similar relation holds for xR in terms of . Then, one has R fl x y] = (x y)(1 + ) (3)

Accumulation of errors Consider the sum


s = x1 + x2 + x3 + x4 + x5
Adding from the left, the computed oating point sum is (4) (5)

fl(s) = fl fl fl fl x1 + x2 ] + x3 ] + x4 ] + x5 ]
On using Eq. 3 with + for

fl x1 + x2 ] = x1 (1 + 1) + x2 (1 + 2 ) fl fl x1 + x2 ] + x3 ] = fl (x1 (1 + 1) + x2(1 + 2 )) + x3 ] = (x1 (1 + 1) + x2 (1 + 2 )) (1 + 3 ) + x3 (1 + 4 ) = x1 (1 + 1)(1 + 3 ) + x2 (1 + 2 )(1 + 3 ) + x3 (1 + 4 )


etc., so that

fl(s) = x1(1 + 1 ) + x2 (1 + 2) + x3 (1 + 3 ) + x4 (1 + 4) + x5 (1 + 5 ) = x1 + x2 + x3 + x4 + x5 + x1 1 + x2 2 + x3 3 + x4 4 + x5 5 = s + x1 1 + x2 2 + x3 3 + x4 4 + x5 5 (6)


where Eq. 4 has been used in the nal step, and where 1+ 1+ 1+ 1+ 1+
1 2 3 4 5

= = = = =

(1 + (1 + (1 + (1 + (1 +

1 )(1 + 2 )(1 + 4 )(1 + 6 )(1 + 8)

3 )(1 + 5 )(1 + 7 ) 3 )(1 + 5 )(1 + 7 ) 5 )(1 + 7 ) 7)

Using Eq. 6 one then arives at an expression for the bound on the error for the computed sum
jfl(s) ; sj
5

x1 1 + x2 2 + x3 3 + x4 4 + x5

Note that y 6= 0 if represents .

The errors discussd so far arise as a result of the machine representation of numbers, and the consequencies which follow when arithmetical operations are carried. They are present in numerical calculations because machine storage of numbers is discrete, whereas the numbers we want to use are part of a continuum. Given a particular machine environment, there is nothing that we can do to avoid such errors. However, there are errors which occur when carrying out arithmetic operations which can be avoided. Consider the oating point subtraction in the representation F (t m M ) of two numbers which are very close in numerical value. We cannot specify the numbers to more than t -digit accuracy, which may be in error by 0:5 1;t. If the result of the subtraction is of the same order as the t-th digits in the original numbers, the error in the result will be of the same order as the result itself. For instance, consider f (x) = 1 ; cos x near to x = 0. From Eq. 2 the bound on the error in calculating cos x close 1 1;t which may be as large or larger than f (x). 1 1;t jcos xj to x = 0 is 2 2 Whenever such a condition could appear, one can circumvent the problem by a judicious choice of alternative expression6

3.3 Loss of signi cance

See problem 3.

4 Lecture 3 - Solving Non-Linear Equations


It often happens that we have to solve a non-linear equation for either one or more than one of its roots. For a general quadratic, cubic or quartic equation there are standard formulas. Therefore, if the problem involves an explicit equation of quadratic, cubic or quartic degree, then there are no real di culties7. However, one does not always know explicitly the form of the equation8, and whever one does explicitly know, this, problems of degree greater than 4 which are nonfactorable require numerical calculation to nd an approximate solution. As such, all the sources of error discussed in lectures 1 and 2 apply9. The problem to be studied here is to nd a solution x of the nonlinear equation f (x) = 0 (7) We also here introduce the concept of iteration, which is extensively used in numerical calculations. This consists in generating a sequence x0 x1 :::xk such that lim x = x k!1 k Since in practical calculations only a nite number of iterations can be computed, we can only approximate x to a given accuracy. Then, one must be concerned with imposing certain terminating conditions on the calculations such that, when these are satis ed at a certain k, then the solution is deemed to have been found. Many of the commonly used iterative methods take the form of successive substitution xk+1 = g(xk ) (8) where g(x) is called the iteration function. If it is the case that lim x k!1 k =x ~ = g(~ x) then x ~ is called a xed point of g. Thus, the problem of nding a zero point of f can often be solved by nding a xed point of g. One can then put the problem 7 in the form 8 by writing g(x) = f (x) + x. However, there will be many ways of putting 7 in the form 8. Consider the function
One still has to take care with regard to problems such as loss of signi cance. The problem may present itself as the solution of a di erential equation which depends upon the initial condition, so that one has a rule for evaluating f (x) for any argument - ie., solve the di erential equation - whereas the explicit form of f (x) is not known. 9 For instance, the polynomial P (x) = (1 ; x)6 has the single solution x = 1. However, if 6 one were to use the unfactored form
7 8

4.1 Fixed-point iteration

P6 (x) = 1 ; 6x + 15x2 ; 20x3 + 15x4 ; 6x5 + x6


evaluate it using a high precision (long word length) computer, and plot it between 0.9 and 1.1, one would nd that there are many apparent zero's ranging between about 0.994 and 1.006. Thus, use of the expanded form of P6 (x) leads to apparently acceptable estimates of the root which are correct only to 2 decimal digits. The reason for this behaviour lies solely in round-o error, loss signi cance, etc.

f (x) = x2 ; x ; 2 One can use a g(x) in any of the forms (1) g(x) = x2 ; 2 p (2) g(x) = 2 + x 2 (3) g(x) = 1 + x

(9)

x;2 8 m 2Z (4) g(x) = x ; x2; m One then starts the calculation with an initial value x0 which is intuitively close to the desired solution, and uses Eq. 8 to generate the iteration sequence.

It sometimes happens that not all iteration sequences converge10 . Also, the function g has to be chosen such p that it is possible to generate the iteration sequence. For example, with g(x) = ; x, x0 necessarily has to be positive, but then that would necessarily make x1 = g(x0 ) negative. Therefore, x2 cannot be calculated. However, if it is the case that
9I

4.2 criteria for convergence

= a b] 2 R 8x0 2 I g(x) 2 I

and that g(x) is continuous in I , then it is guaranteed that there is a xed point of g in I . This is so because g(a) a and g(b) b. Therefore if h(x) = g(x) ; x satis es h(a) 0 and h(b) 0. If g(x) is continuous in I then so is h(x), and so h(x) vanishes somewhere in I by the intermediate value theorem. This zero of h is a xed point of g. In general, the xed point x ~ is located at the intersection of the curves y = x and y = g(x). Now, suppose that
9K

> 0 8x 2 I jg0(x)j K

(10)

~ ; xn . Then and let en = x

en = x ~ ; xn = g(~ x) ; g(xn;1) 0 = g ( n)en;1


where Eq. 8 has been used in the second step and the mean-value theorem in the third step. Then, on using Eq. 10 we have
jen j

K jen;1j
(11)

Generalising the above result


jenj
10

K jen;1j K 2jen;2j ::: K nje0j

See problem 4.

and since for 0 K < 1


n nlim !1 jen j = nlim !1 K je0 j = 0

the criterion for a convergent iteration sequence is that the iteration function shall have a derivative with a maximum absolute value less than unity. Therefore, in the example Eq. 9 if we choose g(x) = x2 ; 2 asp iteration function, then for x > 1=2, g0(x)p> 1 whereas, if we choose g(x) = 2 + x as iteration function 0 g0(x) 1= 8.

4.3 The Newton-Raphson method

Eq. 11 shows that iteration functions with derivatives with smaller absolute value converge faster than those with larger derivatives. For fast convergence therefore, we should choose a form of the iteration function which is minimised at x ~. If f is di erentiable then a xed point form with the desirable features is derived by expanding f in a Taylor series and retaining only the linear term, that is

f (x + h) = f (x) + f 0(x)h
and solving this linearised equation, that is solving f (x + h) = 0 with f (x + h) in the above linear approximation, instead of f (x) = 0 with all terms present. This amounts to replacing f (x) by a linear function l(x) tangent to f (x0), that is

l(x) = f (x0) + f 0(x0 )(x ; x0 )


and solving l(x) = 0, that is

(12)

x = x0 ; f (x0)=f 0(x0 )
In other words, we have a xed point iteration with iteration function

g(x) = x ; f (x)=f 0(x)


One now nds that
0
2

(13)
00

x)) ; f (~ x)f (~ x) g0(~ x) = 1 ; (f (~ 2 (f 0(~ x)) which is zero since f (~ x) = 0. The method which uses the iteration function according to Eq. 13 is called the Newton-Raphson method. It should be noted that a solution depends upon the condition f 0(x) 6= 0. If f 0(x) is very small in the neighbourhood of a solution, then the solution determined by the NewtonRapshon method will be very sensitive to changes in the initial data. For instance, if f (x) = x5 + 10;4x = 0, then x ~ = 0 is a solution. However, f 0(0) = 10;4 which is small. At x = 0:1, we have f (0:1) = 2 10;5. Therefore, if the precision with which we determine that a solution has been found according to f (x) = 0 is of this order, we can get a solution of x = 0:1 even though the precision of the solution kx ; x ~k = 0:1 is about 5000 times larger than the precision of f . Such a problem is said to be ill-conditioned.

5.1 Introduction

5 Lecture 6 - Systems of linear equations (1)

A system of m linear equations in n unknowns is usually expressed in the following form a11 x1 + a12 x2 + ::: + a1n xn = b1 a21 x1 + a22 x2 + ::: + a2n xn = b2 : : : am1 x1 + am2 x2 + ::: + amn xn = bm where the aij and bi are given oating point numbers and the xi are to be determined so as to satisfy the system. It is convenient to express the system in matrix form Ax = b (14) where A is the m n matrix 0 a a ::: a 1 11 12 1n B C a a ::: a B 21 22 2n C B C B C : B C A=B : (15) C B C B C @ : A am1 am2 ::: amn and x and b are the column vectors 0x 1 0b 1 1 1 B C B C x b B C B 2 2 C B C B B C : C : C B C B C x=B b = (16) B C B C : : B C B C B B @ : C A @ : C A xn bn Matrices associated with linear systems can be classi ed as either dense or sparse. Dense matrices have very few zero valued elemnets, and tend to be relatively small, whereas sparse matrices have very few nonzero valued elements, and tend to be relatively large. Sparse matrices usually arise from attempts to solve di erential equations by nite di erence or nite element methods.

5.1.1 Solvability of systems of linear equations

Eq. 14 falls into one of three categories:1 if the system has no solution, the equations are said to be inconsistent, as for example 2x1 + 3x2 = 5 4x1 + 6x2 = 1

2 if the system has many solutions, the system is said to be dependent, as for example 2x1 + 3x2 = 5 4x1 + 6x2 = 10 3 if the system has exactly one solution for every b, it is said to be nonsingular. In this case we usually speak of the non-singularity of the coe cient matrix A. If this is the case, then the homogeneous equation Ax = 0 has only the trivial solution11 x = 0. Here, we will be concerned only with problems which fall into the third category { nonsingular systems. For such systems, it is necessary that the number of equations equals the numbers of unknowns, ie., in Eq. 15 m = n, so that A is an n n square matrix A is invertible. A frequently quoted test for invertibility is based uopn whether det(A) 6= 0. If this is case12 , then it is possible to express the solution in terms of determinants by using Cramer's rule 13 . However, for problems in which n is large Cramer's rule is not of practical interest since the calculation of determinants is in general of the same order of di culty as solving a linear system itself. Numerical methods for solving linear systems may be divided into two categories, direct: these yield the exact solution in a nite number of elementary arithmetic operations subject to round-o and other errors14 iterative: these start with an initial approximation, and by applying a suitable algorithm, successively better approximations are generated. Direct methods are usually better suited to handling problems characterised by dense coe cient matrices, whereas iterative methods are more suitable for dealing with problems characterised by sparse coe cient matrices.
If det(A) is small, ie. if the problem is nearly singular, then one has an ill-conditioned problem. For example, the equations x ; y = 1 and x ; 1:0001y = 0 have solution x = 10 001, y = 10 000. However, the equations x ; y = 1 and x ; 0:9999y = 0 have solution x = ;9999, y = ;10 000 !!! Yet, the coe cients in the two sets di er by at most two units in the fourth decimal place. 13 Cramer's rule states that the solution to an n n linear system can be written down in terms of the determinants D, D1 , D2 , etc., where D = det(A), and the D are de ned as the determinants of the matrices obtained by replacing the i-th column of A by the column vector b. 14 In practice, because computers use nite length oating point representations, extremely poor results can be obtained.
12
i

6.1

11

This will be important later when the algebraic eigenvalue problem is discussed in section

5.2 Direct methods


5.2.1 De nitions
The matrix elements of A fall into the following categories:1 those forming the set faii 1 i ng are called the diagonal elements of A 2 the elements faij i 6= j g are the o -diagonal elements. 2a Of these, the subset faij i < j g are called the superdiagonal elements, 2b the elements faij i > j g are the subdiagonal elements. Now, if all elements in category 2, ie. all o -diagonal elements are zero, A is said to be a diagonal matrix. If all elements in category 2b are zero while those in 2a are not all zero, then A is called an upper-triangular matrix, while if all those elements in category 2a are zero and those in category 2b are not all zero, A is called a lower-triangualr matrix.

5.2.2 The backsubstitution algorithm


If the coe cient matrix A in Eq. 14 is diagonal then the solution can be simple written down as x2 = ab2 :::: xn = abn x1 = ab1
11 22

if all of the aii are nonzero. A slightly more general form consists in A being upper-triangular. Then, the system of equations are

nn

a11x1 + a12 x2 + a22 x2 +

= = : : : an;1 n;1xn;1 + an;1 nxn = annxn =

::: :::

+ +

a1nxn a2nxn

b1 b2
(17)

bn;1 bn
(18)

From the last, or n-th row,

xn = abn
The pennultimate, or (n-1)-th row is

nn

bn;1 = an;1 n;1xn;1 + an;1 nxn =) xn;1 = bn;1a; an;1 nxn n;1 n;1 where xn on the right hand side is known from Eq. 18. Now xn and xn;1 which are now known, can be used in the (n-2)-th row to determine xn;2 , etc. This procedure for solving an upper triangular system of linear equations is called the backsubstitution algorithm.

5.3 Gaussian elimination

Most of the problems encountered in solving systems of linear equations do not present themselves in the upper traingular form. If there is a way of transforming a general square matrix into the upper triangular form, then upon using this one is then in a position to implement the backsubstitution algorithm and solve the problem. The most frequently used method is Gaussian elimination. We rst note that the solution to the system of linear equations is invariant with respect to the elementary row operations:1 multiplication of one of the equations by a non-zero constant 2 replacing an equation by the sum of the equation any other equation in the system 3 interchanging any two equations 4 a combination of 1 and 2 above. One can now see that repeated application of elementary row operations, under which the solution of the system is invariant, can transform the original system into upper-triangular form. This is achieved in the way outline below.

5.3.1 The reduction algorithm

First reduction step: We rst eliminate all coe cients of x1 apart from that
occurring in the rst row by 1.1 move to each row in turn starting from row 2 1.2 at row i add the constant mi1 = ;ai1 =a11 multiplied by the corresponding coe cient of equation 1 1.3 do the same to the right hand side of row i15 . After application of the above procedure, the system of equations looks like

a11x1 + a12 x2 + a13 x3 + ::: + a1nxn = b1 a22 x2 + a23 x3 + ::: + a2nxn = b2 a32 x2 + a33 x3 + ::: + a3nxn = b3 : : : an2x2 + an3 x3 + ::: + annxn = bn
where of course, the coe cients in rows 2 { n are linear combinations of the original coe cients.
The e ect of 1.2 and 1.3 together is to multiply the whole row 1 by m 1 and add it to equation i, ie., elementary row operation 4 above, but with the e ect of eliminating a 1 .
15
i i

Second reduction step: Now, we eliminate all coe cients of x2 in rows 3 { n

by 2.1 move to each row in turn starting from row 3 2.2 at row i add the constant mi2 = ;ai2 =a2 multiplied by the corresponding coe cient of equation 2 2.3 do the same to the right hand side of row i. After application of the above procedure, the system of equations looks like a11x1 + a12 x2 + a13 x3 + ::: + a1nxn = b1 a22 x2 + a23 x3 + ::: + a2nxn = b2 a33 x3 + ::: + a3nxn = b3 : : : an3 x3 + ::: + annxn = bn

where again, the coe cients in rows 3 { n are linear combinations of the coe cients resulting after application of the rst reduction step. This procedure is now repeated for third, fourth, etc. reduction steps, until after n steps, the resulting coe cient matrix is in upper-triangular form. Throughout, only elementary row operations have been performed, under which the solution remains invariant. The rules involved in the k ; th reduction step can be illustrated by the ow diagram in Fig. 1 An algorithm for reduction to upper triangular form by Gaussian elimination consists in a series of such reduction steps from k = 1 to k = n ; 1, after which the continually updated coe cients will be in the required form. This form can then be used as the input to the backsubstitution algorithm. A ow diagram for the Gaussian elimination procedure is as outline in Fig. 2 Examination of Stage 2 of the algorithm for performing the k-th reduction step in Fig. 1 requires that akk 6= 0. Whenever this occurs, there are two possibilities:1 9 l > k alk 6= 0, ie. there is a row which has not yet been visited that has a non-zero coe cient for xk 2 8 l > k alk = 0, ie. all succeeding coe cients of xk are zero. If the rst possibility is the case, then one can simply interchange row k and row l, since row interchange is one of the elementary row operations under which the solution is invariant. If this can be done whenever there is a akk = 0 in the k-th reduction step, then the system is singular. If the second possibility is the case, then the system will be overdetermined, and either there are an in nite number of possible solutions (the system is dependent, see section 5.1.1), or there is no solution (the system is inconsistent). If this is the case, the system is de nitely non-singular.

5.3.2 Failure of the reduction method

Flow Diagram for the k-th Reduction Step


do i = k + 1 n loop over rows > k form the multiplier loop over columns

?
mik = ;aik =akk

?
do j = k n

k in row i

?
aij = aij + mik akj

?
end do

6 6 -

update the coe cients

? ?

bi = bi + mik bk
end do

update the right hand sides

stop Figure 1: The sequence of operations in making the k-th reduction step to uppertriangular form. Notice that in stage 2 above, it is necessary that akk 6= 0. See section 5.3.2

Reduction to Upper Triangular Form by Gaussian Elimination


Inputfn aij big start procedure: supply initial data loop over rows < n perform the k-th reduction procedure

?
do k = 1 n ; 1

?
k-th reduction

? ? ?

6 as outline in Fig. 1 -

end do Outputfn aij big

output the new coe cients and right-hand sides to the backsubstitution algorithm to obtain the nal result

Figure 2: Flow diagram for reduction of a square n n matrix to upper triangular form using Gaussian elimination. The procedure outputs the transformed aij and bi , which can become the input for the backsubstitution algorithm.

The last section discussed the Gaussian elimination algorithm, which is useful for moderately sized systems of linear equations. This is what was called a direct method, which in principle yields an exact result, notwithstanding errors due to roundo and computer arithmetic. Certain problems however involve the inversion of very large matrices. Since matrix inversion involves n3 oating point operations, where n is the dimension of the matrix, the inversion of systems of dimension 104 would require 1012 oating point operations. Since such problems usually arise from the solution of di erential equations, the solution usually has to be marched across some parameter subspace16 . In such circumstance, one then applies approximate methods for nding the solution. These invariably involve some iterative procedure.

5.4 Iterative methods

5.4.1 Jacobi's method

As with all iterative procedures, one starts with an assumed trial solution x(0) . Then, the simplest algorithm proceeds by solving for x1 in terms of b1 and the (0) trial solution x(0) i6=1 solving for x2 in terms of b2 and the trial solution xi6=2 etc. The obtained solution then becomes the rst approximation 8i xi = x(1) i . Then 17 after the k-th iteration the i-th equation takes the form h x(ik) = a1 bi ; ai1 x(1k;1) ; ai2x(2k;1) ; ::: ii i k;1) (k;1) (k;1) ;ai i;1 x( (19) i;1 ; ai i+1 xi+1 ; ::: ; ain xn In an application for which the are many zero's (ie., the coe cient matrix is sparse) which occur in regular patterns, this can be utilised in order to avoid making several needless multiplications aij x(jk) where aij = 0. Such a strategy then reduces the number of oating point operations which have to be performed, and hence also the accumulated error (recall lecture 2). The procedure outlined above is known as the Jacobi algorithm, and is represented in ow diagram form in Fig. 3
k) have previously been In evaluating x(ik) in Eq. 19 it can be noticed that all x(j<i determined. A faster convergence can be obtained if these values are used in the k;1) . This is the basis of the right hand side of Eq. 19 instead of the old values x(j<i Gauss-Seidel method, and the evaluation stage is given by the equation h x(ik) = a1 bi ; ai1x(1k) ; ai2 x(2k) ; ::: ii i k) ; a (k;1) (k;1) ;ai i;1 x( x ; ::: ; a x (20) i i+1 i+1 in n i;1

5.4.2 The Gauss-Seidel method

For instance, in the solution of the Landau-Lifshitz equation (see assignment 2) for many degree of freedom systems, the solution has to be determined over a given range of time and external control eld. If the time and control eld parameters are discretised into say N timesteps and M eld steps, then the inversion has to be done N M times. Therefore, the entire simulation would take n3 N M oating point operations ! 17 Obviously, a must be non-zero. For a non-singular system, one can always achiev this by interchanging rows, which is an elementary row operation under which the solution remains invariant.
16
ii

The Jacobi Algorithm


Input Nmax n aij bi x(0) i

start procedure: supply initial data main iteration loop apply equation 19

? ?

do k = 1 Nmax generate x(ik)

yes

test for convergence

6 the updated solution becomes the old solution ERROR the algorithm has failed to converge terminate and output the solution

? no
x(k;1)

x(k)

no convergence Output x(ik)

Figure 3: Flow diagram for the solution of a system of linear equations using the Jacobi algorithm. Nmax is the maximum number of iterations after which it is deemed that the solution has not converged. At the stage where the x(ik) are generated from the x(ik;1) , the occurrence of zero's in de nite patterns in the matrix of coe cients should be used to avoid unnecessary multiplications.

An algorithm for the implementation of the Gauss-Seidel method will be virtually identical to the Jacobi method, except that Eq. 20 will be used in place of Eq. 19

6.1 The Algebraic eigenvalue problem

6 Lecture 7 - Systems of linear equations (2)

Eigenvalues are of great importance in many physical problems and so it is necessary to have at hand robust ways of systemmatically computing eigenvalues and their associated eigenvectors. The algebraic eigenvalue problem is a vast and important subject, and as such, many methods have been developed to obtain solutions under a variety of conditions. The selection of method depends on what type of matrix is involved, and what information is required. Here, we describe probably the simplest method of locating a given eigenvalue by an iterative procedure | the so called power method. Later, we describe algorithm which is used to compute all eigenvalues and eigenvectors of a real symmetric matrix | the method of Householder tridiagonalisation and QR factorisation. Mathematically, the algebraic eigenvalue problem arises whenever the right hand side of the system of linear equations, Eq. 14 is a multiple of the vector of unknowns, that is whenever b = x, so that (A ; I) x = 0 (21) where I is the unit n n matrix and is a scalar quantity. But from the discussion of non-singular matrices in section 5.1.1, if the matrix A ; I is non-singular, only the trivial solution x = 0 exists. Therefore, for non-trivial solutions to Eq. 21 to exist, the matrix A ; I must be singular, ie. non-invertible.

6.2 The power method


Now de ne

One starts with an trial eigenvector x0 and compute the sequence

x1 = Ax0
m0 = xT k;1 xk;1

x2 = Ax1
m1 = xT k;1 xk m1 q=m
0

::: xk = Axk;1 m2 = xT k xk
(22)

Then, the Rayleigh quotient

is an approximation for an eigenvalue of A. Moreover, if we de ne q = ; so that is the error of q, then

m2 ; q2 m0 To prove this, consider the inner product


j j=

(23)

(xk ; qxk;1)T (xk ; qxk;1) = = = =

m2 ; 2qm1 + q2 m0 m2 ; 2q(qm0) + q2m0 m2 ; q 2 m 0 j j2 m0

(24)

where Eq. 22 has been used in the second step and Eq. 23 has been used in the nal step. Now, the normalised eigenvectors zi i = 1 n of the real symmetric matrix A form a complete orthonormal set which span the vector space Vn = fy = (y1 y2 ::: yn )g so that we can write

xk;1 =

n X i=1

cizi

m0 =

n X i=1

c2 i

xk = Axk;1 =
n X i=1 n

n X i=1

ci Azi =

n X i=1

ci izi (25)

from which we obtain

xk ; qxk;1 =
Using this result in Eq. 24

ci (

i ; q ) zi

X 2 j j2 m0 = c2 i ( i ; q)
i=1

Now, if q is closest to a particular c, then replacing each


j j2 m0

n X 2 c2 c ; q) i i=1

by

= ( c ; q)2m0

where the second of Eq. 25 has been used. Dividing throughout by m0 we arrive at Eq. 23.

6.3 Similarity and the QR method

We now turn to the problem of nding all the eigenvalues and associated eigenvectors for a given real symmetric matrix, that is a matrix which has A = AT . ^ and A are said to be similar if Two matrices A ^ = T;1AT A (26) for some non-singular n n matrix T. Eq. 26 is called a similarity transformation. These are important because they preserve eigenvalues | the eigenvalues of A ^ are the same. Moreover, if x is an eigenvector of A, then x ^ = T;1x is an and A ^ belonging to the same eigenvalue since from Eq. 21 we have eigenvector of A
Ax =

x =) T;1Ax = T;1x = T;1AIx

= T;1ATT;1x ^ T;1x = A

where Eq. 26 has been used in the third step.

6.3.1 Householder tridiagonalisation


Since application of a similarity transformation preserves eigenvalues, and readily gives the eigenvectors, one can exploit this fact in order to reduce a given real symmetric matrix to a simpler form. A very widely used method which employs

this strategy is Householder's method. This employs n ; 2 successive similarity transformations in order to reduce A to tridiagonal form18. The matrices under which these similarity tranformations are carried out, denoted by P1, P2 , ...,Pn;2 1 T are orthogonal and symmetric, and hence19 P; i = Pi = Pi . These similarity transformations generate a sequence of matrices A0 = A, A1 , A2 ,...,An;3 given by

: : : ^ = Pn;2An;3Pn;2 A

A1 = P1 A0 P1 A2 = P2 A1 P2

(27)

The idea is that these tranformations create the necessary zero's in row 1 column 1 in thr rst step, row 2 column 2 in the second step, etc. These steps are ^ is tridiagonal. illustrated for a 5 5 matrix in Fig. 4. The result is that A First step A1 = P1 A0 P1 Second step A2 = P2 A1 P2 Third step A3 = P3 A2 P3

Figure 4: Illustration of Householder's method for a 5 5 matrix. The positions left blank are the zero's created by the transformations.
A tridiagonal matrix has non-zero elements in the diagonal, superdiagonal and subdiagonal only, that is to say, only elements a , a +1 and a +1 can be di erent from zero. 19 Recall that an orthogonal matrix is a real n n matrix which has column vectors e that form an orthonormal system so that e e e e = where is the Kronecker delta equal to 1 when i = j and 0 when i 6= j . This means that, if E is an orthogonal matrix with column vectors e , E E, which has elements e e is equal to the unit matrix, which implies that E = E;1 .
18
ij i j i j i i j T i j ij ij i T T T i j

The question now is, how to determine the Pk ? First, write20


T Pk = I ; 2vk vk

(28)

where vk is a unit vector with its rst k components equal to 0, ie.21

0 1 0C B B C B C B C B C B C : v1 = B C B B C :C B C B @:C A

0 1 0C B B 0C B C B C B C B C v2 = B : C B B C :C B C B @:C A

0 1 0C B B 0C B C B 0C B C C ::: vn;2 = B B : C B B C :C B C B @ C A

The sequence of calculations that give rise to the vk are depicted in Fig. 5

6.3.2 QR Factorisation
Having transformed the original matrix A into tridiagonal form using successive Householder transformations under which the eigenvalues are invariant, we now have to solve the eigenvalue problem for the transformed matrix B0 = An;2. This is accomplished by the so called QR method as follows. 1 Factor B0 = Q0 R0 where Q0 is orthogonal and R0 is an upper triangular matrix. Then compute
B1 = R0 Q0

2 Factor
then compute
20
k

B1 = Q1 R1 B1 = R0 Q0
P
T k

P in this form is autommatically symmetric

= I ; 2v v ; = I ;2 v v = I ; 2v v
k T k T k k T k

T k

and orthogonal, since


PP
k T k

= I ; 2v v 2 = I ; 4v v + 4v v v v = I ; 4v v + 4v v = I
k T k k k T k T k k k T k T k k

T k

where the fact that v v = 1 has been used in the pennultimate step. 21 The asterisks denote components other than zero's
T k k

v11 = 0 r v21 = 1 2 vj1 =


where

S1 aj1 sgn(a21 ) 2v21 S1

ja21 j

3 j

First step

q 2 2 S1 = a2 21 + a31 + ::: + an1


v12 = v22 = 0 s (1) ja32 j v32 = 1 2 S2
a32 ) vj2 = a 22sgn( 4 j n v21 S2 where r 2 (1) 2 (1) S2 = a(1) 21 + a31 + ::: + an1
(1)
j

(1)

Second step
2

Figure 5: De nition of the unit vectors vk showing explicitly the rst (v1 ) and second (v2 ) steps. The components vij refer to the i-th component of unit vector vj . a(ijk) refers to elements of Ak , and aij are the elements of the original matrix A. ;1) ) = +1 when a(k;1) 0 and sgn(a(k;1) ) = ;1 when In the k-th step, sgn(a(kk+1 k k+1 k k+1 k ;1) < 0. After v is computed, P is determined from Eq. 28 and then A from a(kk+1 1 1 1 k Eq. 27. Step 2 is the same as step 1 with all subscripts increased by 1 and aij replaced by a(1) ij , which has just been computed. Thus, we obtain v2 and hence P2 according to Eq. 28. In turn, A2 can then be computed from Eq. 27, upon which v3 can be determined, etc.

3 Continue in this way with the general factorisation rule


Bs = Qs Rs

followed by computation of
Bs+1 = Rs Qs

If the eigenvalues22 of B are all di erent in absolute value then we have that slim !1 Bs = D
22 R = Q;1 B from the factorisation in the general step s. Therefore, B +1 = R Q = ; Q 1 B Q so that B +1 is similar to B and hence, by induction, to B0 and A.
s s s s s s

where D is a diagonal matrix with the required eigenvalues i as its diagonal elements23 .
s s s

A proof of this assertion can be found in Wilkinson, J.H.,The Algebraic Eigenvalue Problem. Oxford Clarendon, 1965.
23

The factorisation itself is obtained in the following way. The tridiagonal matrix B0 has n ; 1 generally non-zero elements b21 b32 ::: bn n;1 below the main diagonal. Suppose that B0 is multiplied on the left by a matrix C2 such that the result C2B0 has b21 = 0. Then, the result is multiplied on the left by a matrix C3 such that C3 C2 B0 has b32 = 0, etc. After n ; 1 such multiplications one is left with an upper triangular matrix R0 = CnCn;1 :::C3 C2 B0. The Cj are orthogonal plane rotations, that is, they contain the 2 2 submatrix cos j sin ; sin j cos
j j

in rows j ; 1, j and columns j ; 1, j , with 1's on all other main diagonal elements and zero's everywhere else. Since the Cj are orthognal, so is their product, and so is its inverse | this is Q0 = CnCn;1 :::C3 C2 ];1. Therefore, according to the scheme24 T T T B1 = R0 Q0 = R0 CT 2 C3 :::Cn;1 Cn Now, in the rst operation C2 B0, one has

0 cos 2 sin B ; sin 2 cos B B @ : : : :

2 2

0 0 : :

::: 1 0 b11 B ::: C C B b21 C @ : ::: A B ::: :

b12 b22 : :

::: 1 ::: C C A ::: C :::

and

is determined by the condition


21 b21 = ; sin 2 b11 + cos 2 b21 = 0 =) tan 2 = b b11

resulting in cos 2 = q 1 1 + (b21 =b11 )2 sin 2 = q b21 =b11 2 1 + (b21 =b11 ) (29)

etc.

24

Notice that we do not need Q0 explicitly, since to get B1 , compute R0C2 , then R0 C2 C3 ,
T T T

7 Lecture 10 - Ordinary Di erential Equations, Part (1)


Physical problems often present themselves as relationships between physical quantities and their derivatives. Often, these problems cannot be solved analytically, or occur in large systems of equations with many variables. Therefore, the numerical solution of di erential equations or systems of di erential equations is of paramount importance in science and engineering. In the next two lectures we describe some of the basic methods used to solve ordinary di erential equations numerically. In this lecture, we describe the basic single step methods | that is, those methods that use a knowledge of the dependent variable at the previous time-step, in order to calculate the value at the current time-step. In the next lecture, we cover some multi-step algorithms | methods that use knowledge of the history of the dependent variable over a number of previous time-steps, in order to calculate the value at the current time-step. An n-th order di erential equation can be written in the form

7.1 Introduction

yn(x) = f (x y0(x) y00(x) ::: yn;1(x))


where

(30)

(x) y00(x) = d2y(x) ::: yk (x) = dk y(x) y0(x) = dy dx dx2 dxk Since the highest order derivative is yn(x), the solution y(x) will involve n constants of integration, and will constitute an n-parameter family of functions. A unique solution therefore requires n auxiliary conditions to be sati ed. If the solution is required on x 2 a b], and these auxiliary conditions are speci ed by y(a) = 1 y0(a) = 2 ::: yn;1(a) = n then the problem is referred to as an n-th order initial value problem. When this is the case, one may introduce new variables z1 = (x) = y(x) z2 (x) = y0(x) ::: zn(x) = yn;1(x) In terms of the zk (x), Eq. 30 becomes 0 (x) = f (x z (x) z (x) ::: z (x)) zn zn(a) = n 1 2 n which together with the derivatives of the zk (x) 0 (x) = z (x) z1(a) = 1 z1 2 0 z2 (x) = z3 (x) z2(a) = 2 : : : 0 zn;1 (x) = zn(x) zn;1 (a) = n;1 comprise a system of n coupled rst order di erential equations in the n unknown functions zk (x) 1 k n. Therefore, an n-th order initial value problem in one

variable x can alsways be treated as a system of n coupled rst order initial value problems. Numerical methods for solving a rst order initial value problem can easily be generalised for a system of coupled rst order initial value problems, and so, it is su cient to study the rst order initial value problem y0(x) = f (x y(x)) y(a) =

7.2 Single-step methods

The most basic numerical algorithms for solving an initial value problem of the form y0(x) = f (x y) y(x0) = y0 (31) for x 2 x0 X ] are the single-step methods. These proceed from y(x0) = y0, and advance across x in nite but small steps h, computing approximate values of the solution y(xk ) at the 'grid points' x1 = x0 + h x2 =0 +2h x3 = x0 + 3h etc: The computation is achieved by expanding y(x) according to 2 00 y(x + h) = y(x) + hy0(x) + h 2 y (x) + ::: y(x) + hy0(x) = y(x) + hf (x y) (32) where the fact that h is small has been used in the second step, and where Eq. 31 has been used in the last step. The fact that only the rst order term proportional to h has been retained in the expansion25 leads to the designation of methods based on this approximation as rst order methods. Neglecting the higher order terms h2, etc. naturally causes a truncation error. If one chooses h to be as small as possible, the truncation error will be minimised. However, this will be at the expense of the need to make excessively many steps to cover x0 X ], with the consequent accumulation of round-o error26 . Now, evaluating Eq. 32 about x0 y(x1) = y0 + hf (x0 y0) Writing y(x1) y1, this can be substituted into Eq. 32 evaluated about x1 y2 = y1 + hf (x1 y1) etc., so that the general step will be yk+1 = yk + hf (xk yk ) This process of advancing across x continues until xk X at which point the calculation is complete. Obviously, a truncation error is entailed at each step because only the rst order term has been retained, and so after many steps, one expects that the error in the solution will have accumulated considerably.
25 26
n n n n

7.2.1 The Euler method

Therefore, the truncation error is of the order of h2 . There will also be an error generated as a result of the fact that f is evaluated at (x y ) instead of (x y(x )). If f depends very strongly upon y, ie. f varies rapidly as y varies (@f=@y large in comparison to y(x ) ; y ), this error could be large. This consideration will force the use of a very small value of h.
n n

7.2.2 The improved Euler (predictor-corrector) method


An improvement on the basic Euler method is obtained if instead of proceeding from xk straight to xk+1 by straight line of slope f (xk yk ), one now proceeds rst to xk+1=2 by straight line of slope f (xk yk ), and then from xk+1 =2 to xk+1 by a straight line of slope f (xk+1 yk+1). However, one does not in advance know yk+1. But we can rst predict yk+1 via a normal Euler step yk+1 = yk + hf (xk yk ) One can then use this value in the improved formula27 h i yk+1 = yk + h f (xk yk ) + f (xk+1 yk+1) (33) 2 Methods which employ the strategy of predicting the value of y at the next step and then correcting it are called predictor-corrector methods. This is an improvement on the basic Euler method because it takes into account that the slope of y(x) will in general have changed between xk and xk+1. Improving still on the modi ed (improved) Euler method is the family of RungeKutta methods. Particularly well used is the fourth order28 Runge-Kutta method, which is described in sec. 7.2.4. We give here a derivation of the second order Runge-Kutta method. Higher order methods are derived in the same way. Again, expanding yk+1 in a Taylor series and retaining terms up to the second order 0 + 1 h2 y 00 yk+1 = yk + hyk 2 k 2 df (xk yk ) h (34) = yk + hf (xk yk) + 1 2 dx
x=xk

7.2.3 Runge-Kutta methods

Now, instead of computing the partial derivatives of f and applying the chain rule, one approximates df (x y(x)) = f (x + h y(x + h)) ; f (x y(x)) dx h and substituting in Eq. 34 " # f ( x 1 k+1 yk+1) ; f (xk yk ) 2 yk+1 = yk + hf (xk yk ) + 2 h h 1 = yk + hf (xk yk ) + 2 h f (xk+1 yk+1) ; f (xk yk )] (35)

In this equation, yk+1 occurs on both sides. In order to obtain an explicit formula for yk+1, consider 0) f (xk+1 yk+1) = f (xk+1 yk + hyk
27

It can be shown that the truncation error involved in the improved Euler method is of the order h3 . The improved Euler method is therefore a second order method. 28 Fourth order because it can be shown that the truncation error is of the order h5 . Generally, if the truncation error is of order h +1 , the method is classed as a k-th order method.
k

which is obtained by expanding yk to rst order. Substituting this into Eq. 35 one obtains 1 h f (x y ) + f (x y + hf (x y )] yk+1 = yk + 2 k k k+1 k k k This is the recurrence formula for the second order Runge-Kutta method. Like the modi ed Euler method, this has truncation error proportional to h3.

7.2.4 4-th order Runge-Kutta algorithm


Higher oder Runge-Kutta formulas are derived in exactly the same way as in the case of the second order, formula, by considering more terms in the Taylor expansion of yk+1. One of the most utilised in the family is the fourth order procedure, which uses the recurrence formula 1 a + 2a + 2a + a ] yk+1 = yk + 6 1 2 3 4

a1 = hf (xk yk )
1 h y a2 = hf xk + 1 k + a1 2 2 1h y + 1a a3 = hf xk + 2 k 2 2

a4 = hf (xk + h yk + a3 )
A ow diagram for the implementation29 of this scheme is depicted in Fig. 6. The major drawback of the Runge-Kutta scheme is the number of function evaluations per time-step. In the 4-th order Runge-Kutta algorithm, f (x y) has to be evaluated four times for every timestep. In a simulation with perhaps thousands of variables, which has to run over perhaps tens of thousands of timesteps, this can be prohibitive. Lecture 11 will introduce methods which require fewer function evaluations, and exhibit better stability characteristics.

In an actual implementation of this algorithm, one might want to include step-size control | that is, methods of optimising the performance by adjusting h so that accuracy is maintained, while excessive computation due to h being smaller than necessary is minimised.
29

Inputfx0 X y0 hg

?
x = x0 y = y0

?
do while x + h X

Algorithm for the Fourth Order Runge-Kutta Method

?
a1 = hf (x y) a2 = hf (x + 0:5h y + 0:5a1) a3 = hf (x + 0:5h y + 0:5a2) a4 = hf (x + h y + a3 ) y=y+1 6 (a1 + 2a2 + 2a3 + a4 ) x=x+h

?
output fx yg

?
stop Figure 6:

8 Lecture 11 - Ordinary Di erential Equations, Part (2)


In the last lecture, rst order initial value problems (or systems of coupled rst order initial value problems) were introduced and some single step methods of solving such problems were described. These included the Euler method, the improved Euler (predictor-corrector) method, and the family of Runge-Kutta methods. Algorithms which had a truncation error of order hk+1 were designated order hk methods. In the present lecture we describe some of the more common multi-step methods used for solving rst order initial value problems. The single step methods use an approximation to y(xk ) in order to calculate an approximation to y(xk+1). However, if several approximate values for y(x) have already been determined, say y(xk ), y(xk;1), y(xk;2), ..., etc. then it is reasonable to use these also in the determination of y(xk+1). A method that uses n such previous values to calculate the next approximation is called an n-step method. A single-step method uses the initial value y0 to compute y1. A two step method requires two previous values in order to compute the next value. But since only one initial value is availible for a rst order di erential equation, a two-step method cannot be used to determine y1. Generally, an n-step method must already have the rst n values before it can get started, and these must be supplied by a single-step method, or by a combination of methods that can be used with the values that are already availible. In the following discussion of n-step methods, it is assumed that lower than n, or single-step methods have been used to generate the rst n steps. The easiest way to derive multi-step methods is to write the di erential equation as an integral equation

8.1 Multi-step methods

y(x + t) = y(x) +

Z x+t
x

f (s y(s))ds

(36)

where a x < x + t b and the solution is required across x 2 a b]. One can now replace the integral by a numerical integration formula.

8.1.1 Implicit formulas

Suppose that one integrates from x to x + 2h using Simpson's rule Z x+2h f (x y(x)) + 4f (x + h y(x + h)) + f (x + 2h y(x + 2h))] f (s y(s))ds = h 3 x Then Eq. 36 becomes y(x + 2h) = y(x) + h 3 f (x y(x)) + 4f (x + h y(x + h)) + f (x + 2h y(x + 2h))] which can be written as a di erence equation, yk+2 = yk + 1 3 h f (xk yk ) + 4f (xk+1 yk+1) + f (xk+2 yk+2)]

This formula, called Simpson's method or Milne's method is implicit, in that yk+2 occurs on both sides. Similarly, any integration formula that uses the value of the integrand at the upper limit will lead to an implicit formula. If f (x y(x)) is a nonlinear function, in general these implicit equations cannot be solved excatly. However, one can attempt to solve the by means of iteration. Therefore, in the equation above, if yk+1 is known, one can obtain a rst approximation to yk+2 by using the Euler formula (0) yk +2 = yk+1 + hf (xk+1 yk+1 ) (0) One can then evaluate f (xk+2 yk +2) which is then substituted into the di erence equation 1 h (0) i (1) yk +2 = yk + 3 h f (xk yk ) + 4f (xk+1 yk+1 ) + f (xk+2 yk+2 ) (1) One can then evaluate f (xk+2 yk +2) which is then substituted into the di erence equation 1 h hf (x y ) + 4f (x y ) + f (x y(1) )i (2) yk = y + k k k k+1 k+1 k+2 k+2 +2 3 The general step in the iteration procedure is given by 1 h (m) i (m+1) yk +2 = yk + 3 h f (xk yk ) + 4f (xk+1 yk+1) + f (xk+2 yk+2 ) This procedure is continued until two successive iterates agree to the desired accuracy. Obviously, such a method is expensive computationally depending upon how fast the iteration procedure converges. However, for sti di erential equations or systems of equations30 , it provides the most stable option availible.

8.1.2 Explicit formulas

Alternatively, one can use the so called 'open type' Newton-Cotes integration formulas of the form Z x+3h f (s y(s))ds = 3 h f (x + h y(x + h)) + f (x + 2h y(x + 2h))] 2 x Z x+4h 4 h 2f (x + h y(x + h)) ; f (x + 2h y(x + 2h) f (s y(s))ds = 3 x +2f (x + 3h y(x + 3h))]

Z x+5h
x

5 h 11f (x + h y(x + h)) + f (x + 2h y(x + 2h)) f (s y(s))ds = 24 +f (x + 3h y(x + 3h)) + 11f (x + 4h y(x + 4h))] etc:

A sti di erential equation is especially di cult to solve because di erent processes in the system behave with signi cantly di erent time scales. Most of the basic methods which are described here exhibit extreme instability when applied to problems of this type. However, implicit methods such as that described above prove to be the most stable methods for this type of problem.
30

which lead to di erence equations of the form 3 h f (x y ) + f (x y )] yk+3 = yk + 2 k+1 k+1 k+2 k+2 yk+4 = yk + 4 3 h 2f (xk+1 yk+1) ; f (xk+2 yk+2) + 2f (xk+3 yk+3)] 5 h 11f (x y ) + f (x y ) + f (x y ) yk+5 = yk + 24 k+1 k+1 k+2 k+2 k+3 k+3 +11f (xk+4 yk+4)] etc: These are all explicit formulas, where, the current step is the left hand side, which is expressed in terms of previous evaluations appearing on the right hand side. For instance, the rst of these is a 3-step method, which uses the three previous values at xk , xk+1 and xk+2 in order to evaluate yk+3. Similarly, the second and third of these are respectively four and ve-step formulas.

8.2.1 The Adams-Bashford method

8.2 The Adams-Moulton predictor-corrector method

Alternative to using the open-type Newton-Cotes integration formulas, suppose that f is replaced by an interpolating polynomial p3(x) of third degree. For p3(x) we take the polynomial that at xk , xk;1 , xk;2 and xk;3 has the values

fk = fk;1 = fk;2 = fk;3 =

f (xk yk ) f (xk;1 yk;1) f (xk;2 yk;2) f (xk;3 yk;3)

p3(x) is obtained from the Newton backward di erence formula 1 p3(x) = fk + rrfk + 1 r ( r + 1) r2 fk + r(r + 1)(r + 2)r3 fk 2 6 where r = (x ; xk )=h, and
rfk r2 fk r3 fk

= fk ; fk;1 = fk ; 2fk;1 + fk;2 = fk ; 3fk;1 + 3fk;2 ; fk;3

Integrating p3 between xk and xk+1

Z x +1
k

xk

p3 dx = h

Z1
0

15 3 rfk + r3 fk + r3 fk p3dr = h fk + 1 2 12 8 h (55f ; 59f + 37f ; 9f ) = 24 k k;1 k ;2 k;3

Since this is an approximation to the integral of f (x y(x)) between x = xk and x = xk+1, then Eq. 36 becomes h (55f ; 59f + 37f ; 9f ) yk+1 = yk + 24 (37) k k ;1 k;2 k ;3 This is a 4-step formula that uses the value of f computed using the previous four approximations yk , yk;1, yk;2 and yk;3 in order to calculate yk+1. The method is called the Adams-Bashford method.

8.2.2 The Adams-Moulton method

One can use Eq. 37 as a predictor yk+1, and obtain a corrector in the same way by integrating another interpolating polynomial p ~3(x) which equals fk+1, fk , fk;1 and fk;2 at xk+1 , xk , xk;1 and xk;2 , and where

fk+1 = f (xk+1 yk+1)


is obtained from Eq. 37 and the other f 's are as before. Therefore 1 r(r + 1)r2f + 1 r(r + 1)(r + 2)r3f p ~3 (x) = fk+1 + rrfk+1 + 2 k+1 k+1 6 with r = (x ; xk+1 )=h. Integrating over x = xk and x = xk+1, that is, over r = ;1 to r = 0 we obtain Z x +1 Z0 p ~3 dx = h p ~3dr = h (9fk+1 + 19fk ; 5fk;1 + fk;2) 24 x ;1 The Adams-Moulton corrector is then h (9f + 19f ; 5f + f ) (38) yk+1 = yk + 24 k+1 k k;1 k ;2 Eqs. 37 and 38 used together form the basis of a multistep predictor-corrector method, known as the Adams-Moulton method. Eq. 37 is used to predict yk+1, and then Eq. 38 is used to correct the prediction. The method provides for repeated application of the corrector stage for a given k until relative di erences between successive values becomes less tha a small preassigned number. The Adams-Moulton method is faster than say the fourth order RungeKutta method since it requires only two function evaluations, fk and fk+1 per step in contrast to the four function evaluations required by the fourth order Runge-Kutta method. It can be shown that the Adams-Moulton method is a fourth order method { comparable to the fourth order Runge-Kutta method, and that it is numericall stable. Furthermore, predictor-corrector methods have the advantage that they provide an estimate of the error. Speci cally, a large value of jyk ; yk j indicates that the error in yk is large and calls for a reduction of the step size h. On the other hand, if jyk ; yk j is very small, then h may be increased.
k k

9 Problems
1 Determine all the oating point numbers 0 fl(x) 3 for the system fl(x) 2 F (3 2 ;1 2) and plot them on a real number line. 2 Determine how many digits a computer using a binary number representation system would need to match the accuracy of an 8 digit denary oating point calculation. 3 Determine alternative expressions for a) f (x) = 1 ; cos x near x = 0 p b) f (x) = ;b+ 2ba2 ;4ac 4ac b2 which avoid loss of signi cant digits in oating point calculation. 4 Solve the equation f (x) = 0 where f (x) = x + ex ; 2 by successive iteration.

10 Answers to Problems
Problem 1 (:000)2 20 = 0 (:100)2 2;1 = 1 1 = 1 2 2 4 1 1 1= 5 (:101)2 2;1 = 2 + 8 2 16 1 1 1=3 (:110)2 2;1 = + 2 4 2 8 1 1+1 1= 7 (:111)2 2;1 = 2 + 4 8 2 16 1 (:100)2 20 = 1= 1 2 2 1 1 5 0 (:101)2 2 = 2 + 8 1 = 8 1 1= 3 + (:110)2 20 = 1 2 4 4 1 1 1 7 (:111)2 20 = 2 + 4 + 8 1 = 8 (:100)2 21 = 1 2 2=1 1 2= 5 (:101)2 21 = 1 + 2 8 4 1 2= 3 (:110)2 21 = 1 + 2 4 2 1 1 7 1 (:111)2 21 = 2 + 4 + 8 2 = 4 (:100)2 22 = 1 2 4=2

(:101)2 22 = (:110)2 22 = (:111)2 22 = The numbers are plotted in Fig. 7

1+1 4= 5 2 8 2 1+1 4=3 2 4 1+1+1 4= 7 2 4 8 2

Numbers < 3 in the set F (3 2 ;1 2)


0.00 0.25 0.625 1.00 1.50 2.00 2.50 3.00

Figure 7: The diagram shows the discrete and non-uniform nature of nite digit oating point number representations. Problem 2 It is necessary to compare 101;t with 21;s where t and s are the number of digits in the denary and binary representations respectively. The steps are 101;t =) log10 101;t =) (1 ; t) log10 10 =) (1 ; t) = = = = 21;s log10 21;s (1 ; s) log10 2 (1 ; s) log10 2 (s ; 1) 0:30103

If t = 8 then s 1 + (t ; 1)=0:30103) 26:5754. Therefore, 24 binary digit calculations are not as accurate as 8 digit denary calculations, while 25 digit binary calculations are slightly more accurate. Problem 3 a) ; cos2 x sin2 x 1 ; cos x = 1 = 1 + cos x 1 + cos x b)
;b + p2 b ; 4ac

2a

2a ;4ac p = 2ab + 2a b2 ; 4ac = ; p 22c b + b ; 4ac

;b +

p2 b ; 4ac

b + pb2 ; 4ac b + b2 ; 4ac

Problem 4 Putting and x0 = 0 one generates

x = 2 ; ex x1 x2 x3 x4 x5 x6
= = = = = = 1

(39)

;0:71828

1:51242 ;2:53761 1:92094 ;4:82709

Clearly, this sequence does not converge. However, by taking natural logarithms of Eq. 39 one obtains

x = log(2 ; x)
Now, with x0 = 0 one obtains

x1 x1 x1 x1 x1

= = = = = ::: x1 = x1 = x1 =

0:69315 0:26767 0:54948 0:37196 0:48755 0:44286 0:44285 0:44286

It should be noticed that both sequences oscillate, but the rst sequence diverges while the second converges.

11 Lecture programme
Date Session Lecture Topic Feb. 4 1 1 Introduction, Computing with numbers 6 2 Practical 1 11 3 2 Errors in computer arithmetic 13 4 Practical 2 18 5 3 Solving algebraic equations 20 6 Practical 3 25 7 4 Fitting experimental data 27 8 5 Interpolation Mar. 4 9 Practical 4 6 10 Assignment 1y 11 11 6 Linear algebra 1 13 12 7 Linear algebra 22 18 13 8 Optimisation 20 14 9 Numerical quadrature 25 15 10 Ordinary di erential equations 1 27 16 11 Ordinary di erential equations 2 Apr. 1 17 Assignment 2z 3 18 Assignment 2 29 19 12 Ordinary di erential equations 3 May 1 20 13 Partial di erential equations 1 6 21 14 Partial di erential equations 2 8 22 15 Partial di erential equations 3 13 23 16 Tutorial 15 24 17 Tutorial Table 1: 1997/98 lecture programme.
y Di erentiating experimental data. z Integrating the Landau-Lifshitz equation

Você também pode gostar