Você está na página 1de 120

1

Introduction to computational methods in science


and engineering using MATLAB
Dr. rer. nat. Hans-Georg Matuttis
University of Electro-Communications,
Department of Mechanical and Control Engineering
Chofu Chofugaoka 1-5-1
Tokyo 182-8585
Japan
http://www.matuttis.mce.uec.ac.jp/, but

1. When something is unclear, ask me, not you neighbor, who is busy himself. Ask
as much questions as you need.

2. The Script will be downloadable from http://www.matuttis.mce.uec.ac.jp/


or from the E-Learning system. You can read it online or print it out. If you
print out more than 100 pages, you have to submit an application (signed by
me) for more printout pages. Reading the script does not replace attending the
lecture.

3. The homework is exercise to learn programming, you cannot learn programming


by reading the script.

4. Learn to use the online-help from MATLAB

5. For the credit:


Presence in the Lecture
Number of Points in the programming homework and the E-Learning System
Getting Started

0.1 Why Computational Methods


Most problems in Science and Engineering differ from undergraduate problems in the
respect that no closed solutions exist: Whereas there is a closed solution (solution
function) for the harmonic oscillator with viscous damping,

x + !2 "#
x$ +02 x = 0
Viscous damping
% & '
x(t) = A exp(t) exp i 02 2t

there is no closed solution for the harmonic oscillator


with sliding friction (see graphics to the right)
v
mx + sgn (v) FN +kx = 0, sgn(v) = ,
! "# $ |v|
Sliding friction

but solutions can only be given piecewise for ranges


where the friction is constant:
% '
T
x(t) = (x x0 ) sin t + x0 for 0t
2 2
% '
T FN
x(t) = (x 3x0 ) sin t + + x0 for t T, x0 =
2 2 k
For forces which are not linear in x and its derivatives, in general not even piece-wise
solutions can be given. Other problems for which no solutions exists are problems with
many degrees of freedom (e.g. planet systems), or flow problems.
For the technically important fields of structural analysis and fluid mechanics, most
results are nowadays obtained by computer simulations.
4

Fluid mechanics: Flow around a sphere with increasing Reynolds number/ flow speed:
Analytical solutions exist only for the Stokes flow problem.
Stokes Vortices Vortex Street Turbulence

0.2 New MAC-Installation


Since 4/2006, the exercises-room is equipped with MAC-computers instead of the old
SUN-Unix-Workstation-Terminals. Software can either be started via the Window-
Icons in the Applications-Directory, or via the command-line terminal, which gets
started by clicking on the X-Icon. MATLAB can be started by clicking on the MATLAB-
Icon. It is recommended for the Course to use the EMACS-Editor. Because the current
MAC-OSX-Operating-System is based on the Unix-Operating system, the following
comments on Unix are useful, last not least, because UNIX-commands (for directory
listings, previewing of graphics, removal of unneeded data etc.) can be used from the
MATLAB-prompt via ! as escape-sequence.
WARNING! The new MAC-Installation allows the teacher to view the screen and the
currently active programs in each student terminal. Applications which are unneeded
for the lesson can be terminated from the teacher-console.

0.3 UNIX-Workstations
The MAC-computers in the terminal-room cannot be used remotely. If students want
to login remotely from outside the terminal-room, the access is possible to the SUN-
cluster which has the name sun.edu.cc.uec.ac.jp, which consists neither of MACs
nor PCs, but UNIX-Workstations. The login (also possible from the MAC-computers)
has to be via the secure shell and login by setting the X-terminal is possible as

ssh -X sun.edu.cc.uec.ac.jp

or

ssh -Y sun.edu.cc.uec.ac.jp

(the option -Y or -X depends on the version of the operating system one is logging
on from). UNIX was originally written to be used from a commando-prompt window,
not from GUI/Window systems. It is advisable to be able to use the original UNIX-
commands. If you like UNIX and would like a UNIX-like environment on your PC,
install the free CYGWIN-package, www.cygwin.com.
Some survival-UNIX-commands:
0.3. UNIX-WORKSTATIONS 5

cp source destination copy the file source to the file destination if


destination exists, overwrite it with source
cp -r source destination like copy, -r means recursive, works also with
directories
pwd display current directory
mv source destination rename the file source to the file destination
cd change directory
ps x display existing jobs and their job-id
kill job-id kills the job with the number job-id
ls list directory content
ls -d list all directories in the current directory
ls -lrt list all files with information about their size, in
the order in which the have been created, the
newest ones a the end
ls [a-c]* b list all files in this directory with names begin-
ning with a,b or c
find . -name thisname -print look for the file or directory thisname in the cur-
rent directory and in all the subdirectories
find . -name *arg* -print display all files and directories in in the current
directory and in all the subdirectories which con-
tain the string arg in their name
fgrep asdf *.txt look for all lines which contains the string asdf
in all files which have the extension .txt.
UNIX is a multi-user multi-process operating system, so several uses can run commands
at the same time on a single computer. It is also to move jobs in the background
when starting them so that they dont block the command prompt by appending the
ampersand &. If a program was started in the foreground and blocks the prompt, it
can be pushed into the background via CNTRL-Z and the execution will be interrupted.
If in the same terminal window bg is typed, the execution is resumed. (Of course,
this does not work with programs which have an input prompt in the foreground, like
MATLAB).
Some special directories:
./ the current ../ the directory below ~/ the users login direc-
directory the current directory tory
Some special characters:
* any string ? any single character [a-f] all of the characters of
a,b,c,d,e,f
The recommendation for this course is to work in a directory which is dedicated to
MATLAB alone, and this directory is not the root-directory of the account. The name
ml should do just fine:

mkdir ml
cd ml
6

0.4 MATLAB
0.4.1 Introduction: Interpreters and Compilers
In general, for the programming projects with high numerical complexity, it will be
the best to develop the algorithms in MATLAB. MATLAB is, like BASIC or symbolic
computer languages like MAPLE, MATHEMATICA, MACSYMA and REDUCE, an
interpreter language, i.e. the language commands are translated into processor instruc-
tions. Nevertheless, MATLAB is not a symbolic language, but performs all calculations
numerically, i.e. with floating point numbers.1 The language can be used either from
a command prompt or as a functional (or object-oriented) programming language. In
compiler languages like FORTRAN, C or PASCAL, the program is fully translated
into processor instructions before execution. If errors occur at runtime, the memory
contents is difficult to analyze, usually only with the help of a debugger, which may
alter the program execution and memory layout up to the point that some errors can-
not reproduced. The debuggers properties vary much more than the language itself. In
MATLAB, after a program crash, the data are still accessible in MATLABs memory
and can be analyzed using the commands from the MATLAB-language itself.
Interpreters allow fast program development. As a rule, their execution times are
higher than those of compiler languages, but during program development, usually the
compile time is more costful than the actual runtime. In MATLAB, when complex
builtin functions are initialized via small commands, like a matrix inversion, very often
the advantage in speed for the compiler languages is negligible.
Many programming languages have a whole zoo of data types. MATLABs elementary
elementary data type is the complex matrix. (Recently, MATLAB also offers more
kinds of data types, but we will not use them in this course). Variables can be pro-
cessed up to the point where they take a complex value. Variables which are used as
indices must nevertheless have an integer value.
Because it is not possible to declare variables in MATLAB, is refuses to process vari-
ables which are not initialized. In FORTRAN77, for example, it was possible to use
variables which were neither declared nor initialized, and which assumed the value 0
at the moment they were used.

0.4.2 Getting started


The MATLAB-Interpreter is started on our installation by by typing

matlab

at the command prompt, which starts the MATLAB-desktop. If you are busy and you
dont want to see the splash-screen (MATLAB-Commercial) at the program start, use

matlab -nosplash
1
The symbolic package available with MATLAB is basically MAPLE with a MATLAB-Interface.
0.4. MATLAB 7

Basic Commands:
edit starts the MATLAB-Editor with Syntax-highlighting of MATLAB-
commands. You can use any editor you like to write MATLAB-files,
but the line-end may vary between operating systems and may lead
to trouble
clear empties the memory
clear a clear the variable a from the memory
who displays the variables which have been assigned
help gives help concerning a specific topic
help help tells you how to use the help function
lookfor looks for a word in the help files, useful if you are looking for a com-
mand according to context, but are not sure about the command
name
disp(a) displays the value of the variable a
disp(a) displays just the string, a.
rand random number generator, will be used a lot to initialize data
format format the output, format compact suppresses output of empty lines,
format short forces the rounding of the output to eight digits, but
the computations are still performed with full precision
% comment sign
ls displays the current working directory of MATLAB, i.e. the directory
for which MATLAB can access the files directly
cd changes the current working directory of MATLAB

The MATLAB-desktop is written in JAVA (another interpreter-based programming


language), which has still some stability problems2 , so the desktop crashes relatively
often. If you dont want to work with the desktop to avoid unnecessary crashes, but
want to write the programs in a Unix-editor you know, you can also start MATLAB
with the command-prompt only as

matlab -nosplash -nodesktop

2
To get an idea why the JAVA-Interface of MATLAB crashes so often, see the internal memo from
SUN from http://www.internalmemos.com/memos/memodetails.php?memo id=1321
8

Special Characters:
! escape sequence, allows to use UNIX-Commands like cd, pwd from
the MATLAB-prompt
[...] 1. vector brackets referring to the value of the entry, [1 2 3] is a
vector with the entries 1, 2 and 3.
2. brackets referring to the output arguments of functions.
(...) 1. Brackets referring to the indices of a vector, a(3) is the third
element of the vector a
2. brackets referring to the input arguments of a function.
... three dots mark the end of a line which is continued in the next line
; has no syntactical function like in C but is only used to suppress the
output of the operation
i, j stands for the complex increment 1, but can also be overwritten
for other uses.
pi is indeed 3.1415....
, divide commands, when several command lines should be written in
the same editor line
: divide loop variables, lower_bound:stepwidth:upper_bound,
WARNING !
lower_bound,stepwidth,upper_bound only displays the variables
lower_bound, stepwidth, upper_bound

As a first reference, Kermit Sigmons MATLAB primer at


http://math.ucsd.edu/driver/21d-s99/matlab-primer.html
can be recommended. It gives a short overview over available commands, but it is a
good idea to get used to the builtin help-function of matlab (just type help from the
prompt). For most purposes, the internal help is sufficient. Manuals for MATLAB
are available, but there is not much information which ones needs beyond the builtin
help command on a daily basis, except the references to the algorithms used. This is a
huge difference to e.g. MATHEMATICA, where the algorithms are secret. Beware,
in contrast e.g. to FORTRAN, MATLAB is case sensitive, ABC is not the same as
abc. If you used the same variable names in lower case and upper case in the same
programs, you will run into trouble anyway. Information about a public-domain clone
of MATLAB, OCTAVE, can be found at www.octave.org.

Control statements are usually terminated via the end command, no matter whether
it is an if statement or a for loop:
a=2
b=3
for i=1:10 if (a>b)
i disp(a>b)
end else
disp(a<=b)
end
0.4. MATLAB 9

0.4.3 Matrix Processing

MATLAB was started by Cleve Moler, a famous researcher in numerical linear algebra,
as a MATRIX LABORATORY for his students, which should allow fast, save and easy
development of algorithms for numerical matrix analysis.
MATLAB has evolved to a general purpose language which specialized applications in
many fields. Many books in the meantime use MATLAB either as a formal language of
for the programming examples, have a look at http://www.mathworks.com/support/
books/index.jsp.
Matrix Syntax:
* multiplies two matrices according to the con-
ventions of inner/outer/matrix product
.* multiplies two matrices elementwise
a(2:4) elements of vector a from the second to the
fourth element
end the last element in a row/column of a vec-
tor/matrix
a(2:end) elements of vector a from the second to the last
element
b=c(2:3,2:6) assign b the values in the matrix c from line 2
and 3 from row 2 to 6
With the matrix syntax and the proper use of brackets, many operations can be sim-
plified without the use of loops:
for i=1:20 a=[0.5:0.5:10]
a(i)=i/2 or
end a=linspace(0.5,10,20)
Many functions either operate on vectors and matrices elementwise, or they are matrix
function in the sense that the operations are performed as matrix functions.
Matrix/Vector Functions:
length give the longest dimension of a matrix, or the length of a vector
size gives the dimensions of a matrix
linspace(a,b,m) make a vector with entries in m equidistant intervals between a and b
rand(n,m) set up a random matrix with n lines and m rows
exp exponential function, works elementwise on a matrix
expm matrix exponential function, works on the eigenvalue of a matrix and
can only be used for square matrices
eig eigenvalue decomposition
inv matrix inversion
norm matrix/vector norm
det determinant
svd singular value decomposition
10

0.4.4 User-defined Functions (m-files)


User-defined functions can be written as ASCII-files with the extension .m. A function
my_function would contain in the file my_function.m

function [out_arg1,out_arg2,arg3]=my_funtion(in_arg1,in_arg2,arg3)
% function [out_arg1,out_arg2,arg3]=my_function(in_arg1,in_arg2,arg3)
% The first comment after the function declaration is
% displayed if "help my_function" is typed to write
% self-documenting functions
........
return

It is advisable always to end a function with a return statement, and also the main
program.
For input-functions, MATLAB-functions use call by value, which means that the
input-arguments (in round brackets) cannot be modified in the functions. Only the
output-arguments (in []-brackets) can be modified by the called function. If an argu-
ment is to be used as an input-argument and an output-argument, it must appear in
the round brackets and in the []-brackets, like arg3 in the above example.
Global variables can be defined with the statement global in a similar way as variables
are declared in other programming languages. The same declaration must then be used
in the functions which use the variable. Global variables can be also overwritten in the
functions, they are call by reference variables.
FORTRAN uses call by reference for all input variables of subroutines. C uses call by
value for scalars and call by reference for arrays, so that a pointer to a variable must
be used if a scalar is to be modified in the functions.
Functions can be overloaded for different numbers of input parameters and for scalar
and matrix functions. If the operations used in the function allow an interpretation in
the matrix-sense, the function can automatically used for functions.
0.4. MATLAB 11

Exercises
1. Set up a vector with the entries (1, 2, 3, 4, . . . n) once using a for-loop, the second
time using an implicit loop.

2. Multiply every second element with a constant a, once using a for-loop, once
using an implicit loop.

3. Write a program which tests which finds out which elements of a vector are even

4. See what happens if you set up ones(L) , ones(L,1), ones(1,L), and what
happens when you try to multiply these objects with each other.

5. a) What do you expect what the following program does?

clear
step=2
upper_bound=10
for i=1,step,upperbound
disp(i)
end
return

b) What does the program really do? c) How do you have to rewrite the program
so that the program does what you expected it to do in a)

6. Write a program which computes the factorial n! of an integer number n,


n n!
1 1
2 12
3 123
4 1234
7. Rewrite the factorial-program as a subroutine

8. Rewrite the factorial-subroutine so that the input-arguments are checked, so that


only proper input arguments are accepted.

9. Use the help-function of MATLAB to find out the relation between the built-in
function gamma and the factorial.
Chapter 1

How to write better programs

In this chapter, I will discuss the basics of programming style for numerical computing.1
Everything seems to be a matter of course, and during several courses, some students
who considered themselves experienced programmers skipped these lessons. Usually,
after 2 weeks homework, they ran into exactly those pitfalls, problems and errors as I
discussed in these pages, and usually wasted several hours which could have been spent
productively. My usual comment was: We had this two weeks ago when you didnt
attend . . .

1.1 Programming Style


1.1.1 Choosing variable names
Of course, nobody would use variable names in scientific computing which have no
scientific meaning, like linda,charly,taro when there is no documentation what the
variables mean. Variables which are difficult to spell, like asdtfgl or such-like should
better be avoided, except if there is a convention how to compose such variable names.
Some variable names in scientific programming are self-explaining, like
x,y,z,vx,vy,vz,omega
etc. I is very easy to over-do the self-explanation by choosing too long variable names,
as i saw once in the programs in a masters thesis:
this_is_the_coordinate_of_x.
this_is_the_coordinate_of_y,

1
I will use the terminology Computational Physics, Computational Engineering, Scientific
Computing, Scientific Programming pretty much as synonyms. Numerical methods, numerical
mathematics, numerical algorithms I will use when I want to emphasize mathematical techniques
to handle floating point computations, minimize roundoff-errors, control discretization errors etc.
Numerical physics i will use if i want to emphasize that the techniques for the computational
physics require an understanding of the floating point computations involved.
14 CHAPTER 1. HOW TO WRITE BETTER PROGRAMS

etc. Of course, dont choose meaningless short variable names, like


xx,xxx,xxxx,xxxxxy

should be avoided. If one wants to express the order of the derivation, e.g. time
derivation of a coordinate (Gear Predictor corrector method uses up to the 5th time
derivative ....), is may be good practice to use
x_0, x_1, x_2, x_4 . . . . .
for the coordinate, first derivative, second and so on. A convention like
dx_f, dy_f, dx2_f, dx_dy_f .....
is also not a bad idea.
The Fortran77-standard allowed only variable (and subroutine) names of six charac-
ters, so once I spend a happy week in 1989 rewriting my longer into shorter variable
names. That was before the coming of the Fortran90-standard, nowadays, all Fortran-
Compilers accept longer variable names, but you may come across programs in the old
convention. I am not sure about the variable-lengths for C++-compilers, but be aware
that internally the compiler will expand internally the variable name variablename in
structure structurename in the object objectname to something like
objectname_structurename_variablename

and when these names become too long, this may also cause trouble. A colleagues
program once refused to compile because the internal name representation was longer
than 256 characters, and also debugging tools have problems if e.g. subroutine names
in objects or modules are becoming too long. As far as too long variable names are
concerned, one may run into similar Problems with new C++ Compilers as one did
with Fortran77 Compilers decades ago.
Be aware that similar variable names can be easily confused, especially if they make
use of uppercase/lowercase letters and the underscore, like
Variablename, variablename, Variable_name

so the use of all three in the same program will certainly cause problems. It is a
good convention to use variable names which sound differently. At one point in ones
programming career, one should decide whether to write composite variable names
with an underscore, Variable_name, or not, as Variablename, or as VariableName.
More considerations about the convention and choices for Variable names can be found
in Code Complete.
It is practical to reserve i,j,k for loop variables for short loops, which increases read-
ability, especially if the original mathematical formulae e.g. for vector operations use
1.1. PROGRAMMING STYLE 15

i, j, k as indices. In scientific applications, n is very often reserved for particle numbers,


and l,lx,ly,lz for system sizes. Using short variable names increases the readability,
when the usage of the variable is clear, e.g. one just implements a formula according
to a text book. Of course, it is not a bad idea to include the reference (page and title)
to the text book in a comment line.
Original Good: Not so good:
formula:
% A.B. Meier, Mechanics, %
a = v/t % p. 15, eq. 4 acceleration=velocity/time
1 2 a=v/t position=.5*acceleration*time^2
x = at
2 x=.5*a*t*t
For long, complex programs, the usage of e.g. n as particle number or m as number
of timesteps becomes increasingly cumbersome, especially if one tries to recycle the
code and the variable names have already be used e.g. as loop-variables. It is better
to append some information, so if one has to treat walls and particles in a program,
one should define n_wall, n_particle, and for the computation of e.g. the mass, the
program would best be written like

for i_particles=1:n_particle
m_particle(i_particle)=r_particle^2*pi
end

and in the same way for the walls.

1.1.2 Readability of code and Computational effort


Try to write your code as readable as possible. One definition of a programming Guru2
is: He understands his programs still after he has not looked at them for ten years. If
you consider yourself a Guru, try to read programs from ten years ago. There is a world
championship in writing the most unreadable C-Program, the infamous International
Obfuscated C Code Contest3 , and one of the winners wrote the following:
/*
* Program to compute an approximation of pi
* by Brian Westley, 1988
* (requires pcc macro concatenation; try gcc -traditional-cpp)
*/

#define _ -F<00||--F-OO--;
int F=00,OO=00;
main(){F_OO();printf("%1.3f\n",4.*-F/OO/OO);}F_OO()
{
_-_-_-_
2
Further Information on how to write good Programs can be found in: Code Complete, SteveMc-
Connel, Microsoft Press, Paperback 1993, also available in Japanese
3
Homepage at http://www.ioccc.org/
16 CHAPTER 1. HOW TO WRITE BETTER PROGRAMS

_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_
_-_-_-_
}
As the purpose of science is clarity, the purpose is writing scientific code is also clarity and
readability. Unreadable code is code which is hard to debug, and errors in scientific computing
are much more difficult to detect if you get the second digit in your as in the case of commer-
cial software, where you can always tell by messages like segmentation violation. Moreover,
commercial software vendors can make money by selling software updates, whereas in scien-
tific computing, people who wrote buggy code will have trouble in their career. Unreadable
code is not the fault of the programming language, though some programming languages
attract chaotic programmers more than others. The advantage of restrictive programming
languages like ADA is, that you cannot make certain classed of errors.

What is in a line
Be aware that identical operations in a computer are not speeded up by cramming everything
in the same line,
a=2*b
a=a^2
a=a/c+d
will take the same computer time as
a=((2*b)^2)/c+d
what is more readable, depends on the implemented formulae. There are tricks called per-
formance optimization which actually allow faster program execution due to the style the
code is written, but this has nothing to do with cramming many commands in a single line,
but this can only discussed in a later chapter.

Coherence
If you are not sure which lines in the code should be grouped together, it is best to stick
to the concept of coherence, writing operations in consecutive lines which affect the same
variables. Instead of
1.1. PROGRAMMING STYLE 17

a1=b1*c1
a2=b2+c2
a3=b3/c3
a1=a1-d1/e1
a2=(a1+a2)/2
a3=a3*a2

it is better to write

a1=b1*c1
a1=a1-d1/e1

a2=b2+c2
a2=(a1+a2)/2

a3=b3/c3
a3=a3*a2

Once I had to find the error in the program of a student. The result was correct, except that
it was 10 orders of magnitude wrong. He should have divided the result by a timestep dt.
The student knew that if one has to do many divisions by the same numberdt, it is fast to
compute the inverse i_dt=1/dt and multiply with i_dt. And he thought that he could save
programming time by not defining a new variable i_dt, so his program looked like

dt=10^-5
dt=1/dt

............
(one page of code
............

result=preliminary_result/dt

A perfect interaction of a stupid choice of variable names (the name of the variable at the
end did not match the meaning), a code which was longer than one page, so one could not
read it in a single window, and an incoherent way of using the variable dt.

Line length and Subroutine length


Fortran77 and also some other programming languages limited the line length to 72 charac-
ters. Many C-Programmers consider it as advanced programming style to indent their code
so much that they cannot even get the full line length on a 19inch screen, and they have to
scroll their windows to the left and right. Be aware that what you cannot see on a single
glance, but only after repeated scrolling, can easily cause errors as you have no oversight
over your code. This applies to vertical as well as horizontal scrolling, so keep the line number
of a subroutine to a certain limit (two A4 pages may already be too long) and also the number
of columns should not be more than maybe 80 characters, every program which needs more
should have more subroutines.
18 CHAPTER 1. HOW TO WRITE BETTER PROGRAMS

1.2 Safety first


The most important aspect of scientific programming is the safety of the programs. Never
in the history of mankind has it been possible to produce so many wrong answers so fast4 .

1.2.1 Check Input variables


Always check the input variables of your subroutines. You may know with which parameters
the subroutine must be used, but there may be somebody else who may not know it, usu-
ally the next student who uses the program after you, who will produce a lot of numerical
garbage. So even if you have a simulation of a mechanical system, which should be used with
positive timestep and positive masses, you better check whether the timestep at the masses
are larger than 0 at the beginning of the program. Moreover, error in passing arguments in
the subroutine can be detected more easily like that. If you find a wrong input parameter,
dont replace the input with a default value, but stop the program good and hard:

input(mass)
if (mass<-0)
error(mass should be larger than 0)
endif

For general software, it may be a good idea to define a default value. For most numerical
applications (except for accuracy thresholds), specifying a default input may be a very bad
idea.

1.2.2 Operator precedence


For analytic arithmetic expressions, the order of the arithmetic operations is usually well
defined, so that a + b cd is automatically evaluated as a + (b (cd )). Usually, the order of
the operations is equally clear with logical expressions, but with numerical code, it is a priori
not clear whether for the logical operator not, and, or as ~,&,|

(~a<b*c&d==0)

is evaluated as ((~(a<b*c))&(d==0)), or as (~((a<b*c)&(d==0)), or whether the logical


operations can indeed be applied bitwise to the integer-values as ~(10101010)=(01010101)
and then be used as the numbers of respective type. So if anything occurs which is more
ambiguous than addition and multiplication, one should use brackets.

1.3 Program documentation


Always document your program, and the best method will be to write the explanation within
the code, if they are elsewhere, they will get lost over the years. I will reject any project
which is not well documented.
4
Carl-Erik Froberg
1.3. PROGRAM DOCUMENTATION 19

1.3.1 Stupid comments


There are useful ways and stupid ways to write comments. When I once emphasized the
importance of comments for computer programs, in the next exercise lesson one student
wrote the following comment:
% here is a comment
When I asked why he wrote such a comment, he said: Because you said we should write
comments. But he had not written in his program what the program should do, and during
one hour of programming forgot actually what he should program... Another stupid comment
would be
% Divide by c
a=b/c
Of course, the multiplication is self-explaining, but for the same short line a comment like
% c from function XXYYZZ, not yet checked whether c becomes 0
a=b/c
may help a lot in debugging the code. Generally, focus on what the code is doing, not how,
because how it is done can be read from the programmed lines.

1.3.2 Comments
Usually, every line which contains information which is not self-explaining, like
volume=lx*lx*lz
mass=volume*rho
should better be documented. Of course, the amount of comments necessary grows with the
number of people who are supposed to use the code, with the number of functions and lines
in the code and with the complexity. If you are not sure who will use the code, then better
write your program documentation in English. It is generally a good Idea to formalize ones
documentation, especially at the beginning of functions/subroutines:
%PURPOSE: What the program is supposed to do
%USAGE: When and how the program
%AUTHOR: Who wrote the program
%DATE: Date when the program was written
%ALGORITHM: If the algorithm used is more complicated
% than what you can document in the body of the subroutine, you
% better explain the algorithm here
%LITERATURE: If you have used a complicated algorithm e.g. for
% matrix inversion etc, write from which book or article the algorithm
% comes, usually you have also used the naming conventions, and anybody
% who wants to understand the algorithm (maybe you after ten years) better
% reads the literature first.
%CAVEATS: If you have programmed
%TODO: How to improve the algorithm the next time you have time
%REVISION HISTORY: Write the date when you modified the algorithm,
20 CHAPTER 1. HOW TO WRITE BETTER PROGRAMS

This above example is easy to maintain, to modify or add to. What is not easy to
maintain, would be something like

% PURPOSE:
% +-------------+

and so on and so on. The simpler you design your comments, the more likely it is that
you really write them in the way they should be written. If any of the above points, leave
them away. If the routine is complete and runs as it should, dont write an empty TODO
point. If your routine-name is my_asin, (My arcus-sinus), then you dont have to do much in
principle. But if the routine actually computes the sinus in a non-standard-way by polynomial
approximation, you better write where you have it from in the literature. If the routine is
vectorized, this should be stated in the PURPOSE. If the vectorization works only if a vectorized
division is availabe, this should be written in the caveat. If you write a routine for the first
time, you dont have to write a REVISIONHISTORY, the date is enough.
And when you change the routine, also change the comments! Nothing is more confusing
than working with a correct routine for my_sinus which calculates a cosinus.

Exercises
1. Check the MATLAB-programs you wrote up to now whether they are in accordance
with the above ideas

2. Write a program which creates a matrix where the first column contains equally spaced
x-values between -5 and 5, and the second column contains the values of the second-
order polynom y = ax2 + bx + c

3. Write a program which makes creates a matrix where the first column contains equally
spaced x-values between -5 and 5, and the second column contains the values of the
function y = 1/(1 + x)

4. Write a program which can dectect whether the result of an mathematical computation
has complex parts
Chapter 2

Stochastic methods I

Stochastic methods use concepts from probability theory. Knowledge about stochastic meth-
ods is important in every field of science and engineering, because each data series contains
a certain element of chance or a certain scattering of the data.

2.1 Random Number Generators


In computer simulations, the element of chance is usually simulated by so-called random-
numbers, or pseudo-random-numbers. A random number generator is a function which should
generate a sequence of numbers which are distributed according to certain probability rules.
In case of equally-distributed random numbers, the numbers are usually between 0 and 1,
and all values can be obtained with the same probability. The random number generator in
MATLAB is called rand, and it can be called with arguments so that the result is not just a
single random number but a vector or matrix:

clear , format compact , format short


rand % output a random number
a=rand(1,4) % output a 4x1 vector of random numbers
b=rand(4) % output a 4x4 matrix of random numbers

This program using the function rand for equally distributed random numbers gives the
following output:

>> showrand
ans =
0.9501
a =
0.2311 0.6068 0.4860 0.8913
b =
0.7621 0.4447 0.7382 0.9169
0.4565 0.6154 0.1763 0.4103
0.0185 0.7919 0.4057 0.8936
0.8214 0.9218 0.9355 0.0579
22 CHAPTER 2. STOCHASTIC METHODS I

2.1.1 Mean and Variance


Standard quantities which characterise statistical properties of a set a1 , a2 , . . . an of n numbers
are the mean
n n
1( 1 (
mean: = &a' = ai . and the variance = Var(a) = &ai &a''2 .
n i=1 n 1 i=1

the mean of the squares of the differences between the respective samples and their mean.
The square root of the variance is called the standard deviation.

% PURPOSE: Calculate mean and Variance


% for the MATLAB-Random-Number Generator
clear
format compact
format short
n_rn=10000
rn_vec=rand(n_rn,1);
% rand(n_rm,1) gives line vector of length 10000,
% rand(1,n_rn) gives a row vector of length 10000,
% rand(n_rn) gives a square matrix of length 10000^2 and crashes the program
mean_rn=mean(rn_vec);
var_rn=var(rn_vec)
return

Exercise: Calculate by hand the theoretical mean and variance of the for random numbers
equally distributed between 0 and 1.
Another random number generator in MATLAB is randn, which creates random numbers
according to the Gauss distribution
) *
1 (x xm )2
G(x) = exp ,
2 2 2

and the normally distributed random numbers from randn have mean xm and standard
deviation = 1.
Exercise 2 : Estimate the error-dependence in a statistical sequence of random numbers
from the number of random numbers used by comparing the theoretical variance for the
randn-random number generator with the actually measured variance.

2.1.2 Distributions and tests of random numbers


A visualization for random numbers is just to draw the histogram: How many random num-
bers in the interval X. These intervals are called bins of the the histogram, the collection
of data in the histogram is often called binning. For a certain number of bins in the his-
togram, the distribution of the random numbers can be studied. For the following program
the output is given below and the drawn histogram is given on the right:
2.1. RANDOM NUMBER GENERATORS 23

2
clear
format compact
1.5
format short
a=rand(1,4)
hist(a) 1

>> randhist 0.5


a =
0.3423 0.3544 0.7965 0.5617 0
0.3 0.4 0.5 0.6 0.7 0.8
If many bins are used, and few random numbers, the histograms is rough, if more
random numbers are used, the histogram is smooth:
50 random numbers,10 bins
clear 10
format compact
format short 5
a=rand(1,50);
subplot(3.9,2,1), hist(a)
set(gca,Xticklabel,) 0
500 random numbers,10 bins
title(50 random numbers,10 bins) 60
axis tight
40
b=rand(1,500);
20
subplot(3.9,2,3), hist(b)
set(gca,Xticklabel,) 0
title(500 random numbers,10 bins) 50000 random numbers,10 bins
axis tight
4000

d=rand(1,50000);
2000
subplot(3.9,2,5), hist(d)
set(gca,Xticklabel,) 0
title(50000 random numbers,10 bins) 50000 numbers,50 bins
axis tight
400
subplot(3.9,2,7), hist(d,100)
title(50000 numbers,50 bins) 200
axis tight
0
0.2 0.4 0.6 0.8
Exercise 3: Estimate the dependence of the statistical fluctuations, i.e. dependence in the
differences in the number of entries in the histogram on the number of entries.
A basic test for random numbers is whether the random numbers in a bin are the same
within the statistical fluctuations Much more sophisticated tests for Random Numbers can
be found in Knuth1 . Nevertheless, to evaluate the usability of a random number algorithm
for a given problem, one should not rely on theoretically available algorithms, but one should

1
Donald Knuth, The Art of Computer Programming, Addison-Wesley 1998
24 CHAPTER 2. STOCHASTIC METHODS I

test the algorithm for a problem with an unknown solution with a related problem for which
one knows the solution. Another visual way of controlling random number sequences is to
plot one sequence as the x- and the other as the y-coordinate:

clear
format compact
n_rn=100; 0.8
a=rand(n_rn,1);
b=rand(n_rn,1); 0.6
plot(a,b,.)
axis image 0.4

2.2 Usage of random numbers 0.2


2.2.1 Initializing the Seed 0.2 0.4 0.6 0.8
Random numbers are used to verify statistical hypotheses, or to initialize simulations in an
arbitrary way. To test statistical hypotheses with several, independent sequences of random
numbers. Nevertheless, during program development, it is advantageous to test and debug the
program always with the same random number sequence. The start value which determines
the sequence is called seed, which is set by
rand("seed",X)

where X should be a numerical value. For general random number generators, very often
prime numbers have to be used as seed, so always read the documentation first.

2.2.2 Monte Carlo Method: Calculate by random numbers

Before random numbers could be easily and fast generated with computer algorithms, math-
ematicians used tabulated random numbers2 , similar as values for integrals are still used
today. Some of these random number tables had been compiled using Roulette Results from
the Casino of Monte Carlo in Monaco, and so Monte Carlo Methods got their name. In re-
cent years, in Computer Science it has become fashionable to name some methods Las Vegas
methods instead of Monte Carlo methods, but the difference is purely academic.

See E.g. Random numbers in uniform and normal distribution : with indices for subsets / compiled
2

by Charles E. Clark, Chandler Pub. Co, 1966


2.1. RANDOM NUMBER GENERATORS 25

To calculate with random numbers, let us


consider a quarter circle of radius 1 and area
a = /4, in a square of length 1 and area
a! = 1. If we choose a point randomly inside r=1
the square, the probability P that it is inside
the area is

a /4 N(in Circle) clear


P = = = , format compact, format short
a! 1 N(in Square) = Ntotal
mc_step=10000
where P is the relative frequency with which n_insize=zeros(mc_step,1);
points are found inside the quarter-circle. n_try=zeros(mc_step,1);
Therefore, can be computed via the relative i_inside=0;
frequencies as for i_mc=1:mc_step
x=rand;
N(in Circle) y=rand;
=4 .
Ntotal r2=x*x+y*y;
if r2<=1
A program which does this computation is i_inside=i_inside+1;
given on the right. end
Exercise: Try to understand the time behav- n_try(i_mc)=i_mc;
ior by plotting the difference to the absolute n_inside(i_mc)=i_inside;
result in different scales (logarithmically, dou- end
ble logarithmically) 4*i_inside/mc_step
return

2.2.3 Simulation of Stochastic processes


Random numbers allow to simulate processes which are often considered to be deterministic
in a stochastic way. Let us in the following consider a league of teams, which sports (baseball,
soccer, basketball ....) does not matter. Each of the six teams has a certain game strength
Si . Let us define the probability for a team A to win against the other team B as

SA max (Si )
PAB = .
SB SA + SB
In the following program, the game strength (team_quality) is the same for each team,
nevertheless you will find that usually one team wins. In the run which is depicted behind
the listing, the percentage of wins for all the teams are plotted. One can see that in the
beginning leading team 2 finishes as the last team, whereas the also leading team 6
wins the championship. In real life, sports reporters waste a lot of time and energy
on explaining such developments, but in out simulation, we can see that such such narrow
outcomes are just a result of chance. For stock exchange fluctuations, the same reasoning
applies.

clear , format compact, format short


n_team=5, n_game=100
26 CHAPTER 2. STOCHASTIC METHODS I

for i=1:n_team
team_quality(i)=11
end
n_games_played(1:n_team)=0
n_games_won(1:n_team)=0
for i_game=1:n_game
for i_team=1:n_team
for j_team=i_team+1:n_team
n_games_played(i_team)=n_games_played(i_team)+1;
n_games_played(j_team)=n_games_played(j_team)+1;

win_probability=...
...% relative probability
(team_quality(i_team)/team_quality(j_team))*...
...% normalization
max(team_quality)/(team_quality(i_team)+team_quality(j_team));

% assign a winner according to the probability


if (win_probability>rand)
n_games_won(i_team)=n_games_won(i_team)+1;
else
n_games_won(j_team)=n_games_won(j_team)+1;
end
score(n_games_played(i_team),i_team)=n_games_won(i_team);
score(n_games_played(j_team),j_team)=n_games_won(j_team);
end
end
end

% normalize the number of games won to a winning probability


normalization=ones(size(score));
normalization=cumsum(normalization(:,1));

plot(normalization,score(:,1)./normalization,--,...
normalization,score(:,2)./normalization,.,...
normalization,score(:,3)./normalization,+,...
normalization,score(:,4)./normalization,-.,...
normalization,score(:,5)./normalization,:,...
normalization,score(:,6)./normalization,-)
legend(team 1,team 2,team 3,team 4,team 5,team 6)
2.1. RANDOM NUMBER GENERATORS 27

0.8

0.6

0.4
team 1
0.2 team 2
team 3
team 4
0 team 5
0 10 20 30 40 50 60 70 80 90 100
team 6

Exercise: Modify the quality and see how the winning probability changes. Find out how
strong you have to modify the winning probability so that it wins in all test runs.

2.2.4 Time averages and ensemble averages


In the example of calculating with random numbers, better statistic can be obtained by
using more Monte-Carlo steps. As an alternative, one can also run the programs with few
Monte-Carlo steps several times with different seeds, save the results and average the results.
This will reduce the noise in the data. For statistically independent data, as for our calculation
of , both approaches are independent.
The law of large numbers states that the actual probabilities are only realized after infinitely
many tries. For a finite number of realizations, the fluctuations in the systems can clearly be
felt.
An approach like the one above where successive Monte-Carlo data are obtained indepen-
dently from each other is called simple sampling. If the Monte-Carlo data are chosen
depending on the previous data, this procedure is called importance sampling.

Homework 1: The obsolete Why-Function


Implement the old why-function (which does not exist any more) from Matlab Version 5.
When you typed why, you got the possible answers:
why not?;
dont ask!;
its your karma.;
stupid question!
how should I know?
can you rephrase that?
it should be obvious.
the devil made me do it.
the computer did it.
the customer is always right.
in the beginning, God created the heavens and the earth...
dont you have something better to do?;
because you deserve it
or
28 CHAPTER 2. STOCHASTIC METHODS I

Cleve / Jack / Bill / Joe / Pete


insisted on it
suggested it
told me to
wanted it
knew it was a good idea
wanted it that way
Write a program which randomly gives one of the four answers, with equal probability
Homework 2: The goat quiz
At the end of a quiz-show, the winner has to chose his price behind three doors. Between
two doors, there is a goat, if the winner chooses one of these doors, he gets nothing.
First the winner chooses one door. Then the show-master opens one of the remaining doors,
which has a goat behind.
The winner is now allowed to switch his choice to the third remaining door, or to stick to the
door he has chosen.
If his choice was successful, he gets the price.
Write a program which allows you to find out whether it is better for the winner to switch
the door after the show-master shows the goat, or whether it is better to stick with the first
choice.
Think about what is better before you write the program, but dont manipulate the program
outcome to obtain your conjectured result.
Chapter 3

Numerical Analysis I

3.1 Data types: Integers


Integers are represented according to the number representation the computer uses inter-
nally. For example, in the binary representation, integers are represented as combination of
0 and 1, in the hexadecimal (Greek-Latin for 16) representation, integers are represented as
combinations from 0 to A, see Tab. 3.1. If you need the conversion from decimal to binary,

decimal binary hexadecimal decimal binary hexadecimal


00 00000 00 10 01010 0A
01 00001 01 11 01011 0B
02 00010 02 12 01100 0C
03 00011 03 13 01101 0D
04 00100 04 14 01110 0E
05 00101 05 15 01111 0F
06 00110 06 16 10000 10
07 00111 07 17 10001 11
08 01000 08 18 10010 12
09 01001 09 19 10011 13
Table 3.1: Integers from 0 to 18 in decimal, binary and hexadecimal representation.

hexadecimal to decimal or whatever, you can always use the MATLAB-functions dec2hex,
hex2dec, dec2bin and bin2dex. 2, pronounced as to, like in decimal to hexadecimal.
The same naming logic is applied in num2str, conversion from numeric to string.
The difference between one integer and the next largest representable integer is always one,
and integers in different representations are always the same integers.
Integers in FORTRAN are also sometimes declared as INTEGER*4, because 4 Byte=4 8
Bit=32 digits are used to represent these standard integers. As one bit is reserved for the
sign of the integer, largest representable integer is something like 231 1, the smallest 231 +1.
As extensions to Standard -FORTRAN, there exist in some compilers also the INTEGER*8
type (8-Bit-integers, from 263 + 1 to 263 1) and the INTEGER*2 type (2-Bit-integer).
INTEGER*8 is convenient when large Integer-values have to expressed without the rounding
occurring in Floating point computations, whereas INTEGER*2 is convenient if large arrays
30 CHAPTER 3. NUMERICAL ANALYSIS I

of integers must be stored where the integers can only take very few values. The danger of
using the non-standard integer-types is that if one changes the compiler (or the computer
one works on), these data-types may not be available any more, and one has to rewrite the
whole program.
The C/C++-standards do not define the absolute accuracy of their data-types, but provides
the type int and longint, where the longint has possibly the larger number of digits (but may
have the same number as short int). Additionally, there is the unsigned-data-type, which
allows to represent a largest number in signed which is twice as large as in the signed data
type.

3.1.1 Fixed point numbers


Fixed-Point numbers are created from integers by renormalizing the integer with a prefac-
tor. Fixed-Point numbers are needed in environments where a constant absolute precision is
needed, for example in the banking sector, where the accuracy of an operation always must
be rounded to a certain digit, e.g. 1/10000 $, and this accuracy must be maintained over the
whole data range, from the smallest transactions of a few dollars to Billions of dollars.

3.2 Data types: Floating point numbers


In technical and scientific applications, the orders of magnitudes used are much larger than
e.g. in banking or administration. Trillions of Dollars (1012 ) are a lot of money, but trillions
of molecules is something rather microscopic. Therefore, the preferred data type in scientific
computations is the floating point number, where the numbers are spaced our irregularly,
more numbers in smaller intervals, so that the relative accuracy of operations is constant,
not the absolute accuracy as with integer numbers.
MATLAB performs all operations in floating point numbers (actually, in complex floating
point numbers). In contrast, many standard programming languages like C, C++, FOR-
TRAN, do not perform type conversion during arithmetic operation, but only at the time
of the assignment of the result. That means an integer-division of a number by a larger
number gives 0, and depending on the data-type, the results have different accuracies, as in
the following example in FORTRAN90:

program test_implicit
implicit none
write(*,*) 3/7 ! = 0 Integer-division
write(*,*) 3./7. ! = 0.428571 REAL*4-Division
write(*,*) 3.d0/7.d0 ! = 0.428571428571429 REAL*8 Division
stop
end
3.2. DATA TYPES: FLOATING POINT NUMBERS 31

3.2.1 Error
For the following sections, it will be convenient
to define the numerical error of an opera- exact A B = C
tion, the difference between the outcome of an numerical A B = C
exact operation using real numbers and the absolute error #absolute = |C C|
numerical operation using numbers as they relative error #relative = |C C|/|C|
are stored
in the computer. With respect to representing mathematical real numbers, e.g. multiples of
5./9., on the computer, integers have a constant absolute error, on average the error is of the
order of 1, whereas for floating point numbers have constant relative error as can be seen
in the following table.
Floating point number Integer
Operation Error Operation Error
50./9.=5.555555555555556 < O(1014 ) 50/9=5 < O(1)
500./9.=55.55555555555556 < O(1013 ) 500/9=55 < O(1)
relative error constant absolute error constant

3.2.2 Usage
They are the only numbers on a computer which which fast, numerical computations are
possible over a large range of possible values. Floating Point Operations, FLOPS, are usually
given as the benchmarks for computers, and currently the fastest computer in the World,
the Earth Simulator near Yokohama, can do about 40 Tera-Flops. The precision of the
declared variables is usually expressed in the declaration statement: In the FORTRAN77
standard, REAL*4/REAL*8 (or DOUBLE PRECISION) expressed that 4/8 Byte were used
to represent and the data.

3.2.3 Data-Layout
In floating point numbers, Mantissa and Exponent are stored in such a way that the number
is represented as the sum of powers of the base , precision t and lower and upper bounds
for the exponents e, L e U. A floating point number xcan then be represented as
% '
d1 d2 dt
x= + 2 + ... t e

with
0 di 1, (i = 1, . . . , t)

The usual real numbers in a higher programming like C or FORTRAN language have the
following characteristics:
Kind Byte/Bit mantissa/exponent Range valid digits
Real 4/32 23/8 8.431037 3.371038 6-7
Double 8/64 52/11 4.1910 307 1.6710 308 15-16
32 CHAPTER 3. NUMERICAL ANALYSIS I

3.2.4 Example
The above representation does not give equidistant numbers, as can be seen if the distribution
of numbers for is plotted for = 2, 1 e 2, t = 3:

-4 -3 -2 -1 0 1 2 3 4

As can be seen by the above graph, floating point numbers have as many numbers between 1
to 10 as between 10 to 100, whereas integers and fixed point numbers have as many numbers
in the interval from 0 to 1 as from 1 to 2. In other words, if numbers are rounded to fixed
point numbers, there is a constant absolute error over the whole range of numbers, whereas
for floating point numbers, there is a constant relative error over the whole range of available
numbers.
The builtin-function in MATLAB to find out which is the largest relative space between
successive floating-point numbers is eps. This function depends on the implementation of
MATLAB as well as on the hardware and will give different results on different processors. If
you use a computer language other than MATLAB of FORTRAN90, where these functions
are built in, you can use the following algorithm:

% program eval_myeps
clear
format compact
% compute machine-epsilon
myeps=1.
myepsp1=myeps+1.
while (myepsp1>1)
myeps=0.5*myeps;
myepsp1=1+myeps;
end
myeps

Other builtin functions which are convenient to get ideas about the feasibility of some numer-
ical algorithm are realmax, the largest representable floating point number, and realmin,
the smallest floating point number which is larger than 0. All these functions eps, realmin,
realmax are implementation dependent, i.e. their result may be different on different com-
puter models, because the mathematical operations are wired in a different way on the
chip.
The actual number of valid digits of mantissa and exponent are usually not defined in
3.3. CHECKING FOR EQUALITY 33

language-Standards, so that the IEEE-Standard (IEEE= Institute of Electrical and Elec-


tronics Engineers)
uses for double precision

S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
0 1 11 12 63

with the sign S, the exponents E and Mantissa digits F. whereas CRAY used something like

S EEEEEEEEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
0 1 17 18 63

which due to the lower accuracy and other idiosyncrasies in rounding, has now totally van-
ished. For most numerical computations, double precision is sufficient, and the errors with
single precision computations will be too large. Additionally to double precision, many man-
ufacturers offered Real*16/Quadruple Precision, which usually will be considerably slower
than double precision.
Modern compilers and Processors, as the Pentium4 and the G4/G5 allow faster computa-
tions if the compiler options allow rounding for double precision, so that the results will be
considerably less accurate than double precision/16 digits, but still more accurate than single
precision/8 digits.
That 4/8 Byte are used does not mean that all compiler functions operate on these data
types are computed with the full accuracy, correct up to the last Bit. The IEEE-Standard
defined that all results have to be given in such a way that only the last/ least significant bit
is rounded. Because this can become quite costly, one can usually choose compiler options
which offer higher accuracy, but not so good performance, or faster but less accurate code.
The errors elf-implemented routines may suffer from additional errors which will be discussed
in the next sections.

3.3 Checking for Equality


Concerning what has been said in this section about accuracy, there are some things one can
do with integers which one should not do with floating point numbers. For a start, do not
check floating point numbers for equality

if (a==b)

but check for equality with a certain error #. Be sure whether you need the absolute error

n=10
epsilon=10^{-n}
if (abs(a-b)<epsilon)

or the relative error, which for two values a, b =


) 0 can be defined as

n=10
epsilon=10^{-n}
if (abs(a-b)<epsilon*max(abs(a),abs(b)))
34 CHAPTER 3. NUMERICAL ANALYSIS I

The last example is much saver than writing it as

n=10
epsilon=10^{-n}
if (abs(a-b)/max(abs(a),abs(b))<epsilon) ! dont do this !!!!

which will crash the program in case that a = b = 0. Moreover, a multiplication can be
executed faster than a division, so if the if-condition is inside an often executed loop, the
division can slow down the execution of the loop considerably.

3.4 Impossible Numbers


Several Mathematical operations are not well defined in mathematics, like dividing by 0, or
computing the real value of an asin of a number with an absolute value larger than 1. The
computer has to do something if the operation is mathematically undefined or meaningless.
Programs in compiler languages like C and FORTRAN usually crash, and leave it to the
programmer to find out where the error occurred.
If a variable is defined in FORTRAN as real, the result must also be real, so expressions like
sqrt(-1) or asin(1.5) crash the program. As the elementary datatype in MATLAB is the
complex array, such operations give in MATLAB the correct result, e.g.

> sqrt(-1)
ans = 0 + 1i
> asin(1.5)
ans = 1.57080 - 0.96242i

This may become a problem if the expected result is indeed real, but very near the undefined
value, e.g. if the result without rounding error should be 1, but due to rounding it is e.g..
1.000000000001, and the asin computed from it is

> asin(1.000000000001)
ans = 1.5708e+00 - 1.4143e-06i

so that the computation will be continued with a complex part. In such cases, the input
should always be checked with an if-statement whether it conforms to the expectations.
There is also an IEEE-Standard which defines such exceptions, e.g. what should be done if
e.g. a number is divided by 0. The result is stored in a bit-pattern which is outputted as NaN,
Not a Number. MATLAB is a bit more sophisticated. For a start, it gives the correct
result for the division, :

> 4/0
warning: division by zero
ans = Inf
> -2/0
warning: division by zero
ans = -Inf

for Inf, the usual rules apply, but some cases are different:
3.5. ERRORS 35

> Inf+3
ans = Inf
> Inf+Inf
ans = Inf
> Inf/Inf
ans = NaN
> Inf-Inf
ans = NaN

When tested for equality via the ==-Operator, one Idiosyncrasy is, that Infinity is always
equal to Infinity in MATLAB, but NaN is always unequal to NaN:

> 4==4
ans = 1
> Inf==Inf
ans = 1
> NaN==NaN
ans = 0

and for tests for NaN, the isnan-Function must be used:

> isnan(4)
ans = 0
octave:17> isnan(NaN)
ans = 1
octave:18>

To test which numbers are the largest and smallest, the MATLAB-Functions realmin and
realmax can be used. Because Inf, -Inf and NaN must are represented as Floating-Point
patterns in MATLAB, there are about three to four Bit-patterns less available in MATLAB
than in Compilers for e.g. FORTRAN or C which dont use Inf and Nan. Because the
Bit-Pattern of the largest Numbers are used, the largest represented floating-point number
is smaller than in the compilers.

3.5 Errors
As we have seen in the previous chapter, the representation of real numbers as floating point
approximation leads intrinsically to rounding errors. In the following, we will treat additional
sources of error which occur in the evaluation of algebraic equations.

3.5.1 Truncation error


Function evaluation
Many mathematical expressions are defined as an infinite process, for example the exponential
function is
x x2 x3
exp(x) = 1 + + + ... (3.1)
1! 2! 3!
36 CHAPTER 3. NUMERICAL ANALYSIS I

The error which results when e.g. the infinite series is instead computed with only a finite
number of operations, i.e. truncated after a finite step is called the truncation error. In fact,
if in a given interval a function f (x) is given by the infinite polynomial series with series
coefficients a1 , a2 , a3 . . . , a , if the series is truncated after n steps in a given interval, an
approximation
n
(
f (x) = a%i xi + O(xn+1 ) (3.2)
i=0
can be found which has a smaller error than the truncated series using the coefficients of the
infinite series ai . Such a series is called an n-th order approximation of f (x), which often
makes use of the expansion of the function in terms of Orthogonal polynomials1
Whereas the exponential function exp(x) is defined in the infinite series with the coefficients

( xn
exp(x) = ,
n=0
n!

the best finite approximation in the interval 0 x ln2 to exp(x) with 10 digits is

exp(x) = a0 + a1 x1 + a2 x2 + a3 x3 + a4 x4 + a5 x5 + a6 x6 + a7 x7 + (x)

with (x) 2 1010 and the coefficients are given in Tab.3.2. Be aware that the coef-
ficients for the truncated polynomial approximation depend on the interval for which the
approximation should be used, to minimize the error.

n an 1/n!
0 1.00000 00000 1.000000000000000000
1 -0.99999 99995 -1.000000000000000000
2 0.49999 99206 0.500000000000000000
3 -0.16666 53019 -0.166666666666666657
4 0.04165 73475 0.041666666666666664
5 -0.00830 13598 -0.008333333333333333
6 0.00132 98820 0.001388888888888889
7 -0.00014 131261 -0.000198412698412698

Table 3.2: Coefficients for the Polynomial approximation of exp(x) in the interval
0 x ln2 (middle column) and the corresponding coefficients of the infinite Taylor
series.

In practice, many transcendental functions f (x) which are introduced in elementary classes
of mathematical lessons are numerically better approximated by other approximations than
polynomial approximations, e.g. by making explicit use of divisions, which can itself mimic
operations of infinite order in x, either via Pade-Approximation (quotient of two polynomial
expressions) or via continued fractions.
An effective strategy, especially with periodic functions, is argument reduction, so that one
does not have to compute the Taylor series for large x, but for a small x near the origin
Chap. 22, Handbook of Mathematical Functions, M. Abramowitz, I. Stegun, National Bureau of
1

Standards.
3.5. ERRORS 37

by either shifting the periodic functions like sin, cos into the interval between [0, /4], or by
decomposing the function into a product of an integer argument and an non-integer argument,
like in the case of the exponential function, where one computes

exp(x) = exp(m + f ) = exp(m) exp(f ), m integer, |f | < 1.

Many approximation of transcendental functions can be found in Abramovitz/Stegun2 .


Other examples for truncation error can be found in other series expansion methods, e.g.
Fourier series truncated after a certain number of coefficients, or Pade approximations, where
an analytical function +
ai xi
f (x) = +i=1
i
i=1 bi x
is approximated by the truncated Pade approximation
+n i
ai x
f(x) = +i=1
m
.
i
i=1 bi x

3.5.2 Rounding error


Because we have only a finite number of digits available, when we try e.g. in Octave to
compute 5/9, we get
> format long
> 5/9
ans = 0.555555555555556
>
So, first of all, it is not necessary to input 5./9. like in FORTRAN when one wants to use
floating point numbers. On the other hand, one sees that the periodic fraction which is the
result must be rounded to 16 decimal digits.
Therefore, when we compute the exponential function in the following program,
% Example for rounding error in computing transcendental functions
clear
format compact
format long
x=-20.5
n_iter=100
myexp=0.
for i=0:n_iter-1
% Compute the Taylor-series for the exp-function
% x!=gamma(x+1)
myexp=myexp+x^i/gamma(i+1)
end
exp(x)

return
2
Handbook of Mathematical Functions, M. Abramowitz, I. Stegun, National Bureau of Standards.
38 CHAPTER 3. NUMERICAL ANALYSIS I

we obtain
myexp=-4.422614950123058e-07
as a result, instead of the correct
exp(x) = 1.250152866386743e 09.
As we see, the result is so very wrong that not even the sign of the result is correct, we get
a negative value for a computation which should always give positive values. The problem is
also not the range of the numbers, because the smallest number representable in MATLAB
precision is 10300 , much smaller than the correct result, 109 . The problem is also not a
truncation error, as we are still trying to add taylor contributions, even the result does not
change any more after the 95th iteration. The problem is that we try to add something which
is smaller than the last digit of the summation.
There are possibilites to circumvent such kinds of problems which will be explained later in
the lecture.

3.5.3 Catastrophic cancellation


Even for the last bit of a floating point function evaluation in double precision, which gives
about 16 digits accuracy, the 17th digit is of course wrong. The subtraction of numbers of
nearly equal size shifts these invalid digits in front, so that the results For expressions like

a=cos(x)^2-sin(x)^2

this gives dubious results whenever the argument x is a multiple of /4, with arbitrary number
of canceled digits. The problem can simply be circumvented by using the trigonometric
identity cos(2x) = cos(x)2 sin(x)2 so that

a = cos(2x) (3.3)

always gives the result with the accuracy of the compilers evaluation of the cos- evaluation.

3.6 Good and bad directions in Numerics


The following integral is positive because, because the integrand is positive in the whole
integration interval [0,1]:
, 1
En = xn ex1 dx, n = 1, 2, . . .
0

From partial Integration we obtain a relation between En and En1 , which can be used to
iteratively compute En if we have E0 given:
, 1
-1 , 1
-
n x1
x e dx = x e n x1 -
- nxn1 ex1 dx
0 0 0
, 1
= 1n xn1 ex1 dx
0
En = 1 nEn1 , n = 2, . . .
3.6. GOOD AND BAD DIRECTIONS IN NUMERICS 39

In a REAL*8 implementation, we obtain with E1 = exp(1)


the result to the right. We can be sure that at least E18 , and
therefore all the following solutions are wrong, because the E1 0.367879441171442
result should not become negative in the first place. Because E2 0.264241117657115
the Ei should fall monotonically, in the iteration 1 nEn the E3 0.207276647028654
term nEn is approaching 1 in the iteration, and the correct E4 0.170893411885384
information in the iteration is quickly annihilated, so that E5 0.145532940573080
only the erroneous last digits survive. If we use instead E6 0.126802356561519
the reordered equation E7 0.112383504069363
E8 0.100931967445092
En = 1 nEn1 , n = 2, . . . (3.4) E9 0.09161229299417073
1 En E10 0.08387707005829270
so that En1 = , n = . . . 3, 2 (3.5) E11 0.07735222935878028
n
E12 0.07177324769463667
we can approximate the starting value: E13 0.06694777996972334
, 1 E14 0.06273108042387321
En = xn ex1 dx (3.6) E15 0.05903379364190187
0
, 1 E16 0.05545930172957014
xn dx (3.7) E17 0.05719187059730757
0
- E18 -0.02945367075153627
xn+1 -1 E19 1.559619744279189
= - (3.8)
n + 1 -0 E20 -30.19239488558378
1
= , (3.9)
n+1
which shows that for very large n, En is very small, and take E21 0 as an educated
guess, so that E20 = 0.5. The output for this iteration E20 E0 is given on the left, on the
right the output is overlayed with the output of the iteration E0 E20 .
E1 0.3678794411714423 E1 0.3678794411714423
E2 0.2642411176571154 E2 0.2642411176571154
0.2642411176571153
E3 0.2072766470286539 E3 0.2072766470286539
0.207276647028654
E4 0.1708934118853843 E4 0.1708934118853843
0.170893411885384
E5 0.1455329405730786 E5 0.1455329405730786
0.1455329405730801
E6 0.1268023565615286 E6 0.1268023565615286
0.1268023565615195
E7 0.1123835040692999 E7 0.1123835040692999
0.1123835040693635
E8 0.1009319674456008 E8 0.1009319674456008
0.1009319674450921
E9 0.09161229298959281 E 9 0.09161229298959281
0.09161229299417073
E10 0.083877070104072 0.083877070104072
E10 0.0838770700582927
E11 0.07735222885520793 0.07735222885520793
E11 0.07735222935878028
E12 0.0717732537375049 0.0717732537375049
E12 0.07177324769463667
E13 0.06694770141243632 0.06694770141243632
E13 0.06694777996972334
E14 0.06273218022589153 0.06273218022589153
E14 0.06273108042387321
E15 0.05901729661162711 0.05901729661162711
E15 0.05903379364190187
E16 0.05572325421396629 0.05572325421396629
E16 0.05545930172957014
E17 0.0527046783625731 0.0527046783625731
E17 0.05719187059730757
E18 0.05131578947368421 0.05131578947368421
E18 -0.02945367075153627
E19 0.025 0.0250000000000000
E19 1.559619744279189
E20 0.5 0.5000000000000000
E20 -30.19239488558378
40 CHAPTER 3. NUMERICAL ANALYSIS I

As can be seen, the second iteration with the wrong starting value converges against the
right end-value exp(1), whereas the first iteration with the right starting value converges
against a wrong result. This shows the art of numerical computing, which is, to obtain a
correct end result with a good routine and a wrong starting value, instead of obtaining a
wrong end result with a correct starting, but a bad routine.
It will be later become obvious in this course that integration is always the good direction
in numerical computing, which can decrease initial errors, whereas the differentiation is the
bad direction, which can increase initial errors. This is in contrast to manual calculation,
where the differentiation is easier to treat than integration.

3.7 Calculus and order of Methods


3.7.1 Taylor Approximation revisited

Many functions can be approximated by a polynomial 2


sin(x)
series
1st Order
( 0 3rd Order
f (x) = ai xi 5th Order
i=1 7th Order
2
in such a way, so that around the point x0 , for the th
derivative f (x), we have 5 0 5


( f ()(x0 )
f (x) = (x x0 ) . 2
cos(x)
=1
! 0th Order
0 2nd Order
For the functions exp(t), sin(t), cos(t), the Taylor series 4th Order
6th Order
is given below: 2

n
( t t t2 t3 5 0 5
exp(t) = =1+ + + + ...
n=0
n! 1! 2! 3! 6

(
t2n+1 t3 t5 exp(x)
sin(t) = (1)n = t + ... 4 0th Order
n=0
(2n + 1)! 3! 5! 1st Order
2nd Order
( t2n t2 t4 t6 2
3rd Order
cos(t) = (1)n = 1 + + ...
n=0
2n! t! 4! 6!
0
5 0 5
If we truncate the (for transcendental functions infinite) series after a finite number of terms,
we obtain the Taylor approximation. The the evaluation of a Taylor approximation, e.g. of
fourth order with the coefficients a, b, c, d, e

f (x) = a + bx + cx2 + dx3 + ex4

series can be done in an efficient and in an inefficient way. Using the above formula directly,
we can write

f(x)=a+b*x+c*x*x+d*x*x*x+e*x*x*x*x
3.7. CALCULUS AND ORDER OF METHODS 41

so that we need four additions and ten multiplications. If we use brackets around the expres-
sion in a skilled way, four additions and four multiplications are sufficient:
f(x)=a+(b+(c+(d+e*x)*x)*x)*x
It is easy to write down the derivative of the above polynom as
f(x)=b+(2*c+(3*d+4*e*x)*x)*x

8
In MATLAB, the evaluations of polynoms is imple- 2
f(x)=x 1
mented with the function polyval, the derivative with d/dx f(x)=2x
polyder, but the order of the coefficients is the oppo-
6
site from the above example, and the graph can be seen
on the right:

clear, format compact


4
P=[1 0 -1]
x=linspace(-3,3,100);
y=polyval(P,x);
P_deriv=polyder(P);
2
y_deriv=polyval(P_deriv,x);
plot(x,y,-,x,y_deriv,--)
legend(f(x)=x^2-1,d/dx f(x)=2x)
grid 0
axis image

Typical functions which cannot be approximated by


Tayler-series are functions with a jump, like the sign-
2
function sign(x) = x/|x|.
Because the Taylor-series is an infinite series, one needs
comparatively many terms to obtain a good approxi-
mation. If convergence is sought only on a finite inter- 4
vall, the Tchebicheff-approximation, which minimizes
the error over a finite interval, is usually a much better
approximation for the same number of terms.
6
2 0 2

3.7.2 Integration I
In the same way that many transcendental functions can be represented by an infinite Taylor
series but approximated as a finite polynomial series in x, integrals and derivatives can be
approximated by replacing the infinitely small differential dx by the finite difference x,
and the error can be expressed as a power of x, as in the approximation of transcendental
functions by finite power series. The simplest method to numerically integrate an integral
, b
I= f (x)dx (3.10)
a
42 CHAPTER 3. NUMERICAL ANALYSIS I

consist by the simple evaluation of the corresponding Riemann-sum as


(
I (1) = x f (xi ), f (xi ) = {a, a + x, . . . b 2x, b x}, (3.11)
i

where b a is a multiple of x, and the integration points are spaced equidistantly.3 I (1)
means that the method is of first order in x, the error is of the order of x2 .
Numerical integration is sometimes called quadrature, maybe from the time where the
integral was approximated numerically by drawing squares under the graph, and this box
counting was the first non-analytical quadrature. As an example, let us compute the
integral
, b
erf(b) erf(a)
exp(x )dx =
2
,
a 2 2
which is a bit unintuitive because its needs the error function erf to be represented analyti-
cally. With the integration bounds of [0, 1] the integral is with about 15 digits accuracy
, 1
exp(x2 )dx = .7468241328124270.
0
Now let us approximate this integral with the rectangle midpoint rule, where we replace the in-
tegral with a Riemann sum with n corner points and we will evaluate
the function in the middle of the n 1 in- Integration with Rectangle Midpoint Rule
tervals of equal width h with the functon 1
evaluated at the middle instead of the left 0.9
or right end of the integration interval: 0.8

clear 0.7
format long 0.6
n=101 % n odd !
0.5
dx=1/(n-1) % stepsize
xrect=[dx/2:dx:1-dx/2]; 0.4

yrect=exp(-xrect.*xrect); 0.3

sum(yrect)*dx 0.2

, b . 0.1 /
f (x)dx = h f (x0 )+f (x1 )+f (x2 ) . . .+f (xn )
a 0
0.4 0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

For 100 intervals / 101 corner points, the


result 0.74682719849232 is correct up to 5
digits or the rectangle midpoint rule , amazing, as we only used 100 point. In our Monte-
Carlo evaluation of , we needed of the order of 10000 points when we wanted an accuracy
of only two digits.
Very often, numerical integration methods are introduced not by using the rectangle midpoint-
rule, but the trapeze-rule, which is slightly more complicated than the midpoint rule, as each
integration interval must be approximated as a trapeze:
, b h . /
f (x)dx = f (x0 ) + 2f (x1 ) + 2f (x2 ) + . . . + f (xn )
a 2

There are methods which dont chose the points equidistantly, but optimize the choice of points
3

so that the most accurate approximation is obtained with the minimum number of points.
3.7. CALCULUS AND ORDER OF METHODS 43

Instead of evaluating the left and right Integration wit Trapeze Rule
1
bound of each interval, we will count the
function values between the upper and 0.9

lower integration bounds once, the function 0.8

values of the upper and lower bound only 0.7


half. This corresponds to the summation
0.6
over the existing trapeze areas:
0.5

clear 0.4

format long 0.3


n=101 % n odd !
0.2
dx=1/(n-1) % stepsize
0.1
xtrap=[0:dx:1];
ytrap=exp(-xtrap.*xtrap); 0
0.4 0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
(sum(ytrap)-.5*(ytrap(1)+ytrap(n)))*dx

Surprisingly, the result of 0.74681800146797 is one digit less accurate than the result with the
midpoint rule, though the program was more complicated, because we had to think about
a proper way to implement the trapeze shape for each interval. If we think about a graph
with mostly negative curvature, the trapeze rule will end with an approximation which is
constantly below the true function value. For the rectangles of the midpoint rule are partly
above, partly below the graph, so that there is error compensation already within one interval.
In the rectangle midpoint rule, we have chosen the quadrature points in the middle of the
interval. If we would have choosen the values for the function evaluation at the left/right
boundary of each interval, we would have obtainted 0.74997860426211/ 0.74365739867383,
considerably less accurate than the rectangle midpoint rule.
It can be shown4 that the Integral over convex curve
midpoint rule has an accuracy of Trapeze rule Rec. Midp. rule
1 3 %% underestimated overestinated
h f (xi ), (3.12)
24 i
better than the trapeze rule which
has an accuracy of
1 3 %%
h f (xi ), (3.13)
12 i

so it is surprising that textbooks usually introduce numerical quadrature via the trapeze rule.
Because both formulae are correct up to the second power of hi , and the error is of the third
power, they are called formulae of second order.
More accurate, accurate to third order, is the composite Simpson-Rule S(f ), which makes use
of combining the rectangle midpoint rule R(f ) and the trapez rule T (f ). When we compare
the integral for the midpoint rule and for the trapeze rule, we see that in our integration
4
G.E. Forsythe, M. Malcom, C. Moler, Computer Methods for Mathematical computations, Pren-
tice Hall 1977
44 CHAPTER 3. NUMERICAL ANALYSIS I

intervall with a convex function, the trapeze rule allways gives a too small result, the midpoint
rule gives allways a too large result. Therefore, if we average R(f ) and T (f ), we will get a
better result than R(f ) and T (f ) alone. Because the error for R(f ) (Eqn. 3.12) is twice as
large as the error for T (f ) (Eqn. 3.13), we should not take the direct average 12 R(f ) + 12 T (f ),
but the weighted average so that the error of both rules cancels:
2 1
S(f ) = R(f ) + T (f ).
3 3
Its error can be shown5 to be of the order of
1 4 %%%%
h f (xi ).
2880 i
For our example with the integral from 0 to 1 over exp(x2 ), we obtain 0.746824132817537
as the result, instead of the exact 0.7468241328124270 . . . .
In our second-order formulae, we tried to approximate the graph with straight lines and
integrated the area below the curve. A parabola is determined by 3 points, and therefore one
can also try to approximate the graph via a parabola instead of a straight line to obtain a
Simpson rule directly by supplying three integration points for each interval.
It is therefore necessary to have an odd number of integration points, and the direct derivation
of the Simpson rule can be done for an integration interval with length 2h and by inserting
the Taylor expansion of the function f (x) with the th derivatives f () around the point
x0 so that

( f ()(x0 ) Simpson-Rule
f (x) = (x x0 )
=0
!

instead of the function f (x) itself yields:


, 2h
f (x)dx =
0
, 2h % '
% %%
f (h) + f (h) x + f (h) x + . . . dx 2
0
, 2h % . / f(x)
f (h) + f (0) f (2h)/(2h) x +
g(x)
0
. / '
f (0) 2f (h) + f (2h)/(2h) 2 2
x dx =
h
(f (0) + 4f (h) + f (2h))
0 h 2h 3h 4h
3
Using this formula for the single interval of length h, we can compose the formula for the
whole integration over [a, b] with the integration point from x0 = a to x2n = b :
, b h. /
f (x)dx = f (x0 ) + 4f (x1 ) + 2f (x2 ) + 4f (x3 ) + . . . + f (x2n )
a 3
The MATLAB-Program is
G.E. Forsythe, M. Malcom, C. Moler, Computer Methods for Mathematical computations, Pren-
5

tice Hall 1977


3.7. CALCULUS AND ORDER OF METHODS 45

clear
format long,format compact
n=101 % n odd !
dx=1/(n-1) % Stepsize
xsimp=[0:dx:1];
ysimp=exp(-xsimp.*xsimp);
(4*sum(ysimp(2:2:n))+2*sum(ysimp(3:2:n-1))+ysimp(1)+ysimp(n))*dx/3

which gives the Result 0.74682413289418, slightly worse than the composed Simpson rule.
In the following table, we compare the errors of the different orders by underlining the correct
digits and introduce the Big-O-notation
Method Integral Order correctness
01
0 exp(x )dx
2

Exact 0.7468241328124270 . . . O(h )


Rectangle, left endpoint 0.74997860426211 O(h)
Rectangle, right endpoint 0.74365739867383 O(h)
Rectangle, midpoint 0.74682719849232 O(h2 )
Trapeze rule 0.74681800146797 O(h2 )
Simpson 0.74682413289418 O(h3 )
Composite Simpson 0.746824132817537 O(h3 )
Several conclusions can be drawn from viewing the above table, which hold also for other
numerical methods which have an intrinsic truncation error:

1. If a formula if of nth order and a discretization of 1/100 of the interval length are used,
for a first order implementation, the error is about 1/100=1 %, for the second order
method it will be about 1/10.000 and for the third order method it will be 1/1000000.
(Of course, the prefactors in the order also have to be taken into consideration).

2. Therefore, it is not always necessary to increase the number of discretization steps to


obtain a more accurate result. The change from first order to second order in the
rectangle rule resulted from just switching the integration points by h/2.

3. If the theoretical accuracy cannot be reached, it is necessary to consider whether


1. The function under consideration does not fulfill the necessary criteria (smoothness
etc) or
2. to search for the error in the program resulting from incorrect prefactors, intervals
with incorrect bounds etc. If a formula of second order gives results with an error
proportional 1/(number of points), then the interval bounds are usually determined
wrongly.

4. Be aware that it is not possible to integrate functions analytically if their integral has
no solution due to divergence etc .....

In this section, we have discussed the error resulting from the integration over a whole interval.
This is also called a global error, in contrast to the error which occurs in the approximation
of the single interval. Numerical methods suffering from truncation error vary depending on
whether the global error is the same as the local error, whether the global error is larger than
46 CHAPTER 3. NUMERICAL ANALYSIS I

the local error (many solvers for differential equations which do not conserve energy) or the
global error is smaller than the local error (error compensation as in the case of the Simpson
Integration.).
One generally should be very
careful in using a method with
Relative Accuracy for Integrating exp(x*x) between 0 and 1
low order accuracy and a small 10
0

Rectangular left and right boundary


time step. First of all, for 2

many problems, such a method 10


Trapeze Rule
can become quite time con- 10
4

suming. Furthermore, the


more function evaluations oc- 10
6
Rectangular Midpoint Rule

cur, the more rounding errors 8


10
are accumulated. The dia-
gram to the right shows the 10
10

cost-performance diagram, the Simpson


12
number of time steps plot- 10

ted with respect to the accu- 10


14

racy. Cost-performance dia-


grams vary depending on the 10
16
0 1 2 3 4 5
10 10 10 10 10 10
evaluated functions. As can Number of Integration Points

be seen, beyond 1000 integra-


tion points, the accuracy of the
Simpson method has already reached the limit of 16 digits of the double precision accuracy
and therefore the integral evaluation cannot be increased further by increasing the number
of integration points.
There are integration formulas which are easier to use than the numerical approximations
of the Riemann sum we introduced here, which are called Newton-Coates formulae. The
Midpoint rule is called an open Newton-Coates formula, because the endpoints of the inte-
gration intervall are not evaluated, the formulae for which the endpoints must be evaluated,
are alled closed Newton-Coates formulae. The following table shows the Taylor expan-
sion for a single interval of length h for Newton-Coates formulae of different order with the
corresponding error term:

Name Integral Formula Error Term


0 x2
Midpoint TrapezRule f (x)dx = h[ 12 f1 + 12 f2 ] +O(h3 f %% )
0x1
x3
Simpson f (x)dx = h[ 13 f1 + 43 f2 + 13 f3 ] +O(h5 f (4) )
0x1
x4
Simpson 3/8 f (x)dx = h[ 38 f1 + 98 f2 + 98 f3 + 38 f4 ] +O(h5 f (4) )
0x1
x5
Bode x1 f (x)dx = h[ 45 f1 + 45 f2 + 45 f3 + 45 f4 +
14 64 24 64
45 f5 ]
14
+O(h7 f (6) )

Carrying the error compensation in formulas with truncation error further to higher orders,
by combining low order methods as in the case of the composite Simpson rule so that a
higher-order method results, is called Romberg integration. If the limit for infinitely high
orders is taken, this is called Richardson-extrapolation, and these ideas can also be applied
to differentiation and the solution of numerical differential equations.
3.7. CALCULUS AND ORDER OF METHODS 47

3.7.3 Differentiation I
In the same way one can derive the Newton-Coates formulae for integrals from their Taylor
expansion in the previous section, one can derive formulae for the derivatives using the
Taylor expansion6 . Such approximations are often called finite difference formulas, as they
approximate the differential with the finite difference. For a data which take the value
fi2 , fi1 , fj , fj+1 , fj+2 at equidistant points, we get the following finite difference schemes
for first order derivatives.:
Name: Finite difference scheme: Leading error
ForwardDifference (fi+1 fi )/x xf %% (x)/2
BackwardDifference (fi fi1 )/x xf %% (x)/2
3point symmetric (fi+1 fi1 )/(2x) x2 f %%% (x)/6
3point asymmetric (1.5fi + 2fi+1 .5fi+2 )/x x2 f %%% (x)/3
5point symmetric (fi2 8fi1 + 8fi+1 fi+2 )/(12x) x4 f %%%%% (x)/3
Note that the leading coefficients in front of the fi2 , fi1 , fj , fj+1 , fj+2 have to add up to 0.
For second order derivatives, similar schemes are be written down in the following table and
again the coefficients add up to 0:
Name: Finite difference scheme: Leading error
3point symmetric (fi1 2fi + fi+1 )/(x2 ) x2 f %%%% (x)/12
3point asymmetric (fi 2fj+1 + fi+2 )/(x2 ) x2 f %%% (x)
5point symmetric (fi2 + 16fi1 30fi + 16fi+1 fi+2 )/(12x) x4 f %%%%%% (x)/90
In contrast to numerical integration, which smoothes out errors via error compensation,
numerical differentiation roughens up the solution. If high accuracy is desired, there are
usually better solutions than computing the derivatives directly via finite difference schemes.
The graph to the right shows the numerical integral of
, x
1 sin(y)dy = 1 cos(x)
0

and the numerical differential of sin(x),

d
sin(x) = cos(x)
dx
with some additional noise of 1 % in sin(x). The above graph was produced with the following
program:

clear
format compact
nstep=200
x=linspace(0,4*pi,nstep);
dx=mean(diff(x));
idx=1/dx;
y=sin(x-dx/2)+0.01*(rand(size(x))-.5);
subplot(3,1,1)
6
Clive A.J. Fletcher, Computational Techniques for Fluid Dynamics, Vol.1, 2nd. ed. Springer 1990
48 CHAPTER 3. NUMERICAL ANALYSIS I

1
sin(x)+0.005*rand
0.5 d/dx sin(x)
cos(x)
0 1int(sin(x))

0.5

1
0 2 4 6 8 10 12
absolute error

0.1
0.05
0
0.05
cos(x)d/dx*sin(x)
0.1 cos(x)1+int(sin(1)

0 2 4 6 8 10 12
relative error

4 (cos(x)d/dx*sin(x))/cos(x)
(cos(x)1+int(sin(1))/cos(x)
2

2
0 2 4 6 8 10 12

plot(x,y,-.,x(1:nstep-1),diff(y)*idx,...
x(1:nstep-1),cos(x(1:nstep-1)),:,...
x(1:nstep-1),1-cumsum(y(1:nstep-1))*dx,--)
axis tight
legend(sin(x)+-0.005*rand,d/dx sin(x),cos(x),1-int(sin(x)))
subplot(3,1,2)
plot(x(1:nstep-1),diff(y)*idx-cos(x(1:nstep-1)),...
x(1:nstep-1),1-cumsum(y(1:nstep-1))*dx-cos(x(1:nstep-1)),:)
legend(cos(x)-d/dx*sin(x),cos(x)-1+int(sin(1))
title(absolute error)
axis tight

subplot(3,1,3)
plot(x(1:nstep-1),((diff(y)*idx-cos(x(1:nstep-1)))./cos(x(1:nstep-1))),...
x(1:nstep-1),(1-cumsum(y(1:nstep-1))*dx-cos(x(1:nstep-1)))...
./cos(x(1:nstep-1)),:)
legend((cos(x)-d/dx*sin(x))/cos(x),(cos(x)-1+int(sin(1))/cos(x))
axis tight
title(relative error)

return

Both the differential and the integral should give cos(x), but the differential is so noisy that
the result deviates visibly from the exact solution. The integral over the noisy result gives
nevertheless a smooth curve. This is again a case of a good and a bad direction of
numerical computing, as we encountered before by rewriting the iterative computation of the
3.7. CALCULUS AND ORDER OF METHODS 49

equation
En = 1 nEn1 (numerically unstable)
into
En = 1 nEn1 (numerically stable).
As can be seen, for the differentiation and its inverse operation, the integration, in numerical
analysis, the differentiation consists the bad direction, integration is the good direction.
In numerical analysis, integrals, also of higher order, can usually be computed with sufficient
precision, in contrast to derivatives, whereas in analytical calculations, it is usually always
possible to compute differentials, but very often the computation of closed forms for integrals
is problematic.
Exercises:
1. Write a program which produces floating point numbers for base 2 and mantissa 4, as well
as for base 4 with mantissa 2.
a) Chose the exponent so that both number systems are roughly comparable.
b) Plot the position of the numbers.
c) Compare both number systems: Which number system can be supposed to have the better
roundoff-properties.
2. Write a program which computes the exponential function exp(x) using the Taylor series
and one program which computes the exponential function by evaluating the integer part of
x using powers of the Euler number e and the non-integer-part using the taylor series. For
which size of the arguments become the
Chapter 4

Graphics

4.0.4 Initializing and manipulating vectors


Instead of using for loops for setting up vectors and matrices, it is convenient in MATLAB to
use the implicit loops provided by the colon operator : and brackets for the array constructor
[]

>> a=[3:6]
a =
3 4 5 6

For step-sizes different from one the stepsize can be specified as


[lower_bound:stepsize:upper_bound] like in

>> a=[3:.5:6]
a =
3.0000 3.5000 4.0000 4.5000 5.0000 5.5000 6.0000

This is different from loops in FORTRAN and C, where the stepsize is added as the third ar-
gument for a loop statement. Whereas the colon operator notation using : constructs a vector
with a given lower and upper bound for a given stepsize, [lower_bound:stepsize:upper_bound],
if instead of the stepsize the number of points is known, it is more convenient to use the
linspace-function

>> a=linspace(3,6,7)
a =
3.0000 3.5000 4.0000 4.5000 5.0000 5.5000 6.0000

There is also a function which gives vectors in logarithmic spacing

>> b=logspace(1,1000,3)
b =
10 Inf In
>> b=logspace(1,4,4)
b =
10 100 1000 10000
52 CHAPTER 4. GRAPHICS

If several vectors should be concatenated, this can be done with the brackets for the array-
constructor []
>> c=[1 3]
c =
1 3
>> c=[4 c b]
c =
Columns 1 through 6
4 1 3 10 100 1000
Column 7
10000
After a lot of vector operations, one usually one also needs functions which give informations
about the vectors used. The most elementary function, which displays information about
variables, is
>> who

Your variables are:

a ans b c
The length of a vector is displayed by
>> length(a)
ans =
7
but this function makes no difference between column- and row vectors. For information on
higher dimensions, one has to use the function
>> size(a)
ans =
1 7
Vector elements can be accessed either via the for loops like in other programming languages
like in
>> for i=1:length(b)
f(i)=2*b(i)
end
f =
20
f =
20 200
f =
20 200 2000
f =
20 200 2000 20000
53

or via the colon-notation with : and round brackets so that for a vector

>> c=.2:.2:1.2
c =
0.2000 0.4000 0.6000 0.8000 1.0000 1.2000

the assignment of the second to the fourth element to a vector g can be written as

>> g=c(2:4)
g =
0.4000 0.6000 0.8000

The whole of a vector can be assigned without specifying the bounds like in

>> h=c(:)
h =
0.2000
0.4000
0.6000
0.8000
1.0000
1.2000

If the vector from a lower bound up to the end should be assigned, this can be done via the
the end statement in round brackets together with the colon operator :

>> v=c(4:end)
v =
0.8000 1.0000 1.2000

Functions which operate on vectors are usually defined in the canonical way, that means
in a way in which one expects the function to work. The functions prod and sum acting on a
vector behave in the way one expects, e.g. they give as a result the product and sum of the
vector elements. Whereas prod and sum are acting on vectors and give a scalar as a result,
the functions cumsum and cumprod which computed the cumulated sum and the cumulated
product give a vector as a result

>> cumsum(1:5)
ans =
[1 3 6 10 15]

One must be careful with the use of multiplicative operators *, / and ^, which are in
MATLAB in general interpreted in the sense of numerical linear algebra, so that column-
and line- operations must match. If one wants to use these operators elementwise, one should
use their elementwise variants which are preceded by a ., as in .*, ./ and .^.
54 CHAPTER 4. GRAPHICS

4.1 Setting up and manipulating Matrices


Matrices can be manipulated in the same way as vectors, preferably with the colon operator
: and the brackets for the array-constructor []. Some elementary builtin MATLAB matrix
functions will be explained here because they make matrix construction easier. The ones
function sets up a matrix with ones as every element, as usual in MATLAB a single matrix
sets up a square two-dimensional matrix

>> ones(2)
ans =
1 1
1 1

For non-square matrixes, two indices have to be specified, where the first is the columns-index,
and the second is the row index, for example

>> ones(3,2)
ans =
1 1
1 1
1 1

The zeros function behaves in the same way as the ones function, only that it sets up
matrixes with 0 as every element

>> zeros(2,3)
ans =
0 0 0
0 0 0

In linear algebra, the identity matrix is very important, and therefore the unit matrix in
MATLAB is named eye, eyedentity / identity

>> eye(3)
ans =
1 0 0
0 1 0
0 0 1

It may be surprising, but the identity-matrix is also defined for non-square matrices, as the
following example shows

>> eye(2,5)
ans =
1 0 0 0 0
0 1 0 0 0

Another important matrix function is the constructor for the random matrix
4.1. SETTING UP AND MANIPULATING MATRICES 55

>> rand(2,4)
ans =
0.8214 0.6154 0.9218 0.1763
0.4447 0.7919 0.7382 0.4057

Matrices can then be constructed via matrix functions alone like in

>> c=ones(2)-eye(2)
c =
0 1
1 0

or with the help of the matrix constructor brackets [] so that

>> b=zeros(2)
b =
0 0
0 0
>> d=[2 3
4 5]
d =
2 3
4 5
>> e=[c b
d c]
e =
0 1 0 0
1 0 0 0
2 3 0 1
4 5 1 0

A very convenient function similar to linspace in one dimension which can be used to set up
arguments for functions in higher dimensions is the meshgrid-function which the functionality
is as follows:

>> [X,Y] = meshgrid(1:3,10:14)


X =
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
Y =
10 10 10
11 11 11
12 12 12
13 13 13
14 14 14
56 CHAPTER 4. GRAPHICS

4.2 Graphs and Visualization


4.2.1 Visualizing Vectors
The elementary command in MATLAB for plotting functions etc. is the plot command.
Plots can be shown in the plotting window either alone or as one of many sub-plots, like in
the following example

>> x=[.1:.1:.5]
x =
0.1000 0.2000 0.3000 0.4000 0.5000
>> y=[20:-4:1]
y =
20 16 12 8 4

>> subplot(2,2, 1)
>> plot(y)

>> subplot(2,2,2)
>> plot(x,y)

which displays on the screen (note the differnt scale on the x-axis)
20 20

15 15

10 10

5 5

0 0
1 2 3 4 5 0.1 0.2 0.3 0.4 0.5
Plots of vectors can be done either by plotting the vector directly or by specifying two vectors,
the first will be taken as the x-axis. If the vector length does not match, MATLAB issues
an error message and stops the program execution. The plots are automatically done in the
sub-plot which has been called last.
A subtle way of plotting it the plot of a vector of complex numbers. If you have a complex
vector c, you can get the real part x and the imaginary part y via

x=real(c)
y=imag(c)

The command plot(c) has then the same effect as plot(x,y) which means that the imagi-
nary part is plotted versus the real part.
If a new plotting window should be opened, this can be done via the figure command, the
first window is built is figure(1) command which is automatically executed if no plotting
4.3. VISUALIZING ARRAYS 57

window is open, figure(2) opens a second plotting window and so on. Plots are done in the
window for which the figure command was called last.
There is a wide variety of ways to influence graph annotation in MATLAB
0.3
% example for graph anotation
subplot(2,2,1) 0.2 any label here

x=[.1:.1:.5]
0.1
plot(x,x.*x,x,x.*x.*log(x))

Yaxis
xlabel(Xaxis) x2
0
ylabel(Yaxis) x2*log(x)

titel(Plot anotations) 0.1


text(.14,.2,any label here)
legend(x^2,x^2*log(x)) 0.2
0.1 0.2 0.3 0.4 0.5
Xaxis
The legend created by label can be moved with the mouse. MATLAB graphics can be saved
in various styles (Postscript, encapsulated Postscript, JPEG, .....) via the print command.
The line-style (full lines, dotted lines, symbols) can be changed via the arguments in the plot
command
0.5
plot(x,log(x),...
x,x,:,... 0
x,x.^2,+,...
x,x.^3,*-) 0.5

1
To look at a drawing in higher resolution,
use the zoom command, aim with the mouse- 1.5
pointer at the region which should be zoomed
2
and click the left mousebuttom (the right
mousebuttom unzooms the region again). 2.5
0.1 0.2 0.3 0.4 0.5

4.3 Visualizing Arrays


As an example in this section, we will use the the Rosser matrix
>> rosser
ans =
611 196 -192 407 -8 -52 -49 29
196 899 113 -192 -71 -43 -8 -44
-192 113 899 196 61 49 8 52
407 -192 196 611 8 44 59 -23
-8 -71 61 8 411 -599 208 208
-52 -43 49 44 -599 411 208 208
-49 -8 8 59 208 208 99 -911
29 -44 52 -23 208 208 -911 99
If a matrix is displayed with the plot command, the lines are each plotted as a vector, as in
the example for plot(rosser) below on the right.
58 CHAPTER 4. GRAPHICS

Arrays are plotted as arrays with the verbmesh-command, which plots the data in a
wire-frame-type of graph, as below on the right.
The view command can be used to set a different viewing angle for three-dimensional plots. It
is also possible to change the viewpoint interactively via the rotate3d command by pointing
with the mouse on the frame and pulling the frame of the 3D-graph.
1000

800

1000
600

400
500

200
0
0

500
200

400
1000
8
600 6 8
7
4 6
800 5
4
2 3
2
1000 0 1
1 2 3 4 5 6 7 8

4.4 Analyzing systems via plotting


In the following, a linear, an exponential function, a hyperbola, a logarithmic and an inverse
square root are plotted via the following program, in linear, x- and y logarithmic as well as
double logarithmic scale:

clear
format compact

x=[linspace(0.1,10,100)];

subplot(2,2,1)
plot(x,x,-,x,exp(x),--,x,1./x,:,x,log(x),-.,x,1./sqrt(x),-+)
axis([0 10 -3 20 ])
title(linear plot)
legend(x,exp(x),1/x,log(x),1/sqrt(x))

subplot(2,2,2)
semilogx(x,x,-,x,exp(x),--,x,1./x,:,x,log(x),-.,x,1./sqrt(x),-+)
axis([0.1 10 -3 20 ])
title(semilogarithmic in x-direction)
legend(x,exp(x),1/x,log(x),1/sqrt(x),2)

subplot(2,2,3)
semilogy(x,x,-,x,exp(x),--,x,1./x,:,x,log(x),-.,x,1./sqrt(x),-+)
axis([0.1 10 .01 20000 ])
title(semilogarithmic in y-direction)
legend(x,exp(x),1/x,log(x),1/sqrt(x),2)
4.4. ANALYZING SYSTEMS VIA PLOTTING 59

subplot(2,2,4)
loglog(x,x,-,x,exp(x),--,x,1./x,:,x,log(x),-.,x,1./sqrt(x),-+)
axis([0.1 10 .01 20000 ])
title(logarithmic )
legend(x,exp(x),1/x,log(x),1/sqrt(x),2)

linear plot semilogarithmic in xdirection


20 20
x x
exp(x) exp(x)
15 1/x 15 1/x
log(x) log(x)
1/sqrt(x) 1/sqrt(x)
10 10

5 5

0 0

0 2 4 6 8 10 1 0 1
10 10 10

semilogarithmic in ydirection logarithmic


4 4
10 x 10 x
exp(x) exp(x)
1/x 1/x
2 log(x) 2 log(x)
10 1/sqrt(x) 10 1/sqrt(x)

0 0
10 10

2 2
10 10
2 4 6 8 10 10
1
10
0 1
10

Many systems in science and mathematics can be better understood by just plotting typical
properties in different scales. Logarithmic, linear, exponential and power laws can be found
in nature, and are easily identifiable by plotting the data in difference scales.
60 CHAPTER 4. GRAPHICS

4.4.1 Linear Plots


Typical linear plots result from linear response func-
tions, the simplest is probably Hooks law, which is
sketched in the drawing to the right. It is a linear
law, and in the dynamical situation, when the spring is
pulled with a force in a certain frequency, the elongation
changes with the same frequency.
Such a linear response is not a matter of course.
There are nonlinear systems which respond with e.g.
frequency-doubling to an external stimulation, like in
the case where a high intensity red laser beam going into
a target comes out as a blue laser beam (blue light with
twice the frequency of the red light).
Crystall target
Red Beam Blue Beam
LASER

4.4.2 Logarithmic Plots


If the y-axis of a plot is choosen logarithmically, exponential curves appear as straight lines,
so that logarithmic plots allow to identify exponential behavior. Typical examples for expo-
nential behavior are time evolution plots. Radioactive decay and the increase of the GDP
in Economies are examples for such a time evolution. Below, the increase in the Dow Jones
Industrial Stock Index is shown. The curves in linear plot are bent, but more or less straight
in a logarithmic plot. It is a matter of ongoing debate whether this reflects rather the ex-
ponential increase in the strength of the US-Economy or just the exponential inflationary
effects.

If instead of the y-axis the x-axis is choosen in logarithmic scale, logarithmic curves become
straight lines. Logarithmic curves grow slower than linear curves. Typical examples for
logarithmic behavior are animal senses. Light and sound are perceived on a logarithmic
4.4. ANALYZING SYSTEMS VIA PLOTTING 61

scale, i.e. sound is not twice as loud if the pressure of the sound wave is twice as high,
but 102 as hight.

4.4.3 Double logarithmic Plots


Double logarithmic plots allow to identify power laws, functions of the form xr , where r
does not necessarily have to be integer. Power laws are usually found in nature when systems
suffer from finite size effects.

The Gutenberg-Richter, which states that the probability of earthquakes of magnitude x is


proportional to a function of 1/x r is an example where a system (the tectonic plates) create
earthquakes which are up to a maximum size, the size of the plate itself.
The curves of a power law can be written as a superposition of exponential curves, like in
the following program

clear
format compact
a=linspace(.0,100,1000)

la=length(a)

x=linspace(0.1,1000,100);

y=zeros(size(x));
for i=1:la
y=y+exp(-.1*a(i).*x);
end
loglog(x,y)

This means that a power-law is found in a system if there are many different scales, which
contribute to an exponential phenomenon and on each size scale (variable a in the above
example) there is a different prefactor in the exponential law.
In the same way as the curve of a power law can be written as a superposition of exponential
curves, a Lorentzian can be written as a superposition of Gaussian curves.
Another example for power-laws is the 1/f -noise in many technical applications. I its very
often found in system where a seemingly continuous process is a result of discrete processes,
and the deviation from the mean causes the noisy fluctuations.
62 CHAPTER 4. GRAPHICS

4.5 Specialized Plots and specialized styles


4.5.1 Graph Properties
The axis of a graph can be easily modified with the axis-command.

axis image

defines the same length unit for x and yaxis.

axis([xmin xmax ymin ymax])

defines the minimal coordinates for x and yaxis respectively. Because MATLAB usually
chooses the axis so as to end the axis at values for multiples of 1, 10, 100, it is sometimes
necessary to set

axis tight

so that the axes terminate at the extremal values of the plot. Apart from the axis, a grid can
be specified by using grid. Sometimes it is necessary to modify picture properties like the
axis labels etc from the default values chosen by MATLAB. For the program

clear
format compact
x=linspace(0,2*pi,10);
y=sin(x);
h=plot(x,y)
g=axes
get(h)
get(g)

the plot is defined as a variable, and these variable can be displayed with the get command.
The entries for h and g can then be directly modfied using the set-command, by specifying
the object-name, h, g, the property to modify, e.g. Color and the new value, e.g.

set(g,Color,[.5 .5 .5])

Another possible usage, if one already knows the property name, is e.g.

set(gca,XTickLabel,{One;Two;Three;Four})

which labels the first four tick marks on the x-axis and then reuses the labels until all ticks
are labeled. The labels can be positioned like

set(gca,XTickLabel,{1;10;100})
4.6. MATLAB-OUTPUT INTO A FILE 63

4.5.2 Including Images


The command

image

puts a default-image (in GIF-Format) on the graphics-screen. In general, graphics of nearly


any format can be read and displayed using

name=imread(name.gif)
image(name)

The date in the variable name can then be manipulated like a usuall MATLAB-array.

4.6 MATLAB-output into a file


Very often, one wants to save some output of MATLAB onto a file to include it in other
documents. For short output, it is simplest under a window-system to copy the desired lines
with the mouse into an editor. If the output becomes too long, one can use the command

diary on

and MATLAB will then output not only to the screen, but also in the file diary . If one
wants to redirect the output in a file with a special name, one can use the command

diary(special_filename)

To end the output in the diary, use the command

diary off

If you want to include the output in a LaTeX document and preserve the Computer-output-
look, you can use the
\begin{document}
\end{\document}
style, all the program examples in this scriptum are produced in such a way.

4.7 Graphics to include into documents


If you want to save the graphics on the MATLAB-Graphics screen as a file that should be
included in a text (for e.g. LaTeX or WORD), you have to use the print-command, with
the syntax

print -dFORMAT FILENAME


64 CHAPTER 4. GRAPHICS

The following table introduces some graphics-formats


print -dps2 name.ps Black-white Postscript-output, Graphics which can be directly
plotted on as (Postscript-) printer. All graphics are on a single
page in the paper format.
print -dpsc2 name.ps Like above, but in color
print -deps2 name.eps Black-white encapsulated POSTSCRIPT-output, can be in-
cluded in word-processor-programs like LaTeX.
print -depsc2 name.eps Like the above, but in color.
print -djpeg name.jpg Output in jpeg-format which can be included in Word-processors
like Word or for Internet-pages.
If the orientation should be changes e.g. to portrait, one can use the MATLAB-command
orient portrait

4.8 Including Encapsulated Postscript in LaTeX


If you want to include jpeg in GUI-based word-processors, you can use the corresponding
menus and/or the mouse.
LaTeX, probably the most widely used word-processing program in the sciences, is not GUI-
based. A (not so) short introduction about the various commands in various languages can
be found on ftp://ctan.tug.org/tex-archive/info/lshort/. LaTeX is rather a text-
programming-Language, which converts a file name.tex (the program) into a file name.dvi
(the device-independent-file) which can then be converted into general formats like Postscript
via
dvips -o name.ps name
which produces a postscript-file. These postscript-formats can then e.g. by the command
ps2pdf be used to transform the Postscript-format into PDF-format (Adobe-Acrobat). If
you want to prepare a LaTeX-document with the corresponding graphics, you have to load
a package with the software for including graphics. A widely used package is the epsfig-
package, so whereas for a conventional LaTeX-report, the header looks like
\documentclass[twoside,12pt]{report}
the header must contain
\documentclass[twoside,12pt]{report}
\usepackage{epsfig}
if postscript-graphics should be included. Graphics can then be included with
\epsfig=filename,width=??,height=??,angle= where either width or height must be given,
the angle can also be left away. Here are some examples:
\epsfig{file=graphiken/circle_square.eps,width=2cm}

r=1
4.9. PROCESSING GRAPHICS 65

\epsfig{file=graphiken/circle_square.eps,height=2cm}

r=1

\epsfig{file=graphiken/circle_square.eps,height=2cm,angle=-90}
r=1

\epsfig{file=graphiken/circle_square.eps,height=2cm,width=4cm}

r=1

In principle, all postscript-files should be includable in Latex, but some Programs produce
postscript-output which is not compatible with LaTeX. Under UNIX, one can use the com-
mand

ps2epsi name.ps name.epsi

to convert a file name.ps to a file name.epsi which corresponds to the encapsulated postscript
interchange format.

4.9 Processing Graphics


If you have graphics in another format than postscript, like *.jpeg or *.gif files, which you
want to include in LaTeX, you have to convert them into postscript with some other software.
One of the most widely used programs for this task under UNIX is xv, which allows to load
graphics in one format and to save it in another format, for example

xv name.jpg

will load the program name.jpg. Pressing the right button of the mouse will make a menu
appear, and the graphics can be saved as Postscript by choosing the appropriate menu (SAVE
FORMAT POSTSCRIPT).
Chapter 5

Linear Algebra

Usually, one learns about linear algebra in the first year of study, but often, one needs it much
later, when one has forgotten most of it already. MATLAB means Matrix LABoratory and
its first version was written by Cleve Moler so that his students could learn linear Algebra
more easily.
General documentation of MATLAB can also be found on http://www.mathworks.com/
access/helpdesk/help/techdoc/matlab.shtml

5.1 Matrix Manipulation


5.1.1 Matrix commands
The diagonal of a matrix can be extracted with the diag command in the following way:

A =
0.520109 0.340012 0.470293
0.510104 0.326988 0.636776
0.010375 0.782090 0.900370

> diag(A)
ans =
0.52011
0.32699
0.90037

If the input of the diag-command is a vector, diag constructs a matrix with the vector on
the diagonal, a typical example how commands are overloaded in MATLAB:

> b=[3 5 7]
b =
3 5 7

> A=diag(b)
A =
68 CHAPTER 5. LINEAR ALGEBRA

3 0 0
0 5 0
0 0 7

The operator for the matrix-transpose is the accent :

> A=rand(2)
A =
0.66166 0.48661
0.69184 0.39113

> B=A
B =
0.66166 0.69184
0.48661 0.39113

Because MATLAB knows the difference between column- and row-vectors, the transpose-
operator can also be used to transform column- into row-vectors and vice versa:

> v=[1 2 3 4 5]
v =
1 2 3 4 5

> u=v
u =
1
2
3
4
5

For complex-valued matrices, the -operator gives the Hermitian conjugate matrix:

> H=rand(3)+sqrt(-1)*rand(3)
H =
0.59574 + 0.89043i 0.91601 + 0.87663i 0.19920 + 0.74066i
0.71691 + 0.73996i 0.31324 + 0.44034i 0.19254 + 0.85119i
0.38660 + 0.13756i 0.33661 + 0.71527i 0.29184 + 0.58186i

> G=H
G =
0.59574 - 0.89043i 0.71691 - 0.73996i 0.38660 - 0.13756i
0.91601 - 0.87663i 0.31324 - 0.44034i 0.33661 - 0.71527i
0.19920 - 0.74066i 0.19254 - 0.85119i 0.29184 - 0.58186i

The commands which extract the upper/lower trigonal matrix are triu/tril:
5.2. MATRIX PRODUCTS 69

> A
A =
0.951650 0.084814 0.208357
0.109170 0.585341 0.562931
0.667123 0.528991 0.860920

> tril(A)
ans =
0.95165 0.00000 0.00000
0.10917 0.58534 0.00000
0.66712 0.52899 0.86092

> triu(A)
ans =
0.95165 0.08481 0.20836
0.00000 0.58534 0.56293
0.00000 0.00000 0.86092
If the columns or rows should be flipped, i.e. if their order should be inverted, this can be
done with the commands fliup and fliplr, flip up down and flip left right:
> fliplr(A)
ans =
0.73180 0.40541
0.55208 0.79014

> flipud(A)
ans =
0.79014 0.55208
0.40541 0.73180
These two commands can be used to form a transposition for a complex matrix, which is not
the hermitian conjugate:
> A=rand(2)+sqrt(-1)*rand(2)
A =
0.839504 + 0.572899i 0.466803 + 0.675260i
0.086815 + 0.252680i 0.132638 + 0.086518i
> B=fliplr(flipud(A))
B =
0.132638 + 0.086518i 0.086815 + 0.252680i
0.466803 + 0.675260i 0.839504 + 0.572899i

5.2 Matrix Products


For matrices and vectors, there are a lot of ways products can be computed. There is no
difference between vectors and matrices, a vector is just a matrix with only one row or column.
The simplest form is the elementwise product, which uses the operator .*:
70 CHAPTER 5. LINEAR ALGEBRA

> u=[1 2 3 4]
u =
1 2 3 4

> v=[5 6 7 8]
v =
5 6 7 8

> u.*v
ans =
5 12 21 32

The inner product for a row-vector u and a column-vector w is computed with the operator
*:

> u=[1 2 3 4]
u =
1 2 3 4

> w=[1 1 2 2]
w =
1
1
2
2

> u*w
ans = 17

If instead of u*w we compute w*u, the result is the outer product

> w*u
ans =
1 2 3 4
1 2 3 4
2 4 6 8
2 4 6 8

Matrices can be treated in the same way as vectors with elementwise multiplication .* or
multiplication in the sense of linear algebra:

> A=[1 2
> 3 4]
A =
1 2
3 4

> B=[1 -1
5.2. MATRIX PRODUCTS 71

> -2 2]
B =
1 -1
-2 2

> A*B
ans =
-3 3
-5 5

> A.*B
ans =
1 -2
-6 8

A matrix-vector product is performed like this:

> A=[1 2
> 3 4]
A =
1 2
3 4

> v=[1
> 2]
v =
1
2

> A*v
ans =
5
11
1
Matlab also has the Kronecker-Product as a builtin function,

> u=[1 2 3 4]
u =
1 2 3 4

> v=[5 6 7 8]
v =
5 6 7 8

> kron(u,v)
ans =
5 6 7 8 10 12 14 16 15 18 21 24 20 24 28 32
72 CHAPTER 5. LINEAR ALGEBRA

> kron(u,v)
ans =
5 10 15 20
6 12 18 24
7 14 21 28
8 16 24 32

Whereas the elementwise matrix product computed with . is commutative, of course the
matrix product computed with is not commutative.

5.3 Repetition of elementary linear Algebra


The angle between vectors to vectors v and w for any finite can be computed via their
inner/scalar-product as
|v w|
cos = .
vv ww
For the inner/ scalar-product, we have the CauchySchwartzInequality

|v w| vv w w.

Vectors for which the scalar product is 0 are called orthogonal. Whereas orthogonality of two
vectors v and w can be defined in theoretical mathematics as the property that their scalar
product is zero, v w = 1, in numerical mathematics it is necessary to define orthogonality
in a way so that possible rounding errors are taken into account, as the following example
shows

> w=[sqrt(3) sqrt(3)]


w =
1.73205080756888 1.73205080756888

> v=[sqrt(3) -sqrt(3)]


v =
1.73205080756888 -1.73205080756888

> v*w
ans = -9.64636952420157e-17

Obviously the last result should be exactly zero, but due to the rounding errors in the
computation, there is a finite error. How the definition of orthogonality can be applied in
such a way that rounding errors are taken into account can be seen in the next section about
the rank of matrices.

5.3.1 Size and Rank of Matrices


The size of a matrix A can be computed with the size-command:
5.3. REPETITION OF ELEMENTARY LINEAR ALGEBRA 73

> size(A)
ans =
2 2

size gives a two-row vector as an answer, the number of columns is size(A,2), the number
of rows size(A,1). Also the length of columns / rows for a column/ row vector v can be
computed with size(v,2) / size(v,1)
The rank of a matrix is in theoretical linear algebra the number of linear independent
rows/columns. Because the definition of linear independence is equivalent to the definition
of orthogonality, we will use the rank computation as the criterion for the orthogonality. The
rank of a matrix can be computed with MATLABSsrank command. How the rank command
works, will be explained later in the section about the singular value decomposition, along
how one should choose the optional threshold in MATLABSs rank command. First let us
review some theorems about the rank of matrices1

5.3.2 Some Theorems on the rank of matrices


The outer product of two matrices of two vectors always gives a matrix of rank 1.
Example:

Random matrices have nearly always full rank, i.e. the rank of a matrix con-
structed with rand is the same as the number of the columns/rows, and if the number
rows/columns larger than the number of columns/rows, we have rank(A)=min(size(A)):

> A=rand(3,4)
A =
0.23382 0.43570 0.42862 0.97961
0.79868 0.34546 0.69142 0.74305
0.66927 0.71192 0.15419 0.11667

> rank(A)
ans = 3

Square matrices which have a rank smaller than their number of columns/rows are
called singular. They cannot be inverted, and systems of linear equations where the
equations form a singular matrix cannot be solved. Their determinant vanishes.

The Rank of a matrix does not change through transposition, complex or hermitian
conjugation.

The product of non-singular matrices as the same rank as the matrices themselves:

> A=rand(3,4)
A =
1
Roger A. Horn, Charles R. Johnson, Matrix Analysis Cambrigde University Press 1991
74 CHAPTER 5. LINEAR ALGEBRA

0.476924 0.068071 0.420827 0.883968


0.061165 0.885041 0.027155 0.417966
0.139677 0.708093 0.489577 0.978820

> B=rand(3)
B =
0.908288 0.703948 0.363589
0.245781 0.950685 0.097344
0.942011 0.726192 0.064962

> C=B*A
C =
0.52703 0.94231 0.57935 1.45301
0.18896 0.92705 0.17690 0.70990
0.50276 0.75283 0.44795 1.19982

> rank(A)
ans = 3
> rank(B)
ans = 3
> rank(C)
ans = 3

For rank-deficient matrix, the rank of the product matrix is the same as that of the
matrix with the lowest rank:

> A=rand(2,4)
A =
0.200421 0.795092 0.896583 0.454798
0.838726 0.220597 0.018236 0.018493

> A(3,:)=A(2,:)
A =
0.200421 0.795092 0.896583 0.454798
0.838726 0.220597 0.018236 0.018493
0.838726 0.220597 0.018236 0.018493

> B=rand(3)
B =
0.94359 0.31700 0.20635
0.50896 0.36833 0.40063
0.48172 0.19705 0.42594

> C=B*A
C =
0.62806 0.86569 0.85555 0.43882
5.3. REPETITION OF ELEMENTARY LINEAR ALGEBRA 75

0.74695 0.57430 0.47035 0.24569


0.61907 0.52044 0.44326 0.23061

> rank(A)
ans = 2
> rank(B)
ans = 3
> rank(C)
ans = 2

5.3.3 Rank-Inequalities
For A Mm,n we have rankA min(m, n)

When a column or a row of a matrix are deleted, the rank of the resulting matrix
cannot be larger than the rank of the original matrix.

For A Mm,k , B Mk,n we have

rankA + rankB k rankAB

min(rankArankB)

For A, B Mm,k , rank(A + B) rankA + rankB

For A Mm,k , B Mk,p , C Mp,n we have

rankAB + rankBC rankB + rankABC.

5.3.4 Norms of a Matrix


Every Matrix-Norm can also be used as a vector-norm, but not vice versa. Therefore, we
explain here only definitions for matrix-norms. Analogous to real and complex Scalars, one
wants to use something like an absolute value also for matrices. Something which behaves
like an absolute value under addition is the Norm of a matrix.

Properties of MatrixNorms
||A|| 0 (Non-negativity)

||A|| = 0 if A = 0

||cA|| = |c| ||A|| for all real and complex c (homogeneity)

||A + B|| ||A|| + ||B|| (triangle inequality)

||AB|| ||A|| ||B||(sub-multiplicativity)


76 CHAPTER 5. LINEAR ALGEBRA

Definitions for Norms


In MATLAB, all of the following norms exist, and they can be computed via the function
norm(x) and if necessary further arguments.
Name Definition MATLABFunction
1. Spectral Norm ||A||2 maximal Eigenvalue of A A norm(A), norm(A,2)

2. 1Norm ||A||1 maximal row sum of A norm(A,1)


3. Norm ||A|| maximal
& + column sum of A norm(A,inf)
4. FrobeniusN. ||A||Fro ( Ai,j Aj,i )
norm(A,fro)

5.3.5 Determinant of a Matrix


The norm only fulfills sub-multiplicativity, i.e. the norm of a matrix product is equal or
smaller than the product of the norms of the factors. An absolute value which fulfills the
multiplicativity is the determinant, which can be computed in matlab via det(A):
det A det B = det(A B)
Further properties of the matrix are:
The exchange of two adjacent columns/rows inverts the sign of the matrix:

> A=rand(3)
A =
0.209224 0.413728 0.212479
0.106481 0.192283 0.074438
0.291095 0.436435 0.508115

> det(A)
ans = -0.0017939

> B=[A(:,2) A(:,1) A(:,3)]


B =
0.413728 0.209224 0.212479
0.192283 0.106481 0.074438
0.436435 0.291095 0.508115

> det(B)
ans = 0.0017939

The determinant of the identity matrix is one, independent of its dimension.


Never use the Cramer Rule or the Jacobi expansion for the computation of a determi-
nant, is is wasteful and numerically instable.
The numerically most suitable computation method for determinants is the so-called
LU-decomposition, where the matrix A is decomposed as a product of an lower trian-
gular matrix L with 1s on the diagonal and an upper triangular Matrix U as
L U = A.
5.3. REPETITION OF ELEMENTARY LINEAR ALGEBRA 77

The determinant of A is therefore the product of the diagonal entries of U. Row-


and Column-Permutations, so called pivoting, increases the numerical accuracy of the
decomposition, for details, see [Gol89]. The MATLAB-command which computes the
matrix determinant via LU-decomposition is det.

5.3.6 Matrix inverses


Nonsingular square matrices are inverted by the inv command. The elementwise division of
one matrix by another in MATLAB is written as A./B, where all entries of the divisor matrix
must be )= 0 to avoid an error message. This is totally different from the MATRIX division
A/B, which corresponds to the multiplication of matrix A with the inverse of matrix B:

> A=rand(2)
A =
0.29975 0.85007
0.88812 0.33290

> B=rand(2)
B =
0.89979 0.72370
0.53648 0.97567

> A/B
ans =
-0.33410 1.11909
1.40492 -0.70090

> C=inv(B)
C =
1.9926 -1.4780
-1.0956 1.8376

> A*C
ans =
-0.33410 1.11909
1.40492 -0.70090

Because the product C*A does not necessarily the same result as the product A*C, there is
also the right division of a matrix, which with the above matrices gives

> C*A
ans =

-0.71537 1.20181
1.30362 -0.31963

> B\A
78 CHAPTER 5. LINEAR ALGEBRA

ans =

-0.71537 1.20181
1.30362 -0.31963

If one tries to invert a singular matrix, MATLAB gives a result (usually wrong), and issues
an error message:

> A=[1 1
> 1 1]
A =
1 1
1 1

> inv(A)
warning: inverse: matrix singular to machine precision, rcond = 0
ans =
1 1
1 0

> B=inv(A)
warning: inverse: matrix singular to machine precision, rcond = 0
B =
1 1
1 0

> B*A
ans =
2 2
1 1

5.4 How many matrix products are possible


A matrix product is computed using three indices i, j, k,
(
aij = bik ckj
k

Therefore, there are 6 possible orders to to program the loops, but basically, there are only
two possibilities:

clear
format long
n=20
b=randn(n).*10.^(16*randn(n));
c=randn(n).*10.^(16*randn(n));
5.4. HOW MANY MATRIX PRODUCTS ARE POSSIBLE 79

tic
% Version 1: Dot-Product
a1=zeros(n);
for j=1:n
for i=1:n
for k=1:n
a1(i,j)=a1(i,j)+b(i,k)*c(k,j);
end
end
end
toc

tic
% Equivalent to
a2=zeros(n);
for j=1:n
for i=1:n
a2(i,j)=b(i,:)*c(:,j);
end
end
toc

tic
% Version 2: Daxpy-Product
a3=zeros(n);
for j=1:n
for k=1:n
for i=1:n
a3(i,j)=a3(i,j)+b(i,k)*c(k,j);
end
end
end
toc

tic
% equivalent to
a4=zeros(n);
for j=1:n
for k=1:n
a4(:,j)=a4(:,j)+c(k,j)*b(:,k);
end
end
toc

return
80 CHAPTER 5. LINEAR ALGEBRA

We have also included the tic and toc command to profile the time used for a matrix
multiplication. It can be seen that MATLAB performs much faster if the inner loop is
evaluated using the :-notation.
The first version of the matrix-matrix multiplication has a inner vector product as a Kernel,
the inner part of the routine. The second version of the Matrix-multiplication has a kernel
which can be written as
(y = a(x + (y ,
an operation where the left side in words is A X Plus Y, for which often the acronym
SAXPY or DAXPY (S for single, D for double precision) is in use.
It turns out that both operations are numerically equivalent, and both need 2l3 floating point
operations (multiplications and additions).
It is common to give the speed of computers by how many Floating Point Operations Per
Second (Flops) then can perform. Modern PCs are in the range of a few hundreds MFlops,
Workstations are nowadays in the GFlops-Range, and the Earth Simulator, a Supercomputer
near Yokohama, can to about 4 TeraFLOPS.
Using programs to test the speed of Computers is called benchmarking.

5.5 Matrix Inverses again


5.5.1 How to solve a linear system by hand
The standard way to solve a linear system of k equations with k unknowns,

Ax = b,

with the unknowns in the vector x and the right-hand-side b



a1,1 a1,2 a1,k x1 b1

a2,1 a2,2 a2,k x2 b2
.. .. .. .. = ..
..
. . . . . .
ak,1 ak,2 ak,k xk bk
is to rewrite the system in augmented form
-
a1,1 a1,2 a1,k - b
- 1
-
a2,1 a2,2 a2,k - b2
.. .. .. - .
.. - .
. . . . - .
-
ak,1 ak,2 ak,k - bk

and transform the matrix and the right-hand-side vector b via elementary row- and column-
operations (subtracting multiples of some rows from other rows) to upper triangular form,
where all the elements below the diagonal are 0:
-
a1,1 a1,2 a1,k - b1
-
-
0 a2,2 a2,k - b2
.. .. .. - .
.. - .
. . . . - .
-
0 0 ak,k - bk
5.5. MATRIX INVERSES AGAIN 81

The solutions for x1 , x2 , . . . xk can then be computed via back-substitution as

xk = b/ak,k
. /
xk1 = bk1 ak1,k xk /ak1,k1
. /
xk2 = bk2 ak2,k1 xk1 ak2,k xk /ak2,k2

k
1 (
xi = bi ai,j xj
ai,i j=i+1

This scheme of eliminating elements so that a triangular coefficient matrix survives for which
the unknowns can be computed in a trivial way is called Gaussian elimination. As an example

9 3 4 x1 7

4 3 4 x2 = 8
1 1 1 x3 3

in augmented form -
9 3 4 -- 7
-
4 3 4 - 8 .
-
1 1 1 - 3
We start by interchanging the first and the last row
-
1 1 1 -- 3
-
4 3 4 - 8 .
-
9 3 4 - 7

Next, we subtract 4times the first row from the second row and nine times the first row from
the last row: -
1 1 1 -- 3
-
0 1 0 - 4 .
-
0 6 5 - 20
Finally, we add -6 times the second row to the last row, and obtain the triangular system
-
1 1 1 - 3
-
-
0 1 0 - 4 ,
-
0 0 5 - 4

from which we can compute the unknowns successively as x= 4/5, x= 4 and x= 1/5.

5.5.2 The numerical variants: LU-decomposition


For numerical purposes, the two steps, reduction to a triangular system (elimination step),
and backward substitution (solution step), are often split up in two routines. A common
collections of subroutines for the computation of numerical linear algebra is the LINPACK-
package, which includes matrix inversions and orthogonalization methods for real and com-
plex matrices. MATLABS routines for linear algebra are basically routines from LINPACK,
and Cleve Moler, the inventor of MATLAB was also a co-author of LINPACK.
82 CHAPTER 5. LINEAR ALGEBRA

Numerically, the Gaussian elimination is usually implemented as a LU-decomposition, a


factorization of the Matrix A of the system

Ax = b

into an upper trigonal matrix U and lower trigonal matrix L, so that

LU = A

so that the solution can again be computed in a trivial way. In MATLAB, the LU-factorization
can be computed via the lu command, for example as

> a
a =

-1.0688456296920776 0.5834664106369019 -0.0174380335956812


0.0473232455551624 -0.6955339908599854 -0.2883380949497223
-0.5952438712120056 -0.0617007017135620 -1.1060823202133179

> [l,u]=lu(a)
l =
1.000000000000000 0.000000000000000 0.000000000000000
-0.044275098518011 1.000000000000000 0.000000000000000
0.556903499136249 0.577325122175704 1.000000000000000

u =
-1.068845629692078 0.583466410636902 -0.017438033595681
0.000000000000000 -0.669700958047086 -0.289110165605131
0.000000000000000 0.000000000000000 -0.929460456605607

The solution of a linear system Ax = b can be computed in MATLAB with the slash-
command, which is not only the division from the left for scalars,
5/4
ans = 1.25000000000000
> 5\4
ans = 0.800000000000000
but also for matrices. The Algebraic meaning is

A\B = A1 B, A/B = A B 1 ,

and for matrices (Remember that MATLAB means MATRIX Laboratory), this is not nec-
essarily the same. The solution for Ax = b can be obtained by formally dividing through A
from the left,
Ax = b A\Ax = A\b x = A\b.

The solution of the triangular system, including with testing whether Ax is really equal b,
can then be programmed in the following way:
5.5. MATRIX INVERSES AGAIN 83

> A=rand(3)
A =

0.63356 0.25786 0.71159


0.98480 0.13788 0.62761
0.60858 0.76457 0.90059

> b=rand(3,1)
b =

0.81931
0.61835
0.14195
> x=A\b
x =

-0.74296
-2.36996
2.67169

> A*x
ans =

0.81931
0.61835
0.14195
In LINPACK, the elimination step is called factoring (because the LU-decomposition produces
two factors, L and U), and the Double precision GEneral matrix FActoring is therefore called
DGEFA. The solution/substitution of the system is DGESL , SL for solution.
There exists also a LINPACK-benchmark, which sets up matrices in a well-defined way and
computes the matrix inverses, then computes the number of floating point operations and
the time and then computes the Flop-rate. In this way, the speed of computers has been
evaluated for decades.2

5.5.3 Matrix inversion


A matrix inversion can be computed in the same way as the solution of a linear system, which
we see that if we write the problem as

1 0 0

1
0 1 0
AA = E, with the identity-matrix E =
.. .. . . ..
. . . .

0 0 1
2
http://www.netlib.org/benchmark/linpackd in FORTRAN, but also available in other lan-
guages, the results in http://www.netlib.org/benchmark/linpackd/performance.ps
84 CHAPTER 5. LINEAR ALGEBRA

where the columns of the identity matrix E have the role of clear
the right-hand side b and the columns of A1 are the un- format compact
knowns x. It is now clear why it is advantageous to use the n=150
LU-decomposition, as it allows the simultaneous solution of A=randn(n);
systems with arbitrary many columns on the right hand side. b=randn(n,1);
After the factoring is completed, the solution step for com-
puting the inverse of a ll matrix takes l times as many steps flops(0)
as the solution of the system for a single column right-hand- x1=A\b;
side. We can see this by using the flops-command which flops
was available with old versions of MATLAB (before version flops(0)
6), which measured the number of floating point iterations, x2=inv(A)*b;
and the example program on the right. flops
For 150 150 matrices, the number of FLOPS necessary for return
the solution of the linear system is 2419042, for the computa-
tion of the matrix inverse it is 6907967. This means that the >>
number of FLOPS required for the matrix inversion is about n =
three times as much than the solution of the linear system. 150
For the solution of the linear system, the highest computa- ans =
tional cost is actually the factoring, not the backward substi- 2419042
tution, and we can see that the backward substitution takes ans =
twice as much operations as the factoring itself. 6907967

5.5.4 Accuracy of the matrix inversion


Up to now, we have not discussed the error in matrix inversions. As we have not used any
order of approximation, it is clear that there will be neither truncation error nor discretization
error, and only rounding errors have to be taken into consideration. As a test case for the
matrix inversion, let us consider the matrix
) *
1 1
A=
1$ 1

which has the inverse ) *


1 1/$ 1/$
A =
(1 + $)/$ 1/$
If we compute the inverse in double precision, for # = 108 , we obtain for

A=
1.000000000000000 1.000000000000000
0.999999990000000 1.000000000000000

the inverse

99999999.4975241 -99999999.4975241
-99999998.4975241 99999999.4975241
5.6. EIGENVALUES 85

instead of the expected result


) *
1 108 108
A = .
100000001 108
What went wrong? The numerical parameter which describes how accurate a matrix inversion
can be computed, or a linear system can be solved, is the condition number , which is
implemented in MATLABs cond-function. The condition number for a matrix A is defined as
the norm of the Matrix |A| divided by the norm of the inverse matrix |A1 |, or, if |A|/|A1 | <
1, then as the inverse = |A1 |/|A|.
There is a heuristic, which says that if the condition-number of a matrix A is 10k , for a
matrix inversion about k digits will be lost in accuracy. four our above matrix with # = 108 ,
the condition number is = 4 108 . as the error is about 0.5 for a matrix for which the
entries are of the order of 108 , we see that the predictions of the Heuristic are quite accurate.
We have discussed that there are two possible implementations of the matrix-multiplication,
the DOT and the DAXPY-product. The LU-decomposition can formally written as an op-
erator O acting on the original matrix A, so that formally
O A = L U.
In other words, the LU-decomposition is a very special matrix-matrix multiplication, and
therefore there a two variants. The conventional DAXPY-variant, which is widely treated
in textbooks on numerical analysis, is implemented in MATLAB, and in Packages like LIN-
PACK, LAPACK, NAG and visual numerics. The rather rarely mentioned DOT-variant, for
which depending on the implementations Names like Doolittle, Crout or Crout-Doolittle are
used, is basically only used in NUMERICAL RECIPES, a compendium of numerical routines,
where none of the authors has a background in analysis. A DDOT-routine in MATLAB (not
shown, because I dont want anybody to use it) for the above problem produced the result
1.99999999 -1.00000000
-0.99999999 1.00000000
and the error was not in the eight digit, but already in the first digit! The scalar product as
a kernel introduces rounding errors which cannot be predicted with the conventional formula
using the condition number.

5.6 Eigenvalues
The eigenvalues can be computed in MATLAB via the eig command. For a random matrix
A, we obtain the eigenvalues as
>> A=randn(2)
A =
0.5181 -1.2274
0.8397 0.1920
>> eig(A)
ans =
0.3551 + 1.0020i
0.3551 - 1.0020i
86 CHAPTER 5. LINEAR ALGEBRA

so one can see that the eigenvalues of a real square matrix are in general not real. For a
symmetric matrix, we see that

>> A=A+A
A =
1.0363 -0.3876
-0.3876 0.3840
>> eig(A)
ans =
1.2167
0.2035

we obtain real eigenvalues. Formally, the eigenvalues i of a matrix A are often introduced
as the roots of the characteristic polygyon of A,


1 0 0

0 1 0
det(A E) = 0, E=
.. .. . . .. .
. . . .

0 0 1

For a diagonal 2 2 matrix,

)) * ) **
a 0 1 0
det = (a )(b ) = 0,
0 b 0 1

we see that the solutions for are exactly a and b. In other words, the eigenvalues of a
diagonal matrix are the diagonal matrix entries themselves. As we have seen above, the
eigenvalues of a real symmetric diagonal matrix are real. If we look at the characteristic
polynomial, we see that for an upper triangular matrix, the off-diagonal elements vanish in
the characteristic polynomial, so also for a trigonal matrix the eigenvalues are exactly the
diagonal elements:

A =
1.0363 -0.3876
0 0.3840
>> eig(A)
ans =
1.0363
0.3840
5.6. EIGENVALUES 87

5.6.1 What to do with eigenvalues


For a non-diagonal matrix A, the action of the non-diagonal
matrix can usually be replaced somehow by the action of clear
one or more eigenvalues. One example for such an action is format compact
the multiplication of a matrix with a vector. We if we multiply a format long
vector iteratively with a matrix A and compute of the norm of the
vector, for example with the program on the right, we find that v=[1
after several iterations the length of the iterated vector become 1];
the absolute value of the largest eigenvalue of the matrix: v=v/norm(v);
ans = A=[1 2
-0.23606797749979 4.23606797749979 2 3];
norm_of_v = 4.12310562561766 eig(A)
norm_of_v = 4.23570259468110
norm_of_v = 4.23606684261261 for i=1:8
norm_of_v = 4.23606797397526 v=A*v;
norm_of_v = 4.23606797748884 norm_of_v=norm(v)
norm_of_v = 4.23606797749976 v=v/norm(v);
norm_of_v = 4.23606797749979 end
norm_of_v = 4.23606797749979

5.6.2 Diagonalization and Eigenvectors


) *
1 2
Now, if we have a matrix A = we can call eig not only to compute the eigen-
2 3
values, but using two output-arguments in constructor-brackets [], we can also obtain the
eigenvectors as

>> [u,l]=eig(A)
u =
0.85065080835204 0.52573111211913
-0.52573111211913 0.85065080835204
l =
-0.23606797749979 0
0 4.23606797749979
88 CHAPTER 5. LINEAR ALGEBRA

(the eigenvectors l are then not outputted as a vector, but as a diagonal matrix). In our
above example, where we iteratively multiplied the vector v with the matrix A, the end result
for v is

>> v
v =
0.52573111213781
0.85065080834049

which is the right column of u, and therefore the eigenvector to the larger eigenvalue 2 =
4.23606797749979. In other words, out iterative multiplication of a vector to a matrix is a way
to find the largest eigenvalue and the eigenvector corresponding to this largest eigenvector,
and in the literature, this method is often called the power method, because it corresponds
to multiplying a power of A onto v :

An v > (umax

The matrix u which contains the eigenvectors is at the same time the transformation which
transforms A onto diagonal form, so that
) *
% 0.23606797749979 0
u Au =
0 4.23606797749979

The matrix u is called a unitary transformation

5.6.3 Computing the characteristic polynomial


As we have used Newton-Raphson Iteration for the computation of roots of polynomials,
one could think that this would also be a good method to compute the eigenvalues from the
characteristic polynomial
det(A E) = 0.

Actually, this is not the case. The numerical algorithm for the computation of the eigenvalues,
which will not be elaborates here, makes use of all of the l l matrix entries, whereas the
solution of the characteristic polynomial only makes use of l coefficients computed from the
l l matrix entries, so again we loose significant information as in the example for the
intersection computation of ellipses via the fourth order polynomial.
On the contrary, instead of computing eigenvalues via roots it is usually feasible to compute
the roots of a polynomial by rewriting it as the corresponding eigenvalue problem. First let
us divide the polynomial P (x)

P (x) = ak xk + ak1 xk1 . . . a0 = 0

by the leading coefficient ak , so that our equation looks like

P (x) = xk + ak1 xk1 . . . a0 = 0.


5.6. EIGENVALUES 89

Then one can set up the so-called companion matrix CP for P (x) e.g. as

0 1 0 0
0 0 1 0

.. .. .. .. ..
CP =
. . . . .

,

0 0 0 1
a0 a1 a2 ak1

and the eigenvalues of CP are the roots of the polynomial P (x). For example, the polynomial

P (x) = x3 2x2 5t + 6 = 0

has the roots x1 = 1, x2 = 2, x3 = 3. If we set up the companion matrix as

C =

0 1 0
0 0 1
-6 5 2

We obtain as the eigenvalues of the companion matrix C

> eig(C)
ans =

3.0000
1.0000
-2.0000

5.6.4 Stability Analysis


Eigenvalues play an important role in stability analysis, i.e. in the analysis whether a nu-
merical problem is stable or not. Instability usually results from eigenvectors which are
larger than 1 (or, in some cases, different from 1). As an example of how eigenvalues enter
in the solution of problems, let us look at the example problem for the ordinary differential
equations:

function dydt = f(t,y)


% necessary parameters as global variables
global D
global omega0
% velocity component
dydt(1)=-omega0^2 * y(n,2)- 2*D*y(n,1) ;
% position component
dydt(2)=y(n,1);
return

This can also be written in matrix notation as


90 CHAPTER 5. LINEAR ALGEBRA

function dydt = f(t,y)


% necessary parameters as global variables
global D
global omega0
% velocity component
dydt=[- 2*D -omega0^2
0 1 ]* y;
% position component
return
) *
2D 2
Obviously, the Matrix A = has Eigenvalues, and the integration step is given
0 1
by At., so in fact errors in the time integration can be analyzed by analyzing the eigenvalues
of Adt. For this harmonic oscillator, the Matrix is trigonal and the eigenvalues are obviously
(2Ddt, dt) , and are therefore constant in time, so the problem can be analyzed also purely
analytical. For more complicated problems, like for the ordinary differential equations of the
Lorentz attractor
dx
= (y x)
dt
dy
= rx y xz
dt
dz
= xy bz,
dt
with real constants , r, b the matrix of the ordinary differential equation is obviously a non-
dy dz
linear function, because the time evolution ( dx dt , dt , dt ) cannot be written as a product of
a matrix A independent of x, y, z and the vector x, y, z, like in the case of the harmonic
oscillator. The classical way to analyze the stability of such a system is to linearize the
matrix, usually a risky business, because the linearized matrix is not guaranteed reproduce
the full behavior of the non-linear system. The modern approach is to simply perform the
time integration and output representative values for the eigenvalues of the matrix

0

B = r 1 x .
0 x b

Now let us solve the Lorentz-Model with constant stepsize using the Euler method and let
us plot the Eigenvalues of the matrix. We know already that the Euler-method is bad,
so our solution will be inaccurate, but it will be much more interesting to implement the
Euler-method in two different ways and see how the two solutions diverge from each other.

% Compute the Lorenz-Model


clear,format compact
n=20;
r=60;
b=8/3;
sigma=10;
5.6. EIGENVALUES 91

t_max=1.3
dt=0.01 % diverges with this timestep: dt=0.011;
ndt=round(t_max/dt);
x=zeros(ndt,1);
mateig=zeros(ndt,1);
x(1)=1;
y=x; z=x;

bild=0;
k=[1 1 1];
k(:,2:ndt)=zeros(3,ndt-1);
prop=[ -sigma sigma 0,
0 -1 0,
0 0 -b];

% L\"osung der DGL direkt


for i=1:ndt-1
dx=sigma*(y(i)-x(i))*dt;
dy=(x(i)*(r-z(i))-y(i))*dt;
dz=(x(i)*y(i)-b*z(i))*dt;
x(i+1)=x(i)+dx;
y(i+1)=y(i)+dy;
z(i+1)=z(i)+dz;
end
% Loesung der DGL mit Matrix-
% Vektor-Multiplikationen
for i=1:ndt-1
prop(2,1)=(r-k(3,i));
prop(3,1)=(k(2,i));
k(:,i+1)=dt*(prop*k(:,i))+k(:,i);

mat_eig(i+1)=max(abs(eig(prop*dt)));
end
subplot(4,1,1)
plot3(x(1:ndt),y(1:ndt),z(1:ndt));
subplot(4,1,2)
plot3(k(1,1:ndt),k(2,1:ndt),k(3,1:ndt));
subplot(4,1,3)
plot3(k(1,1:ndt)-x(1:ndt),...
k(2,1:ndt)-y(1:ndt),...
k(3,1:ndt)-z(1:ndt));
subplot(4,1,4)
plot(mat_eig)

This is the first surprise, that two implementations of the Euler-method dont give numerical
identical results, and the difference increases if we increase the maximal time. The next
92 CHAPTER 5. LINEAR ALGEBRA

surprise comes when we increase the timestep from dt=0.01 to dt=0.013. We can see that
then the Solution and the maximal eigenvalues start to diverge, and if the maximal time is
taken longer, the program even crashes because it reaches infinity. Here we have found the
property of the solution of differential equations, that the eigenvalues of the corresponding
matrix times the time step may not become larger than 1, or the solution does not converge
any more.
The eigenvalue-spectrum obtained from the Euler-method is also representative for the eigen-
values which we would obtain from higher-order Methods like Runge-Kutta, which are itself
only a sophisticated concatenation of Euler-steps with different step-size.

5.6.5 Eigenvalue condition number


As in the case of the matrix inversion, there is a parameter which tells how accurately the
condition of the eigenvalues could be performed. In MATLAB, the function which gives the
eigenvalue condition number (different from the condition number cond for matrix inverses)
is called condeig, and it gives the inverses of the eigenvectors of the matrix.
Chapter 6

Ordinary differential equations

For ordinary differential equations there is a closed theory about which solution method
should be applied in which case. In the case of ordinary differential equations, the total
differential imposes additional constraints on the solution so that the numerical equations can
be satisfied more easily. In contrast, Partial differential equations are much more difficult to
treat numerically, because the boundary conditions impose certain constraints on the solution
method, so that in the case of nonlinear equations, the optimal choice for a solution strategies
is far from obvious.

6.1 Reference Example


6.1.1 Newtons equations of motion
Ordinary differential equations play a an important role in science and engineering, and
maybe the most central equation is Newtons equation of motion, which relates a time- ,
velocity and position-dependent force F (x, x, t), mass m and a

F (x, x, t)x = ma.

Rewriting the equation with the second derivative of the position x, we get

F (x, x, t) = mx,

which due to the second derivation of x is called a ordinary differential equation of second
order. In general, it can be shown that n ordinary differential equations of order m can be
rewritten into n m coupled differential equations of first order. For the case of Newtons
equations of motion, this can be done by introducing the velocity v as the derivative of x so
that

F (x, x, t) = mv,
v = x.

Because standard texts in numerical analysis prefer to deal with first order differential equa-
tions, is is importand to understand the latter form.
94 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

6.1.2 Linear oscillator


For simplicity, we set the mass m = 1. If the force takes the form 2Dxomega20 x, which
corresponds to a linear spring with linear damping, the equation of motion takes the form

2Dx 02 = x.

The solution of this equation with the Damping term 2D and the frequency of the undamped
oscillation 0 is

x(t) = x0 exp(Dt) cos(D t),


&
D = 02 D2 .

Though there are solution schemes to solve second order equations directly, it is usually
simpler to solve equation of second order by reducing them to a system of coupled first order
equations. For our problem, we introduce the velocity v and its time derivative v = a, so this
leads to the system of first order equations

v = 2Dx 02 x,
x = v.
8v9
It is customary in the mathematical community to introduce the vector y = x (without
vector symbol( ) and to rewrite the equation as

d
y = F (y, t),
dt
where F (y, t) becomes a vector-valued function with the time t and the vector y as argument.

6.2 The Euler Method


When faced with a differential for an numerical implementation, intuitively one first wants
to replace the differential operator d by the finite difference as

dy y(t + t) y(t)

dt t
so that the first order approximation in the solution of one time-step ti with value y(ti ) and
the Function value Fi = F (yi , ti ) to the next is

y(ti + t) y(ti ) + Fi t,

which is called Euler method.

clear
format compact

D=.2 , omega0=1 % Damping and Force constant


x0=1 , v0=0 %Initial conditions
6.2. THE EULER METHOD 95

dt=0.1, t0=0, t_max=30 % time-step, start-time, end time

y(1,1)=v0
y(1,2)=x0
t(1)=t0
n=1
while (t(n)<t_max)
% velocity
y(n+1,1)=y(n,1)+dt*(-omega0^2 * y(n,2)- 2*D*y(n,1) );
% position
y(n+1,2)=y(n,2)+dt*y(n,1);
% current time
t(n+1)=t(n)+dt;
n=n+1;
end

% exact solution
omega_d=sqrt(omega0^2-D^2);
y_ex=(x0*exp(-D*t).*cos(omega_d*t));
subplot(2,2,1)
plot(t,y(:,2),-,t,y_ex,:)
legend(Euler, dt=0.1,Exact)
axis tight

Euler
Computed 1
solution Euler, dt=0.1
Exact
Exact 0.5
y0 solution

0.5
t0 t 0+dt
Strategy of the Euler method: Eval- 0 10 20 30
uate the value at the right side of the Result of the Euler method for the damped harmonic
interval via the starting value of the oscillator: The period and the amplitude are wrong.
left side and the tangent of the left
side of the interval.
By construction, the Euler-method is a first-order method, because we only retained the
terms proportional to t in the expansion. Of course, if we plot the absolute error for our
96 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

exponentially vanishing solution, the error also vanishes exponentially, therefore we dont
draw the error here.

6.2.1 Discussion of the Error


Geometrically speaking, the Euler Method chooses the starting value and the tangent at
the same point for the integration, which is correct only in the infinitesimal limit. It can
be seen that the results obtained from the from the Euler Method are far from satisfying
for our timestep of dt = 0.1. If we decrease the step size, we can reduce the discretiza-
tion error for the time step, but there is a limit which is reached for the above differ-
ential equation with dt = 0.001, and even using 1/10 and 1/100 the timestep does not
change the result significantly any more - by 10 times and 100 times the computational cost.
1
10 dt=102
dt=103
dt=104
2
dt=105
10

3
10

4
10

5
10

6
10

7
10
0 5 10 15 20 25 30

For some ordinary differential equations, we will increase also the rounding error (from adding
each new timestep) for integrating out the same time interval, so in the limit of dt 0, we
wont obtain the correct result with the Euler Method. Therefore there is one thing about the
Euler Method which should be kept in mind: NEVER USE THE EULER METHOD
IN A SERIOUS APPLICATION1

Except for stochastic differntial equations, where the stochastic noise destroys the systematic
1

error, but even then there may be better choices ....


6.2. THE EULER METHOD 97

6.2.2 Modified Euler Method


There are several strategies which can be used to reduce the error in the Euler Mehod. One
possibility is to use for the timestep from t0 to t + dt the value of F (y, t0 + dt/2), in the
middle of the interval, instead of F (y, t0 ) on the left of the interval, which results in a second
order method. This is similar to the midpoint method in numerical quadrature, which gives
a second order method, whereas the rectangular method with the value at the end of the
integral gives only a quadrature rule of first order.
Modified Euler
1
Computed Euler, dt=0.1
solution Exact

Exact 0.5
y0
solution

t0 t0 +dt/2 t 0+dt 0.5


Strategy of the modified Euler 0 10 20 30
method: Evaluate the value at the Result of the modified Euler method for the damped
right side of the interval via the harmonic oscillator: The period and the amplitude are
starting value of the left side of the computed much more accurately than for the Euler
interval and the tangent in the mid- method.
dle of the interval.

6.2.3 Heuns method


Heuns method uses the value y0 and the tangent F (y0 , t0 ) at the left of the interval to
computes the Euler step to the right of the interval F (y0 + dtF (y0 , t), t0 + dt) as an esti-
mate/prediction of the value at the right intervall, then it calculates as corrected value for
F (y, t) the averages between the solution F (y0 , t0 ) at the left hand side of the interval and
F (y0 + dtF (y0 , t), t0 + dt) at the right side.
Heuns method is a second order method, and there is a certain structuarl similarity to the
trapeze rule in the quadrature, where also the left hand value and the right hand value are
used.
Nevertheless, there are some new perspectives about this method which allow to develope a
new class of integration methods for ordinary differential equations which are of higher than
second order:

1. The idea of first advancing the time integration in a predictor step, then to modify
the result in a corrector step, is the basis of the so-called predictor-corrector methods.
98 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

2. The idea to use more than a single value of F (y, t) within a single time intervall dt is
the basis of the Runge-Kutta-type methods.

clear ; format compact

D=.2 , omega0=1 % Damping and Force constant


x0=1 , v0=0 %Initial conditions
dt=0.1, t0=0, t_max=30 % time-step, start-time, end time

y(1,1)=v0
y(1,2)=x0
t(1)=t0
n=1
while (t(n)<t_max)
% halfstep
% velocity
y_pred1=y(n,1)+dt*(-omega0^2 * y(n,2)- 2*D*y(n,1) );
% position
y_pred2=y(n,2)+dt*y(n,1);
% velocity
y(n+1,1)=y(n,1)+dt*(-omega0^2*.5*(y(n,2)+y_pred2)- 2*D*.5*(y(n,1)+y_pred1));
% position
y(n+1,2)=y(n,2)+dt*.5*(y(n,1)+y_pred1);
% current time
t(n+1)=t(n)+dt;
n=n+1;
end
% exact solution
omega_d=sqrt(omega0^2-D^2);
y_ex=(x0*exp(-D*t).*cos(omega_d*t));
subplot(2,2,1)
plot(t,y(:,2),-,t,y_ex,:)
legend(Euler, dt=0.1,Exact)
axis tight
6.2. THE EULER METHOD 99

Modified Euler
Predicted 1
Solutions Heun, dt=0.1
Exact Exact
solu
Computed tion
0.5

corrected
solution
0

t0 t 0+dt 0.5
Strategy of Heuns method: Evalu- 0 10 20 30
ate the values and tangents at the Result of Heuns method for the damped harmonic os-
left and right side of the interval as cillator: The period and the amplitude are computed
predicted values and take the av- much more accurately than for the Euler method.
erage as corrected value.

6.2.4 Stability
Up to now we have focused in our investigations purely from the point of accuracy, in the
meaning that a numerical solution we get will have some finite error in comarison to the
exact solution of the problem, but will be more or less the same shape. Actually, a more
fundamental problem in numerical analysis is stability, loosely speaking, the mathematical
problem whether a numerical solution has the same shape of the exact solution at all. If
we look at the following first order differential equation

d
y(t) = 1 ty 1/3 ,
dt

which for y(0) = 1 is strictly real in the interval [0,5]. The numerical solution overshoots
for too large time-steps, as is shown in the following graphs for the numerical solution with
the Euler method, so that y(t) becomes negative, and therefore in MATLAB delivers the
complex roots of the negative values of y(t). The result of the numerical integration for too
large time-steps shows a total different shape than the exact solution, and is therefore called
unstable. It is therefore a primary aim to choose numerical methods and time-steps so that
the solution is stable, accuracy is only a secondary concern.
Regrettably, some methods which give very high accuracy for some problems give very small
stability with other problems. It is often advisable to check the stability of a method by
using different time steps to see if the numerical solution changes, or not. If a small change
of the time-step leads to only a small change in the solution, the solution is stable. The
mathematical definition of stability is that a solution undergoes only a small change for a
small change of the initial conditions, and in this respect, the time-step represents something
like an initial condition.
100 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

real component imaginary component

1.5
0

0.2
1

0.4
0.5
0.6
y

y
0 0.8

1
0.5

1.2

1 Euler, dt=0.5 Euler, dt=0.5


Euler, dt=0.25 1.4 Euler, dt=0.25
Euler, dt=0.125 Euler, dt=0.125
Euler, dt=0.05 Euler, dt=0.05
1.5 Exact 1.6 Exact

0 1 2 3 4 5 0 1 2 3 4 5
t t

6.3 Programming Ordinary differential equations


6.3.1 Readability
Before we proceed to higher order formulae, we should improve the readability of the code.
Here is a good opportunity to introduce MATLAB functions. For the Euler method, we had

% velocity
y(n+1,1)=y(n,1)+dt*(-omega0^2 * y(n,2)- 2*D*y(n,1) );
% position
y(n+1,2)=y(n,2)+dt*y(n,1);

As was emphasized in the first chapter, readability is tantamount in programming, and


whereas the Euler algorithm was still quite readable, for Heuns method, we hat to treat

% velocity
y_pred1=y(n,1)+dt*(-omega0^2 * y(n,2)- 2*D*y(n,1) );
% position
y_pred2=y(n,2)+dt*y(n,1);
6.3. PROGRAMMING ORDINARY DIFFERENTIAL EQUATIONS 101

% velocity
y(n+1,1)=y(n,1)+dt*(-omega0^2*.5*(y(n,2)+y_pred2)- 2*D*.5*(y(n,1)+y_pred1));
% position
y(n+1,2)=y(n,2)+dt*.5*(y(n,1)+y_pred1);

and this is certainly not readable any more. An insight is, that the forcelaw for the time
integration of the spring was inputted twice, once for the corrector step, and once for the
predictor step, so it would be a good idea to input the force law evaluation as a MATLAB
function.

6.3.2 Global variables


Most ordinary differential equations do not only need the time as an input parameter, which
we specified in the above example, but they need other input parameters as well. One of
the simplest ways to incorporate such input parameters is via the declaration of MATLAB
global attribut, which allows the specifiction of global variables, and which are of course not
limited in their use to functions for ordinary differential equations.

6.3.3 MATLAB functions


A typical MATLAB function is the following code, where the constructor brackets [] have
to be used for the output arguments and the round brackets () for the input arguments:

function [output_arg1,output_arg2]=function_name(input_arg1,input_arg2);
% Comment following the function declaration; This comment will be displayed
% when you type
% "help function_name"
% from the MATLAB prompt.
global a % a global variable , which must be declared as global
% somewhere else and initialized

output_arg1=input_arg1+input_arg_2*a

output_arg2=input_arg1*input_arg_2

return % end of function

The function is called for example as

[out1,out2]=function_name(25,24)

If not all input-arguments are used in the function call, like

[out1,out2]=function_name(25)

MATLAB terminates with an error message. MATLAB functions cannot override input
arguments, in the following example,
102 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

function [output_arg1,output_arg2]=function_name(input_arg1,input_arg2);

output_arg1=input_arg1+input_arg_2

output_arg2=input_arg1*input_arg_2

input_arg1=15
return % end of function

the line

input_arg1=15

does not have any meaning, because only output-arguments (in constructor brackets [])
are recopied to the calling program. If not all output- or input-arguments are assigned,
MATLAB terminates with an error message. Overloading, the use of a variable number of
input-arguments is possible, and in this case one has to ask the number of the input arguments
with the MATLAB function nargin and the number of output arguments nargout. We will
not treat overloading here, but is is easy to find examples for overloaded methods by looking
at some MATLAB functions which exist in MATLAB-code (most MATLAB functions are
written in MATLAB) in the toolbox directory.

which histo

will display the directory in which the toolbox-MATLAB-function histo can be found, and it
is possible to load the function into the editor and view the usage of the operator overloading.

6.3.4 MATLAB-functions for ODE-Solvers


It is customary to write MATLAB-functions for ODE-Solvers with a header like

function dydt = f(t,y)

with the time t and the generalized coordinates y as input and the first order derivative of
the generalized coordinates dxdy as output. We can rewrite Heuns method

% velocity
y_pred1=y(n,1)+dt*(-omega0^2 * y(n,2)- 2*D*y(n,1) );
% position
y_pred2=y(n,2)+dt*y(n,1);
% velocity
y(n+1,1)=y(n,1)+dt*(-omega0^2*.5*(y(n,2)+y_pred2)- 2*D*.5*(y(n,1)+y_pred1));
% position
y(n+1,2)=y(n,2)+dt*.5*(y(n,1)+y_pred1);

using the MATLAB-function (we will retain the the time as function-argument though there
is no explicit time-dependence in our force law)
6.3. PROGRAMMING ORDINARY DIFFERENTIAL EQUATIONS 103

function dydt = f(t,y)


% necessary parameters as global variables
global D
global omega0
% velocity component
dydt(1)=-omega0^2 * y(n,2)- 2*D*y(n,1) ;
% position component
dydt(2)=y(n,1);
return

as

clear ; format compact

global D, D=.2 , global omega0, omega0=1 % Damping and Force constant


x0=1 , v0=0 %Initial conditions
dt=0.1, t0=0, t_max=30 % time-step, start-time, end time

y(1,1)=v0
y(1,2)=x0
t(1)=t0
n=1
while (t(n)<t_max)
% predicted value
y_pred=y(n,:)+dt*f(t(n),y(n))
-omega0^2 * y(n,2)- 2*D*y(n,1) );
% corrected value
y(n+1,:)=y(n,:)+.5*dt*(f(t(n),y(n))+f(t(n),y_pred)
t(n+1)=t(n)+dt;
n=n+1;
end
% exact solution
omega_d=sqrt(omega0^2-D^2);
y_ex=(x0*exp(-D*t).*cos(omega_d*t));
subplot(2,2,1)
plot(t,y(:,2),-,t,y_ex,:)
legend(Euler, dt=0.1,Exact)
axis tight

which is much more readable than the original. This also allows us to see how many function
evaluations are necessary to a given timestep: whereas for the original Euler method we used
one function evaluation per timestep, for Heuns method we need two function evalution
per timestep. Therefore, Heuns method is not only more accurate, but also more costful.
Higher order methods will need even more function evaluations, therefore we rewrite Heuns
method as a function along with the computation of the number of steps, and the control
of the reasonableness of the input parameters. Moreover, the evalf command of MATLAB
is used so that the MATLAB-file which contains the differential equation can be passed as
104 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

an argument. Moreover, we have initialized tout and yout so that we dont loose time by
allocating new memory space when a new element is added to these vectors in each timestep:

function [tout,yout] = heun(yinit,tstart,tend,dt,f)


% Heuns Method:
% Runge-Kutta integrator (2nd order)
% Input arguments -
% y = current value of dependent variable
% t = independent variable (usually time)
% dt = step size (usually timestep)
% f = right hand side of the ODE; f is the
% name of the function which returns dy/dt
% Calling format f(y,t).
% Output arguments -
% yout = new value of y after one stepsize dt
nsteps=ceil((tend-tstart)/dt)
dt
if nsteps<0
tstart
tend
dt
error (tend-tstart is not a positive multiple of dt)
end
if (abs(nsteps*dt-(tend-tstart))>1e-6*dt)
disp(warning: time interval not a multiple of timestep)
disp(inputed timestep:)
dt
dt=(tend-start)/nsteps
disp(use instead)
dt
end
dt

if (size(yinit,1)==1)
yinit=yinit;
end

%allocate necessary memory to save time:


yout=zeros(length(yinit),nsteps); %
tout=zeros(1,nsteps);

yout(:,1)=yinit;
y=yinit;
tout(1)=tstart;

n=1;
6.4. THE CLASSICAL RUNGE-KUTTA FORMULA 105

for k=1:nsteps
F1 = feval(f,y,tout(n));
t_full = tout(n) + dt;
ytemp = y + dt*F1;
F2 = feval(f,ytemp,t_full);
n=n+1;
y= y + .5*dt*(F1 + F2);
yout(:,n)=y;
tout(n)=t_full;
end
return

This program can then be called from a driver routine (a routine which does nothing else
than call a specific function) in such a way:

clear ;
format compact
global D, D=.2 ,
global omega0, omega0=1 % Damping and Force constant
omega_d=sqrt(omega0^2-D^2);
dt=0.1, t0=0, t_max=20 % time-step, start-time, end time
x0=1 %Initial conditions
v0=-D*exp(-D*t0)*cos(omega_d*t0)-omega_d*exp(-D*t0)*sin(omega_d*t0);

[t,y]=heun([v0;x0],t0,t_max,dt,harm_osc);

% exact solution
y_ex=(x0*exp(-D*t).*cos(omega_d*t));
subplot(2,2,1)
plot(t,(y(2,:)-y_ex)./y_ex,:)
legend(Heun, dt=0.1,Exact)
axis tight

6.4 The classical Runge-Kutta formula


6.4.1 The Idea
The idea to evaluate not only a single integration point, and moreover compute within a single
timestep more integration points from previously computed auxiliary timesteps is realized in
the so-called Runge-Kutta algorithm. The formulae for the so-called classical Runge-Kutta
method are
dt
yk+1 = yk + (k1 + 2k2 + 2k3 + k4 )
6
k1 = f (ti , yi )
k2 = f (ti + dt/2, yi + k1 dt/2)
106 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

k3 = f (ti + dt/2, yi + k2 dt/2)


k4 = f (ti + dt, yi + k3 dt)

It uses four evaluations F 1, F 2, F 3, F 4 at intermediate steps t0 , t0 + dt/2, t0 + dt/2, t0 + dt.


F 2 is computed from F 1, F 3 from F 2 and F 4 from F 3. Afterwards, the new y is computed
as a weighted average of the F 1, F 2, F 3, F 4. The function then looks like this

function [tout,yout] = rk4_class(yinit,tstart,tend,dt,f)


% Classical Runge-Kutta integrator (4th order)
% Input arguments -
% y = current value of dependent variable
% t = independent variable (usually time)
% dt = step size (usually timestep)
% f = right hand side of the ODE; f is the
% name of the function which returns dy/dt
% Calling format f(y,t).
% Output arguments -
% yout = new value of y after one stepsize dt
nsteps=ceil((tend-tstart)/dt)
if nsteps<0
tstart
tend
dt
error (tend-tstart is not a positive multiple of dt)
end
if (abs(nsteps*dt-(tend-tstart))>1e-6*dt)
disp(warning: time interval not a multiple of timestep)
disp(inputed timestep:)
dt
dt=(tend-start)/nsteps
disp(use instead)
dt
end

if (size(yinit,1)==1)
yinit=yinit;
end

yout=zeros(length(yinit),nsteps);
tout=zeros(1,nsteps);

yout(:,1)=yinit;
y=yinit;
tout(1)=tstart; %40
6.4. THE CLASSICAL RUNGE-KUTTA FORMULA 107

n=1;

half_dt = 0.5*dt;
dt_6=dt/6;
for k=1:nsteps
% y
F1 = feval(f,y,tout(n));
t_half = tout(n) + half_dt;
ytemp = y + half_dt*F1;
F2 = feval(f,ytemp,t_half);
ytemp = y + half_dt*F2;
F3 = feval(f,ytemp,t_half);
t_full = tout(n) + dt;
ytemp = y + dt*F3;
F4 = feval(f,ytemp,t_full);
y = y + dt_6*(F1 + F4 + 2.*(F2+F3));
n=n+1;
yout(:,n) = y;
tout(n)=t_full;
end

return;

The same driver as for Heuns program above can be used, just with the line
[t,y]=heun([v0;x0],t0,t_max,dt,harm_osc);
replaced by
[t,y]=rk4_class([v0;x0],t0,t_max,dt,harm_osc);

6.4.2 The importance of the initial condition


In the section for Eulers and Heuns method, we used as initial condition x0 = 1, v0 = 0 to
compare the numerical result with the exact solution

yex = (x0 exp(Dt) cos(d t))

Actually, this solution is not the exact solution for the initial value problem with v0 = 0, but
for the initial value problem with

v0 = D exp(Dt0 ) cos(d t0 ) d exp(Dt0 ) sin(d t0 )

The improvement for the Runge-Kutta method compared to Heuns method would not been
visible because the initial value for the integration is so far off that the numerical solution
is quite wrong. Whenever one computes numerical solutions to compare them with exact
solutions, one should be sure that they are the solutions for the identical problem.
108 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

1
correct initial condition
exact solution
incorrect initial condition
0.5

0.5
0 1 2 3 4 5 6 7 8 9

6.4.3 Accuracy

Now that we have outlined several algorithms with different order (different truncation error
with respect to the Taylor expansion of dt), we should compare the above methods with
respect to their cost and accuracy. As has been mentioned already above, the cost of a
Runge-Kutta step is four function evaluations per timestep, in contrast to a single function
evaluation for Euler and two function evaluations for Heun. Let us compare the accuracy
of the three methods, once for the absolute accuracy ycomputed yexact , and once for
the relative accuracy (ycomputed yexact )/yexact . For the Euler method, we obtain an
exponentially decaying error due to the fact that the solution decays exponentially, and the
relative error increases exponentially. The absolute error starts at the order of 102 , which
is the square of the order of the timestep (dt = 0.1)2 , as was expected.

Euler, dt=0.1, absolute error Euler, dt=0.1, relative error

2
10 2
10

0
10
4
10

2
10
0 10 20 30 40 0 10 20 30 40

For Heuns method, the absolute error starts at the order of 103 , which is the order of the
timestep (dt = 0.1)2 , which was also expected. The relative error is constant for a certain
time, and then increases exponentially.
6.4. THE CLASSICAL RUNGE-KUTTA FORMULA 109

Heun, dt=0.1, absolute error Heun, dt=0.1, relative error

0
4 10
10

2
10
6
10

4
10
0 10 20 30 40 50 0 10 20 30 40 50

For the classical Method by Runge and Kutta, the absolute error starts at the order of
106 , which is by sheer luck one order more accurate than the fifth power of the timestep
(dt = 0.1)4 , which as was expected as the absolute error. Again, as in Heuns method, the
relative error is constant for a certain time, and then diverges exponentially.
Class. RungeKutta, dt=0.1, absolute error Class. RungeKutta, dt=0.1, relative error
2
10
6
10

4
10
8
10

6
10
10
10

10 20 30 40 50 0 10 20 30 40 50

The last investigations have reviewed some old concepts and shown some important new
concepts for error analysis:

1. The order of the Euler, Heun and Runge-Kutta method are 1,2 and 4 respectively,
therefore the absolute error at the beginning of the integration process is of the order
+1 of the timestep dt2 , dt3 and dt4 for an initial amplitude of the order of 1.

2. The local error is the error for a single timestep, and the local absolute error at the
beginning of the integration is the same as the local relative error.

3. The behavior of the relative error is a bit more complicated, as can be seen, the relative
error increases during the integration process, but not monotonically. The error at
the end of the integration process is called the global error, and it can be seen that
the global relative error is much larger than the local absolute error. Whenever one
performs a time-integration of ordinary differential equations, one should know which
is the actually permitted error, and this is determined by the physical problem.
110 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

6.5 Adaptive Stepsize Control


It is possible to construct Runge-Kutta schemes using redundant evaluations of F (y, t) so
that the timestep can be computed in order n and in order n + 1 synchronously. One of the
first such method was proposed by Fehlberg for fourth and fifth order, another, currently
very popular method of the same order is the more stable scheme by Prince and Dormand.
The knowledge about both orders allows to estimate the error of the solution, and one can
therefore devise strategies to reduce the timestep if the error is too large, or to increase the
timestep more accurately than desired (and therefore takes too much computer time). Such
Runge-Kutta methods are build into MATLAB as ode23 and ode45, and with the driver
clear ;
format compact
global D, D=.2 ,
global omega0, omega0=1 % Damping and Force constant
omega_d=sqrt(omega0^2-D^2);

dt=0.1, t0=0, t_max=50 % time-step, start-time, end time


x0=1 %Initial conditions
v0=-D*exp(-D*t0)*cos(omega_d*t0)-omega_d*exp(-D*t0)*sin(omega_d*t0);

[t,y]=ode23(harm_osc2,[t0 t_max],[v0 x0]);

y_ex=(x0*exp(-D*t).*cos(omega_d*t));
subplot(2,2,1)
semilogy(t,abs(y(:,2)-y_ex),:)
title(ode23, dt=0.1, absolute error)
axis tight
subplot(2,2,2)
semilogy(t,abs((y(:,2)-y_ex)./y_ex),:)
title(ode23, dt=0.1, relative error)
axis tight
and the file for the differential equation harm_osc2
function dydt=harm_osc2(t,y)
format compact
global D
global omega0
% d velocity/dt
dydt(1,1)=-omega0^2 * y(2)- 2*D*y(1);
% d position/dt
dydt(2,1)=y(1);
return
This file is different from our previously used harm_osc.m file, as the order of the input
parameters t and y are exchanged. The following solution for our damped harmonic oscillator
has been computed:
6.5. ADAPTIVE STEPSIZE CONTROL 111

0.5

0.5

1
0 5 10 15 20 25 30

0.3

0.25

0.2

0.15

0.1

0.05

5 10 15 20 25 30

Above the solution was plotted, below we see the timestep. The time-adaption algorithm
changed the timestep depending on whether the oscillation was at a relative minimum or in
a straight motion. The accuracy of the time integration can be set by the input parameters
of the ode23 function, see help ode23. The accuracy diagram for the default accuracy is the
following:

ode23, dt=0.1, absolute error ode23, dt=0.1, relative error

4
10

6
10 5
10

8
10

0 10 20 30 40 50 0 10 20 30 40 50

The same plots can be made for the ode45 algorithm, which gives the following accuracy
diagram
112 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

ode23, dt=0.1, absolute error ode23, dt=0.1, relative error


0
10
5
10

5
10

10
10
10
10

0 10 20 30 40 50 0 10 20 30 40 50

The computed solution and the timestep are


1

0.5

0.5

1
0 5 10 15 20 25 30 35 40 45 50

0.3

0.25

0.2

0.15

0.1

0.05

5 10 15 20 25 30 35 40 45 50

and it can be seen that MATLAB starts with a very small timestep and then increases the
timestep significantly to reach the default accuracy of the time integrator. The advantage of
these adaptive methods for reasonableordinary differential equations are:
One can specify the relative and absolute errors on input, and obtain a solution which
is guaranteed to be inside the specified errors.

The performance is optimal, i.e. for the given method there will be no solution which
will be computable with less timesteps/less computer time.

Without knowing anythin about the system, and the relation between the timestep
and the error resulting from the timestep for the given set of equations, one obtains a
6.5. ADAPTIVE STEPSIZE CONTROL 113

correct solution.

Therefore, it is allways a good idea to start an investigation of a problem with the above
method. But there are some caveats, for the case of unreasonable differential equations,
and these are systems which are often encountered in daily life, which are treated in the next
subsection.

6.5.1 Problems with Adaptive stepsize control


Adaptive stepsize control needs some assumptions about the smoothness of the treated dif-
ferential equations. There are some notorious physical situations which lead to non-smooth
problems:

Coulomb-friction: If an ordinary differential equations contains terms which contains


the sign of a function, like in the case of Coulomb-friction,

FCoul = sign(v),

it may happen that the solution for the equation is not smooth enough, so that even a
reduction of the time step does not lead to the same solution for different orders of the
function evaluation of the ODE-solver. In that case, the solver stops, or it continues
only with very small time steps so that the solution is not finished within finite time.

Bouncing balls: If an object flies in free motion in a gravitational field, its trajectories
are parabolic. If it hits a target, the motion is suddenly reversed. For the numerical
time integration, the free motion allows a very large timestep, whereas in the moment
where the target is hit, the timestep has to be drastically reduced. It is possible that
numerical solvers with adaptive stepsize control are not able to reduce the step-size
appropriately, and in the simulation the impacting particle may not be reflected, but
may fly through the target. The risk for such a mishap is higher for higher order
solvers, e.g. for 8th order.

Because the adaptive stepsize control needs some information about how the timestep must
be reduced, MATLAB allows to specify the way in which the timestep should be changed via
the options-command.

6.5.2 Coulomb Friction


In contrast to the friction of and in fluids, which is for small velocities v proportional to the
velocity, Coulomb friction, the friction of solid on solid surfaces, is proportional to the sign
of th friction only. Using the Coulomb friction coefficient and the normal force fn , we can
write the Coulomb friction as
FCoul = fn v.

Obviously, this forcelaw has a jump at v = 0, and we know from physics that the friction
FCoul can take any value from fn to fn for v = 0. Actually, there is a method to solve
114 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

such an undetermined Problem in a numerically exact way2 , but we will just try to use
the adaptive stepsize control in the
hope that we get a reasonable solu-
tion by decreasing the stepsize. Using
the ordinary differential equation clear
format compact
y = y 2Dsign (y) , global D

Let us check the output for different tmax=7


values of D using the programs to the
right and let us look at the timestep: D=0.05
d/dx2 x + 2 D *sign(d/dx x) + x = 0 [t1,y1]=ode23(lin_coul_osc,[0 tmax],[1 0]);
1
D=0.05 t1_plot=linspace(0,max(t1),2*length(t1));
D=0.1 y1_plot=interp1(t1,y1,t1_plot,spline);
0.5

D=0.1
0
[t2,y2]=ode23(lin_coul_osc,[0 tmax],[1 0]);
t2_plot=linspace(0,max(t2),2*length(t2));
0.5 y2_plot=interp1(t2,y2,t2_plot,spline);
t2_plot=[t2_plot tmax];
0 2 4 6
y2_plot=[y2_plot ; [0,1]];
subplot(2,2,1)
0
10 plot(t1_plot,y1_plot(:,1),...
t2_plot,y2_plot(:,1),:)
axis tight
legend(D=0.05,D=0.1)
dt

title(d/dx^2 x + 2 D *sign(d/dx x) + x = 0)
subplot(2,2,2)
D=0.05 semilogy(t1(2:end),diff(t1),t2(2:end),diff(t2),:)
D=0.1
5 ylabel(dt)
10
0 2 4 6 xlabel(timestep)
timestep
legend(D=0.05,D=0.1)
It can be seen that for D = 0.05, axis([0 7 1e-5 1])
as long the oscillation resembles the
oscillation of the damped harmonic return
oscillator, the timestep is compara-
ble to the one one expects for the
harmonic oscillator. For D = 0.1, function dydt = f(t,y)
where the 0-amplitude is reached, the % lin_coul_osc.m
timestep goes down for several orders global D
ob magnitude to guarantee the van- dydt = [y(2); -y(1)-2*D*sign(y(2))];
ishing of the amplitude, and the inte- return
gration is slowed down several orders
of magnitude in comparison with the
damped harmonic oscillator.

2
Hairer et al, Solving Ordinary Diff. Equations I, Springer
6.6. STIFF DIFFERENTIAL EQUATIONS 115

6.6 Stiff differential equations


There is a class of differential clear
equations, which are called stiff. format compact
They usually involve two different global mu,mu=500
timescales/ periods for the oscillation
of a scientific phenomenon, like in the t_max=1.5
case of the Van der Pols equation D=0.2
tic
y1% = y2 [t1,y1]=ode23(vanderpol,[0 t_max],[2 0]);
. /
y2% = 1 yi2 y2 y1 t1_plot=linspace(0,max(t1),2*length(t1));
y1_plot=interp1(t1,y1,t1_plot,spline);
which for values of of the order of 1 toc
is a absolutely ordinary set of dif-
ferential equations, but for increas- tic
ing , usual integrators need very [t2,y2]=ode23s(vanderpol,[0 t_max],[2 0]);
small timesteps for the integration t2_plot=linspace(0,max(t2),2*length(t2));
process. For = 500, we have com- y2_plot=interp1(t2,y2,t2_plot,spline);
puted the solution using the standard toc
ode23-integrator and the stiff vari-
ant ode23, and it can be seen that subplot(2,2,1)
the stiff integrator uses much longer plot(t1,y1(:,1),*,t1_plot,y1_plot(:,1))
timesteps to obtain the same result, title(vandermode-ODE, ode23)
and is ten times as fast. subplot(2,2,2)
MATLAB offers several stiff solvers, semilogy(diff(t1))
among them ode23s and ode45s, ylabel(timestep)
which are Runge-Kutta-Type, and
ode15s, which has several options for subplot(2,2,3)
the choice of the solution method. plot(t2,y2(:,1),*,t2_plot,y2_plot(:,1))
Currently, there is no clear defini- title(vandermode-ODE, ode23s)
tion of when an ordinary differen- subplot(2,2,4)
tial equations is stiff, because it is semilogy(diff(t2))
not allways possible to identify time- ylabel(timestep)
scales in the system. The current
heuristic definition definition is: A
stiff differential equation is a differen-
tial equation for which a stiff solver
works much better than an ordinary function dydt = vanderpol(t,y)
solver. % vanderpol.m
If very high accuracy is necessary global mu
for the solution of the system, the
timestep of the stiff solver is reduced dydt=[y(2); mu*(1-y(1)^2)*y(2)-y(1)];
to the timestep for the ordinary return
solver.
For the example of the previous section for the Coulomb friction problem, the solution using a
stiff solver is not better than for the ordinary integrator. On the contrary, for some parameters
116 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

it may happen that the solver reduces the timestep to numerically 0 and the solution process
terminates with an error message.
vandermodeODE, ode23
2
2 10

1.9995
3
10

timestep
1.999

1.9985 4
10
1.998

5
1.9975 10
0 0.5 1 1.5 0 200 400 600 800 1000

vandermodeODE, ode23s
0
2 10

1.9995

1.999
timestep

1.9985

1.998

5
1.9975 10
0 0.5 1 1.5 0 10 20 30

6.7 Symplectic differential equations


The examples we have discussed up to now were implementations of Newtons Equation of
Motion where the system underwent continuous energy loss. Actually, there is a huge class
of systems which obey Newtons equation of motion for which no energy-loss occurs, there
are systems of atoms and molecules. These systems are called symplectic, and can be written
via canonical equations (using generalized coordinates and generalized momenta).

6.7.1 Stormer-Verlet Method


In the previous examples, we always included some damping in the system and rewrote the
ordinary differential equation as a system of coupled first order differential equations. For the
most widely used symplectic (energy-conserving) time integrator, it is not necessary to rewrite
the first order differential equation from second to first order, on the contrary, this method
is not able to handle velocity-dependent (first order terms) at all. Using the acceleration a
(=Force/mass), we can write the Verlet method for the coordinate x as

xi+1 = 2xi xi1 + ai dt2 .


6.7. SYMPLECTIC DIFFERENTIAL EQUATIONS 117

function [tout,yout] = verlet(yinit,tstart,tend,dt,f)


% Stoermer-Verlet Method:
As one can see, this method uses % Symplectic Method of (2nd order)
not only the information of the % Input arguments -
current timestep (xi and ai ), but % y = current value of dependent variable
also the information from the % t = independent variable (usually time)
previous timestep (xi1 ), and is % dt = step size (usually timestep)
therefore a so called multistep- % f = right hand side of the ODE; f is the
method, a method which uses % name of the function which returns dy/dt
information from several steps. % Calling format f(y,t).
The previously described Runge- % Output arguments -
Kutta methods are members of % yout = new value of y after one stepsize dt
the class of the so-called one-step nsteps=ceil((tend-tstart)/dt)
methods. dt
A possible implementation of the if nsteps<0
Verlet-method with computation tstart
of the number of timesteps is tend
shown on the right. As can dt
be seen from the formula for error (tend-tstart not positive multiple of dt)
the Verlet-algorithm, when we end
start with timestep 0, we also if (abs(nsteps*dt-(tend-tstart))>1e-6*dt)
have to compute timestep dt, disp(warning: time interval not)
which is also done in the fol- disp(a multiple of timestep)
lowing example program before disp(inputed timestep:)
the loop, in an approximate and dt
unsatisfying way. Because the dt=(tend-start)/nsteps
multi-step methods dont allow disp(use instead)
by themsteves the computation dt
of the previous timesteps, they end
are called non-self-starting, in dt
contrast to one-step-methods,
which are called self-starting. if (size(yinit,1)==1)
In conventional implementa- yinit=yinit;
tion of multistep-methods, usu- end
ally at the beginning a self-
starting method is used to com- yout=zeros(length(yinit),nsteps);
pute the previous timestep. tout=zeros(1,nsteps);
Because verlet-type of algorithms
are mostly used for molecular yout(:,1)=yinit;
simulations, where the details of y=yinit;
the initial conditions dont matter, tout(1)=tstart;
at least not up to an error dt,
for practical applictions it is no n=1;
problem that the Verlet-method dt2=dt*dt;
is not self-starting. % compute timestep before initial timestep,
% this implementation BAD, as it is one order less
% accurate than verlet itself !!!!!!!!!
118 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

F1 = feval(f,yout(2,1));
y_mdt=y(2,1)-F1*dt2;
F2 = feval(f,yout(2,1));
yout(2,2)=2*y(2,1)-y_mdt-F2*dt2;
tout(2)=dt;

for k=2:nsteps-1

F1 = feval(f,yout(2,k));
t_full = tout(k) + dt;
yout(2,k+1)=2*yout(2,k)-yout(2,k-1)+F1*dt2;

tout(k+1)=t_full;
end

return;

6.7.2 Precision
We compare the numerical solution for the harmonic oscillator without damping

function out=verlet_lin_osc(in)
% verlet-lin-osc
% linear oscillator with frequency omega
% for use with verlet-type integrator
global omega0
out=-omega0^2*in;
return

using the main program

clear ;
format compact
global D, D=0 ,
global omega0, omega0=1 % Damping and Force constant
omega_d=omega0;

dt=0.01, t0=0, t_max=300 % time-step, start-time, end time


x0=1 %Initial conditions
v0=-D*exp(-D*t0)*cos(omega_d*t0)-omega_d*exp(-D*t0)*sin(omega_d*t0);

tic
[t,y]=verlet([v0 x0],t0,t_max,dt,verlet_lin_osc);
y_ex=(x0*cos(omega_d*t*1)); % exact solution

[rkt2,rky2]=ode23(lin_osc,[t0 t_max],[v0 x0]);


y_rk2=(x0*exp(-D*rkt2).*cos(omega_d*rkt2)); % exact solution
6.7. SYMPLECTIC DIFFERENTIAL EQUATIONS 119

[rkt4,rky4]=ode45(lin_osc,[t0 t_max],[v0 x0]);


y_rk4=(x0*exp(-D*rkt4).*cos(omega_d*rkt4)); % exact solution

subplot(3,1,1)
semilogy(t,abs((y(2,:)-y_ex)))
title(Error for verlet)
axis tight

subplot(3,1,2)
semilogy(rkt2,abs(rky2(:,2)-y_rk2))
title(Error for ode23)
axis tight

subplot(3,1,3)
semilogy(rkt4,abs(rky4(:,2)-y_rk4))
title(Error for ode45)
axis tight
return
Error for verlet

4
10

6
10

0 50 100 150 200 250


Error for ode23

5
10

0 50 100 150 200 250 300


Error for ode45

6
10

8
10
0 50 100 150 200 250 300

It can be seen that the Verlet-Algorithm has a larger error for the initial timesteps, due to
our choice of the earliest timestep in first order. Nevertheless, the error bound is constant
over the whole integration interval. The remarkable property of the Verlet-method is that its
120 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS

global error is the same as its local error.


Though ode23 and ode45 from MATLAB start which a much smaller error, the global error
is proportional the the integration time, i.e. with time grows over all bounds. Physically that
means that if non-symplectic algorithms like Runge-Kutta are used for energy-conserving
systems, the energy will drift significantly over the integration interval. Verlet-Type of algo-
rithms are even stable for millions of integration steps.

6.7.3 Velocities
The verlet methods only makes use of the coordinates, not of the velocities. Because the
velocities dont occur in the equations, they can only be estimated using the relation

vi = (ri+1 ri1 ) /2dt,

so that the velocities of a timestep are only known after the completion of the following
timestep. Therefore, it is not possible to incorporate velocity-dependent interactions in the
verlet-scheme.
Often, it is not clear how large a timestep should be chosen for a given dissipative problem.
There are some people who advocate the following procedure: Run the problem without dis-
sipation and fix the timestep so that the change in energy during the simulation is negligible,
than use this timestep for the dissipative system. Our exploration of the symplectic integrator
shows that such a strategy is meaningless. The non-dissipative systems are a totally differ-
ent class than the dissipative systems, even the best non-symplective integrators cannot
compete with quite mediocre symplectic integrators. Contrarywise, symplectic integrators
cannot be used with dissipation, in the above Verlet-Stormer integration there is no possibil-
ity to implement velocity-dependent forces, because at the time the forces must be computed,
the velocity is not yet known. The same is true for modifications like the velocity-Verlet
scheme, where one knows the velocity half a timestep too late, using the velocity from the
previous timestep introduces errors which are of the order of Verlet-scheme itself.

Você também pode gostar