Dreizler E

R.M. Dreizler und C.S.
Lüdde
Mathematical Supplement
Theoretical Physics 1
June 29, 2010
Springer
Berlin Heidelberg NewYork
Barcelona Hong Kong
London Milan Paris
Singapore Tokyo
Preface
The name ’Mathematical Supplement’ indicates that this is not a full-fledged

textbook on mathematics. The necessity to provide an introduction to the
mathematical tools used in physics arose from the fact that theoretical physics
courses start in Frankfurt with the first term. One of the advantages of this
arrangement is a very close connection of the two fundaments of natural
science.
The resulting excerpt of mathematics emphasises aspects of mathemat-
ics which are more oriented towards practical applications. The hope, that
this supplement would be of use for physics students, was the reason for its
inclusion on the CD-ROM.
The following fields are needed for an underpinning of theoretical mechan-
ics: analysis (dealing with functions of one or more real variables), linear alge-
bra (the mathematical description of three dimensional space, mathematical
operations in this space and the extension to spaces of higher dimensionality),
vector analysis (so to speak the marriage of the first two fields), ordinary dif-
ferential equations (the definition of functions with the aid of relations that
contain the derivatives of these functions, one of the cornerstones of theoret-
ical physics) and a mini excursion into the world of complex numbers and
functions of complex variables.
The treatment of the last topic may serve as a warning: The 15 pages
introducing the theory of complex functions can, quite clearly, only touch
the edge of this rich field. Even if the presentation of the other areas is much
more extensive, this does not imply that an in-depth consultation of the
mathematical literature can be avoided.
The mathematical supplements of the following volumes will offer infor-
mation on additional topics and on advanced aspects of the present ones.
Table of Contents
1 Analysis I: Functions of one real variable . . . . . . . . . . . . . . . . . 1

1.1 The concept of a function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Continuity and differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Naive considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Convergence of sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.4 Limiting value of a function . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.5 Continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.6 Differentiable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Series expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2 Numerical series or number series . . . . . . . . . . . . . . . . . . 16
1.3.3 Convergence criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.4 Fourier series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.4.1 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 Differential Equations I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.1 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 Methods of solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2.1 Separation of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.2.2 The linear differential equation of second order . . . . . . . 44
3 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.1 Qualitative vector calculus . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.1.2 Quantitative formulation of vector calculus . . . . . . . . . . 58
3.1.3 Addendum I: n-dimensional vector spaces . . . . . . . . . . . 66
3.1.4 Addendum II: nonorthogonal coordinate systems and
extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.2 Linear coordinate transformations, matrices and determinants 72
3.2.1 Linear coordinate transformations I . . . . . . . . . . . . . . . . . 72
3.2.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.2.3 Linear coordinate transformations II . . . . . . . . . . . . . . . . 85
3.2.4 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
VIII Table of Contents
4 Analysis II: Functions of several variables . . . . . . . . . . . . . . . . . 107

4.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.1.1 Functions of two independent variables . . . . . . . . . . . . . . 107
4.1.2 Functions of three or more independent variables . . . . . 111
4.2 Limiting values and differentiation . . . . . . . . . . . . . . . . . . . . . . . . 112
4.2.1 Limiting values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.2.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.2.3 Directional derivatives and gradient . . . . . . . . . . . . . . . . . 119
4.2.4 The total differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.2.5 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.3.1 Single integrals of f (x, y) . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.3.2 Double and domain integrals with f (x, y) . . . . . . . . . . . 139
4.3.3 Integrals with f (x, y, z) . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.3.4 Addendum: Elliptic integrals . . . . . . . . . . . . . . . . . . . . . . . 162
5 Basic concepts of vector analysis . . . . . . . . . . . . . . . . . . . . . . . . . 165

5.1 Vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.2 Differentiation of vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.2.1 Gradient,divergence and rotation of vector fields . . . . . . 168
5.3 Integration of vector functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.3.1 Line integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.3.2 Surface integrals with vector functions . . . . . . . . . . . . . . 176
5.3.3 The integral theorems of Gauss and Stokes . . . . . . . . . . 184
6 Differential equations II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

6.1 Further orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
6.2 Differential equation of first order . . . . . . . . . . . . . . . . . . . . . . . . 210
6.2.1 Separation of variables and transformation of variables 211
6.2.2 The total differential equation . . . . . . . . . . . . . . . . . . . . . . 212
6.2.3 The integrating factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.2.4 Linear differential equation . . . . . . . . . . . . . . . . . . . . . . . . 217
6.2.5 Differential equations of first order and higher degree . 218
6.3 Differential equations of second order . . . . . . . . . . . . . . . . . . . . . 219
6.3.1 Solvable implicit differential equations . . . . . . . . . . . . . . 220
6.3.2 Linear differential equation . . . . . . . . . . . . . . . . . . . . . . . . 223
6.3.3 Differential equations of the Fuchs class . . . . . . . . . . . . . 225
6.4 Addendum: Numerical methods of solution . . . . . . . . . . . . . . . . 228
7 Complex numbers and functions . . . . . . . . . . . . . . . . . . . . . . . . . . 237

7.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7.2 Fundamental rules of complex arithmetic . . . . . . . . . . . . . . . . . . 238
7.3 Elementary functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
8 List of literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

Table of Contents IX
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
1 Analysis I: Functions of one real variable
The first chapter provides a survey of topics from the analysis of functions
of one real variable which are of interest in theoretical mechanics. The topics
include differentiation, integration and series expansions of functions. The
chapter begins with a brief discussion of the definition of the concept of a
function.
1.1 The concept of a function
The definition of a function of one variable includes the statements:

1. Given is a domain of definition (Fig. 1.1). This comprises normally
an interval of the independent variable which will be denoted by t (the time
in theoretical mechanics). The domain of definition can also consist of a set
of isolated points.
x
domain
of definition
co-domain
Fig. 1.1. Domain of definition and co-domain
2. Any rule, which allocates a unique, real number x(t) to each point of
the domain of definition, is called a function of one real variable.
function =⇒ unique specification x(t)

for each t ∈ domain of definition.
3. The set of x -values, which is obtained with the rule, constitutes the co-
domain of the function (Fig. 1.1). The nature of the rule is not regulated in a
strict fashion. Some selected examples indicate the diversity that is possible.
2 1 Analysis I: Functions of one real variable
• The domain of definition is the interval [−∞, ∞], the specification is x(t) =
et . The function is defined by an explicit formula in this example. It can be
represented by a ’smooth’ curve in a diagram. The co-domain (see Fig. 1.2a)
is [0, ∞] .
• Consider the function x(t) = sin(1/t) in the domain (0, ∞] . The specifi-
cation uses also a formula. The co-domain is [−1, 1] . This function can,
however, not be represented by a ’smooth’ curve. The value of the function
oscillates ever more rapidly the closer one approaches the (excluded) point
t = 0 (Fig. 1.2b).
(a) (b)
x
1
x
t
–1
t
the function x = et the function sin 1/t
Fig. 1.2. Diagrams of the examples for x(t)
• The specification in the domain [−∞, ∞] is

0 for t < 0
x(t) =
1 for t ≥ 0 .
This step function is defined by an abbreviated version of a verbal state-
ment. A graphical representation is possible. It is, however, necessary to
agree on some convention in order to represent the statements ’defined for
less than or greater than’, respectively ’equal to or less than’ or ’equal to
or greater than’. The usual convention is: a filled in ’point’ in the graph
of the function belongs to the particular domain, an open ’point’ does not
(Fig. 1.3).
• A function in the domain [0, 1] is defined by the unique, semi-verbal spec-
ification

1 for rational t
x(t) =
0 for irrational t .
A graphical representation of this ’function’ in an x − t diagram is not
possible.
The concept of a function, as introduced in mathematics, covers a mul-
titude of possibilities but is too general for the requirements of physics. A
1.2 Continuity and differentiability 3
Fig. 1.3. Diagram of the present step function
more restricted class of functions (excepting some special cases) is of interest

in physics. This restriction is connected with the concepts of ’continuity’ and
’differentiability’.
1.2 Continuity and differentiability

It is always useful to develop concrete ideas of abstract concepts, even it is
necessary to resort finally to more rigorous mathematical definitions.
1.2.1 Naive considerations

A continuous function can in this sense be characterised by the statement:
continuous functions can be drawn ’in one go’. Functions as x = t2 , x = sin t
etc. could be considered to be continuous in the sense of this statement, but
also functions that consist of two (or more) connected sections as for example
(Fig. 1.4)
0.5
1 t
Fig. 1.4. A ’continuous’ function
1⎧
⎪
⎨ for t < 1
3−t
x(t) =
⎪
⎩ 1 for t ≥ 1 .
1+t
On the other hand, the functions of the examples 3 and 4 quoted above
are not continuous. The value of the function 3 jumps from one point to
the next, the function in the case 4 is completely disconnected. Example 2
can not be accommodated easily in terms of the naive characterisation. The
function can be traced up to the very vicinity of the point t = 0, but the
effort increases the closer one approaches this point.
The naive characterisation can not be justified on mathematical grounds.
The question whether it is possible to draw a curve in one go or not is, at least
in part, a question of the skill. It is necessary to convert the tentative, naive
characterisation into a rigorous mathematical definition.It turns out that the
implementation of this task calls for the introduction of a considerable chain
of additional concepts
1.2.2 Sequences
The first concept, which is needed, is the concept of a numerical sequence.
In a numerical sequence a definite number a1 , a2 , a3 , . . . , an , . . . is associ-
ated with each natural number 1, 2, 3, . . . , n, . . .. Some explicit examples
are:
1 1 1
The sequence 1, , , . . . with the general term an = .
2 3 n
The sequence 1, 4, 9, . . . with the formula an = n2 .
The sequence 1, 2, 1, . . . with a2n−1 = 1, a2n = 2 for n = 1, 2, . . ..
These examples indicate the possible properties of sequences.
• The terms of the sequence approach a finite limiting value A (A = 0 in
the first example).
• The terms increase indefinitely with increasing n beyond all limits in the
second example.
• The terms oscillate between two (or more) values in the third.
Sequences with a finite limiting value (for short limit) are called convergent,
all sequences, which are not convergent, are called divergent.
The next step is a precise definition of the concepts ’convergence’ and
’limiting value’. The formal mathematical language needed for this purpose
sounds more complicated than it actually is.
1.2.3 Convergence of sequences

The formal definition of convergence sounds as follows:
A sequence {an } is convergent and has the limiting value

A = lim an (A finite) ,
n→∞
if it is possible to find for each number ε > 0 a natural number

N (ε), so that the inequality |aν − A| < ε is valid for all ν > N (ε).
This convergence criterion sounds rather formal but is nonetheless a prac-

tical rule which can be used to verify the convergence of a sequence in a log-
ically correct manner. This is indicated in the following pictorial argument.
Figure 1.5 represents the terms of a (convergent) sequence on the number ray.
The criterion demands: choose an arbitrary interval of magnitude ε about the
(suspected) limit A (the aim is an arbitrarily small interval) so that the term
aN lies in this interval. It must then be possible to prove that all the terms
of the sequence with ν > N are also found within the interval. The task is to
name the number N once ε is specified so that this is the case. This should
be possible for each value of ε > 0 that is chosen.
A- ε A A+ ε
aN aN+1
Fig. 1.5. Illustration of the convergence of numerical sequences
The following explicit example, a sequence with the law

n+1
an = ,
n
can be used directly to illustrate the application of the criterion. The sequence
begins with
2 4 11 101
a1 = 2 a2 = a3 = . . . a10 = . . . a100 = ... .
3 3 10 100
This indicates that the limiting value of this sequence might be 1. The ap-
plication of the criterion involves the steps:
1. Insert the conjectured limiting value into the criterion

ν + 1 1

ν − 1 = ν < ε .
2. The postulate ν > N corresponds to 1/N > 1/ν .

3. Both statements can be accommodated if
1 1
ε≥ >
N ν
is chosen. This inequality is already the practical rule that is required.
Quite precisely: choose for instance ε = 1.01 · 10−6 . The number N = 106
is a possible choice in accordance with the inequality above. It is then
certain that all the terms of the sequence starting with a1 000 000 are found
within the chosen interval. A corresponding statement is possible for ε =
1.01 · 10−8 or whatever is chosen.
A warning should be expressed at this stage: a practically inclined person

might be satisfied to calculate the terms with n = 10, 100, . . . and estimate
the limiting value. This procedure is not sufficient from a logical point of
view. It is necessary to convert the estimate into an explicit statement of the
form N = N (ε) .
The topic ’convergence’ could be pursued for some time, in particular
there exist a number of variants of the criterion of convergence. It is, for
instance, possible to formulate a criterion which allows the verification of
convergence without the knowledge or an estimate of the limiting value. The
discussion will, however, continue with the investigation of the concept of a
limiting value of a function on the basis of the definition of the limiting value
of a sequence.
1.2.4 Limiting value of a function
The definition of the limiting value of a function x(t) is: consider a function
x(t) with a domain of definition and a point ta , which is the limit of a
sequence of points which lie all within the domain of definition. The function
x(t) possesses the limiting value xa at the point ta , that is
lim x(t) = xa ,
t→ta
if the following condition is satisfied: for each sequence of points in the domain
t1 , t2 . . . tn . . . with the limiting value
lim tn = ta one finds lim x(tn ) = xa .
n→∞ n→∞
This definition sounds again more cumbersome than it is. The function
sin t
x(t) = for 0 < t ≤ 1
t
can be used as an explicit example. This function is well defined for t = 0 .
The diagram of the function in the vicinity of the point t = 0 is shown in
Fig. 1.6.
x
1
t
Fig. 1.6. Limiting value of the function (sin t)/t for t → 0
The task is the determination of the limiting value of this function at

the point t = 0 , which does not belong to the domain of definition. It is
however a limiting value of suitable sequences of points which all lie within
this domain. One could then proceed as follows:
1. Consider any sequence {tn } with the limiting value ta = 0, for example
a sequence with tn = 1/n .
2. Consider the sequence {x(tn )}, for the choice x(tn ) = n sin(1/n) and use
the criterion to demonstrate that the limit is 1 .
3. The problem is however: for each ... sequence ... . As this can not be
verified easily, a different approach is called for. One possibility is the
comparison with sequences for which the convergence properties have
been established (see below Math.Chap. 1.3.3). The necessary discussion
of the limiting behaviour is simpler if the sequences are defined by a
simpler formula. Another possibility is the formulation of a definition of
the limiting value of a function in terms of the ε -formalism.
The result for the present example would be
sin t
lim =1,
t→0 t
independent of the method used. Note that the requirements of the definition
are satisfied. Even though the function is not defined for t = 0 , it has the
limiting value 1 at this point.
1.2.5 Continuous functions

The concept of continuity of a function can be defined in a precise manner
with the aid of the concept of a limiting value of a function.
A function x(t) is continuous at the point ta , if the limiting value
of the function at this position agrees with the value of the func-
tion (and both are finite)
Another simple example can be used as a comment on this definition.
Define the following function (why not!)
⎧
⎨ sin t for t < 0
x(t) = 10 for t = 0 .
⎩
sin t for t > 0
The graphical representation in the vicinity of t = 0 is given in Fig. 1.7. The
figure illustrates: the limiting value of the function at the point t = 0 is
lim x(t) = 0
t→0
(independent of the fact whether the point is approached from the right or
the left). According to the definition the value of the function is 10 . The
value of the function and the limiting value do not agree. The function is not
continuous for t = 0 .
x
Fig. 1.7. An obviously discontinuous function
1.2.5.1 Points of discontinuity. It might be of use to name possible points

of discontinuity directly. The following points of discontinuity could occur at
a point t0 :
• Isolated jumps of the value of a function. The value of the function and
the limiting value are different for one (or more) point (see Fig. 1.7 and
the example in Fig. 1.8a).
• Step functions. The left-handed limiting value does not agree with the
(a) (b)
x
x
t
t0 t
isolated jump step

Fig. 1.8. Functions with jumps
(right-handed) value of the function (Fig. 1.8b) or vice versa.

• The function is not defined for t = t0 , as shown in Fig. 1.9.
• x(t0 ) is ±∞ . A proper limiting value has to be finite (see definition). The
function in Fig. 1.10a has no proper limiting value for t0 and is divergent.
• Additional examples are functions with infinite jumps or steps (Fig. 1.10b).
Of interest is also the concept of a function which is ’continuous in an interval’.
This signifies that the function has to be continuous for each point of the
interval.
The last sections indicate that some steps are required in order to define
a simple concept as a ’coherent curve’ in a logically acceptable and math-
x
t0 t
Fig. 1.9. Function with a gap in the domain
(a) (b)
x
x
t0
t
t0 t
infinities infinite jumps

Fig. 1.10. Functions with infinities
ematically rigorous manner. A more lax approach is often used in physics.

This is acceptable as long as the more careful approach is kept in mind.
1.2.6 Differentiable functions
The discussion of differentiable functions just addresses a particular limiting

value. The formal definition is
A function x(t) is differentiable in the point t , if the limiting value
x(t + Δt) − x(t)
lim
Δt→0 Δt
exists and is unique.
This definition is usually expressed verbally in the following way: the

function x(t) is differentiable in the point t if a unique derivative (more
formally a differential quotient) can be formed. A more pictorial statement
would be: if a unique tangent line to the curve, which corresponds to the
function x(t), exists in the point t.
A continuous function is not necessarily differentiable, as is illustrated by
the following example. The function
⎧
1
⎪
⎨ for t < 1
3−t
x(t) =
⎪
⎩ 1 for t ≥ 1
1+t
which has already been introduced above, is continuous at the point t = 1 .
The derivatives for the two branches of the function are

dx 1 dx 1
= =− .
dt links (3 − t)2 dt right (1 + t)2
The derivative in the point t = 1 is (as can be gleaned from the representation
in Fig. 1.11 without a calculation) not unique.
0.5
1 t
Fig. 1.11. A function which is continuous but not differentiable at t = 1
The reverse statement is, however: a function is continuous at a point t0

if it is differentiable in this point. According to the assumption the relation
lim x(t0 + Δt) = x(t0 )
Δt→0
has to hold if the limit of the difference quotient can be evaluated. This
means that the value of the function and the limiting value in this point
must coincide. The differential quotient would otherwise not be defined.
A possible, though rough classification of functions is therefore: the set
of continuous functions is a subset of all possible functions. Differentiable
functions are a subset of the set of continuous functions.
The relevance of the concepts discussed for the functions, which are used
in the discussion of kinematics in Chap. 2 is apparent. Functions, which are
supposed to describe the position x(t) and the speed v(t) of a point particle,
should be differentiable. This is necessary for the existence of an acceleration.
Functions, which characterise the acceleration, have to be at least continuous.
Discontinuous functions can nonetheless be used to describe the acceleration,
but this is always an idealisation. For example, a point particle may be sub-
jected to a constant acceleration for some time which is turned off suddenly.
This process will, independent of the details, always be described by a con-
tinuous function (as indicated in Fig. 1.12a). A fast turn-off can in many
1.3 Series expansions 11
(a) (b)
a a
t0 t t0 t
real ideal
Fig. 1.12. Turning-off processes
cases be approximated by a step function (Fig. 1.12b). This function can be

handled more simply and is not necessarily a bad approximation of reality.
1.3 Series expansions
Series expansions are of considerable importance for practical applications in

mathematics and physics. Some functions are even defined mainly in terms
of their series expansions. Problems in physics are often more transparent if
the series expansions of the functions involved are used. The standard form
of a series expansion of a function of one variable is known as a Taylor or a
power series.
1.3.1 Taylor series
Taylor series can be introduced as a step by step approximation of a function

x(t) (see Fig. 1.13), e.g. in the vicinity of the point t = 0 . The simplest
x x(t)
Fig. 1.13. Simple approximation of x(t) at the point t = 0
approximation uses a straight line. The curve x(t) is replaced by the tangent
line on the curve in the point t = 0 .1
dx dn x
1
Derivatives will be denoted by = x (t), respectively = x(n) (t).
dt dtn

dx(t)
x(t) ≈ x(0) + t = x(0) + x (0)t .
dt t=0
The intercept of the straight line with the axis is x(0), x (0) is its gradient.
This approximation is not sufficient for points further away from the point
t = 0 . One possibility to improve matters is an approximation by curves of
higher and higher order
x(t) ≈ a0 + a1 t +a2 t2 +a3 t3 + ... + aN tN .
straight line parabola cub. parabola
The general shorthand is
N

x(t) ≈ an tn .
n=0
This ansatz would represent an approximation of the function by a polyno-

mial of N -th degree. With the hope that the function is represented better
and better if more and more terms are included, one arrives in the limit
N → ∞ at the power series ansatz
x(t) = a0 + a1 t + a2 t2 + a3 t3 + ... + aN tN + ...

∞
= an tn .
n=0
This ansatz raises a number of questions:

(i) How can the coefficients an of the power series be determined for a given
function x(t)?
Besides this more practical question the following question in principle has
to be posed as well:
(ii) Does the limit N → ∞ present any pitfalls? Does the approximation
really improve so that the equal sign can be justified in the limit?
The practical question can be answered quite easily, assuming that the
function x(t) can be differentiated an arbitrary number of times. Set t = 0
in the ansatz and obtain a0 = x(0) . Differentiate the ansatz once
x (t) = a1 + 2a2 t + 3a3 t2 + . . . + nan tn−1 + . . . ,
set t = 0 in this expression and find a1 = x (0) . Differentiate the ansatz twice
x (t) = 2a2 + 3 · 2a3 t + . . . + n(n − 1)an tn−2 + . . .
and obtain
2a2 = x (0) .
Repeat the process once more
x (t) = 3 · 2a3 + 4 · 3 · 2a4 t + . . . + n(n − 1)(n − 2)an tn−3 + . . .
with the result2

3!a3 = x (0) .
At this point the general result can be guessed to be
1 (n)
an = x (0) .
n!
If required this result could be justified by rigorous mathematical induction.
The expected form of the power series is therefore
1
x(t) = x(0) + x (0)t + x (0)t2 + . . .
2
or using standard notation

∞
x(n) (0) n
x(t) = t .
n=0
n!
A series of this form is called the Taylor expansion of the function x(t)
about the position t = 0 .
The argument presented does, however, present a problem. The ansatz has
been differentiated term by term without further thought. This is permitted
in the case of a polynomial. Whether this is also permitted for an infinite
series is the content of question (ii): in how far can the equal sign between
the actual function and the Taylor expansion really be justified? Alternatively
the question could be rephrased as: for which range of t-values can the equal
sign be guaranteed? The answer to these questions will be postponed for some
time. It is preferable to look first at some examples of Taylor series without
consideration of finer points.
• The derivatives of the exponential function x(t) = et
x(n) (t) = et and x(n) (0) = 1
give for the Taylor series about the origin

∞
1 2 1 1 n
et = 1 + t + t + t3 + . . . = t .
2! 3! n=0
n!
This series has e.g. been used for the discussion of free fall with friction
(Chap. 2) in the form (replace t → −kt)
∞

kn n
e−kt = (−1)n t .
n=0
n!
• The derivatives of x(t) = sin t are

x(t) = sin t x (t) = cos t x (t) = − sin t
x (t) = − cos t x(4) (t) = sin t ...

2
Recall the definition of the factorial n! = 1 · 2 · 3 · · · (n − 1) · n .
respectively
x(0) = 0 x (0) = 1 x (0) = 0 x (0) = −1 x(4) (0) = 0 ... .
All even powers of the series expansion vanish. This expresses the fact that
sin t is an odd function. The series is therefore
1 1 1
sin t = t − t3 + t5 − t7 + . . .
3! 5! 7!
or in general
∞

1
sin t = (−1)n t2n+1 .
n=0
(2n + 1)!
• The following example shows that the evaluation of the necessary deriva-
tives can be quite wearisome. The derivatives of the function x(t) = tan t are
x(t) = tan t x(0) = 0
1
x (t) = x (0) = 1
cos2 t
2 sin t
x (t) = x (0) = 0
cos3 t
2(1 + 2 sin2 t)
x (t) = x (0) = 2 .
cos4 t
The calculation becomes more and more involved from this point on (try it!).
The Taylor series of the tangent function
1 2 17 7 62 9
tan t = t + t3 + t5 + t + t + ...
3 15 315 2835
is, for this reason, obtained by a combination of the power series of sin t
and cos t . This is an example for the fact that many Taylor series are not
obtained by direct evaluation of higher order derivatives. It is more useful
to assemble a collection of rules which allow the construction of the series of
more complicated functions from the series of simpler functions.
• A power series that is used often is the binomial series. This series
corresponds to the Taylor series of the function
x(t) = (1 + t)α α arbitrary, real .
The calculation of the coefficients is elementary but involves some paperwork
x(t) = (1 + t)α x(0) = 1
α−1
x (t) = α(1 + t) x (0) = α
x (t) = α(α − 1)(1 + t)α−2 x (0) = α(α − 1)
..
. .
The n-th derivative is
x(n) (t) = α(α − 1) . . . (α − n + 1)(1 + t)α−n ,

with the value x(n) (0) = α(α − 1) . . . (α − n + 1) for t = 0 . The resulting
series is
α(α − 1) 2
(1 + t)α = 1 + α t + t + ...
2!
∞

α(α − 1) . . . (α − n + 1)
∞
n α n
= t = t .
n! n
n=0 n=0
The coefficients can be expressed in terms of the binomial symbol

α α(α − 1) . . . (α − n + 1)
= .
n n!
Many useful formulae of physics are obtained with this series, for example
the expansion of the typical relativistic expression
v 2 −1/2 1
1− = .
c 1 − (v/c)2
Consideration of the first terms of the series is often sufficient for small values
of v/c . The replacement α → −1/2, t → −(v/c)2 gives
v 2 −1/2 2
1 v 2 1 1 3 v 2
1− ≈ 1 + (−)2 + (−)4 + ...
c 2 c 2 2 2 c
1 v 2 3 v 4
≈ 1+ + + ... .
2 c 8 c
An important special case of the binomial series is the geometrical se-
ries. This is the series for α = −1 and t = −z

∞
1
= 1 + z + z2 + . . . = zn .
1−z n=0
The binomial series breaks off if α is a positive integer (α = m). There results

the well known binomial formula
m

m m n
(1 + t) = t .
n
n=0
The binomial coefficients can be expressed fully in terms of factorials if α is

an integer

m m!
= m ≥ n (0! = 1) .
n n!(m − n)!
Approximations or Taylor series of functions in the vicinity of an arbitrary
point t0 instead of t = 0 can also be considered. The Taylor ansatz for the
expansion of a function about the position t0 is
∞

x(t) = bn (t − t0 )n .
n=0
Differentiation of this expression (if possible) any number of times yields

(with the same restriction as for the special case t0 = 0)

1 1 dn x(t)
bn = x(n) (t0 ) = .
n! n! dtn t=t0
The answer of the question whether the power series discussed above really
represent the functions is a more protracted task. It will be answered under
the heading ’convergence criteria’ (at least in outline). The first step in this
direction is the discussion of numerical series, a form that is of interest in
mathematics and physics in its own right.
1.3.2 Numerical series or number series
Numerical series are denoted generically as

∞

un = u0 + u1 + u2 + . . . .
n=0
The individual terms represent numbers. Numerical series can be obtained by

insertion of a definite value for the variable t in a power series, as for example

∞
1 1 1 1 1
e= =1+1+ + + + + ... .
n=0
n! 2 6 24 120
They can, however, also be obtained in different ways.
Some often quoted examples of numerical series are:
• The harmonic series

∞
1 1 1
= 1 + + + ... ,
n=1
n 2 3
• the alternating harmonic series
∞

1 1 1
(−1)n+1 = 1 − + − ...
n=1
n 2 3
and the Leibniz series

∞
1 1 1
(−1)n = 1 − + − ... .
n=0
2n + 1 3 5
1.3.2.1 The calculation of the cumulative values. It is possible to

calculate the cumulative value (the sum) of a numerical series directly or
more skilfully. The series is called convergent if the value of the sum is unique
and finite. If this is not the case the series is called divergent. Cumulative
summation is, however, not a reliable method to assess the convergence of a
numerical series.
The discussion of the convergence of numerical series is generally based
on the following quantities
S0 = u 0
S1 = u 0 + u 1
..
.
k

Sk = u 0 + u 1 + . . . + u k = un .
n=0
These quantities are called subtotals or partial sums. The partial sums form
a sequence
S0 , S1 , S2 , . . . , Sk , . . . .
The numerical series possesses a unique and finite sum value if the sequence
of the partial sums converges towards a finite (and unique) limit
S = lim Sk .
k→∞
The discussion of the convergence of numerical series is reduced in this fashion

to the discussion of the convergence of sequences (see Math.Chap. 1.2.3). The
investigation of convergence would be relatively simple if it were possible to
find a closed expression for the partial sums Sk . Unfortunately this is only
the case for a few cases.
Some explicit examples for the direct evaluation of partial sums of nu-
merical series show that a more rigorous investigation of the behaviour of the
partial sums is often a necessity.
∞ 1
• The series for the number e = n=0 .
n!
k 2 4 6 8 10
Sk 2.5 2.708 . . . 2.71806 . . . 2.71828 . . . 2.7182818 . . . .
The partial sums of this series converge very quickly. The partial sum S10
reproduces the exact value S = e = 2.718281828 . . . up to 9 digits.
∞ 1
• The alternating harmonic series n=1 (−1)n+1 .
n
k 2 4 6 10 20 30
Sk 0.5 0.58333 . . . 0.61666 . . . 0.64563 . . . 0.66877 . . . 0.67675 . . .
This series converges very slowly. The value of the sum is known to be
S = ln 2 = 0.693147 . . . .
∞ 1
• The harmonic series n=1 .
n
k 2 10 20 30 40
Sk 1.5 2.929 3.598 3.995 4.279 (rounded values) .
It looks as if the sequence of partial sums converges if ever so slowly. This
conjecture is, however, wrong. The cumulative value of the harmonic series
is ∞ . The series is divergent.
In order to prove this statement the series
∞ 1 1 1 1 1 1 1
n=1 vn = 1+ + + + + + +
2 4 4 8 8 8 8
1 1 1 1
+ + ... + + + ... + + ...
16 16
32
32

8 terms 16 terms
can be compared with the harmonic series
∞ 1 1 1 1 1 1 1
n=1 un = 1 + + + + + + +
2 3 4 5 6 7 8
1 1 1 1
+ + ... + + + ... + +... .
9 16 17 32
Each term of the harmonic series is larger than or equal to the correspond-
ing term of the comparative series vn ≤ un . The comparative series can
be rewritten in a different fashion
∞ 1 1 1
n=1 vn = 1 + + +
2 2 2
1 1
+ + + . . . −→ ∞ .
2 2
N N
The harmonic series diverges in view of the relation vn < un for
each N ≥ 3 .
The estimate of convergence on the basis of a direct evaluation of the partial
sums has to be regarded with caution if no closed expression for the partial
sums is available. It is necessary to establish more general criteria for the
investigation of convergence.
1.3.3 Convergence criteria

1.3.3.1 Convergence criteria for numerical series. The starting point
for the discussion of convergence criteria for numerical series is the ma-
jorant criterion (or majorant test). It corresponds to the inverse of the
argumentation which has been used in the discussion of the harmonic series
above. The criterion states

The
sum un is convergent if 0 ≤ un ≤ vn for all n > N and if
vn converges.
The formal proof, which will not be given here, follows if the general criterion
for the convergence of sequences is transcribed to the case of a sequence of
partial sums.
The complementary statement, which has been used above, is

The
sum un is divergent if 0 ≤ vn ≤ un for all n > N and if
vn diverges.
It should be noted that the condition for all n > N allows the possibility that
a finite number of terms of the series un may be larger (in the first case) or
smaller (in the second case) as the corresponding terms of the comparative
series.
It is not useful to try to find a special comparative series, for
which con-
vergence or divergence has been established, for every given series un . It is
more economical to use some standard series, which lead to simpler criteria,
for the comparison envisaged. A useful comparative series is the geometric
series
∞

tn = 1 + t + t2 + . . . ,
n=0
as it is possible to obtain an explicit expression for the partial sums. This
expression can be calculated with the following argument. Begin with
Sk = 1+ t + . . . +tk
tSk = t + . . . +tk +tk+1
and subtract to find
Sk (1 − t) = 1 − tk+1
or
1 − tk+1
Sk = if (1 − t) = 0 .
1−t
The series is divergent for |t| ≥ 1 . On the other hand the limiting value is
(1 − tk+1 ) 1
lim Sk = lim =
k→∞ k→∞ 1−t 1−t
for |t| < 1 . By comparison with the geometric series it is possible to establish
the very useful root criterion (or root test), which states

The series un is convergent, if there exists a number q with
0 < q < 1 , so that for all n > N the relation
n
|un | ≤ q < 1
is satisfied.
A proof of the criterion can be given in the following fashion. Write the
condition stated in the criterion in the form
|un | ≤ q n for n > N .
The partial sum
SN +p = SN + uN +1 + . . . + uN +p
can be majorised by the absolute values
|SN +p | ≤ |SN | + |uN +1 | + . . . + |uN +p | .
Use then the conditions of the criterion to find
|SN +p | ≤ |SN | + q N +1 (1 + . . . + q p )
and sum the additional terms

1 − q p+1
|SN +p | = |SN | + q N +1 .
1−q
The finite result
q N +1
|S| ≤ |SN | +
1−q
can be obtained in the limit p → ∞ provided the condition q < 1 is satisfied.
Please note again that it is only necessary that all terms with n > N (N
a finite integer) satisfy the condition. Naturally, all terms with n ≤ N have
to be finite.
The related quotient criterion or quotient test can be demonstrated
in a comparable fashion. The formulation is similar, only the actual condition
is replaced by

un+1

un ≤ q < 1 for all n > N .
The complementary statement of the majorant criterion allows the formula-

tion of corresponding criteria for divergence as for example

un+1
The series un is divergent for |un | > 1 or
n > 1.
un
Application of the quotient test for the series defining the number e yields

un+1 n! 1
= =
un (n + 1)! n + 1 < 1 (for n > 0) .
The series converges. In particular one finds

un+1
lim −→ 0 < 1 .
n→∞ un
The application of the criterion for the case of the harmonic series
(whether alternating or not) gives

un+1 n
=
un n + 1 .
This is smaller than 1 if n is finite, but the limiting value is

n
lim =1.
n→∞ n + 1
This is exactly the case which is excluded in the criteria. The quotient is
neither smaller nor greater than 1. It is not possible to conclude whether
the series is convergent (as the alternating series) or divergent (as the direct
harmonic series) on the basis of the quotient test.
The criteria represent necessary but not sufficient conditions for conver-
gence or divergence. Variants and refined criteria, which can be found in the
mathematical literature, might be more adequate for special cases. The cri-
teria do not help in any way with the calculation of the values of the infinite
sums. This can involve a good deal of work. The criteria will, however, tell
you whether the attempt is worth your while.
1.3.3.2 Convergence criteria for Taylor series. The question, under

which conditions a function is represented by a Taylor series, has not been
answered. The answer to this question follows from the discussion of the
convergence of power series.
A basic (though somewhat trivial) statement concerning the convergence
of power series is:

A power series n an tn is convergent in an interval |t| < r , if it
converges for each t -value of this interval.
This statement reduces the consideration of the convergence of a power series

to the consideration of the convergence of a numerical series (for each t -value
of the interval). The number R, which characterises the largest interval about
the point t = 0 for which a power series still converges, is called the radius
of convergence (see Fig. 1.14). The root or the quotient test can be used
t
-R -r r R
Fig. 1.14. Illustration of the radius of convergence
to determine the radius of convergence. The statement

1 1
|t| < √ and in particular |t| < lim √ =R
n a
n n→∞ n an
follows from the condition

n
|an tn | < 1
of the root test. The quotient test yields the statement
an
|t| < lim =R
n→∞ an+1
because of

an+1 tn+1

an tn < 1 .
If the complementary criteria for divergence are included in the discussion,
the role of the radius of convergence can be summarised in the form:
A power series converges for |t| < R . It diverges for |t| > R. No
statement can be made for |t| = R.
The radius of convergence of the power series indicated in Math.Chap. 1.3.1

can be stated readily. It is R = ∞ for the exponential series as

∞ n
t
leads to R = lim |n + 1| = ∞ .
n! n→∞
n=0
The value of the infinite sum is finite for every (even a very large) t-value with
t < ∞ . This might sound astonishing, for example in view of the exponential
series with t = 100 . The series starts with
104 106 108 1010
e100 = 1 + 102 + + + + + ... .
2 6 24 120
The individual terms decrease, however, after the largest term 10200 /100!
with the approximate value 10200 /100! ≈ 1042 has been reached. The value
of the infinite sum for e100 is
e100 ≈ 3 · 1043 .
The radius of convergence of the series for the sine-function

∞
(−1)n 2n+1
t
n=0
(2n + 1)!
is, as a consequence of the corresponding structure of the individual terms,
also R = ∞ . It is

n+1
R = lim =1
n→∞ α − n
for the binomial series (see the discussion of the convergence of the geometric
series).
1.3.3.3 Justification of the Taylor formula. The question whether the

equal sign in the ansatz for the Taylor series is really valid can now
∞ be an-
swered in the following fashion. The discussion of the power series n=0 an tn
in the last section shows that a finite sum value can be expected for every
t within the interval of convergence. This statement is valid independent of
the method with which the coefficients have been obtained. The central the-
orem (slightly abbreviated), that justifies the special Taylor form quoted in
Math.Chap. 1.3.1, is
A power series, which represents a function x(t) , can be differen-

tiated term by term within the interval of convergence.
The longish proof of this theorem requires the steps (consult the mathemat-
ical literature):
n−1
(i) Show that the series ϕ(t) = n nnan t has the same radius of conver-
gence as the series x(t) = n an t .
(ii) Show that the function ϕ(t) , defined by the power series n n an tn−1 ,
is the derivative of the function x(t) , that is ϕ(t) = dx(t)/dt .
The Taylor formula
1 (n)
an = x (0)
n!
has been obtained via term by term differentiation. The theorem therefore
guarantees, that the function x(t) , which has been used to generate the coef-
ficients, is really represented by the series within the interval of convergence.
Three remarks conclude this relatively condensed discussion of series ex-
pansions.
(1) The statements concerning the series expansion about the origin of a
coordinate system t = 0 can be transferred to the case of an expansion about
a point t0 . For example, the radius of convergence of the series

x(n) (t0 )
x(t) = bn (t − t0 )n = (t − t0 )n
n
n!
is determined by
1
|t − t0 | < R = lim .
n→∞ n |b |
n
(2) Convergent power series can (nearly) be manipulated in the same way
as numbers. As an example, the rule for themultiplication of two
power series

is: the product of two power series x(t) = n an tn and y(t) = n bn tn can
be represented as a power series

x(t)y(t) = an tn bn tn = cm tm
n n m
within the common interval of convergence. The coefficients cn are obtained

by direct multiplication and comparison of the coefficients of the same powers
of t
c0 = a0 b0
c1 = a0 b1 + a1 b0
c2 = a0 b2 + a1 b1 + a2 b0
... ... ...
cn = a0 bn + a1 bn−1 + . . . . . . + an−1 b1 + an b0 .
This rule allows the calculation of power series of functions for which the
application of the direct Taylor formula would be rather tedious. The series
for the tangent function discussed above can be obtained with the ansatz
sin t
tan t = = cn tn .
cos t n
This is sorted in the form

n
sin t = cn t cos t .
n
After insertion of the series expansion for cos t the two series on the right
hand side are multiplied term by term. Comparison of the factors of tn with
the expansion for sin t yields recursion relations for the coefficients cn . The
radius of convergence of the tangent-series is R(tan) = π/2 as the function
cos t has the value zero for |t| = π/2 .
(3) Series constructed from more general functions are also of interest in
physics
∞

x(t) = fn (t) = f0 (t) + f1 (t) + . . . .

n=0
Each of the functions is associated with a corresponding natural number.

An example are the Fourier series which will be introduced in the next
section. A possible hierarchy among series could therefore be indicated in the
following way. A power series is a special function series with fn (t) = an tn .
A numerical series is obtained if a special value for the variable t is inserted
into a function series or a power series.
1.3.4 Fourier series
Fourier series constitute a tool which allows the representation and the anal-
ysis of periodic processes both in space and/or in time. They are, for this
reason, of particular interest in physics for the description of oscillations or
wave propagation.
A (periodic) function with the period 2L is characterised by the functional

equation
f (x) = f (x + 2L) .
Such functions can be represented by an expansion in terms of sine and cosine
functions. The standard form of this series, a Fourier series, is
1
∞ nπx
∞ nπx
f (x) = a0 + an cos + bn sin .
2 n=1
L n=1
L
The periodicity is guaranteed by the relations
nπx nπx
nπ(x + 2L)
cos ) = cos + 2nπ = cos
L L L

nπ(x + 2L) nπx nπx
sin = sin + 2nπ = sin .
L L L
An alternative representation of Fourier series, which can be handled more
easily, is based on the use of complex exponential functions. The relations
(see Math.Chap. 7.3)
1 ix 1 ix
cos x = e + e −ix sin x = e − e −ix
2 2i
are used for a transition between the two representations. The complex form
of the series is

∞
f (x) = cn e(inπx)/L .
n=−∞
The coefficients of the two representations are related by

1
cn = (an − ibn ) with a−n = an , b−n = bn and b0 = 0 .
2
1.3.4.1 Convergence of Fourier series. The question of convergence of
these series has to be dealt with before practical aspects, as the calculation of
the expansion coefficients for a given periodic function f (x), are approached.
The first step in this direction is the investigation of partial sums SN (x) with
1
N nπx
N nπx
SN (x) = A0 + An cos + Bn sin .
2 n=1
L n=1
L
The following argument shows that a partial sum with the coefficients
nπx
1 L
An = d x f (x) cos
L −L L
nπx
1 L
Bn = d x f (x) sin
L −L L
constitutes an optimal approximation of the periodic function f (x) in the

average. Approximation in the average implies a minimisation of the mean
square deviation (MSD)
L
M SD = d x (f (x) − SN (x))2
−L
in which the full Fourier representation of f (x) is to be inserted. The ’or-

thogonality relations’ of the trigonometric functions3
L nπx mπx L nπx mπx
d x cos cos = d x sin sin = L δn,m ,
−L L L −L L L
and
L nπx mπx
d x cos sin =0
−L L L
are needed for the evaluation of the mean square deviation. These relations
can be obtained by
• the substitution y = (πx)/L with dx = (L dy)/π and the limits −π and π
as well as
• a rewriting of the integrand with the addition theorem as e.g.
1
sin(ny) sin(my) = cos(n − m)y − cos(n + m)y
2
• and the basic integrals (with integer M )
π π
d y cos M y = 2π δM,0 d y sin M y = 0 .
−π −π
Evaluation of the definition of the mean square deviation yields

N
N N
1 1 L a2
M SD = d x f (x)2 − a2n − b2n − 0 + (an − An )2

L L −L n=1 n=1
2 n=1
N

(a0 − A0 )2
+ (bn − Bn )2 + .
n=1
2
This expression is minimal if the last three (positive) terms vanish. This
requires
an = An and bn = Bn for n ≤ N .
A partial sum SN with coefficients An and Bn , which are calculated as indi-
cated above, does provide the best mean approximation of a given periodic
function f (x) for each value of N . The mean square deviation is positive
definite. This allows to state the inequality
3
The Kronecker symbol δn,m takes the values 1 for n = m and 0 for n = m .
N

N

L
a20 1
a2n + b2n + ≤ d x f (x)2 for all N
n=1 n=1
2 L −L
which is known as Bessel’s inequality. The inequality is also valid in the

limit N −→ ∞ as long as
L
d x f (x)2 < ∞ .
−L
The transition from the partial sums SN (x) to the Fourier series
1
∞ nπx
∞ nπx
f (x) = a0 + an cos + bn sin
2 n=1
L n=1
L
with the coefficients
nπx nπx
1 L 1 L
an = d x f (x) cos and bn = d x f (x) sin
L −L L L −L L
in the limit N −→ ∞ demands a more extensive discussion. It is necessary
to demonstrate that this series converges absolutely and uniformly in the
basic interval4 . Only in this case is it possible to calculate the coefficients of
the series with a term by term integration. Uniform convergence implies in
mathematical language that there exists for each > 0 a partial sum SN (x)
so that |f (x) − SN (x)| < for all x of the interval. The Fourier series will
represent the function in the basic interval or a finite number of intervals, in
which it is continuous, if these conditions are met. At points with a step of
the function the series yields the number
1
lim (f (x + ) + f (x − )) .
2 →0
1.3.4.2 An explicit example. The calculation of the Fourier representa-
tion of a periodic function (if convergence is assured) implies the evaluation
of the integrals for the coefficients an and bn for all n . The coefficients bn
(or an ) will vanish for all even (or odd) functions f (x) as the sine-function is
odd and the cosine-function is even. This implies that only the coefficients
nπx
1 L
bn = d x x sin
L −L L
have to be evaluated for the saw tooth function (see Chap. 4.2.4)
f (x) = x for −L≤x≤L
The result for these coefficients is

1 L
2 nπx Lx nπx L
2L
bn = sin − cos = (−1)(n+1) .
L nπ L nπ L πn
−L
4
Alternatively: in the case of singular points in the basic interval as e.g. steps of
the function, in each closed part interval.
The last topic to be addressed under the heading ’Analysis I’ is integra-

tion.
1.4 Integration
Actually only one subitem will be discussed here: improper integrals. It is

assumed that the reader is familiar with standard integration techniques.
Topics, with which he/she should be familiar, are
1. The definite (Riemannian) integral.
2. The indefinite integral (the primitive).
3. Rules of integration.
4. Methods of integration, in particular
partial integration,
rules of substitution,
expansion into partial fractions.
An acceptable practical integration technique is the use of collections of def-
inite and indefinite integrals as5 e.g.
Gröbner-Hofreiter, Integraltafeln I, II
Gradsteyn-Ryzhik, Table of Integrals, Series and Products.
1.4.1 Improper integrals
A definite integral
b
I(a, b) = x(t) dt
a
is normally discussed under the conditions
a) The interval of integration is finite.
b) The integrand is bounded |x(t)| < M < ∞ for a ≤ t ≤ b .
It is, however, possible in certain cases to go beyond these restrictions. These
cases will be introduced in terms of explicit examples rather than by a rig-
orous approach.
1.4.1.1 Infinite intervals of integration. The first example is the integral

∞
I= e−λt dt λ>0.
0
The integrand is a decreasing exponential function, the interval of integration
is infinite. It is not easy to estimate on the basis of the graphical representa-
tion in Fig. 1.15 whether the area under the curve is finite or not. For this
5
See also List of Literature
1.4 Integration 29
x
1
t
∞
Fig. 1.15. Indication of the integral 0
e−λt dt
reason a more careful definition of such integrals is required

b
I = lim x(t) dt .
b→∞ 0
Calculation of this integral with x(t) = e−λt gives

1
I = lim − (e−λb − 1) .
b→∞ λ
The limiting value of the exponential for λ > 0 is
lim e−λb = 0
b→∞
so that the integral takes the value I = 1/λ .

The integrand of the second example
∞
I= cos t dt
0
produces alternating positive and negative contributions (Fig. 1.16) to the

integral. With the definition in terms of a limiting process one obtains
x
+ +
-- t
Fig. 1.16. Improper integral with cos t
b
I = lim cos t dt = lim [sin b − 0]
b→∞ 0 b→∞
and finds that the limiting value is not defined. This improper integral does
not exist.
The arguments can be summed up in the form: improper integrals with an
infinite interval of integration can be defined rigorously in terms of a limiting
process. On the basis of this process one distinguishes the cases
⎧
⎪
⎨ finite −→ convergence
b
I = lim x(t) dt −→ does not exist −→ divergence
b→∞ 0 ⎪
⎩
∞ −→ divergence .
1.4.1.2 Unbounded integrands. The integrand is not bounded for a sec-

ond species of improper integrals. One example of this kind is
1
dt
I= α
α > 0, α = 1 .
0 t
The integrand is indicated in Fig. 1.17. It is not bounded at the lower limit
1 t
Fig. 1.17. The function 1/tα
in this example. Again, a more cautious approach using limiting values is

required. The following limiting value
1
dt
I = lim
ε→0 ε tα
is used for the present example with the result

1
1 1
I = lim t 1−α
= lim 1−ε 1−α
.
ε→0 (1 − α) ε→0 (1 − α)
ε
The limiting value of the term ε1−α is

= 0 for 1 − α > 0
lim ε1−α
ε→0 →∞ for 1 − α < 0
and one concludes: the improper integral is convergent for 0 < α < 1 and has
the (limiting) value
1
I= .
1−α
The improper integral diverges for α > 1 . The result can be illustrated (see
Fig. 1.18) by a comparison of the integral for the functions x(t) = t−1/2 and
x(t) = t−2 .
1.4 Integration 31
(a) (b)
x
x
1 1
1/4 1 t 1/4 1 t
the function x(t) = t−1/2 the function x(t) = t−2
Fig. 1.18. Comparison of the integrands x(t) = t−1/2 and x(t) = t−2
The rise of the integrand for t −→ 0 is slow enough in the first case
(Fig. 1.18a) so that the limiting value is finite. The rise is too strong in the
second case (Fig. 1.18b).
1.4.1.3 Cauchy principal values. A similar approach is used if a singular

point is found in the interior of the interval of integration (see Fig. 1.19).
First a ’small interval’ about the singular point (t = c in the example) is cut
a c b t
Fig. 1.19. Singular points in the interval of integration
out. This leads to the definition of the limiting value

b c−ε1 b
x(t) dt = lim x(t) dt + lim x(t) dt .
a ε1 →0 a ε2 →0 c+ε2
Three possible results can be found:

1. Both limits exist if ε1 and ε2 approach zero independently. The improper
integral is convergent in this case.
2. A finite limiting value exists only for ε1 = ε2 = ε , that is a uniform
approach towards the singular point from both sides

b c−ε b
x(t) dt = lim x(t) dt + x(t) dt .
a ε→0 a c+ε
This limit is called the Cauchy principal value of the improper integral
if it exists.
3. The improper integral is divergent if even the Cauchy principal value
does not exist.
An example for an integral with a Cauchy principal value is
b
dt
a, b > 0 .
−a t
-a +
-- b t
Fig. 1.20. A Cauchy principal value integral
The area to be calculated is indicated in Fig. 1.20. The general limiting

process demands
I = lim (ln ε1 − ln a) + lim (ln b − ln ε2 ) .
ε1 →0 ε2 →0
The individual limiting values do not exist

lim ln ε → −∞ .
ε→0
On the other hand one finds for ε1 = ε2 = ε the limiting value

I = lim {ln ε − ln a + ln b − ln ε} = ln b − ln a .
ε→0
The Cauchy principal value of this improper integral exists.

2 Differential Equations I
Differential equations are an important tool for the formulation of physical

theories. Some basic facts, that is a brief general overview and methods of
solution for the simplest ordinary differential equations, are presented in the
present chapter. The topic ’differential equations’ will be taken up again in
Math.Chap. 6, but even there will it not be possible to cover this vast field.
2.1 Orientation
Three relatively simple differential equations have been introduced in the

second chapter of the main text1
v (t) = a(t) or x (t) = a(t)
v (t) = a(x(t)) or x (t) = a(x(t))
v (t) = a(v(t)) or x (t) = a(x (t)) .
The function a(t), the acceleration, is supposed to be specified in each case,
as a function of time in the first example, then as a function of the position
and finally of the velocity. The task is the determination of the functions x(t)
and v(t) = x (t) . The equations are ordinary differential equations.
• Ordinary, as the function x depends on only one independent variable
(called t in conformity with the mechanics text).
• Differential equation, as the function x(t) and its derivatives appear in the
equations.
The first concept, that has to be discussed, is the order of a differential
equation:
The order of a differential equation is the order of the highest

derivative that features in the equation.
1
Derivatives up to second order will be denoted by primes in the Mathematical
Supplement rather than by dots, which are used in the main text.
34 2 Differential Equations I
The differential equations indicated are therefore of first order for the
function v(t) (left column) respectively of second order for the function x(t) .
It should be noted that v = a(x) is not really a differential equation for v(t) .
This expresses the point of view that only the function to be determined,
its derivatives and the independent variable should occur in the equation.
The three cases are special cases of a general explicit differential equation of
second order for the function x(t)
x (t) = a(t, x, x ) .
In terms of the language of mechanics: the acceleration can be an arbitrary
function of the time, the position and the velocity. Explicit examples for
the solution of more general differential equations of second order will be
addressed in Math.Chap. 6.3.
The statement
The general solution of a differential equation of n -th order
contains n constants of integration.
can be illustrated in a direct fashion. The term ’general solution’ implies that
no conditions of any kind are stipulated concerning the solution. The mean-
ing of the word ’constant of integration’ will be clarified immediately. The
illustration begins by considering functions which contain one, two, three,
. . . constants. Assume then that the functions are possible solutions of a
differential equation and demonstrate that they are solutions of a differen-
tial equation of first, second, third, . . . order. The assertion follows then by
inversion of this argument.
he first case deals with the specification
v = v(t, c) .
The function contains the independent variable t and the parameter c . It is
assumed that each value of c produces a unique curve in the v - t-diagram, so
that variation of the parameter c leads to a family of curves. Examples are:
• A family of cubic parabolae, which are parallel shifted, is characterised by
the function
v = (t + c)3 .
The individual parabolae intersect the t -axis in the point t = −c (see
Fig. 2.1 a).
• The first term of the function
v = t + c et
corresponds to the bisecting line in the first and third quadrant of the v - t
diagram. Added to this is an exponential function. For c > 0 the function
approaches the bisecting line in the limit t → −∞ from above, for c < 0
from below (see Fig. 2.1 b).
2.1 Orientation 35
(a) (b)
x
x
The function v = (t + c)3 The function v = t + c et
Fig. 2.1. Families of curves with one parameter
The parameter c can be eliminated from the equations

d v(t, c)
v = v(t, c) and v =
= a(t, c) .
dt
For this purpose one of the equations is resolved with respect to c
c = c(v, t) or c = c(v , t)
and inserted into the other equation. This procedure gives
v = a(t, c(v, t)) or v = v(t, c(v , t)) ,
that is a differential equation of first order for the function v(t) .
• For the first example the two equations are
(t + c) = v 1/3 , v = 3(t + c)2
from which follows
−→ v = 3v 2/3 .
• For the second example the equations and the result are
v = t + c et , v = 1 + c et −→ v − v = 1 − t .
More generally the resulting function can be quoted in the form of an implicit
differential equation of first order F (t, v, v ) = 0 .
The considerations above show that a family of curves with one parameter
can be characterised by a differential equation of first order. This statement
can be inverted: the general solution of a differential equation of first order
is a family of curves with one parameter, the constant of integration. Some
additional examples for one parameter families are
family equation differential equation

straight lines x = ct tx = x
parabolae x = t2 + c x = 2t
parabolae x = ct2 tx = 2x
circles (t − c)2 + x2 = c2 2txx = x2 − t2
All straight lines of the first family pass through the origin, the parabolae of
the second family are parallel shifted, the parabolae of the third family pass
through the origin. They differ in the size of the opening and in the orientation
(open from above or below). The centres of the circles are all found on the
t -axis. All circles pass through the origin. The first and the third example
show that a ’small’ change of the form of the differential equation can lead
to quite different curves.
A family of curves with two parameters (c1 and c2 ) is characterised by an
equation of the form
x = x(t, c1 , c2 ) .
One example is the function x = c1 (t + c2 )3 . Variation of c1 results in a
cluster of cubic parabolae through the origin of the x - t diagram (Fig. 2.2)
for c2 = 0 . Subsequent variation of the parameter c2 generates clusters
of parabolae through each point of the t -axis. For the elimination of two
Fig. 2.2. The two parameter family of curves x = c1 (t + c2 )3
parameters three equations are needed. These are

2.1 Orientation 37
dx dv
x = x(t, c1 , c2 ), x =
= v(t, c1 , c2 ), x = = a(t, c1 , c2 ) ,
dt dt
respectively for the example
x = c1 (t + c2 )3 , x = 3c1 (t + c2 )2 , x = 6c1 (t + c2 ) .
The ratios
x 1 x 1

= (t + c2 ) and
= (t + c2 )
x 3 x 2
lead after simple sorting to the (not necessarily simple) differential equation
of second order
2
xx = x2 .
3
A general discussion shows that an implicit differential equation of second
order with the form F (t, x, x , x ) = 0 is obtained by elimination of the
two parameters from the equation x = x(t, c1 , c2 ) . Additional examples are
family equation differential equation
straight lines x = c1 t + c2 x = 0

parabolae/straight lines x = c1 t + c2 t2 (t2 x /2) − tx + x = 0
ellipses (t/c1 )2 + (x/c2 )2 = 1 txx + t(x )2 − xx = 0
The argumentation can be continued. The final statements are:
A family of curves with n -parameters is characterised by a

differential equation of n-th order. The general solution of
a differential equation of n-the order contains n integration
constants.
The knowledge of the general solution of a differential equation (geo-
metrically speaking the complete family of curves) is useful but not always
required. In many cases only a particular solution (one of the curves of the
family) is of interest. Two options exist for the selection of a particular
solution of a differential equation of second order.
• The values of the function at two positions (t1 , x1 ) and (t2 , x2 ) are spec-
ified. The two parameters c1 , c2 can be determined from the equations
x1 = x(t1 , c1 , c2 ) x2 = x(t2 , c1 , c2 ) .
This method of selecting a particular solution is called a boundary value
problem (Fig. 2.3a).
The equations for the determination of the parameters for the example
x = c1 (t + c2 )3 with (t1 , x1 ) = (0, 0) , (t2 , x2 ) = (1, 1)
are
c1 c32 = 0 c1 (1 + c2 )3 = 1 .
The solution is c1 = 1 and c2 = 0 so that the parabola x = t3 is selected
by the boundary values given.
• A second possibility for the selection of a definite curve requires the spec-
ification of a value of the function x(t0 ) = x0 and the first derivative
v0 = x (t0 ) at a point t0 . The parameters are determined from the equa-
tions
x0 = x(t0 , c1 , c2 ) v0 = x (t0 , c1 , c2 )
in this case. This option is referred to as an initial value problem
(Fig. 2.3b).
Consider once more the example
x = c1 (t + c2 )3 with the specification t0 = 0, x0 = 1, v0 = 1 .
This leads to the equations c1 c32
= 1 and 3c1 c22 = 1 with the solutions
c1 = 1/27 and c2 = 3 . From the family of cubic parabolae the particular
curve x = (1/27)(t + 3)3 is selected by the specification of initial values.
It should be kept in mind that not every specification leads to the selec-
tion of a particular solution. Consider, for the sake of simplicity, differential
equations of first order for which the specification of the value of the solu-
tion at one point should be sufficient. A particular solution of the differential
equation tx = 2x can be selected by specifying (t0 , x0 ) as x = x0 t/t20 . An
exception is the point (0, 0) . All the solutions pass through this point. None
of the circles (t−c)2 +x2 = c2 , which are solutions of the differential equation
2txx − x2 + t2 = 0 , can satisfy the condition x(0) = x0 = 0 . For t0 = 0 all
the circles pass through the origin.
(a) (b)
x
x
x2 x0 v0
x1
t0 t
t1 t2 t
boundary values initial values

Fig. 2.3. Determination of particular solutions
Problems of motion of theoretical mechanics are initial value problems.

Position and velocity of a mass point are specified for a definite time (the
2.2 Methods of solution 39
starting time). The discussion of oscillations, as for example the vibrations of

a string fixed at both ends, is a mechanical boundary value problem. However,
the displacement of the string has to be followed in space and in time (that
is with two independent variables). This means that the discussion has to be
based on a partial differential equation. This species will be introduced
(more explicitly) only in Volume 2 of this series.
In the next section methods of solution for simple differential equations
will be discussed. Actually the following question ought to be addressed be-
forehand: what are the necessary requirements so that it can be assured that
a solution of a differential equation exists and is unique? Quite directly one
could ask, what are the properties required of the function a(t, x, x ) in the
explicit differential equation x = a(t, x, x ) so that existence and unique-
ness of the solution can be guaranteed? For an answer to these questions you
should consult the mathematical literature and keep in mind that the ques-
tion concerning the conditions which guarantee existence and uniqueness of
a solution is not an idle one. It would be a waste of time to search for the
solution of a differential equation which does not satisfy these conditions.
2.2 Methods of solution

There exists no general method by which the (analytical) solution of every
ordinary differential equation can be found. The statement is also true if
’every’ is restricted to ’every . . . of second order’. It is possible to identify
classes of differential equations, which can be solved analytically with specific
procedures. Solutions of differential equations, which do not belong to any
of these classes, are, in general, obtained numerically. Only the methods of
solution for simple differential equations will, however, be discussed in the
introduction of the theme.
2.2.1 Separation of variables
The pattern for this method of solution can be gleaned from differential
equation of first order of the form
g(t) + f (x) x = 0 with x(t0 ) = x0 ,
or written more explicitly for the sake of the present argument
dx
g(t) + f (x) =0.
dt
A simple but incorrect path to the solution of this differential equation of
first order is the following: the ’numerator’ and the denominator of the dif-
ferential quotient are interpreted as independent, small (that is infinitesimal)
quantities so that the differential equation can be written as
g(t) d t = −f (x) d x .
The two variables are separated here. Integrate both sides of this relation in
a corresponding fashion, using the proper initial conditions, and obtain2
t x
d t̃ g(t̃) = − d x̃ f (x̃) .
t0 x0
The problem is solved if the two indefinite integrals can be worked out. The
solution may be written in terms of the corresponding primitives
G(t) − G(t0 ) = −(F (x) − F (x0 )) ,
where the initial condition is incorporated in an explicit fashion. The two
constant terms can be subsumed so that the general form of the solution
F (x) + G(t) = c ,
which may be resolved with respect to either x or also t , follows.
This argument is not correct. The differential quotient represents a lim-
iting value and not a fraction. Fortunately the same result can be obtained
in a more rigorous manner. The following steps are involved:
• Begin with the differential equation g(t) + f (x) x = 0 and consider the
indefinite integral
t t
d t̃ g(t̃) + d t̃ x (t̃) f (x(t̃)) = c
(under the assumption that a solution exists). Substitute now

x̃ = x(t̃) dx̃ = x (t̃) dt̃
with the upper limit x = x(t) in the second integral and obtain the relation
in question
t x
d t g(t ) + d x f (x ) = 0 .
• In order to demonstrate, that this relation is really equivalent to the dif-

ferential equation, differentiate with respect to t
x(t)
t
d
d t̃ g(t̃) + d x̃ f (x̃) = 0 .
dt
The first term, involving differentiation with respect to the upper limit,
yields the integrand at the position t . The second term is treated with the
chain rule

d d dx
F (x) = F (x) ,
dt dx dt
so that one finds
2
The variables are distinguished from the limits by a tilde.
x(t) x
d d
d x̃ f (x̃) = d x̃ f (x̃) x = f (x) d x .
dt dx
The sum of the two integrals with separated variables is equivalent to the
differential equation g(t) + f (x) x = 0 .
The solution of the three simple differential equations
x (t) = a(t), x (t) = a(x), x (t) = a(x )
calls for two consecutive applications of the method of separation of variables.
Substitute v = x in the differential equation x = a(t) and obtain v =
a(t) with the initial condition v(t0 ) = v0 . Solve via separation of variables
with the result
t
v(t) = v0 + d t̃ a(t̃) .
t0
The differential equation x = v(t) with the initial condition x(t0 ) = x0 is

treated in the same fashion, so that one finds
t t̃
x(t) = x0 + v0 (t − t0 ) + d t̃ d τ a(τ ) .
t0 t0
If the integrals in question can be calculated analytically the solution has

been found.
An explicit example is the free fall in one space dimension with friction
(Chap. 2.1). The specification of the acceleration (to suit the case under
consideration) is a(t) = g e−kt , for example with the initial condition
t0 = 0 x(t0 ) = x0 = 0 v(t0 ) = v0 = 0 .
This leads to the solution
g g g
v(t) = (1 − e−kt ) x(t) = t + 2 (e−kt −1 ) .
k k k
A more general ansatz
a(t) = (g − kv0 ) e−kt
with
t0 = 0 x(t0 ) = x0 v(t0 ) = v0
leads to
g
v(t) = v0 e−kt + (1 − e−kt ) and
k
v0 g g
x(t) = x0 + (1 − e−kt ) + t + 2 (e−kt −1 ) .
k k k
A substitution is required in the case of the differential equation x = a(x)
before separation of variables can be applied. The inverse of the relation
x = x(t) can be inserted into v = v(t) to give v = v(t(x)) = v(x) , or after

application of the chain rule

dv dv dx dv
x = = =v .
dt dx dt dx
The differential equation, that is obtained in this fashion
dv
v = a(x) ,
dx
can be treated by separation of variables. The definition
x
φ(x) = d x̃ a(x̃)
and the initial conditions x(t0 ) = x0 , v(t0 ) = v0 lead to a result that repre-
sents essentially the law of energy conservation of mechanics for the motion
of one mass point in one space dimension3
1 2 1 2
v − v0 = φ(x) − φ(x0 ) .
2 2
Write this result in the form v 2 = 2φ(x)+c and solve the differential equation
dx
= ±[2φ(x) + c]1/2
dt
in a second step using separation of variables
x
d x
t − t0 = ± 1/2
.
x0 [2φ(x) + c]
This result can possibly be inverted in the form x = x(t) after evaluation
of the integral. The sign has to be chosen on the basis of suitable (physical)
arguments in order to have a unique solution.
An example for this case is the harmonic oscillator problem (which can
also be solved more simply, see Math.Chap. 2.2.2): the differential equation
dv
x = −ω 2 x or v = −ω 2 x
dx
is to be solved with e.g. the initial conditions
t0 = 0, x(0) = 0, v(0) = v0 .
The mass of the oscillator moves initially in the positive x - direction and
passes through the origin. Separation of variables gives in the first step
1 2 1 2 1
v − v0 = − ω 2 x2 .
2 2 2

Resolve in the form v = ± v02 − ω 2 x2 , choose the sign in agreement with
the initial condition v(0) = +v0 and apply separation of variables a second
time to find
3
Multiply by m and sort, compare Chap. 3.2.3.
x
d x̃
ωt = 1/2 .
0 v0 2
− x̃2
ω
The primitive of this integral is the arc sine
x x
d x̃
= arcsin +c,
[a2 − x̃2 ]1/2 a
so that the result is

ωx
arcsin = ωt
v0
or by inversion
v0
x(t) = sin ωt .
ω
The amplitude A = v0 /ω is determined by the initial conditions, the maxi-
mal displacement is proportional to the initial velocity in this example. The
velocity function v(t) = v0 cos ωt can either be obtained by differentiation
of
x(t) or by insertion of the result for x(t) into the equation v = ± v02 − ω 2 x2 .
The process of solution is slightly different for the initial conditions
t0 = 0, x(0) = A, v(0) = 0 .
The mass is initially displaced in the positive x -direction. As the velocity is
zero at this time the sense of the oscillation is reversed. The solution with
the present initial conditions is
x(t) = A cos ωt v(t) = −Aω sin ωt .
The sign and the integration constant can only be selected after the second
step.
The simpler way to the solution, which has been mentioned, relies on
the fact that the function a(x) is linear in x. The method of separation of
variables has the advantage that a solution can be obtained for a general
form of the function a(x) . A prerequisite for gaining an analytical solution
are evaluation of all integrals and inversion of x = x(t) in an analytic manner.
The method of separation of variables can be applied again directly in the
case x = a(v). Write the differential equation in the form v (t) = a(v) and
obtain the solution
v
d ṽ
t − t0 = .
v0 a(ṽ)
General initial conditions are incorporated. The relation t−t0 = A(v)−A(v0 )

can be sorted in the form v = v(t, t0 , v0 ) if the integral is evaluated. The
second step is then a direct integration
t
x − x0 = d t̃ v(t̃, t0 , v0 ) .
t0
The free fall problem with friction characterised this time by a(v) = g−kv
and the initial conditions
t0 = 0, x(0) = x0 , v(0) = v0
can be used as an example in this case. Calculate4
v
d ṽ 1 g − kv
t= = − ln
v0 (g − kṽ) k g − kv0
in the first step. The choice of the sign in

g − kv
−kt
g − kv0 = e ,
which is necessary after the inversion, can be dealt with according to the
special situation. The result of the inversion
g
v(t) = (1 − e−kt ) + v0 e−kt
k
has to be integrated once more. The final result has been given above.
2.2.2 The linear differential equation of second order
The differential equation of the harmonic oscillator problem is a special case

of a linear differential equation of second order. The general form of this class
of differential equation is
a0 (t) x (t) + a1 (t) x (t) + a2 (t) x(t) = b(t) .
The coefficients ai are (reasonable) functions of t . The following cases must
be distinguished
• b(t) = 0 → homogeneous differential equation
• b(t) =
0 → inhomogeneous differential equation .
b(t) = 0 → homogeneous differential equation
b(t) = 0 → inhomogeneous differential equation.
The attribute linear refers to the fact that both the function x(t) and its
derivatives occur in the first power. This property is the reason for the wide
range of applications of this type of differential equation in physics.
Concerning the solutions of linear differential equations two general state-
ments can be made. They will be quoted here for the case of differential equa-
tions of second order even though they are valid for any (finite) order. The
first statement is
4
Use the substitution s = g − kv .
The general solution of a linear inhomogeneous differential equa-

tion has the form
xi (t, c1 , c2 ) = xh (t, c1 , c2 ) + xp (t) .
The function xh represents the general solution of the associated
homogeneous differential equation and xp a particular solution of
the inhomogeneous differential equation.
The general solution of the inhomogeneous differential equation in ques-

tion can be obtained in two steps: find first the general solution of the
associated homogeneous differential equation. Add any (the simplest pos-
sible is sufficient) solution of the inhomogeneous differential equation. This
’particular integral’ of the inhomogeneous differential equation can often be
guessed, but there exist explicit methods for its orderly determination (see
Math.Chap. 6.2.4 and 6.3.2).
The proof of the statement is simple. According to the assumptions one
may write
a0 (t) xh (t, c1 , c2 )+ a1 (t) xh (t, c1 , c2 )+ a2 (t) xh (t, c1 , c2 ) = 0
a0 (t) xi (t)+ a1 (t) xi (t)+ a2 (t) xi (t) = b(t) .
Addition of these equations leads to the statement that xh + xi is a solution
of the inhomogeneous differential equation. As this solution contains two
constants of integration, it is –as claimed – a general solution.
The general solution of the homogeneous differential equation is often
determined with the aid of the principle of superposition.
The general solution of the homogeneous linear differential equa-

tion can be composed of two arbitrary, linearly independent solu-
tions in the form
xh (t, c1 , c2 ) = c1 x1 (t) + c2 x2 (t) .
The term linear independence refers to a concept of Linear Algebra

(see Math.Chap. 3.2.4). The formal definition (within the present context) is
Two functions x1 (t), x2 (t) are linearly independent if the relation

α x1 (t) + β x2 (t) = 0
can only be satisfied with α = β = 0 for all values of the common
domain of definition of the two functions.
It is not easy to apply the formal definition as it demands that the condi-
tion has to be checked for all values of the variable t in domain of definition.
This definition can be turned into a practical tool by inclusion of a second
equation, which is obtained by differentiation of the first
αx1 (t) + βx2 (t) = 0 .

The two equations represent a simple system of linear equations in which
the quantities α and β represent the unknowns and the functions of t the
coefficients. This system possesses the solution α = β = 0 if and only if the
coefficients satisfy the condition
x1 x2 − x2 x1 = 0 .
This combination of the two functions and their derivatives is known as the
Wronski determinant5
W (x1 , x2 ) = x1 x2 − x2 x1 = 0 .
The proof of the last set of statements can be obtained readily. In order
to obtain the solution of the system of equations
α x1 (t) + β x2 (t) = 0
α x1 (t) + β x2 (t) = 0
the first equation is multiplied by either x2 or x1 and the second by x2 or x1 .
By subtraction one finds
(x1 x2 − x2 x1 )α = 0 and (x1 x2 − x2 x1 )β = 0 .
This shows that both α and β can only equal zero if the Wronski determi-
nant does not vanish. The two linearly independent solutions are also called
fundamental solutions.
The fact, that x1 (t) and x2 (t) are solutions of the homogeneous differential
equation, can be expressed in the form
a0 (t)x1 (t) +a1 (t)x1 (t) + a2 (t)x1 (t) = 0
a0 (t)x2 (t) +a1 (t)x2 (t) + a2 (t)x2 (t) = 0 .
Multiplication of the first equation with c1 , the second with c2 and addi-
tion demonstrates that xh (t) = c1 x1 (t) + c2 x2 (t) is a general solution of the
homogeneous linear differential equation.
Many basic differential equations of theoretical physics are linear. This re-
mark emphasizes the importance of the principle of superposition. A relevant
example are wave equations (even if they are partial differential equations of
second order). The superposition of solutions of these equations allows the
mathematical formulation of interference phenomena.
The solution of ordinary linear differential equations of second order is
not necessarily a simple task. Even a simple functional form of the coeffi-
cients ai (t) can lead to ’higher functions of mathematical physics’ which are
introduced only in Volumes 2 and 3 of this series. The principle of superpo-
sition can possibly be used in a practical way in these cases, as there exist
5
Details concerning determinants can be found in Math.Chap. 3.2.4
methods (see e.g. Math.Chap. 6) which allow the determination of particular

solutions.
The general solution of homogeneous, linear differential equations can be
obtained easily if the coefficients ai do not depend on the variable t . It is
possible to determine the fundamental solutions of these linear differential
equations of second (or higher) order with constant coefficients
x (t) + a1 x (t) + a2 x(t) = 0
with the ansatz
x = e αt .
The steps are:
• Insert the ansatz into the differential equation and obtain the relation
(α2 + a1 α + a2 ) e αt = 0 .
• Calculate the roots of this quadratic (the characteristic) equation6
α2 + a1 α + a2 = 0 → (α1 , α2 ) .
• The Wronski determinant does not vanish
W (eα1 t , eα2 t ) = (α2 − α1 ) e(α1 +α2 )t = 0
if the two roots are different α1 = α2 . Therefore the general solution is
xh (t) = c1 e α1 t +c2 e α2 t
in this case.
• The Wronski determinant vanishes W = 0 in the case of a double root
(α1 = α2 ) . A second linearly independent solution has to be found. The
factorisation
α2 + a1 α + a2 = (α − α1 )(α − α2 ) = α2 − (α1 + α2 )α + α1 α2
allows the identification a1 = −2α1 and a2 = α12 in the case of a double
root. A solution of the resulting differential equation
x (t) − 2α1 x (t) + α12 x(t) = 0
can be obtained with the (very popular) ansatz x(t) = R(t) eα1 t , containing
the known solution and a remaining function. Insertion of
x = (R + α1 R) eα1 t x = (R + 2α1 R + α12 R) eα1 t
into the differential equation yields a simple differential equation for the
remaining function
R e α1 t = 0 or R = 0 ,
6
Note that e αt = 0 for |t| < ∞ . For a differential equation of order n the equiva-
lent is a characteristic polynomial of degree n .
with the general solution R(t) = c1 + c2 t . The two solutions found, that is
x1 = eα1 t , x2 = t eα1 t , are linearly independent because of
W (e α1 t , t e α1 t ) = e2α1 t = 0 .
The general solution of the differential equation in the case of a double
root of the characteristic equation is therefore
xh (t) = (c1 + c2 t) eαt .
The different situations, which can be encountered in practical applica-
tions, are demonstrated by the following four examples.
1. The characteristic equation of the differential equation x + 4x − 5x = 0
has the real roots α1 = −1 and α2 = −5 . The general solution is therefore
xh (t) = c1 e t +c2 e−5t .
2. The characteristic equation of the differential equation x +4x +5x = 0 has
the complex roots (see Math.Chap. 7 for some details concerning complex
numbers and functions)
α1 = −2 + i α2 = −2 − i (i2 = −1) .
The roots are complex conjugate with respect to each other α1 = α2 . This
guarantees that the combinations α1 + α2 and α1 · α2 are real. The general
solution is
xh (t) = (c1 e it +c2 e −it ) e−2t .
It might astonish that the solution of a differential equation with real
coefficients should be complex. However, with the relation
e±it = cos t ± i sin t
a real form
xh (t) = (A cos t + B sin t) e−2t
may be obtained. The relation between the coefficients is
A = c1 + c2 B = i(c1 − c2 ) .
The two trigonometric functions are linearly independent as one finds
W (sin t, cos t) = −1 = 0 . A third form of the solution is
xh (t) = C sin(t + ϕ) e−2t
with
A = C sin ϕ B = C cos ϕ
and the inverse
A
C = A2 + B 2 tan ϕ = .
B
In the end it does not matter which of the three forms is used (except
the fact that the exponential functions are handled more easily, e.g. the
addition theorems). The specification of real initial values for an initial
value problem of physics will lead to a real solution. The initial conditions
x(0) = 1 , x (0) = 0 lead in all three cases to the real solution
xh (t) = (cos t + 2 sin t) e−2t
of the present differential equation.
3. The characteristic equation of the differential equation x − 4x + 4x = 0
has a double root α1 = α2 = 2 . Hence the general solution is
xh (t) = (c1 + c2 t) e 2t .
4. The differential equation of the harmonic oscillator x + ω 2 x = 0 can be
solved with the same method. The general solution can be given in three
(actually not so) different forms
xh (t) = c1 e iωt +c2 e −iωt = A cos ωt + B sin ωt = C sin(ωt + ϕ) .
The last example of this section is an inhomogeneous, linear differential
equation with constant coefficients The differential equation
x + ω 2 x = b0 sin ω0 t
characterises a driven harmonic oscillator (see Chap. 4.2.3). Besides the
general solution of the homogeneous oscillator equation a particular so-
lution of the inhomogeneous differential equation is needed. The ansatz
xp (t) = D sin ω0 t for the determination of this function is sufficient in this
case as the second derivative of the sine function is a sine function. Insertion
into the differential equation allows, via
(−ω02 + ω 2 )D sin ω0 t = b0 sin ω0 t
the determination of the constant D as
D = b0 /(ω 2 − ω02 )
(as long as ω 2 = ω02 ). The general solution of the inhomogeneous differential
equation is therefore
b0
xi (t) = A cos ωt + B sin ωt + sin ω0 t .
(ω 2 − ω02 )
The initial conditions x(0) = x0 , x (0) = 0 yield (by solution of a system
of linear equations for the integration constants) the special solution
b0 ω0
x(t) = x0 cos ωt + 2 sin ω 0 t − sin ωt .
(ω − ω02 ) ω
The inhomogeneous term vanishes for b0 = 0 and the special solution of the
homogeneous problem (with corresponding initial conditions) is recovered.
3 Linear Algebra
Linear Algebra is concerned with the mathematical definition of the concept

of space (of arbitrary but finite dimension),with operations in these spaces
(rotations, reflection, etc.) and the introduction of mathematical tools (ma-
trices, determinants) to describe these operations. The chapter starts with a
discussion of vectors. Vectors have been introduced around 1880 by Gibbs,
Grassmann and Hamilton. They are, since that time, an important aid of the-
oretical physics as they allow a concise formulation of problems with more
than one (space) dimension. After the introduction of a more qualitative
formulation of vector calculus (which is already of use for the solution of geo-
metrical problems) follows a discussion of the quantitative version. This is the
starting point for an abstraction from the familiar three-dimensional space
to multidimensional spaces and spaces with a nonorthogonal basis which are
touched upon (very briefly at this stage) at the end of the first part of this
chapter. The second part of the chapter deals with transformations in two-
and three-dimensional spaces and establishes the tools necessary for this and
a more general discussion.
3.1 Vectors
The vector concept is introduced in a more graphic fashion in this section.

Operations with vectors, as addition, subtraction and the different products
with vectors are presented. ’Graphic fashion’ means restriction to maximally
three space dimension and a more qualitative formulation.
3.1.1 Qualitative vector calculus
The first definition states

Quantities, which are characterised by one number, are called
scalars.
Examples for scalar quantities in physics are mass, energy, temperature
etc. An example from geometry is a line segment.
The second definition states
52 3 Linear Algebra
Quantities, which are characterised by a number and a direction,

are called vectors.
One example is the displacement vector. The statement that a person
starts at point A and moves r kilometers is not sufficient to locate the end
point. The end point can be anywhere on a circle around the starting point.
Besides the number r an additional piece of information is required to define
a unique end point, as e.g. in north-westerly direction.
Directed line segments (and more generally vectors) are represented by
an arrow, which connects the starting point and the end point (see Fig. 3.1).
The standard notation for a displacement vector (or position vector) is1
B
r
r
A
Fig. 3.1. Displacement vector
→ −→
r = r = r (=AB) ,
or more calligraphical variants. The notation for the length (the magnitude)
of a directed line segment (of a vector) is
|r| = r .
Examples for vectorial quantities in physics are forces (it plays a role in which
direction the push or pull is applied), velocity, angular momentum, electric
or magnetic fields, etc.
Some finer points can be noted notwithstanding the simple definition. One
differentiates between
• Fixed vectors. The starting point is strictly fixed. An example are displace-
ment vectors, which start at a definite position.
• Sliding Vectors. The starting point can be shifted along the straight line,
which is given by the direction of the vector. Force vectors are examples
of such vectors. The point at which the force is applied can be shifted, for
instance by a ’rope’.
• Free vectors. They can be shifted in an arbitrary way but have to retain
their direction.
1
Mainly the left hand form is used.
3.1 Vectors 53
Free vectors will be used in the discussion that follows, first to assemble
a vector calculus (to begin with in a qualitative form which is useful for
applications in physics and geometry.).
The addition of vectors corresponds to consecutive displacements. An
object is first displaced by the vector r 1 , then by the vector r 2 . The vector,
which connects the starting point and the end point of this chain of vectors,
is the sum vector. It is defined by
S = r1 + r2 .
The sum vector marks the shortest connection of these points (Fig. 3.2a).
The sum vector can also be constructed by using the same starting points
for the two vectors and complementing the figure to form a parallelogram
(Fig. 3.2a). The long diagonal of the parallelogram is the sum vector. The
fact, that vector addition is commutative, is used in this construction
r1 + r2 = r2 + r1 .
The addition of more than two vectors (Fig. 3.2b) is indicated by the state-
(a) (b)
r1 S r1
S’
r2 r2
r3
two vectors three vectors
Fig. 3.2. Illustration of vector addition
ment
r 1 + r 2 + r 3 = (r 1 + r 2 ) + r 3 = r 1 + (r 2 + r 3 ) .
The associative law of vector addition is indicated in this equation.
The multiplication of a vector with a scalar describes the following
manipulation: do not shift the object by the vector r but by a times the
vector. The resulting vector
R = ar
has the length |ar| . It points in the same direction as the vector r if the
number a is positive. The vector R is called a zero vector (null vector) if
a = 0 . The question could be posed in this case if a quantity without a
length and a direction should be called a vector. The zero vector is, however
an indispensable quantity – similar to the number 0 for the operations with
numbers – with the property
54 3 Linear Algebra
r+0 = r.
The vector R points in the direction opposite to r if a is negative, the length
is still given by |ar| . For multiplication with a scalar the distributive laws
(a + b)r = ar + br a(r 1 + r 2 ) = ar 1 + ar 2
and the associative law
a(br) = (ab)r
hold.
The subtraction of vectors can be represented with the aid of addition
and multiplication with (−1) . The difference of two vectors is
D = r 1 − r 2 = r 1 + (−1) r 2 .
The vector r2 is turned around and then added. The difference vector can
also be obtained as the short diagonal in the vector parallelogram. The end
point of the difference vector is identical with the end point of the vector r 1
(Fig. 3.3).
r1
D D
-r 2 r2
Fig. 3.3. Subtraction of vectors
There exist two different products of a vector with another. The scalar
product, also called the inner product corresponds, so to speak to the
projection of one vector onto the other. The definition (and notation) of the
scalar product is
(r 1 · r 2 ) = r 1 · r 2 = r1 r2 cos ϕ12 .
The angle ϕ12 is the angle enclosed by the two vectors. The definition includes
the following operations:
• Project the vector r 2 onto the direction of r 1 . This leads to the factor
r2 cos ϕ12 .
• Multiply with the magnitude of the vector r 1 .
• Alternatively, the vector r 1 may first be projected onto the direction of
r 2 . This is to be followed by multiplication with r2 (Fig. 3.4a).
The definition therefore associates
two vectors =⇒ one scalar (number).
3.1 Vectors 55
Rules for handling the scalar product are

• The scalar product is commutative
(r 1 · r 2 ) = (r 2 · r 1 ) .
This is the formal expression of the fact that it does not matter if r 1 is
first projected onto r 2 or vice versa..
• The distributive law
(r 1 · [r 2 + r 3 ]) = (r 1 · r 2 ) + (r 1 · r 3 )
is valid. This rule can be demonstrated easily in a plane. The projection of
the vector sum corresponds to the sum of the projections of the summands
(Fig. 3.4b).
(a) (b)
r3
r2 r2
ϕ
12
r1 r1
definition distributive law
Fig. 3.4. The scalar product
• There exists an associative law concerning the multiplication of a scalar

product with a scalar
a(r 1 · r 2 ) = ((ar 1 ) · r 2 ) = (r 1 · (ar 2 )) .
The following explicit properties can be extracted directly from the definition
• The scalar product of a vector with itself yields the square of the magnitude
(r 1 · r 1 ) = r12 as cos ϕ11 = 1 .
• The scalar product of two vectors has the value zero
(r 1 · r 2 ) = 0
if
r 1 = 0 and/or r2 = 0
or if
r1 is perpendicular to r 2 .
The two vectors are said to be orthogonal in the last case.
56 3 Linear Algebra
The scalar product is a very useful instrument. This assertion can be

demonstrated with a trigonometric example, the law of cosine. The proof,
that the relation a2 − 2ab cos γ + b2 = c2 is valid for arbitrary triangles with
elementary geometric means, is rather cumbersome. The proof with the aid
of the scalar product is more direct. The triangle can be described by a closed
polygon of vectors (Fig. 3.5)
a+b+c=0.
π−γ
b γ
a
c
Fig. 3.5. Illustrating the law of cosine
Resolve e.g. with respect to c and consider the scalar product

c2 = (c · c) = (a + b) · (a + b) = (a + b)2
= (a)2 + 2(a · b) + (b)2
= a2 + 2ab cos ϕab + b2 = a2 + 2ab cos(π − γ) + b2
= a2 − 2ab cos γ + b2 .
All variants of the law of cosine (and further laws of trigonometry) can be
obtained in a similar fashion.
The vector product, also called outer product, assigns a third vector
to two given vectors.
two vectors =⇒ third vector.

The following consideration is the background for the definition of the
vector product. Two vectors r 1 , r 2 in three-dimensional space span a paral-
lelogram (Fig. 3.6a). The area of the parallelogram is
F = r1 h = r1 r2 sin ϕ12 .
The definition of the vector product with the notation
(r 1 × r 2 ) or r1 × r2 or [r 1 , r 2 ]
incorporates three points (Fig. 3.6b):
• The quantity (r 1 × r 2 ) is a vector r 3 = (r 1 × r 2 ) .
3.1 Vectors 57
(a) (b)
b
b
ϕab h
a a
Fig. 3.6. Illustrating the definition of the vector product
• The direction of the vector r 3 is defined by the statement: the vector r 3

is perpendicular to the parallelogram. The vectors r 1 , r 2 , r 3 (exactly in
this order) constitute a right handed coordinate system. The order of the
three vectors is reflected in the following rules:
Right hand rule: (initiated by Z.W. Cole in 1902) The thumb (of the
right hand) points in the direction of r 1 , the index finger in the direction
of r 2 and the middle finger in the direction of r 3 .
Screw rule: (initiated by J,C. Maxwell) Turn the vector r 1 along the
shortest possible way in the direction of r 2 . The direction of r 3 corresponds
to the course of a standard screw.
For persons without technical experience there is the additional rule:
imagine that the three vectors indicate a sitting human figure. The vector
r 1 corresponds to the right leg, r 2 to the left leg and r 3 to the direction
towards the head.
• The vector (r 1 × r 2 ) has the length
|r 3 | = |(r 1 × r 2 )| = r1 r2 sin ϕ12 .
The magnitude of the vector product corresponds to the area of the
parallelogram which is spanned by the vectors r 1 and r 2 .
The characterisation of an area by a length may need a short comment.
The more precise expression (for the purpose of geometry) is ’a length
which constitutes a measure of an area’ (explicitly x cm is the measure
of an area with x cm2 ). No difficulties concerning units occur in physics.
For instance, the definition of angular momentum l = (r × p) leads to the
correct unit (Chap. 3.2.2)
ML2
[l] = .
T
It should be noted that the vector product yields the zero vector if one of
the factors is a zero vector or if the two vectors point in the same or in
opposite directions.
(r 1 × r 2 ) = 0 if r1 = 0 or/and r 2 = 0 or r 1 = cr 2 .
58 3 Linear Algebra
There exist also rules for handling the vector product as well:
• The vector product is anticommuting (Fig. 3.7)
(a) (b)
b
a xb
a
b bx a
Fig. 3.7. Vector product: anticommutativity
(r 1 × r 2 ) = −(r 2 × r 1 ) .
This emphasizes the detailed specification of the direction of r 3 in the right
hand rule.
• The definition implies the associative law with respect to multiplication
with a scalar
c(r 1 × r 2 ) = ((cr 1 ) × r 2 ) = (r 1 × (cr 2 )) .
• There exists a distributive law
(r 1 × (r 2 + r 3 )) = (r 1 × r 2 ) + (r 1 × r 3 ) .
A proof on the basis of elementary geometry is not difficult but somewhat
cumbersome (try it!).
The qualitative form of the vector calculus is not suitable for the pro-
duction of quantitative results. Unnecessary errors occur, for instance, if one
attempts to determine the vector sum with the aid of a ruler and a goniome-
ter (angle gauge ). It is necessary to go over a quantitative version of vector
calculus.
3.1.2 Quantitative formulation of vector calculus
A quantitative version of the operations introduced in the previous section

can be obtained by referring the vectors with respect to a chosen Cartesian
coordinate system (Fig. 3.8). A Cartesian coordinate system can be spanned
by a trihedron of unit vectors or basis vectors. These vectors
e1 , e2 , e3
(or alternatively ex , ey , ez ) are characterised by the following relations
3.1 Vectors 59
(a) (b)
z
r
e3 = ez
y
e2 = ey
e1 = ex
x
Fig. 3.8. Coordinate system and trihedron
(e1 · e1 ) = (e2 · e2 ) = (e3 · e3 ) = 1

(e1 · e2 ) = (e1 · e3 ) = (e2 · e3 ) = 0 .
In view of the commutative law there exist nine scalar products of the unit
vectors which can be written in the form
(ei ·ek ) = δik
The Kronecker symbol
⎧
⎨1 i=k
δik = for
⎩
0 i = k
has been used here in order to abbreviate the notation.
This set of relations states that the three vectors are mutually orthogonal
(perpendicular with respect to each other) and have the length 1 (are nor-
malised to 1). The relative orientation necessary for a right handed system
is expressed by the vector products
(e1 × e2 ) = e3 (e2 × e3 ) = e1 (e3 × e1 ) = e2 .
The sequence of the indices (12/3), (23/1), (31/2) corresponds to cyclic com-
mutation (permutation) of the numbers (123) . In addition one has
(e1 × e2 ) = −(e2 × e1 ) etc. and (e1 × e1 ) = 0 etc.
The nine vector products can also be summarised in a compact form
3
(ei × ej ) = k=1 ijk ek .
The factor which features in the vector sum is the Levi-Civita symbol
with the properties
⎧
⎨ 0 if two coefficients are equal,
ijk = 1 if (ijk) is a cyclic permutation of (123)
⎩
−1 for every other permutation .
60 3 Linear Algebra
There are 33 = 27 combinations of the indices, of which 21 correspond to 0,

three to 1 and three to −1 . The application of this symbol does not seem to
be worthwhile. However, it turns out to be quite useful.
The paragraphs above define a (three-dimensional) Euclidian space (in
standard notation R(3) or R3 ) which can be used for the quantitative for-
mulation of vector calculus2 . This is achieved by first projecting an arbitrary
vector r onto the coordinate axes with the aid of the scalar product
(r · e1 ) = r cos ϕr1 = x1
(r · e2 ) = r cos ϕr2 = x2
(r · e3 ) = r cos ϕr3 = x3 .
The notation (x1 , x2 , x3 ) is appropriate if a notation in terms of sums is used.
It corresponds fully to the standard form (x, y, z) . On the other hand, the
vector r can be reconstructed from the projections

3
r = x1 e1 + x2 e2 + x3 e3 = xi ei .
i=1
This expression is referred to as the decomposition of a vector into its Carte-

sian components
The components of a vector (with the starting point at the origin) are
identical with the coordinates of the end point of the vector (Fig. 3.9). The
notation
r
y
x
Fig. 3.9. The decomposition into components
r ⇒ (x1 , x2 , x3 )
can be used for this reason. Vectors in R3 can be characterised by a triple of
numbers3 .
The decomposition into components is the key for a quantitative formu-
lation of vector calculus. The individual arithmetic operations can be sum-
marised (use x instead of r 1 , etc.) with
2
An introductory discussion of additional spaces can be found in Math.Chap. 3.1.3
and 3.1.4.
3
The equivalence expressed by ⇒ can be read as an equal sign on the basis of
matrix calculus (Math.Chap. 3.2).
3.1 Vectors 61
x = x1 e1 + x2 e2 + x3 e3 y = y1 e1 + y2 e2 + y3 e3
in the following fashion (Fig. 3.10)
• Addition Using the rules indicated above one finds for the sum vector
S = x + y = (x1 + y1 )e1 + (x2 + y2 )e2 + (x3 + y3 )e3
3
= i (xi + yi )ei ⇒ (x1 + y1 , x2 + y2 , x3 + y3 ) .
The sum vector is obtained by addition of the individual components.
• Multiplication with scalar The decomposition is in this case
R = ax = (ax1 )e1 + (ax2 )e2 + (ax3 )e3
3
= i (axi )ei ⇒ (ax1 , ax2 , ax3 ) .
• Subtraction The difference of two vectors is
D = x − y = (x1 − y1 )e1 + (x2 − y2 )e2 + (x3 − y3 )e3
3
= i (xi − yi )ei ⇒ (x1 − y1 , x2 − y2 , x3 − y3 ) .
(a) (b) (c)
y
x D
x
R
S y
x
addition subtraction multiplication with scalar

Fig. 3.10. Illustration of the quantitative vector calculus
The discussion of the two products is only slightly more involved.

• With a few calculational steps one obtains for the scalar product
(x · y) = (x1 e1 + x2 e2 + x3 e3 ) · (y1 e1 + y2 e2 + y3 e3 )
3 3 3
= i,j=1 xi yj (ei · ej ) = i,j=1 xi yj δij = i=1 xi yi
= x1 y1 + x2 y2 + x3 y3 .
The scalar product can be calculated as the sum of the products of the
components if the decomposition of the two vectors is known. The result
is a number.
62 3 Linear Algebra
• The vector product can also be expressed in terms of the components of

the two vectors
3
(x × y) = i,j=1 xi yj (ei × ej ) = i,j,k ijk xi yj ek
! "
= k i,j ijk xi y j ek .
The individual steps have used the rules given and the representation of
the vector product in terms of unit vectors. The result can be summarised
in the form: the k -th component of the product vector is

(x × y)k = ijk xi yj .
i,j
The formal double sum can be written more directly if the properties of
the Levi-Civita symbol are used. Only two of the nine contributions are
different from zero. The result is (reproduce it)
(x × y) = (x2 y3 − x3 y2 )e1 + (x3 y1 − x1 y3 )e2 + (x1 y2 − x2 y3 )e3
⇒ (x2 y3 − x3 y2 , x3 y1 − x1 y3 , x1 y2 − x2 y3 ) .
The rule to recall the sequence of indices is: the first term for every compo-
nent is indexed by the cyclic complement of the index marking the compo-
nent. The second term (with a minus sign) contains the anticyclic comple-
ment. An additional rule uses the concept of determinants. This rule will
be quoted in the appropriate section (see Math.Chap. 3.2.4).
It is opportune to consider a set of ’exercises’ in order to demonstrate
the use of the two products for the discussion of geometric and trigonometric
problems.
• Exercise 1: Calculate the distance between the end points of the vectors
a = (1, 1, 1) and b = (3, 0, 4) and determine the angle between the two
vectors (Fig. 3.11a). The units of the length may be cm, m, . . . .
Answer 1: The first question can be answered √by calculation of the magni-
tude of the difference vector D = |a − b| = 14 = 3.7417 . . . (Fig. 3.11a).
The answer to the second √ question requires the knowledge of the lengths
of the two vectors (a = 3, b = 5) . The scalar √ product (a · b = 7) and
its qualitative definition yield cos ϕab = 7/(5 3) = .8083 . . . and hence
ϕab = .6296 . . . rad.
• Exercise 2: Determine the equation of the straight line through the end
points of the vectors a and b (Fig. 3.11b).
Answer 2: Each point of the straight line is described by the equation
x = a + s(b − a) . The parameter s takes the values −∞ ≤ s ≤ ∞
(Fig. 3.11b). The components of this vectorial form of an equation for a
straight line in space correspond to
xi = ai + s(bi − ai ) i = 1, 2, 3 .
3.1 Vectors 63
(a) (b)
z
z
D
D
b a
b a
x y
x y g
distance straight line
Fig. 3.11. Applications of vector calculus
• Exercise 3: Determine the equation of a plane in space which contains the

end point of the vector r 0 and is perpendicular to a given direction n
(Fig. 3.12).
(a) (b)
10
10
8
5 6
4y
2
2
4
x
6
8
10
Fig. 3.12. Application: plane in space
Answer 3: The difference vector r−r 0 lies in the plane if the end point of the
vector r is contained in the plane. The orthogonality of the vectors r − r 0
and n is expressed by the scalar product (r − r 0 ) · n = 0 . All points of the
plane (as end points of the vector r) satisfy this equation which is known as
the Hesse canonical form or Hesse normal form. An explicit equation
characterising the plane explicitly is obtained by transition to components
(use (x, y, z))
(x − x0 )nx + (y − y0 )ny + (z − z0 )nz = 0 .
This corresponds to the standard version of an equation of a plane in space
which is used in analytic geometry (see also Math.Chap. 4.1).
• Exercise 4: Calculate the area of a triangle in space which is spanned by
the end points of the vectors r 1 , r 2 , r 3 (Fig. 3.13a).
Answer 4: Calculate e.g. the difference vectors a = r 3 − r 1 and b = r 2 − r 1
(other combinations are possible) and evaluate the vector product of a and
64 3 Linear Algebra
b . The area of the triangle is F () = |(a × b)|/2 . The transition to the

representation in terms of components yields the relation
1 1/2
F () = (a2 b3 − a3 b2 )2 + (a3 b1 − a1 b3 )2 + (a1 b2 − a2 b1 )2 ,
2
which is not obtained that easily with elementary means.
• Exercise 5: Determine the shortest distance between the end point of the
vector r 0 and the straight line g = r 1 + sr 2 with ∞ ≤ s ≤ ∞ (Fig. 3.13b).
(a) (b)
r2 d
r2
r0
r1
r1
r3
g
area of a triangle determination of distances
Fig. 3.13. Applications: vector product
Answer 5: Consider an arbitrary point of the straight line characterised by

the parameter s1 . Obtain the difference vector
a = r 0 − g1 = r 0 − r 1 − s1 r 2 ,
which starts at the point of the straight line and ends at the end point
of the vector r 0 . The shortest distance is characterised by d = |a| sin ϕ ,
where ϕ is the angle between the straight line and the vector a . This
angle can be determined via the relation |(a × r 2 )| = ar2 sin ϕ . The vector
product (a × r 2 ) can be written as
(a × r 2 ) = (r 0 − r 1 ) × r 2
so that the shortest distance is given by
|(a × r 2 )| |(r 0 − r 1 ) × r 2 |
d= = .
r2 r2
The definition of the two basic products of vectors opens the possibility to
consider more complicated products composed of three, four or more vectors.
A product that is used often is a product of three vectors a, b, c , the spat
product4 . It is defined as
4
This product is also known as parallelepipedal product (the more official name),
triple product, triple scalar product and mixed product.
3.1 Vectors 65
(a b c) = a · (b × c) .
This product, a scalar quantity, represents the volume of a parallelepiped
which is spanned by the three vectors (Fig. 3.14). The vector (b × c) is per-
bxc a
c
b
Fig. 3.14. The parallelepipedal product
pendicular to the plane spanned by the two vectors and represents a measure
of the area marked by them. The projection of the vector a onto the vector
(b × c) describes the height of the parallelepiped. The volume of the paral-
lelepiped, according to the formula (base area) times (height), is therefore5
V (pepi) = (a b c) = abc cos ϕa,b×c sin ϕbc ,
or in terms of the components of the vectors involved
V (pepi) = a1 (b2 c3 − b3 c2 ) + a2 (b3 c1 − b1 c3 ) + a3 (b1 c2 − b2 c1 ) ,
or alternatively in a compact form

3
V (pepi) = ijk ai bj ck .
i,j,k=1
Use of the decomposition in terms of components (or a more geometrical

argumentation) shows that the value of the spat product does not change
under cyclic permutation of the order of the vectors. There is a change of
sign for anticyclical permutations
(a b c) = (b c a) = (c a b) = −(b a c) = −(c b a) = −(a c b) .
The spat product represents the ’oriented’ volume of a parallelepiped.
The triple vector product
v = (a × (b × c))
leads to a vector which is represented by a linear combination of the vectors b
and c. The coefficients correspond to the scalar product of the other vectors
involved
v = (a · c) b − (a · b) c .
This decomposition is known as Grassmann’s expansion or Grassmann’s
theorem. As an illustration of the proof, consider the 1 -component of the
vector v . One starts with
5
An alternative formula using 3 × 3 determinants is given in Math.Chap. 3.2.4.
66 3 Linear Algebra
v1 = a2 (b × c)3 − a3 (b × c)2
(evaluation of the outer vector product)
= a2 (b1 c2 − b2 c1 ) − a3 (b3 c1 − b1 c3 )
(evaluation of the inner vector product)
= b1 (a1 c1 + a2 c2 + a3 c3 ) − c1 (a1 c1 + a2 c2 + a3 c3 )
(sort, add and subtract a suitable term)
= (a · c)b1 − (a · b)c1 .
A similar argument can be given for the other components of the vector v .
Additional products, as e.g. the products with four vectors
(a × b) · (c × d) =⇒ a scalar
(a × b) × (c × d) =⇒ a vector
are encountered occasionally in physics (mechanics). They will not be dis-
cussed here.
In the next section a short addendum to the discussion of vectors is offered.
Two topics, which can be included under the heading ’Linear Algebra’ are
introduced here: ’n-dimensional (Euclidian) vector spaces’ and ’nonorthog-
onal (oblique) coordinate systems’. A full discussion of these topics will be
taken up at a later stage.
3.1.3 Addendum I: n-dimensional vector spaces
The following request might tax the ability of abstraction: envisage a space
which is spanned by n (with n being larger than 3) mutually perpendicular
unit vectors. Such spaces can be discussed in mathematical terms without
any difficulties even if there are problems with the imagination.
A set of vectors, which are supposed to span the space in question, may
be denoted by
e1 , e2 , e3 , . . . , en .
The expected properties of these vectors can (in analogy to the situation in
the three-dimensional space) be expressed by the orthogonality relations
(ei · ek ) = δik i, k = 1, 2, . . . , n .
The postulate, that n vectors have the length 1 and are mutually perpendic-
ular, only makes sense if it is possible to define and to implement the basic
concepts of geometry – lengths, distances and angles – in this space in a
unambiguous fashion.
The first step towards this aim is an extension of the decomposition into
components. An arbitrary vector a in this space can be expressed in terms
of the basis envisaged as
3.1 Vectors 67
n

a = a1 e1 + · · · + an en = ai ei with ai = (a · ei ) .
i=1
The vector can also be characterised, in a more formal notation, by the n-

tuple of numbers6
a = (a1 , a2 , . . . , an ) .
The basis vectors can also be represented by the basis vectors
n

ek = δki ei .
i=1
This shows that e.g. the basis vector ek can be represented by an n-tuple
with the number 1 at the k-th position
ek = (0, . . . , 1, . . . , 0) .
It is possible to transcribe the vector calculus of three-dimensional space,
including all calculation rules, to the n-dimensional space on the basis of
these definitions.
• Addition
S = x + y ⇒ (x1 + y1 , . . . , xn + yn ) .
• Multiplication with scalar
R = ax ⇒ (ax1 , . . . , axn ) .
• Subtraction
D = x − y ⇒ (x1 − y1 , . . . , xn − yn ) .
• Scalar product
n n n n

(x · y) = xi ei · yk ek = xi yk (ei · ek ) = xi yi .
i=1 k=1 i,k=1 i=1
The basic concepts of geometry can be formulated with the aid of these
operations. The length of a vector is given by the square root of the scalar
product
1/2
[a · a]1/2 = |a| = a21 + · · · + a2n .
The distance between two points in n-dimensional space, the endpoints of
two vectors b and c, is determined by the magnitude of the difference vector
d = b − c, that is |d| . The scalar product is also used to define the angle
between two vectors
(a · b)
cos ϕab = .
|a||b|
6
Use the equal sign instead of the more cautious equivalence in the sense of a
definition.
68 3 Linear Algebra
• The vector product has been used in three-dimensional space for fixing
the orientation of the trihedron. A generalisation of the vector product to
multidimensional spaces is possible but rather cumbersome. It is preferable
to avoid this discussion as long as the question of orientation is not of
interest.
The multidimensional Euclidian space indicated above can be defined over
the domain of real or of complex numbers. It is denoted by R(n) or Rn for real
n-tuples, for complex n-tuples by C(n) or Cn . The mathematical foundation of
quantum mechanics calls for an additional extension which involves the limit
n −→ ∞ . The corresponding space over the domain of complex numbers is
the Hilbert space C∞ . A four-dimensional space, the Minkowski space, plays
a central role in the (special) theory of relativity. The difference between
Euclidian and Minkowski spaces is indicated in the next section.
3.1.4 Addendum II: nonorthogonal coordinate systems and

extensions
A three-dimensional space can also be spanned by three arbitrary vectors as
long as they do not all lie in a plane or coincide (Fig. 3.15). The three basis
e3
e2
e1
Fig. 3.15. Oblique coordinate system
vectors do not need to be orthogonal nor do they have to have the length 1 .
The characterisation of the space is, also in this case, based on the scalar
products
(ei · ek ) = |ei ||ek | cos ϕik = gik i, k = 1, 2, 3 .
The scalar product is commutative
gik = gki
so that there are 6 independent quantities. The quantities gii are associated
with the lengths of the basis vectors. The quantities gik = gki with i = k
characterise the relative position described by (cos ϕik ) . The set of numbers
{gik } is called the metric tensor for this reason7 . It has the form
7
The term ’tensor’ is discussed briefly in Chap. 6.3.3. More information is given
in Vol. 2
3.1 Vectors 69
gik = δik
for the special case of a Cartesian system.
Two different decompositions of an arbitrary vector into components are
possible for a nonorthogonal coordinate system. The figures illustrate, for
the sake of clarity, the situation in a two-dimensional world. All equations
correspond to three space dimensions.
1. A vector a can be projected orthogonally onto the coordinate directions.
The components are in this case
ai = (a · ei ) i = 1, 2, 3 .
The vector can not be reconstructed from these components

3
a = ai ei .
i=1
2. The vector can be decomposed into vectors parallel to the coordinate

axes

3
a= ai ei .
i=1
The two sets of compoments

a =⇒ (a1 , a2 , a3 )
a =⇒ (a1 , a2 , a3 )
are called the covariant (lower indices, Fig. 3.16a) and contravariant (up-
per indices, Fig. 3.16b) components of the vector a .
(a) (b)
a1 e 1
a
e2 a e2
a2 e 2
e1 e1
covariant contravariant
Fig. 3.16. Decomposition of a vector with respect to a nonorthogonal coordinate
system
The two sets of components must be related. This relation is best charac-
terised by the introduction of a reciprocal coordinate system. The basis
of this system (Fig. 3.17) , which is denoted by upper indices, is defined by
70 3 Linear Algebra
3
e
e2
e1
Fig. 3.17. Reciprocal coordinate system: e1 × e2 −→ e3
(e2 × e3 ) (e3 × e1 ) (e1 × e2 )

e1 = e2 = e3 = .
(e1 e2 e3 ) (e1 e2 e3 ) (e1 e2 e3 )
The expression in the denominator is the spat product, e.g.
(e1 e2 e3 ) = e1 · (e2 × e3 ) .
The vectors ei are perpendicular to the planes which are spanned by the cyclic
complements of the set {ek } with lower indices. This implies the relation
ei · ek = ek · ei = δik .
The relation follows from the definition of the spat product for i = k. The
case i = k can be gleaned from one example
(e1 e3 e1 ) (e3 e1 e1 )
e1 · e2 = = =0
(e1 e2 e3 ) (e1 e2 e3 )
as the spat product in the numerator vanishes.
It is possible to expand the basis vectors of the reciprocal system in terms
of the basis vectors of the original system

ei = g ik ek .
k
The notation is supposed to be suggestive. The set of expansion coefficients

{g ik } represents the metric tensor of the reciprocal system. The scalar prod-
uct of the two sets of basis vectors

(ei · em ) = g ik (ek · em )
k
gives

δim = g ik gkm .
k
The scalar product of the basis vectors of the reciprocal system themselves
yields

(ei · em ) = g ik g mk (ek ek )
kk

= g ik g mk gk k
kk
3.1 Vectors 71
or
(ei · em ) = g im .
The argument also shows that the elements {g ik } are unambiguously deter-
mined by the elements {gik } . There exist 6 independent equations for the
determination of 6 quantities. The name ’reciprocal system’ implies that the
inverse relations
(e2 × e3 )
e1 = etc.
(e1 e2 e3 )
are valid (a proof will not be given).
The contravariant components are the components of a vector with re-
spect to the original basis (see above)

3
a= ai ei .
i=1
The scalar product

(a · ei ) = ak (ek · ei )
k
then shows that the relations

ai = gik ak
k
are satisfied. The two decompositions are related via the metric tensor. The
covariant components are the components of a vector with respect to the
reciprocal basis.

a= ai ei
i
because of

(a · ek ) = ai (ei · ek ) = ai δik = ak .
i i
The scalar product of two arbitrary vectors a, b can therefore be written

in three different ways

(a · b) = ai bk (ei · ek ) = ai bk g ik
ik ik

= ai bk (ei · ek ) = ai bk gik
ik ik

= ai bk (ei · ek ) = ai bi .
ik i
This implies (as could also be shown directly)

72 3 Linear Algebra

ai = g ik ak .
k
The decompositions can be fully converted into each other with the metric
(or the reciprocal metric) tensor.
Cartesian coordinate systems are sufficient for the representation of the
content of classical mechanics. There exist, however, two areas of physics
which demand the use of oblique coordinate systems:
∗ The coordinate systems have to be adapted to the crystal structure in
crystal physics.
∗ Space and time (times the velocity of light in order to have matching units)
coordinates are combined to form the basis of a four-dimensional space in
the (special) theory of relativity. The basis vectors of this space are still
orthogonal
(ei · ek ) = 0 for i = k .
The metric is, however, not Euclidian
(ei · ei ) = 1 for all i.
A distinction between co- and contravariant components is required in both
cases.
3.2 Linear coordinate transformations, matrices and

determinants
Linear Algebra is also concerned with the question how the decompositions of
a vector with respect to two different (Cartesian) coordinate systems, which
can be in relative motion, should be related. This question is of interest in
physics as different ’observers’ register the results of an experiment with re-
spect to their specific coordinate systems. The necessary transformation to
the point of view of another system is very simple for the case of uniform rel-
ative motion (at least in classical mechanics). The transformation is already
more complicated if the relative motion is a uniform rotation about an axis.
The compact formulation of linear transformations is based on matrix cal-
culus. The discussion of coordinate transformations will therefore be divided
into two parts (a brief introduction and the actual presentation of the details)
which are divided by the introduction of matrices and matrix calculus. The
section is concluded by the introduction of another useful tool, determinants.
3.2.1 Linear coordinate transformations I
The discussion of coordinate transformations in three-dimensional space can

be quite demanding. It is opportune for this reason to begin with some ex-
amples in a two-dimensional world (Fig. 3.18):
e 2
r
e’2
e’1
α
e 1
Fig. 3.18. Rotation of the coordinate system in R2
Consider a vector r in R2
r = x1 e1 + x2 e2 ⇒ (x1 , x2 )
which is referred to a Cartesian basis. A second Cartesian coordinate system,
which is rotated by the angle α with respect to the first system, is spanned
by the (primed) unit vectors e1 and e2 . The decomposition of the vector r
with respect to the second system is
r = x1 e1 + x2 e2 .
The following question has to be answered: how can the components x1 and
x2 be calculated if the components x1 , x2 and the angle α are known? A
first step towards an answer is the representation of the basis vectors of the
primed system in terms of the basis vectors of the unprimed system. Simple
trigonometric considerations give (see Fig. 3.19a)
e1 = + cos α e1 + sin α e2
e2 = − sin α e1 + cos α e2 .
It would also have been possible to proceed in the reverse order and express
e1 and e2 in terms of the basis vectors of the primed system (Fig. 3.19b).
The relevant equations can be obtained from the first set of vector equations
(a) (b)
e 2
e2 e’ 2
e’1
e’2 α
e’1
e1
α
e1
Projection on Projection on
unprimed system primed system
Fig. 3.19. Relation between the coordinate systems
74 3 Linear Algebra
by simple manipulations (multiply by cos α, − sin α, add, etc.)

e1 = cos α e1 − sin α e2
e2 = sin αe1 + cos α e2 .
Comparison of the two sets of transformation equations shows, as expected,
that the unprimed system
The relation between the two sets of components of the vector r can now
be obtained in the following fashion: write
x1 e1 + x2 e2 = x1 e1 + x2 e2
for two decompositions of the same vector with respect to two systems. Inser-
tion of one of the transformation equations for the basis vectors, e.g. for the
primed basis, ordering of the terms and comparison of the coefficients gives
x1 = x1 cos α − x2 sin α
x2 = x1 sin α + x2 cos α .
The inverse relation is
x1 = x1 cos α + x2 sin α
x2 = −x1 sin α + x2 cos α .
The components transform in the same way as the basis vectors. The equa-
tions for the basis vectors are, however, a set of vectorial equations, the
equations for the components are algebraic.
The relevance of these transformations for physics can be gleaned from the
following consideration: assume that x1 (t), x2 (t) characterise the motion of a
point particle and that the primed coordinate system rotates in some fashion,
characterised by α(t) , with respect to the unprimed system. The second set of
transformation equations between the components then allows the calculation
of the time development of the coordinates of the point particle with respect
to the primed system (Fig. 3.20). An example (three-dimensional) is an object
e2
e’2
e’1
α(t)
e1
Fig. 3.20. Interpretation of the transformation
moving on the rotating (about its north-south axis) earth viewed from a
system tied to the earth and a system fixed in space.
The situation becomes more involved if a second rotation (about the an-
gle β) of the unprimed coordinate system is considered (Fig. 3.21). For a
description of this situation three coordinate systems are needed:
2
β
α
1
Fig. 3.21. Three coordinate systems, two rotations
1) the unprimed,
2) the once primed as before, rotated by the angle α with respect to the
unprimed,
3) a double primed. This is rotated by the angle β with respect to system 2
and by the angle (α + β) with respect to system 1.
A vector r (in the plane) can be decomposed with respect to each of the three
systems and transformation equations between the three sets of components
can be given, for example
x1 = x1 cos β + x2 sin β β
2 −→ 3
x2 = −x1 sin β + x2 cos β .
The transformation between the components of system 1 and system 3 are
obtained by insertion of the first set of equations into this set. This yields
x1 = x1 {cos α cos β − sin α sin β} + x2 {sin α cos β + cos α sin β}
which can, as might have been expected, be written as
x1 = x1 cos(α + β) + x2 sin(α + β) .
The corresponding equation for the 2 -coordinate is
x2 = −x1 sin(α + β) + x2 cos(α + β) .
A sequence of transformations (here rotations in a plane) can be described
correctly by insertion of the relevant equations into each other. These explicit
insertions are not easily handled if a larger number of transformations has to
be considered if more than two space dimensions are involved. Some effort is
already required in the case of three space dimensions. An elegant tool for the
formulation of transformations in multidimensional spaces is matrix calculus
which is introduced in the next section.
76 3 Linear Algebra
3.2.2 Matrices
The simple basic definition is8 :
A rectangular arrangement of (real) numbers is called a matrix.
Direct examples are

1 2 3 a11 a12 a13
or .
3 1 1 a21 a22 a23
The indices in the example on the right hand side denote the position of the
element in the rows and columns. The element aik is found in the i-th row
and the k-th column9 . A general form of a matrix is therefore
⎛ ⎞
a11 a12 . . . a1N ↑
⎜ a21 a22 . . . a2N ⎟
⎜ ⎟
⎜ .. .. ⎟ M rows
⎝. . ⎠
aM 1 aM 2 . . . aM N ↓
← N columns →
The customary notation for a matrix is (additional variants are possible):
A in order to indicate a matrix.
(aik )M,N in order to indicate the elements and the number of rows and
columns.
(A)M,N in order to indicate only the dimension of the matrix.
The following examples can be taken from the discussion of transformations
in two space dimensions:
(1) The matrix of the coefficients of the transformation equation for rotations
in a two dimensional world

cos α sin α
D(α) = .
− sin α cos α
This is an example of a 2 × 2 matrix. A matrix with the same number of
rows and columns (M = N ) is called a square matrix.
(2) The representation of the components of a vector in R2 or in R3 (now
definitely with the equal sign)
⎧
⎪
⎪ x1
⎪
⎪ or (x1 , x2 )
⎪
⎪ x2
⎨
r= ⎛ ⎞ (with or without commata) .
⎪
⎪ x1
⎪
⎪ ⎝ x2 ⎠ or (x1 , x2 , x3 )
⎪
⎪
⎩
x3
8
The restriction to real numbers is not necessary, but will be used in the following.
9
Note that there is no agreement concerning this allocation. It is advisable to
check the position of row and column indices in each text.
These are examples for matrices with one row or one column. The rep-
resentation of vectors as one row or one column is used alternatively
depending on the situation and typographical convenience.
In the spirit of the last example: the set of numbers of a general M × N
matrix
(ai1 . . . aiN )
is called the i-th row vector, the set of numbers
⎛ ⎞
a1k
⎜ .. ⎟
⎝ . ⎠
aM k
is called the k-th column vector.
The basic statement concerning matrix calculus is: it is possible to handle
matrices (nearly!) in the same way as numbers. The discussion of the various
mathematical operations with matrices has to be preceded by a completion
of the list of terms and concepts which are used in this context.
(i) The chain of elements a11 , a22 , . . . of a matrix A is called the main or
principal diagonal
⎛ ⎞
a11 . . . . . .
⎜ a22 ⎟
⎜ ⎟
A = ⎜. . . ⎟ .
⎝. . . . .⎠
.
. . . . . . aN N
(ii) A matrix, which is reflected with respect to the main diagonal, is the
transposed matrix AT (variants of notation are indicated)
⎛ ⎞
a11 a12 a13 . . .
⎜ a21 a22 a23 . . . ⎟
⎜ ⎟
A = ⎜ a31 a32 a33 . . . ⎟ =⇒
⎝ ⎠
.. .. ..
. . .
⎛ ⎞
a11 a21 a31 . . .
⎜ a12 a22 a32 . . . ⎟
⎜ ⎟
AT = {A = A∗ } = ⎜ a13 a23 a33 . . . ⎟ .
⎝ ⎠
.. .. ..
. . .
An example is
⎛ ⎞
T 13
123
= ⎝2 1⎠ .
311
31
The relation
78 3 Linear Algebra
(AT )T = A
can be obtained on the basis of the definition. The transposed of a trans-
posed matrix is the original matrix
(iii) Two matrices A and B are called similar if the number of rows and the
number of columns is the same
M A = MB N A = NB .
(rows) (columns)
(iv) Two matrices A and B are called equal if they are similar and if the
elements at each position agree
M A = MB N A = NB and aik = bik .
This is expressed by A = B .
Mathematical operations with matrices are addition, multiplication with
a number, subtraction, matrix multiplication and matrix inversion.
• The definition of the addition of matrices follows the definition of the
addition of vectors which may, as indicated above, be interpreted as a
particular matrix.
The matrix C is called the sum of two similar matrices A = (aik )

and B = (bik ) if its elements are given by cik = aik + bik
C=A+B with cik = aik + bik .
A direct example says more in this case than any further explanation

123 234 357
+ = .
311 422 733
Vector addition (here with columns) is a special case

x1 y1 x1 + y1
+ = .
x2 y2 x2 + y2
Note again: the addition of matrices is only defined for similar matrices.
• The second operation, the multiplication of matrix with a (real) num-
ber, can also be regarded as the extension of the corresponding operation
with vectors.
The multiplication of the matrix A with the (real) number α leads
to the matrix C = αA with the elements (cik ) = (α aik ) .
This implies for example

⎛ ⎞ ⎛ ⎞
a11 a12 . . . α a11 α a12 ...
⎜ a21 a22 . . . ⎟ ⎜ α a21 α a22 ... ⎟
α⎝ ⎠=⎝ ⎠ .
.. .. .. .. .. ..
. . . . . .
A set of rules applies to these operations with matrices. They are collected
below without additional comment.
Commutative law of addition: A + B = B + A
Associative law of addition : A + (B + C) = (A + B) + C
Distributive laws : (α + β)A = αA + βA
: α(A + B) = αA + αB
Rules for transposition : (αA)T = αAT
: (A + B)T = AT + BT
Multiplication with α = 0 yields a zero matrix or null matrix
⎛ ⎞ ⎛ ⎞
a11 a12 . . . 0 0 ...
⎜ ⎟ ⎜ ⎟
0 ⎝ a21 a22 . . . ⎠ = ⎝ 0 0 . . . ⎠ .
.. .. .. .. .. ..
. . . . . .
• The difference of two similar matrices can be defined by combination of
the two operations above
D = A + (−1)B = A − B with dik = aik − bik .
• The definition of the multiplication of two matrices is fashioned, as
indicated above, after the manipulation of a sequence of transformations.
Many problems and questions of mathematics and physics can be formu-
lated in a concise manner using matrix multiplication. The definition of
this operation is more involved
The matrix C = (cik )M,R with M rows and R columns and the elements
N

cik = aij bjk

j=1
represents the product of the matrix A = (aik )M,N with M rows and N
columns with the matrix B = (bik )N,R with N rows and R columns
C = AB .
This definition calls for an explanation. Concentrate on the i-th row of the
matrix A with M rows and N columns and the k-th column of the matrix
B with N rows and R columns
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
.. ↑ b1k ↑ .. ↑
⎜ . ⎟ ⎜ b ⎟ ⎜ . ⎟
⎜ ai1 ai2 . . . aiN ⎟ M • ⎜ ⎟
2k
⎝ ⎠ ⎜ .. ⎟N =⎜ ⎟
⎝ . . . cik . . . ⎠ M
.. ⎝ ... . ... ⎠ ..
. ↓ bN k ↓ . ↓
←− N −→ ←− R −→ ← R →
The element cik of the product matrix has the form
cik = ai1 b1k + ai2 b2k + . . . aiN bN k .
80 3 Linear Algebra
This implies that the first element of the i-th row A is multiplied by the
first element of the k-th column of B plus the product of the corresponding
second elements and so on. The rule to remember is: each row of the matrix
A is combined in this fashion with each column of the matrix B. As the
matrix A has M rows and the matrix B R columns a product matrix with
M rows and R columns is obtained. The operation is only defined if the
shapes of the two factors are matched. The number of columns of A has
to agree with the number of rows of B .
The definition is also illustrated by a number of examples. The first ex-
ample illustrates the formal execution of the matrix multiplication
⎛ ⎞
b11 b12
a11 a12 a13 ⎝
b21 b22 ⎠ =
a21 a22 a23
b31 b32
.

a11 b11 + a12 b21 + a13 b31 a11 b12 + a12 b22 + a13 b32
a21 b11 + a22 b21 + a23 b31 a21 b12 + a22 b22 + a23 b32
The product of a 2 × 3 matrix with a 3 × 2 matrix yields a 2 × 2 matrix. The
outer indices in each term of the sums mark the position (row on the left,
column on the right) of the elements of the product matrix.
The second example is just numerical (please check!)
⎛ ⎞⎛ ⎞ ⎛ ⎞
1 1 1 1 1 2 3 4 4 8 12 16
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜2 2 2 2⎟⎜1 2 3 4⎟ ⎜ 8 16 24 32 ⎟
⎜ ⎟⎜ ⎟ = ⎜ ⎟
⎜3 3 3 3⎟⎜1 2 3 4⎟ ⎜ 12 24 36 48 ⎟ .
⎝ ⎠⎝ ⎠ ⎝ ⎠
4 4 4 4 1 2 3 4 16 32 48 64
The third example illustrates the transformation law between the com-
ponents of a vector in two coordinate systems in R2 , which are rotated with
respect to
each other, in terms of the matrix notation. Write
x1 for the components of the vector in the
x=
x2 unprimed system

x1 for the components of the vector in the
x =
x2 primed system and

d11 d12 for the rotation matrix which mediates the
D=
d21 d22 transition between the two systems.
The matrix relation is then

x1 d11 d12 x1
x = Dx =⇒ =
x2 d21 d22 x2

d11 x1 + d12 x2
= .
d21 x1 + d22 x2
The fourth example illustrates the consecutive application of transforma-

tions

cos β sin β cos α sin α
D(β)D(α) =
− sin β cos β − sin α cos α

cos α cos β − sin α sin β sin α cos β + cos α sin β
=
− cos α sin β − sin α cos β − sin α sin β + cos α cos β

cos(α + β) sin(α + β)
= = D(α + β) .
− sin(α + β) cos(α + β)
This equation should be read in the following fashion: A rotation with the
angle α in R2 followed by a rotation with the angle β is equivalent to a
rotation with the angle α + β .
The fifth example shows that a system of M linear equations in N un-
knowns
a11 x1 + a12 x2 + . . . + a1N xN = b1
a21 x1 + a22 x2 + . . . + a2N xN = b2
..
.
aM 1 x1 + aM 2 x2 + . . . + aM N xN = bM
can be written as a matrix equation
Ax = b
if the definitions
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a11 . . . a1N x1 b1
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
A=⎝ ..
. ⎠ x = ⎝ ... ⎠ b = ⎝ ... ⎠
aM 1 . . . aM N xN bN
are used.
The examples 3 and 5 hint at the fact that there might be some connection
between the discussion of systems of linear equations and the transformation
of vectors (in spaces of higher dimensionality).
Scalar products of vectors can be expressed in matrix form. Using the
representation of vectors in the form of columns and keeping in mind the
statements concerning transposition one may write e.g. in R3
⎛ ⎞
x1
y T x = (y1 y2 y3 ) ⎝ x2 ⎠ = (y1 x1 + y2 x2 + y3 x3 )
x3
=⇒ (x · y) .
A multiplication of a row vector with a column vector results in a 1 × 1
matrix, a scalar.
82 3 Linear Algebra
There exist a number of rules for matrix multiplication, which could be

proven in detail. This will not be done as the rules are, with one exception,
similar to the rules for handling numbers. A comment will, however, be offered
in each case.
Rule 1: The associative law
(A · B) · C = A · (B · C)
is valid. The shape of the matrices has to be matched though
(A)M N (B)N R (C)RS = (D)M S .
Rule 2: Distributive laws are
(A + B)C = AC + BC C(A + B) = CA + CB .
Rule 3: The multiplication of a matrix product with a scalar (number) can
be handled in different ways
(αA)B = A(αB) = α(AB) .
Rule 4: Matrix multiplication is, as already mentioned, not commutative.
The order of the factors can (in general) not be exchanged
in general AB = BA .
This is quite apparent for the case of rectangular matrices, as e.g.
(A)M N (B)N M = (C)M M (B)N M (A)M N = (C)N N .
The original order yields a M ×M matrix, the second product a N ×N ma-
trix. However, even for square matrices commutativity does not necessarily
hold. This can be demonstrated with a simple example

01 10 00
AB = =
00 00 00

10 01 01
BA = = .
00 00 00
On the other hand, square matrices can be commuted in some cases. It can
e.g.be checked explicitly that this is the case for two rotations in the plane
D(α)D(β) = D(β)D(α) = D(α + β) .
The interpretation of this statement is: it does not matter in which sequence
the two rotations in the plane are executed. The situation is quite different
for rotations in the three dimensional world (see e.g. Chap. 6.2 and 6.3).
Rule 5: A useful rule concerns the transposition of products
(AB)T = BT AT .
The transposed of a matrix product is the product of the transposed factors
in a reversed order. The proof contains the statements:
(a) The rule for matching of shapes demands this rule

(A)M R (B)RN = (C)M N −→ CT = (C̃)N M
BT AT = (B̃)N R (Ã)RM = (C̃)N M .
(b) Explicit consideration of the element with the index ik on both sides
of the equation and comparison of these elements is then sufficient
Rule 6: The discussion of multiplication raises the question of the unit
matrix This matrix is defined as
⎛ ⎞
1 0 0 ... 0 0
⎜0 1 0 ... 0 0⎟
⎜ ⎟
⎜ .. ⎟ .
E = (δik )N N = ⎜ ... ..
. .⎟
⎜ ⎟
⎝0 0 0 ... 1 0⎠
0 0 0 ... 0 1
The unit element of matrix calculus is a square matrix with the number 1
at each position of the main diagonal and 0 in the off-diagonal elements.
Its properties are
(A)M N (E)N N = (E)M M (A)M N = (A)M N
or written in short hand
AE = EA = A .
The last mathematical operation to be discussed is ’matrix division’ or
more correctly the question of the existence of an inverse matrix. A matrix
B with the property
AB = E
is called the inverse of the matrix A . The notation is
B = A−1 .
A distinction between the left inverse and the right inverse is necessary as
matrix multiplication is in general not commutative
right inverse AA−1

R =E
left inverse A−1

L A=E .
The situation is not simple concerning matrices of a general shape. This

is illustrated by the following example for a 3 × 2 matrix
⎛ ⎞
1 2
A = ⎝ 1 −2 ⎠ .
1 2
The left inverse exists as one has
84 3 Linear Algebra
⎛ ⎞

1/2 1/2 0 1 2
⎝ 1 −2 ⎠ = 1 0
.
1/4 −1/4 0 1 2
0 1
For the right inverse one ought to have

⎛ ⎞ ⎛ ⎞
1 2 1 0 0
⎝ 1 −2 ⎠ a b c = ⎝ 0 1 0⎠ .
d e f
1 2 0 0 1
The first column of the matrix product on the left hand side of this equation
demands
a + 2d = 1
)
a − 2d = 0
−→ a = d = 0 =⇒ disagreement .
a + 2d = 0
The relations can not be satisfied. The right inverse does not exist. The
following statements can, however, be made for square matrices:
• The existence of one of the inverses implies the existence of the other (the
proof is lengthy and calls for tools which have not been prepared).
• The right and the left inverse matrices are then equal
A−1 −1
L = AR = A
−1
.
The proof is
A−1 −1 −1 −1 −1 −1
L = AL E = AL AAR = EAR = AR .
Square matrices which have an inverse are called invertible or regular, square
matrices without an inverse are called singular. A direct criterion, which al-
lows an answer to the question of the existence of the inverse of a square ma-
trix, will be found during the discussion of determinants (Math.Chap. 3.2.4).
There exist some useful rules for operations with regular matrices:
1. (AB)−1 = B−1 A−1 (the inverse of a product)

−1 −1
2. (A ) =A (the inverse of the inverse)
1
3. (αA)−1 = A−1 α = 0
α
4. (AT )−1 = (A−1 )T (exchange of operations)
The corresponding proofs are collected below (without comment):
1. (AB)−1 = B−1 A−1 .
Proof:
E = B−1 B = B−1 EB = (B−1 A−1 )(AB)
⇐⇒ (AB)−1 = B−1 A−1 .
E = (AB)−1 (AB)
2. (A−1 )−1 = A .
Proof:
A−1 A = E =⇒ (A−1 A)−1 = E−1 = E
With 1. : A−1 (A−1 )−1 = (AA−1 )−1 = E =⇒ (A−1 )−1 = A .
1 −1
3. (αA)−1 = A .
α
Proof:
(αA)−1 (αA) = E =⇒ α(αA)−1 A = E
1 1
=⇒ (αA)−1 = EA−1 = A−1 .
α α
4. (AT )−1 = (A−1 )T .
Proof:
(A−1 A)T = ET = E
=⇒ AT (A−1 )T = E (see rule for transposition)
=⇒ (AT )−1 AT (A−1 )T = (AT )−1
(A−1 )T = (AT )−1 as (AT )−1 AT = E .
The discussion of linear coordinate transformations can now be continued

using matrix calculus as a means to formulate the theory more concisely.
3.2.3 Linear coordinate transformations II
The compact mathematical language for the presentation of coordinate trans-

formations allows the addition of a few details in the two-dimensional world.
The rotations considered in part I

x1 cos α sin α x1
=
x2 − sin α cos α x2
can be interpreted in two different ways.
The previous interpretation has been: the coordinate system has been ro-
tated while the vector x remained fixed. The equations represent the connec-
tion between the components of one vector with respect to the two coordinate
systems. From the point of view of the vectors this could be called a passive
approach (Fig. 3.22a).
It is also possible to take the point of view that the vector is rotated
while the coordinate system stays fixed. The interpretation of the equation
is in this case: the vector x is rotated by the angle (−α) with respect to the
vector x . This statement is supported by the following argument
86 3 Linear Algebra

cos α sin α x cos ϕ
x =
− sin α cos α x sin ϕ
(x is a vector of length x in the ϕ direction)

cos α cos ϕ + sin α sin ϕ cos(ϕ − α)
=x =x .
− sin α cos ϕ + cos α sin ϕ sin(ϕ − α)
The vector x is a vector with the same length pointing in the direction given
by the angle (ϕ − α) . From the point of view of the vectors this is the active
approach (Fig. 3.22b).
(a) (b)
x x
S’ x’
−α
α ϕ
S S
passive active
Fig. 3.22. Passive and active interpretation of rotations
A general linear transformation in R2 has the form

x1 a11 a12 x1 b1
x = Ax + b =⇒ = + .
x2 a21 a22 x2 b2
The question is: Which operations are represented by this transformation
equation?
The transformation is reduced to
x = x + b
if the matrix A is identical with the unit matrix E . The two options for the
interpretation of this equation are:
• The origin of the system S is shifted by the vector (−b) with respect to
the origin of the system S . The vector x as viewed from S corresponds to
a vector x = x + b as viewed from S (passive point of view, Fig. 3.23a).
• A vector x is obtained by adding the vector b to the vector x (active
point of view, Fig. 3.23b).
This parallel displacement will not be considered further in the following.
The discussion will concentrate on the homogeneous linear transformation
x = Ax .
It begins with the analysis of a few examples on the basis of the active point
of view.
(a) (b)
S S’
S’ b
x
-b x’
x
x’
passive active
Fig. 3.23. Transformations in R2 : translation
(1) The first example is

21 1 3
x = = .
23 1 5
The original vector (1, 1) is rotated and stretched (Fig. 3.24). Special
4 x’
3
1
x
0 0.5 1 1.5 2 2.5 3
Fig. 3.24. Transformations in R2 : rotation plus stretching
cases of this rotation plus stretching are rotation only (as discussed above)
or stretching only, e.g.

α0
x = x = αx .
0 α
(2) The second example

10 x1 x1
x = =
00 x2 0
is a projection onto the 1 - or x -axis (Fig. 3.25a). Every vector x with
the same x1 -component is transformed into the same vector x . A more
general transformation is

a0 x1 ax1
x = = .
00 x2 0
88 3 Linear Algebra
(a) (b)
x x
x
x
x’ x’
projection projection and stretching
Fig. 3.25. Transformations in R2
This is a projection onto the 1 -axis with additional stretching (Fig. 3.25b).
The two examples differ in the following aspects:

(1) Each vector x is associated with a unique image vector x and vice versa
in example (1). The transformation matrix is regular with the inverse

3/4 −1/4
A−1 = .
−1/2 1/2
The matrix equation x = Ax can be inverted in the form x = A−1 x .
(2) Many vectors x lead to the same image vector x in the example (2), the
case of a projection. The transformation matrix is singular. There exists
no inverse matrix of A and a resolution of the transformation in the form
x = A−1 x is not possible.
The two examples correspond to a first classification of transformations
A regular −→ rotation plus stretching

A singular −→ projection plus stretching .
A subclass of the regular transformations are the orthogonal transforma-
tions. These transformations are characterised by the statement
Scalar products are invariant with respect to orthogonal

transformations.
(Orthogonal transformations do not change scalar products.)
This implies that the length of vectors and the angles between vectors are
not changed. Orthogonal transformations are isometric and isogonal. The fol-
lowing argument is used for a characterisation of orthogonal transformations:
Begin with the vectors
x = Ax and y = Ay
and postulate the invariance of the scalar product
y T · x = y T · x .
Insert the transformation on the right hand side and obtain
y T · x = y T AT Ax .
The two sides agree with each other if the transformation matrix A satisfies
AT A = E
or
AT = A−1 .
The inverse of the transformation matrix of an orthogonal transformation
equals its transposed.
The matrix relation corresponds in R2 to three conditions which restrict
the form of the matrix

a11 a21 a11 a12 10
= ,
a12 a22 a21 a22 01
or explicitly
(1) a211 + a221 = 1
(2) a11 a12 + a21 a22 = 0
(3) a212 + a222 = 1 .
The following properties of the matrix can be extracted from these relations:
Equations (1) and (3) state that none of the four matrix elements can be
larger than 1 (|aik | ≤ 1 ). It is possible to choose one of the matrix elements
freely (observing the restriction) as there exist three conditions for four ma-
trix elements. The matrix elements are then determined up to a sign by the
conditions stated above.
Choose without loss of generality
a11 = cos α
and obtain from Eq. (1)
a21 = ± sin α .
Equation (2) then gives
a211 a212 = a221 a222 .
Substitute a212 from Eq. (3), resolve with respect to a222
a222 = a211 /(a211 + a221 ) = a211
and find
a22 = ± cos α .
From Eq. (3) follows finally
90 3 Linear Algebra
a12 = ± sin α .
Only two of the four combinations of signs are compatible with Eq. (2)
For a11 = +a22 follows a21 = −a12 .

For a11 = −a22 follows a21 = +a12 .
The final result can be summarised in the following statements:
(a) The matrix

cos α ∓ sin α
AD = ≡ D(±α) ,
± sin α cos α
which is obtained for the choice a11 = +a22 , describes rotations of a
vector x by the angle α in the anticlockwise direction for the matrix
element a12 = − sin α. The case a12 = + sin α corresponds then to a
rotation by the angle (−α) in an anticlockwise direction (or by an angle
α in the clockwise direction). An explicit calculation with a unit vector
in the ϕ -direction

cos ϕ cos(ϕ ± α)
D(±α) =
sin ϕ sin(ϕ ± α)
illustrates these statements. The notation D is standard for rotation ma-
trices.
(b) The matrix

cos α ± sin α
AS = ≡ S(±α) ,
± sin α − cos α
which is obtained for a11 = −a22 , represents reflections at straight lines
through the origin. A vector x is reflected at a straight line with the slope
m = ± tan α/2 for the matrix with a12 = ± sin α . The image of a vector
x , which points in the direction ϕ , is a vector pointing in the direction
(α − ϕ) for the reflection at the straight line with m = + tan α/2 .
The statement concerning the reflections can be checked with a simple cal-
culation

cos α sin α cos ϕ cos(α − ϕ)
x = = .
sin α − cos α sin ϕ sin(α − ϕ)
The unit vector x with the direction ϕ and the vector x with the direction
(α − ϕ) are mirror images with respect to the straight line with the slope
m = tan α/2 (Fig. 3.26a). The fact that the transformation represents a
reflection and not a rotation can be checked with the following argument:
set ϕ = α/2, and find

cos(α/2)
x = =x.
sin(α/2)
(a) (b)
α
α
x’ α/2
x’ α/2
ϕ
x ϕ x
ϕ ϕ
arbitrary vector x vector x in the reflecting straight line

Fig. 3.26. Transformations in R2 : reflections
Vectors in the reflecting straight line are not changed (Fig. 3.26b). Every
vector is transformed for a rotations about an angle α = 0 . Rotations and
reflections differ in addition in the following point: The relative orientation
of two vectors is preserved for rotations, it is interchanged for reflections
(Fig. 3.27). This is compatible with the postulation of the invariance of the
(a) (b)
y’
x’ x’
y’
x x
y y
for reflections for rotations
Fig. 3.27. Orthogonal transformations in R2 : The invariance of the scalar product
scalar product. The cosine function, which features in the definition of the
scalar product is an even function.
The corresponding discussion of linear transformations in R3 is more
involved although similar in spirit. Not all details will be presented for this
reason. A transformation
⎛ ⎞ ⎛ ⎞⎛ ⎞
x1 a11 a12 a13 x1
x = Ax −→ ⎝ x2 ⎠ = ⎝ a21 a22 a23 ⎠ ⎝ x2 ⎠
x3 a31 a32 a33 x3
can be classified again into
92 3 Linear Algebra
A singular −→ projections (onto straight lines and planes),

as e.g. a projection onto the x1 axis
⎛ ⎞⎛ ⎞ ⎛ ⎞
10 0 x1 x1
x = ⎝ 0 0 0 ⎠ ⎝ x2 ⎠ = ⎝ 0 ⎠ .
00 0 x3 0
A regular −→ rotation, stretching, reflection.
Only orthogonal transformations will be discussed in some detail. The
compact notation of the equations for these transformations is identical with
that of the case R2 so that the postulate of the invariance of the scalar
product can again be written as
AT A = E .
The details are different though

3
ali alk = δik (i, k = 1, 2, 3) .
l=1
There exist 6 conditions, which restrict the structure of the transformation

matrix as interchange of the indices i and k does not lead to a new condition.
The original transformation matrix contained 9 elements. Orthogonal trans-
formations in R3 are therefore characterised by (maximally) 3 parameters.
The orthogonal transformations are rotations (about an arbitrary axis
through the origin) and reflections (at planes and straight lines through the
origin) also in this case. A general form of the transformation matrix can be
found. As the argumentation is relatively lengthy (and not of interest in the
present context) the discussion will be restricted to a number of instructive
examples.
Rotations in R3 about the coordinate axis have a simple representation.
The matrix (note the extension of the notation)
⎛ ⎞
cos α − sin α 0
D3 (α) = ⎝ sin α cos α 0 ⎠
0 0 1
describes an anticlockwise rotation of a vector by the angle α about the 3 -axis
(active point of view). A direct calculation yields
⎛ ⎞
x1 cos α − x2 sin α
x = D3 (α)x = ⎝ x1 sin α + x2 cos α ⎠ .
x3
The 3 -component is (as is expected) not changed, the 1, 2 - components are
transformed in the same fashion as in R2 (Fig. 3.29). Rotations about the
other coordinate axes are represented by matrices with a corresponding struc-
ture. The interpretation depends, however, on the choice of a right-handed
or left handed coordinate system. The matrix
3
x x’
2
α
1
Fig. 3.28. Transformations in R3 : rotation about 3 -axis
1 β
1’
3
2
β
3’
Fig. 3.29. Transformations in R3 : projection of rotation about 2 -axis onto 1 - 3
plane
⎛ ⎞
cos β 0 − sin β
D2 (β) = ⎝ 0 1 0⎠ .
sin β 0 cos β
describes a clockwise rotation by the angle β about the 2 -axis for a right-
handed system (the standard choice).
The complication, that one encounters in the discussion of rotations in
R3 , can be pointed out by the following consideration. Compare the matrices
D23 (βα) = D2 (β)D3 (α)
and
D32 (αβ) = D3 (α)D2 (β) .
A given vector is first rotated by the angle α about the 3 -axis and then by the
angle β about the 2 -axis for D23 (Fig. 3.30). In the second case the rotations
are executed in reverse order (Fig. 3.31). A calculation of the matrices for
the two combinations of rotations yields
⎛ ⎞
cos α cos β − sin α cos β − sin β
D23 = ⎝ sin α cos α 0 ⎠
cos α sin β − sin α sin β cos β
94 3 Linear Algebra
(a) (b) (c)

3 3
3
2 2
1 2 1
1
starting position rotation about 3 -axis rotation about 2 -axis
Fig. 3.30. Consecutive rotations in R3 : sequence 3, 2
(a) (b) (c)

3 3
3
2
2 1 1
2
1
starting position rotation about 2 -axis rotation about 3 -axis
Fig. 3.31. Consecutive rotations in R3 : sequence 2, 3
⎛ ⎞
cos α cos β − sin α − cos α sin β
D32 = ⎝ sin α cos β cos α − sin α sin β ⎠ .
sin β 0 cos β
One finds D23 = D32 . Consecutive rotations about different axes can not be
interchanged.
The special case with α = β = π/2 can be considered for a direct numer-
ical illustration. The result is in this case
⎛ ⎞⎛ ⎞ ⎛ ⎞
0 0 −1 x1 −x3
x = D23 x = ⎝ 1 0 0 ⎠ ⎝ x2 ⎠ = ⎝ x1 ⎠
0 −1 0 x3 −x2
⎛ ⎞⎛ ⎞ ⎛ ⎞
0 −1 0 x1 −x2
x = D32 x = ⎝ 0 0 −1 ⎠ ⎝ x2 ⎠ = ⎝ −x3 ⎠ .
1 0 0 x3 x1
It is apparent that the transformed vectors are not the same. The fact that
rotations about different axes in R3 can not be interchanged, leads to a
number of complications in applications.
A general rotation in R3 has to be represented by three parameters. One
possibility is the use of two angles to fix the direction of the axis of rotation
and a third angle for the actual rotation. An alternative (and a standard)
choice is to use the three Euler angles. A general rotation is composed of
individual rotations by the Euler angles. The rotation axes used in this
case are: rotations about the x -axis, the x -axis (to complicate matters there
are variants which use the y -axis instead) and the z -axis (for details see
Chap 6.3.4). This choice is used for the discussion of the theory of tops (the
rotation of rigid bodies).
Reflections at coordinate planes in R3 are represented by simple matrices.
The example
⎛ ⎞⎛ ⎞ ⎛ ⎞
1 0 0 x1 x1
x = S12 x = ⎝ 0 1 0 ⎠ ⎝ x2 ⎠ = ⎝ x2 ⎠
0 0 −1 x3 −x3
describes a reflection of a vector at the 1 - 2 plane. The reflection of a vector
at planes, which are not coordinate planes, can be composed of two rotations
and a simple reflection, for instance with the steps
(1) Determine the rotation, which is necessary to rotate the plane into a
position, so that it coincides with a coordinate plane.
(2) Rotate the original vector in the same fashion.
(3) Reflect the rotated vector at the coordinate plane chosen.
(4) Rotate the reflected vector with the inverse transformation corresponding
to step (1).
These statements describe the active interpretation from a point of view of
the vector. A simple example is the reflection at a plane, which contains the
3 -axis and includes an angle α with the 1 -axis (see Fig. 3.32). The explicit
1 α
Fig. 3.32. Transformations in R3 : example for a reflection at a plane
steps are (step (1) is simple)

(2) Apply a rotation about the 3 -axis by the angle −α . The reflecting plane
is now at the position of the 1 - 3 plane. Alternatively, it can said that
the vector has now the same position with respect to the 1 - 3 plane as it
had originally with respect to the reflecting plane.
(3) Reflect the resulting vector at the 1 - 3 plane
(4) Rotate the reflected vector about the 3 -axis by the angle +α.
96 3 Linear Algebra
This transformation of the vector is described by

x = AS x = D3 (α)S13 D3 (−α)x .
The complete transformation matrix is
⎛ ⎞
cos 2α sin 2α 0
AS = ⎝ sin 2α − cos 2α 0 ⎠ .
0 0 1
It is easy to show that this matrix
(a) does not change a vector along the 3 -axis,
(b) transforms a vector along the 1 -axis into a vector which includes an angle
2α with this axis
⎛ ⎞ ⎛ ⎞
x1 x1 cos 2α
AS ⎝ 0 ⎠ = ⎝ x1 sin 2α ⎠ ,
0 0
(c) transforms a vector along the 2 -axis into a vector which includes an angle
2α − π/2 with the 1 -axis
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0 x2 sin 2α x2 cos(2α − π/2)
AS ⎝ x2 ⎠ = ⎝ −x2 cos 2α ⎠ = ⎝ x2 sin(2α − π/2) ⎠ ,
0 0 0
(d) it does not change a vector in the line of intersection of the 1 - 2 plane
and the reflecting plane
⎛ ⎞ ⎛ ⎞
r cos α r cos α
AS ⎝ r sin α ⎠ = ⎝ r sin α ⎠ .
0 0
The consecutive application of two reflections, as e.g. reflections at planes
through the 3 -axis which include the angles β and α with the 1 -axis, results
in
⎛ ⎞
cos 2(α − β) − sin 2(α − β) 0
AS (α, β) = AS (α)AS (β) = ⎝ sin 2(α − β) cos 2(α − β) 0 ⎠ .
0 0 1
The two reflections correspond to a rotation about the 3 -axis (the intersecting
line of the two planes) by the angle 2(α − β) (twice the angle between the
two planes). It can once more be ascertained that the order of the operations
can not be interchanged. The rotation, which follows from the sequence
AS (β, α) = AS (β)AS (α)
is a rotation in the reverse sense characterised by the angle D3 (2(β − α)) .
A last example for a transformation in R3 is the transformation
⎛ ⎞⎛ ⎞ ⎛ ⎞
−1 0 0 x1 −x1
x = ⎝ 0 −1 0 ⎠ ⎝ x2 ⎠ = ⎝ −x2 ⎠ .
0 0 −1 x3 −x3
This transformation describes a reflection at the origin. It plays a particular
role in quantum mechanics and the physics of elementary particles under the
name parity operation.
3.2.4 Determinants
A pragmatic approach to this subject can be based on the consideration of

systems of linear equations. A system
Ax = b
with two unknowns and the explicit form
a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2
has the solution
b1 a22 − b2 a12 b1 a11 − b2 a21
x1 = x2 = .
a11 a22 − a12 a21 a11 a22 − a12 a21
The question concerning the existence of a solution can then be answered
directly. The common denominator in the formulae for the solution has to be
different from zero. The denominator can be written as a determinant of the
2 × 2 matrix A

a a12
det(A) = |A| = 11 = a11 a22 − a12 a21 .
a21 a22
The first three expressions indicate the variation of nomenclature,the last
entry is the actual definition. The determinant associates a number with a
square matrix.
Square matrix −→ form determinant −→ number
The usefulness of this concept can already be illustrated for the simplest
case, the 2 × 2 determinant:
• The formula for the solution of the system of equations can be given com-
pletely in terms of determinants

b1 a12 a11 b1

b2 a22 a21 b2
x1 =
x2 = .

a11 a12 a11 a12
a21 a22 a21 a22
98 3 Linear Algebra
These expressions are known as Cramer’s rule10 . The rule states: the
determinant in the denominator is the determinant of the matrix A . The
determinant in the numerator for the unknown xi is obtained by replacing
the i-th column by the vector b .
• The determination of the inverse matrix involves also the solution of a
system of linear equations, for example for a 2 × 2 matrix

−1 c1 c2 a11 a12 1 0
A A=E → = .
c3 c4 a21 a22 0 1
The resolution of the system for the matrix elements c1 to c4
a11 c1 + a21 c2 =1 a11 c3 + a21 c4 =1
a12 c1 + a22 c2 =0 a12 c3 + a22 c4 =0
yields with Cramer’s rule

−1 1 a22 −a12
A = .
|A| −a21 a11
The inverse matrix contains the factor |A|−1 . It exists only if the deter-
minant |A| is different from zero .
The tasks to solve a linear system of equations (of n equations in n un-
knowns) and to find the inverse of a (square) matrix are identical. The
resolution of Ax = b is x = A−1 b .
A geometrical interpretation of these statements is the following: given
are the transformation represented by A and a vector b which results from
transforming the original vector x . The determination of the original vector
corresponds to the determination of the inverse matrix A−1 .
• Determinants can be used to classify transformations. Rotations and
stretching operations are characterised in R2 by det(ADS ) = 0, projec-
tions by det(AP ) = 0 . Concerning orthogonal transformations one finds
det(AD ) = 1 for rotations and det(AS ) = −1 for reflections. Corresponding
statements are valid in R3 (and in Rn ).
It is useful to look at 2 × 2 determinants from a different angle before dis-
cussing determinants of square n × n matrices. The columns of the determi-
nant (or the rows) are interpreted as vectors for this purpose

a a12
det(A) = 11 = det(a1 , a2 ) .
a21 a22
The vectors a1 , a2 are then complemented to form a vector in R3 (they still
lie in the 1 - 2 plane)
⎛ ⎞

a1i
ai = ⎝ a2i ⎠ i = 1, 2 .
0
10
A corresponding rule exists also in the case of n equations with n unknowns, see
below.
The vector product is

⎛ ⎞

0
(a1 × a2 ) = ⎝ 0 ⎠ .
a11 a22 − a12 a21
The third component of the vector product is the 2 × 2 determinant. The
product is a null vector (the determinant has the value zero) if one of the
vectors a1 , a2 is a null vector or if the two vectors point in the same or
the opposite direction. These are exactly the conditions which have been
discussed before under the heading ’linear independence’.
Two vectors are linearly dependent if the vector equation

c1 a1 , +c2 a2 = 0 can be satisfied with at least one ci = 0 .
Two vectors are linearly independent if the equation can only be

satisfied with c1 = c2 = 0 .
The vector product considered above vanishes if the vectors a1 , a2 are

linearly dependent. The corresponding statement for the 2 × 2 determinant
is:
The determinant det(a1 , a2 ) has the value zero if the column
vectors (the row vectors) are linearly dependent.
A (small) addition to this discussion is the remark: The determinants

of the transformation matrices for rotations and reflections are composed of
column vectors which are orthogonal and normalised to the length 1 . Their
sign is, however, different
det(AD ) = det(a1 , a2 ) = 1 and det(AS ) = det(a1 , −a2 ) = −1 .
The question of the definition of the determinant of higher dimensional
square matrices can be answered either by looking at the corresponding sys-
tem of linear equations or by a discussion of the properties of linearity. The
pragmatic approach is rather cumbersome for larger matrices but gives some
useful hints. The solution of a system of three linear equations for three
unknowns
Ax = b
explicitly
a11 x1 + a12 x2 + a13 x3 = b1
a21 x1 + a22 x2 + a23 x3 = b2
a31 x1 + a32 x2 + a33 x3 = b2
can (do the elementary but not exactly very brief calculation) be expressed
in the form (Cramer’s rule)
100 3 Linear Algebra

b1 a12 a13

b2 a22 a23

b3 a32 a33
x1 =
and similar expressions for x2 and x3 .
a11 a12 a13
a21 a22 a23

a31 a32 a33
The 3 × 3 determinants involved are e.g. defined by
det(A) = a11 a22 a33 + a12 a23 a31 + a13 a21 a32
−a13 a22 a31 − a11 a23 a32 − a12 a21 a33 .
The sequence of the indices is arranged according to the rule: the first indices
in each product are in natural order (123) . The second indices of the first
three products with the positive sign correspond to even permutations,11 of
the numbers (123)
(123), (231), (312) ,
the second indices of the last three products to odd permutations
(321), (132), (213) .
A more practical rule is the rule of Sarrus. The determinant is aug-
mented by repetition of the first two columns on the right hand side. The
positive contributions correspond to the products of elements in the diagonals
from left to right

a11 a12 a13 a11 a12

a21 a22 a23 a21 a22 ,

a31 a32 a33 a31 a32
the negative contributions to the products of elements from right to left

a11 a12 a13 a11 a12

a21 a22 a23 a21 a22 .

a31 a32 a33 a31 a32
Here is an example to check the rule:

1 2 3

The determinant 2 1 3 has the value 9 .
3 1 1
11
An even (odd) permutation is obtained by an even (odd) number of transpo-
sitions of two of the numbers starting from (123). Thus (231) is an even per-
mutation as (123) → (213) → (231), (321) is an odd permutation as only one
transposition (123) → (321) is required.
The solution of a system of equations in four unknowns can (after a rather

lengthy direct calculation) be summarised in Cramer’s rule with 4 × 4 deter-
minants

a11 · · · a14

.. ..
. . −→ 24 terms .

a41 · · · a44
A comparison of such results shows that the number of terms increases as

n! . This corresponds to the number of permutations of n figures. It is also
possible to find a general formula for a n × n determinant. The validity of
this formula

det(A) = sign(P) a1i1 a2i2 · · · anin

P
can be demonstrated by complete induction. The sum runs over all n! per-
mutations
(i1 , i2 , . . . , in ) of the numbers (1, . . . , n) .
The sign (expressed by sign(P)) is positive for even permutations, negative
for odd permutations. This formula does not necessarily represent a practical
way for the calculation of the value of a determinant. A task (not unusual)
as the calculation of the value of a 10 × 10 determinant would involve the
evaluation of 10! = 3628800 products with 10 factors followed by addition,
respectively subtraction.
A much more practical method for the solution of larger systems of linear
equations is the elimination technique. A system of equations of the form
a11 x1 + ··· ··· + a1n xn = b1
.. .. ..
. . .
an1 x1 + ··· ··· + ann xn = bn
can be converted into triangular shape by constructing suitable linear com-
binations of pairs of equations
ã11 x1 + ã12 x2 + ··· +ã1n xn = b̃1
ã22 x2 + ··· +ã2n xn = b̃2
.. .. .
. .
ãnn xn = b̃n
The result for the determinant of coefficients can now be read off directly
det(A) = det(Ã) = ã11 ã22 · · · ãnn .
The unknown can be determined consecutively starting with the last line.
A justification of the elimination technique and the assembly of further
rules for handling determinants can be obtained by a generalisation of the
alternative definition of a determinant indicated above for the case of a 2 × 2
determinant. This generalisation comprises the following points: Interpret the

columns (or rows) of a square n × n matrix as vectors in the space Rn
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
a11 · · · a1n a11 an1
⎜ .. ⎟ −→ a = ⎜ .. ⎟ , . . . , a = ⎜ .. ⎟ .
(A) = ⎝ ... . ⎠ 1 ⎝ . ⎠ n ⎝ . ⎠
an1 ··· ann an1 ann
The determinant of this matrix is defined by postulating the following prop-
erties:
1. The sum of two determinants, which differ in one column, has the same
value as the determinant which contains the sum of the two column vectors.

det(a1 , · · · , ak + ak , · · · , an ) = det(a1 , · · · , ak , · · · , an )

+det(a1 , · · · , ak , · · · , an ) .
The case n = 2 corresponds to the distributive law for the vector product.
2. The multiplication of a column vector with the number c corresponds to
a multiplication of the determinant by the factor c
det(a1 , · · · , c ak , · · · , an ) = c det(a1 , · · · , ak , · · · , an ) .
This is the associative law for the multiplication of the vector product with
a scalar in the case n = 2 .
The postulates 1. and 2. define the property of linearity of determinants.
A determinant is therefore called a multilinear form as the properties of
linearity are valid for each column vector. The two postulates do, however, not
define the value of a determinant uniquely. For this one needs the additional
postulates
3. The determinant has the value zero if the column vectors are linearly
dependent
det(a1 , · · · , an ) = 0
if the vector equation
c1 a1 + c2 a2 + · · · + cn an = 0
can be satisfied with at least one ck = 0 . This involves the possibilities
that one of the column vectors is a null vector or that one of the vectors
can berepresented by a linear combination of a subset of the other vectors
ak = i c̃ki ai .
4. The determinant formed with the basis vectors of the space Rn has the
value 1
det(E) = det(e1 , · · · , en ) = 1 .
The following properties of determinants can be proven on the basis of the
postulates 1 to 4:
• The value of a multilinear form, which satisfies these postulates, is identical

with the value of the pragmatic sum formula.
• A set of calculational rules for determinants can be established as for
instance the equivalence of using rows or columns
det(A) = det(AT )
or the rule for the exchange of two columns
det(a1 , · · · , ai , · · · , ak , · · · , an ) = −det(a1 , · · · , ak , · · · , ai , · · · , an ) .
• Expansion theorems for determinants are also of (practical) interest. The
possibility of an expansion with respect to the elements of a column (row)
can already be gleaned from the case of a 3 × 3 determinant. The explicit
result

a11 a12 a13

det(A) = a21 a22 a23 = a11 a22 a33 + a12 a23 a31 + a13 a21 a32
a31 a32 a33
−a13 a22 a31 − a11 a23 a32 − a12 a21 a33 .
can be arranged in the form
det(A) = a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 )
+a13 (a21 a32 − a22 a31 )
or in terms of 2 × 2 subdeterminants

a22 a23 a a23 a a22
det(A) = a11 − a12 21 + a13 21 .
a32 a33 a31 a33 a31 a32
The expansion in terms of the elements of the first row can be recognised
clearly. The discussion of the general case could start from the formula

det(A) = sign(P) a1i1 a2i2 · · · anin .

P
The expansion with respect to the k -th row is12

n

det(A) = (−1)(i−k) aki det(A)(ki) .

i=1
The sum runs over all the elements of the row. The factor is a determinant
from which the k -th row and the i -th column have been removed

det(A)(ki) = sign(P ) a1i1 a2i2 · · · anin

P =(k,i)
12
Consult the literature for details of the proof as well as for expansions in terms
of subdeterminants instead of single rows and columns.

a11 ··· a1(i−1) a1(i+1) ··· a1n

.. .. .. ..
. . . .

a · · · a(k−1)(i−1) a((k−1)(i+1) · · · a(k−1)n
(k−1)1
= .
a(k+1)1 · · · a(k+1)(i−1) a((k+1)(i+1) · · · a(k+1)n

.. .. .. ..
. . . .

an1 ··· an(i−1) an(i+1) · · · ann
The sign in the expansion is due to the difference of the number of trans-
positions of the permutation P with respect to the permutation P’.
• The justification of the elimination technique is based on the properties of
the multilinear form. A linear combination of two rows leads for instance
to the statement
det(b1 , · · · , bi + ck bk , · · · , bn )
= det(b1 , · · · , bi , · · · , bn ) + ck det(b1 , · · · , bk , · · · , bk , · · · , bn )
= det(b1 , · · · , bi , · · · , bn ) .
Operations with rows or columns of this kind do not change the value of a
determinant.
Determinants also play a role in the discussion of the algebraic eigen-
value problem. This problem can be formulated from a geometric point of
view in the following fashion. Specify a regular transformation in R3 (a brief
remark concerning Rn follows) which is represented by the 3 × 3 matrix A .
The question can be posed whether there exist some vectors x which are, up
to a factor λ, are transformed into themselves? An alternative formulation of
this question is: can the factors λ and the vectors x be determined from the
equation
A x = λx ?
In order to answer this question two statements concerning the solution of
systems of linear equations are needed.
• Use Cramer’s rule for the system of n inhomogeneous equations with n
unknowns C x = b and infer the statement
The inhomogeneous system of equations possesses a solution if the
determinant of the matrix C does not vanish.
• The homogeneous system of equations C x = 0 possesses only the trivial
solution x = 0 if the determinant of the matrix C does not vanish. One
of the column vectors in each of the determinants of the numerator is a
null vector. If, however, the determinant of the matrix C does vanish, the
situation is different. Cramer’s rule does not lead to a well defined solution,
as it gives xi → 0/0 . This leads to the statement:
The homogeneous system has only a nontrivial solution x = 0 if

the determinant of the matrix C vanishes.
A vanishing determinant indicates that there exists a linear dependence
between the column (row) vectors of the matrix C . One (or more) of
the equations of the system are linear combinations of the others. A so-
lution with one parameter is obtained if only one of the equations is a
linear combination of the others. One of the unknowns can then be chosen
(arbitrarily). The remaining system of equations with the dimension n − 1
C̃(n−1,n−1) x(n−1),1 = b̃(xn )(n−1),1
possesses then a unique solution as determinant of this system det(C̃) will
not vanish.
The transformation equation given above can be interpreted as a homo-
geneous linear system of equations (using the unit matrix E)
(A − λE)x = 0 .
There exists a nontrivial solution if the condition
det(A − λE) = 0
is satisfied. Evaluation of the determinant in the case of R3 that has been
indicated above leads to a cubic equation for the factors λ . Inspection of the
coefficients of the cubic equation shows that real solutions for λ occur if the
matrix A is symmetric. For each of the three real solutions (the eigenvalues)
there exits a linear system of equations
(A − λi E)xi = 0 i = 1, 2, 3 ,
with nontrivial solutions (eigenvectors). The solution that can be chosen
freely is normally fixed by the normalisation condition

3
x2i,k = 1 .
k=1
Corresponding statements are valid for Rn . The eigenvalues are then de-
termined by an equation of degree n, the eigenvector has n components and
in the normalisation condition the upper limit 3 is replaced by n . Further in-
formation on the algebraic eigenvalue problem is found in Vol. 3 in connection
with the matrix formulation of quantum mechanics.
The section will be concluded by stating once more the cases for which
the concept of a determinant has been used in anticipation of this chapter.
• The 2 × 2 Wronski determinant (Math.Chap. 2.2.2)

x1 (t) x2 (t)

W (x1 (t), x2 (t)) = .
ẋ1 (t) ẋ2 (t)
• The spat product (Math.Chap 3.1.2)

a1 b1 c1

(a b c) = a · (b × c) = det(a, b, c) = a2 b2 c2 .
a3 b3 c3
• The rule for the vector product (Math.Chap 3.1.2)

e1 e2 e3

(a × b) = a1 a2 a3
b1 b2 b3
is not really a determinant. It is, however applied in connection with the
rule of Sarrus or similar rules, even if its elements are vectors rather than
numbers.
4 Analysis II: Functions of several variables
Functions of several variables are an important tool of theoretical physics.

The concepts and techniques, which are necessary for their discussion, are
related to those of functions of one variable. On the other hand new aspects
originate due to the higher dimensionality of the ’representation space’. For
instance, a point in a one-dimensional world can be approached from exactly
two directions, a point in three-dimensional space from an arbitrary number
of directions. This affects for instance the discussion of continuity. Derivatives
can be defined with respect to each of the independent variables and two- or
multi-dimensional domains replace the one-dimensional interval of ordinary
integration.
4.1 Functions
Our perception of space is restricted to three dimensions. This means that a

certain ability to abstract is needed for the discussion of functions of several
variables. This ability is, however, not reqired in the simplest case, functions
of two independent variables.
4.1.1 Functions of two independent variables
An explicit function of two independent variables is written in the form

z = f (x, y) . The three variables x, y, z are interpreted as Cartesian coordi-
nates in a three-dimensional space. The domain of definition is normally an
area of the x - y plane. The z -values constitute the co-domain (Fig. 4.1). The
points (x, y, f (x, y)) represent a spatial surface over the domain of definition.
A few examples indicate the possibilities:
• The domain of definition of the function z = [1−x2 −y 2 ]1/2 is the interior of
the unit circle in the x - y plane, including the circumference x2 + y 2 ≤ 1 .
The co-domain of the function is the interval 0 ≤ z ≤ 1 . The function
represents a hemisphere over the unit circle in the x - y plane (Fig. 4.2a).
• The function z = x2 + y 2 is defined over the complete x - y plane, the co-
domain is determined by z ≥ 0 . The function describes a paraboloid of
revolution which is open at the top (Fig. 4.2b).
108 4 Analysis II: Functions of several variables
z
x
Fig. 4.1. Representation of an explicit function two independent variables
• The linear equation z = ax + by is the equation of a plane in space, which

runs through the origin. The domain of definition is the complete x - y
plane, the co-domain is given by −∞ ≤ z ≤ ∞ (Fig. 4.2c).
(a) (b) (c)

z
z
z
y
y y
x
x x
hemisphere paraboloid of revolution plane in space
Fig. 4.2. Examples for functions of two variables
A concise graphical representation of such surfaces is the representation

by topographical contour lines1 . The curves, which are obtained by the in-
tersection of the surface f (x, y) with the planes z = const., are projected
onto the x - y plane. An example are the contour lines of the hemisphere
described above. The lines are concentric circles about the origin with the
radius R = [1 − zfixed
2
]1/2 . A good choice for the actual representation are
equidistant z-values (as used for maps (Fig. 4.3)). The contour lines of the
√
paraboloid are concentric circles with the radii R = zfixed , they are the
straight lines ax + by = zfixed for the third example.
A more complicated example is the function
(x2 − y 2 )
z= .
(x2 + y 2 )
1
The common usage of this term is contour line, topographic line or, more infor-
mally, topo line.
4.1 Functions 109
y
Z
Z=1
Z=3/4 x
Z=1/2
Z=1/4
Z=0
intersections contour lines

Fig. 4.3. Representation of a hemisphere by projection
The domain of definition is the complete x - y plane without the origin. This
function in the form of rational fraction is not defined for x = y = 0 . It
is more convenient to use polar coordinates in the x - y plane instead of the
Cartesian coordinates for the discussion of this function
r2 (cos2 ϕ − sin2 ϕ)
z = f (r cos ϕ, r sin ϕ) = = cos 2ϕ .
r2 (cos2 ϕ + sin2 ϕ)
(a) (b)
y Z= –1
1.5 Z= 0
1
0.5
0 Z= 1/ √-2
–0.5
1.5 Z= 1
–1
–1.5
1 x
0.5
–1.5
–1 0y
–0.5 –0.5
0
x 0.5 –1 Z= 0
1
1.5 –1.5
representation of the surface contour lines

Fig. 4.4. The function z = (x2 − y 2 )/(x2 + y 2 )
One recognizes easily that

• the co-domain of the function is the interval −1 ≤ z ≤ 1 and that
• the contour lines are the straight lines ϕ = const.
The representation of the contour lines indicates that the function can take
any value between −1 and 1 in the vicinity of the origin (Fig. 4.4b). It is not
easy to draw the surface for this reason (Fig. 4.4a). It varies from +1 to -1
(and vice versa) within each quadrant while omitting the origin.
A definite z -value is assigned to each point of a domain in the x - y plane

if the explicit form of such functions is used. A generalisation is possible via
the use of implicit functions (here still of two variables)
F (x, y, z) = 0 .
Examples for this form are:
• Hemispheres under and above the x - z plane are characterised by the ex-
plicit functions
z = [R2 − x2 − y 2 ]1/2
z = −[R2 − x2 − y 2 ]1/2 .
A complete sphere (Fig. 4.5a) can be represented as

(z − R2 − x2 − y 2 )(z + R2 − x2 − y 2 ) = 0 .
This equation describes points of the lower and upper hemisphere. Multi-
plication of the two factors yields the implicit function
x2 + y 2 + z 2 − R2 = 0 .
Implicit functions allow the representation of closed surfaces (with multiple
values for each point of the domain of definition) in a compact fashion.
• The implicit function
x4 + 2x2 y 2 + 2x2 z 2 + y 4 + 2y 2 z 2 + z 4 − 5(x2 + y 2 + z 2 ) + 4 = 0
seems to be rather complicated. It only takes a moment to find out that
the expression can be factorised
(x2 + y 2 + z 2 − 1)(x2 + y 2 + z 2 − 4) = 0 .
The ’surface’ consists of two concentric spheres with the radii 1 and 2
(Fig. 4.5b). The surface is characterised by two or four values for each
point of the domain of definition.
• The implicit equation for a plane in space
Ax + By + Cz + D = 0
can be written in an explicit form, e.g. for C = 0 as
A B D
z=− x− y− .
C C C
The specification reduces to
x = −D/A
if e.g. B = C = 0. This equation describes also a plane, namely a plane
through the point −D/A on the x -axis which is parallel to the y - z plane.
4.1 Functions 111
The role of the dependent and independent variables can be interchanged

for an implicit form of a function. For instance, the equation
x − y2 − z2 = 0
should not necessarily be resolved with respect to z
z = ±[x − y 2 ]1/2 ,
but rather in the form
x = y2 + z2 .
This characterises a paraboloid of revolution with respect to the x -axis
(Fig. 4.5c).
(a) (b) (c)

z z z
y y y
x
x
x
sphere concentric spheres paraboloid about x -axis

(represented for z ≥ 0)
Fig. 4.5. Implicit functions
4.1.2 Functions of three or more independent variables
Visualisation is already difficult for functions of three independent variables

explicit u = f (x, y, z)
implicit F (x, y, z, u) = 0 .
The domain of definition is normally a region (a volume) of three-dimensional
space, for instance the interior and the surface of a sphere about a given point
(x − x0 )2 + (y − y0 )2 + (z − z0 )2 ≤ R2 .
A fourth dimension would be needed for the representation of such func-
tions. It represents (in analogy to the case of two independent variables) a
three-dimensional manifold embedded in a four-dimensional space. The offi-
cial language is: a hypersurface in R4 . There exist two possibilities to arrive
at a visualisation of such surfaces:
• Imagine that a value of the function is attached to each point of the domain
of definition. Such a construct could, for instance, represent the distribution
of temperature in space
T = T (x, y, z) .
This is also an example for a scalar field (see Math.Chap. 5.1).
• A projection from 4 to 3 dimensions is also feasible. The implicit function
f (x, y, z) = const.
represents a surface in the three-dimensional space. The contour lines, dis-
cussed above, have to be replaced by families of surfaces. This possibility
is rarely employed.
No useful possibility of visualisation exists for the case of more than three
independent variables
z = f (x1 , x2 , . . . , xn ) (in the explicit form) .
The domain of definition is a region of n-dimensional space. The function
itself can be characterised as a n-dimensional hypersurface in a (n + 1) di-
mensional space. The difference with respect to the simpler cases is, excepting
the lack of visualisation, not that large. All necessary geometric properties (as
points, distances between points, etc.) can be defined in higher dimensional
Euclidian spaces (see Math.Chap 3.1.3).
The topics ’limiting values’ and ’differentiation’ are, as for the case of
functions of one variable, of particular interest for functions of several vari-
ables as well.
4.2 Limiting values and differentiation

The discussion of limiting values can be restricted to a few remarks. The
concepts and the discussion for the case of one variable can be carried over
more or less directly.
4.2.1 Limiting values
A definition of a limiting value for functions of one variable is:
A function f (x) possess the limiting value A at the point x0 if the value
lim f (xν ) = A
ν→∞
is obtained for every sequence {xν } with

lim xν = x0
ν→∞
within the domain of definition of the function.

4.2 Limiting values and differentiation 113
This definition is (as has been remarked before) not very practical as the
postulate is for every sequence. It can, however, be readily carried over to the
case of several variables. The term ’sequence (of points on the number ray)’
can be read strictly as ’sequence (of points in space)’.
The domain of definition of a function of two variables is a region of the
x - y plane. A point P0 in a plane (or in space) can be approached from an
arbitrary number of directions. Sequences of points
P1 = (x11 , . . . , x1n ), P2 = (x21 , . . . , x2n ), . . . , P∞ = P0 ,
which approach the limiting point P0 from a chosen direction can therefore
be defined for functions of n (≥ 2) variables. The transcription of the criterion
for limiting values can then be noted as:
The function f (x1 , . . . , xn ) possess the limiting value A at the

point P0 if the sequence of functions f (Pν ) = f (xν1 , . . . , xνn )
converges to the value A for every sequence of points {Pν } , which
lie all in the domain of definition of the function and have the
limiting value
lim Pν = P0 .
ν→∞
Two examples can serve as a comment on this criterion: the domain of

definition of the function (discussed beforehand)
x2 − y 2
z = f (x, y) =
x2 + y 2
is the x - y plane without the point (0, 0) . Consider now the following se-
quences of points with the limiting value (0, 0):
• The limiting value 1 is obtained for each sequence along the x -axis
x2
lim f (x, 0) = lim =1.
x→0 x→0 x2
• The limiting value −1 is obtained for each sequence along the y -axis
2
y
lim f (0, y) = lim − 2 = −1 .
y→0 x→0 y
Different sequences lead to different values. A limiting value at the point
(0, 0) does not exist.
The domain of definition of the function
x2 y 2
z = f (x, y) = 2
x + y2
is also the x - y plane without the point (0, 0) . It is again useful to go over to
polar coordinates in order to discuss the limiting value at this position
z = r2 sin2 ϕ cos2 ϕ .
Every sequence of points with the limiting value (0, 0) can be characterised
by r → 0 . One finds therefore
lim r2 sin2 ϕ cos2 ϕ = 0 .
r→0
The limiting value (zero) exists at the point (0, 0) .

All criteria for the case of functions of one variable, which involve lim-
iting values (as the Cauchy criterion, etc.), can be transcribed to the case
of functions of several variables in view of the formal similarity of the basic
definition. This discussion will, however, be restricted to a remark on the
concept of continuity.
A functionf (x1 , . . . , xn ) is continuous at the position

P0 = (x10 , . . . , xn0 ) if the value f (P0 ) exists (the func-
tion is defined at the position P0 ) and if f (x1 , . . . , xn )
possess a limiting value A at the position P0 which
agrees with the value of the function.
The function (shown in Fig. 4.4)

x2 − y 2
f (x, y) =
x2 + y 2
can be complemented at the position (0, 0) to form a continuous function in
the complete x - y plane according to this definition. This is not possible for
the function
x2 y 2
f (x, y) = 2 .
x + y2
The topic of major interest is, however, the investigation of limiting values
which lead to the differentiation of functions of several variables.
4.2.2 Differentiation
This section addresses the concept of a partial derivative, to begin with for
the case of two variables. The definition of the partial derivatives is in this
case:
A function z = f (x, y) , which is defined in a region D, possesses
a partial derivative with respect to x or with respect to y in the
point (x, y) ∈ D if the limiting value

∂f (x, y) f (x + h, y) − f (x, y)
= fx (x, y) = lim
∂x h→0 h
respectively the limiting value

∂f (x, y) f (x, y + k) − f (x, y)
= fy (x, y) = lim
∂y k→0 k
exists.
The definition indicates that partial differentiation does not require any
new technical skills. The function is differentiated with respect to one of the
variables while the other variable is kept constant.
A brief list of examples is sufficient to explain the technique
f (x, y) = xy fx = y fy = x
1 2x 2y
f (x, y) = fx = − fy = −
(x2 + y 2 ) (x2 + y 2 )2 (x2 + y 2 )2
f (x, y) = ex (x2 − y 5 ) fx = ex (x2 − y 5 + 2x) fy = ex (−5y 4 ) .
The geometrical interpretation of the partial derivatives is not difficult. The
intersection of the surface f (x, y) and a plane y = const., parallel to the x - z
plane, is the curve used for the definition of the partial derivative with respect
to x (Fig. 4.6a). This derivative characterises the slope of the intersecting
curve in a point P , or expressed differently, the rise of the surface in this
point in the x -direction. The slope of the tangent lines in different points of
the intersecting curve is itself a function of x and y . The partial derivative
with respect to y describes the slope of the surface in the y -direction in a
corresponding fashion (Fig. 4.6b).
(a) (b)
z
z
y y
x x
partial derivative with respect to x partial derivative with respect to y
Fig. 4.6. The partial derivatives of f (x, y)
Partial derivatives of higher order can also be discussed. There exist four
partial derivatives of second order for the case of functions of two variables

∂ ∂f ∂2f
fxx = =
∂x ∂x ∂x2

∂ ∂f ∂2f
fxy = = (differentiate first with respect to y)
∂x ∂y ∂x∂y

∂ ∂f ∂2f
fyx = = (differentiate first with respect to x)
∂y ∂x ∂y∂x

∂ ∂f ∂2f
fyy = = .
∂y ∂y ∂y 2
The notation for the sequence of the indices is not standardised in the liter-
ature. The notation used here is: the derivative with respect to the variable
indicated on the right hand side is executed first (shorthand – left column,
standard notation – right hand column). The second derivatives for the ex-
amples given are
f (x, y) = xy
fxx = 0 fxy = fyx = 1 fyy = 0
f (x, y) = [x2 + y 2 ]−1/2

2 8x2
fxx = − + 2
(x2 2
+y )2 (x + y 2 )3
8xy 2 8y 2
fxy = fyx = 2 2 3
fyy = − 2 2 2
+ 2
(x + y ) (x + y ) (x + y 2 )3
f (x, y) = ex (x2 − y 5 )
fxx = ex (x2 − y 5 + 4x + 2) fxy = fyx = −5y 4 ex
fyy = −20y 3 ex .
It can be noticed that the mixed partial derivatives of second order agree.
This raises the question, whether this is always the case or which conditions
have to be satisfied for this to happen. An answer will be given shortly.
There exist eight derivatives of third order for the case of functions of two
variables
fxxx , fxxy , fxyx , fyxx , fxyy , fyxy , fyyx , fyyy .
The number of possible derivatives grows with the order k as 2k .
The definition of the partial derivative can be extended to the case of
functions of n variables. There exist n partial derivatives of first order with
the definition of the corresponding limiting values (i = 1, 2, . . . , n )
∂f (x1 , x2 , . . . , xi , . . . , xn )
=
∂xi
)
f (x1 , . . . , xi + hi , . . . , xn ) − f (x1 , . . . , xi , . . . , xn )
fxi = lim .
hi →0 hi
The definition implies once again the actual technique: evaluate the derivative
with respect to one of the variables while treating the other variables as
constant. There exist n2 partial derivatives of second order

∂ ∂f
= fxi ,xk i, k = 1, 2, . . . , n ,
∂xi ∂xk
n3 derivatives of third order, etc.
The following example of a function of three variables is often used in
theoretical physics. The function
1 1
f (x, y, z) = =
[x2 + y2 2
+z ]1/2 r
describes the inverse distance of a point from the origin (with a simple form
in spherical coordinates). The three partial derivatives of first order with
respect to the Cartesian coordinates are
1 2x x
fx = − =− 3
2 [x2 + y 2 + z 2 ]3/2 r
1 2y y
fy = − 2 2 2 3/2
=− 3
2 [x + y + z ] r
1 2z z
fz = − 2 2 2 3/2
=− 3 .
2 [x + y + z ] r
There exist nine derivatives of second order
1 3x2 1 3y 2
fxx = − 3
+ 5 fyy = − 3
+ 5
r r r r
3xy 3yz
fxy = fyx = 5 fzy = fyz = 5
r r
3xz 1 3z 2
fxz = fzx = fzz = − + .
r5 r3 r5
The mixed partial derivatives of second order are again independent of the
sequence of the differentiation. The sum of the double derivatives with respect
to the three coordinates is
3 3(x2 + y 2 + z 2 )
fxx + fyy + fzz = − 3
+ =0.
r r5
This statement can be interpreted differently. The function f (x, y, z) = 1/r
is determined by a differential equation
∂2f ∂2f ∂2f
+ + =0.
∂x2 ∂y 2 ∂z 2
This is a partial differential equation. The function 1/r is a particular solu-

tion (not a general solution) of this differential equation2 . The combination
of partial derivatives indicated above is used in many branches of theoret-
ical physics (electrodynamics, quantum mechanics). It is usually quoted in
operator form
2
∂ ∂2 ∂2
Δf (x, y, z) = + 2 + 2 f (x, y, z) .
∂x2 ∂y ∂z
The differential operator acting on the function f (x, y, z)
∂2 ∂2 ∂2
Δ= + + = ∂x2 + ∂y2 + ∂z2
∂x2 ∂y 2 ∂z 2
is known as the Laplace operator. The differential equation
Δf (x, y, z) = 0
is Laplace’s differential equation.
There does not seem to be a substantial difference between ordinary and
partial differentiation. The following remarks show that this is not quite the
case.
(a) A function of two variables is defined over the domain
x2 + y 2 ≤ R2 x, y ≥ 0
x2 y2
+ ≥1 x, y ≥ 0 b<R
R2 b2
illustrated in Fig. 4.7. The partial derivatives do not exist in the tip of
R x
Fig. 4.7. Partial derivatives: a problematic domain
the domain on the x - axis, even if the function defined over the domain is
reasonable. The surrounding area does not belong to the domain of defini-
tion so that a differential quotient can not be defined. Peculiar situations
can occur if the domain of definition is an area.
2
Partial differential equations are discussed in detail in Vol. 2.
(b) It can be demonstrated that the existence of the derivative f (x) at the
point x implies continuity of the function of one variable f (x) at this
point. A corresponding statement is not possible for functions of several
variables. Any point is approached from two directions in the construc-
tion of the partial derivatives. This does not imply anything about the ap-
proach from an arbitrary direction. Such a statement would be necessary
for the transcription of the statement at the beginning of this paragraph.
The relation between differentiability and continuity is, as a consequence,
more involved in the case of functions of several variables. This point is
addressed in the section on directional derivatives (Math.Chap. 4.2.3).
The fact, that mixed partial derivatives are independent of the sequence in
which the derivatives are executed, has been found in all examples discussed
so far. This feature is explained by the theorem of Schwarz. This theorem,
formulated as a sufficient condition, states:
The mixed derivatives of k-th order are independent of the sequence in
which the derivatives are executed if all the derivatives in k-th order are
continuous.
There exist more rigorous variants of this theorem. Neither the variants
nor the proof of the soft version will be given here. One consequence of the
theorem is, however, worth a remark. The number of independent derivatives
of n-th order is reduced from n2 to n(n + 1)/2 if the theorem holds.
The fact that derivatives of functions of several variables are defined over
a region opens the possibility to consider further types of derivatives. The
most important for theoretical physics are the directional derivative and the
gradient.
4.2.3 Directional derivatives and gradient
The partial derivatives fx and fy of a function of two variables have been

interpreted as the slope of the surface in the x- or the y- direction (Fig. 4.8a).
This interpretation suggests the question: how can the slope with respect
to an arbitrary direction be calculated? This question is answered in the
following way: consider a plane through a point of the x - y plane which is
perpendicular to this plane. The angle between the intersecting line of this
plane with the x - y plane and the x -axis is the angle α . The perpendicular
plane intersects a surface f (x, y) in a curve K . The slope of the tangent
line on K in the point P is called the directional derivative (Fig. 4.8b). It
represents the slope of the surface in the point P in the direction α. The
definition of this quantity is
)
(α) f (x + ρ cos α, y + ρ sin α) − f (x, y)
D f (x, y) = lim .
ρ→0 ρ
The point P = (x, y) is approached along the straight line in the x - y plane.
It can be shown that this limiting value exists if
(a) (b)
z
z
fx
fy
α y
y
x
x P
the partial derivatives of defining the directional derivative
f (x, y) in a point P
Fig. 4.8. Directional derivatives
• the partial derivatives fx and fy exist,

• the partial derivatives are also continuous in the point P .
The proof proceeds in the following fashion: augment the numerator by ad-
dition and subtraction
{f (x + ρ cos α, y + ρ sin α) − f (x, y)} =
{f (x + ρ cos α, y + ρ sin α) − f (x + ρ cos α, y)}
+ {f (x + ρ cos α, y) − f (x, y)} .
Apply the mean value theorem of ordinary analysis for the appropriate vari-
able in the two expressions in brackets, e.g.
f (x + h) − f (x) = hf (x + ch) 0≤c≤1.
Use the assumption, that the partial derivatives exist, to obtain
= ρ sin α fy (x + ρ cos α, y + ρc sin α) + ρ cos α fx (x + ρc cos α, y) .
The assumption, that they are continuous, allows then, after division by ρ,
the evaluation of the limiting value
D(α) f (x, y)
= lim {cos α fx (x + ρc cos α, y) + sin α fy (x + ρ cos α, y + ρc sin α)}
ρ→0
= cos α fx (x, y) + sin α fy (x, y)
The result is only valid if the partial derivatives which feature in this expres-
sion are continuous functions of x and y . The standard partial derivatives
can be recovered as
D(0) f (x, y) = fx (x, y) D(π/2) f (x, y) = fy (x, y) .
This argument can be extended to the case of more than two variables. A
characterisation of a direction in space is required in the case of a function
of three variables f (x, y, z) . A suitable unit vector eα with the following

projections on the coordinate directions
(eα · ex ) = cos αx (eα · ey ) = cos αy (eα · ez ) = cos αz
can be used for this purpose. The angles αi are the angles between the vector
eα and the coordinate directions. The three scalar products are known as the
directional cosines (in the respective direction). The properties

3
eα = cos αi ei and eα · eα = 1
i=1
yield the relation

3
(cos αi )2 = 1 .
i=1
This shows that only two angles are required for the specification of a direc-
tion in R3 . The third angle is then determined (uniquely).
The derivative of the function f (x, y, z) in the direction given by eα is
defined as
D(αx αy αz ) f (x, y, z)
)
1
= lim (f (x + ρ cos αx , y + ρ cos αy , z + ρ cos αz ) − f (x, y, z)) .
ρ→0 ρ
This yields under the same assumptions as in the case of two variables the
limiting value
= cos αx fx (x, y, z) + cos αy fy (x, y, z) + cos αz fz (x, y, z) .
This limiting value describes the slope of the function f (x, y, z) in the point
P = (x, y, z) in the direction characterised by eα .
A direction in an n-dimensional Euclidian space is characterised by a unit
vector

n
eα = (cos αi )ei .
i=1
The values of the n directional cosines describe (for an orthonormal basis) the
projections of the vector eα onto the coordinate axes (see Math.Chap. 3.1.3)
cos αi = (eα · ei )
with the restriction

(cos αi )2 = 1 .
i
An arbitrary direction in an n-dimensional space is fixed by (n−1) quantities

(angles).
The directional derivative of a function z = f (xi , . . . , xn ) is then given

(under the assumption that all partial derivatives of first order are continuous
in the point P = (x1 , . . . , xn )) by
D(α1 ...αn ) f (x1 , . . . , xn )
)
1
= lim (f (x1 + ρ cos α1 , . . . , xn + ρ cos αn ) − f (x1 , . . . , xn ))
ρ→0 ρ

n
= (cos αi ) fxi (x1 , . . . , xn ) .
i=1
The result for the case n = 2 can be recovered with α1 + α2 = π/2 and
cos α2 = cos(π/2 − α1 ) = sin α1 .
It is very useful to express the directional derivative in terms of the gra-
dient operator ∇ which is defined as
n

∂
∇= ei .
i=1
∂xi
The short hand version in terms of components is

∂ ∂ ∂
∇= , , ..., .
∂x1 ∂x2 ∂xn
The gradient operator is a differential operator with a vector character. An
alternative notation is
∇ = grad .
The application of this operator to a function f (x1 , . . . , xn ) , that is a
scalar function, yields a vector function (a function with n components, see
Math.Chap. 5.1)
n

∇f (x1 , . . . , xn ) = ei fxi (x1 , . . . , xn ) ,

i=1
in shorthand
∇f (x1 , . . . , xn ) = (fx1 (x1 , . . . , xn ), . . . , fxn (x1 , . . . , xn )) .
The components of the vector function are the n partial derivatives of the
scalar function f . The directional derivative can be expressed with the aid
of the gradient in the form
D(α1 ...αn ) f (x1 , . . . , xn ) = eα · ∇f (x1 , . . . , xn )

n
= (eα · ei )fxi (x1 , . . . , xn )
i=1
n

= (cos αi )fxi (x1 , . . . , xn ) .

i=1
The directional derivative corresponds to the scalar product of the gradient of

f with the given directional vector eα . This demonstrates that the directional
derivative is a scalar quantity.
The properties of the gradient operator can be illustrated in terms of the
following examples.
• The domain of definition of the first example is the x1 - x2 plane3 . The
function
f (x1 , x2 ) = x21 + x22
represents a paraboloid of revolution in R3 (Fig. 4.9a). The contour lines
of this function are concentric circles about the origin. The gradient vector
of this function is
gradf = ∇f = 2x1 e1 + 2x2 e2 .
This can be expressed in terms of planar polar coordinates as
∇f = 2r(cos ϕe1 + sin ϕe2 ) = 2rer .
The gradient vector points in the radial direction, the direction of the
steepest slope of the function under consideration. The directional deriva-
(a) (b)
z
y
x
y
x
illustration in R3 illustration of the projection
Fig. 4.9. The gradient vector for a paraboloid of revolution
tive (the rise of the function in an arbitrary direction α) is in this example

D(α) f (x1 , x2 ) = eα · ∇f (x1 , x2 )
= 2r {cos ϕ cos α + sin ϕ sin α}
= 2r cos(ϕ − α) .
The directional derivative is maximal for α = ϕ, it vanishes for α = ϕ±π/2,
that is for a direction tangential to the contour lines (Fig. 4.10).
3
Use x1 and x2 instead of x and y .
x2 eα
ϕ
x1
Fig. 4.10. Directional derivative and gradient
• The situation in R2 can be discussed in general terms. The directional

derivative (Fig. 4.11) of a function z = f (x1 , x2 ) in a point P = (x10 , x20 )
is
D(α) f (x10 , x20 ) = cos α fx1 (x10 , x20 ) + sin α fx2 (x10 , x20 ) .
x2
eβ eγ
eα
x 20
x1
x 10
Fig. 4.11. Interpreting the gradient operator
The direction of a tangent line on a contour is characterised by

D(β) f (x10 , x20 ) = cos βfx1 (x10 , x20 ) + sin βfx2 (x10 , x20 ) = 0 .
There is no increase of the function in this direction. The direction of the
tangent line is therefore given by
sin β fx (x10 , x20 )
tan β = =− 1
cos β fx2 (x10 , x20 )
or by
tan β fx1
sin β = = − 1/2
2
1 + tan β fx21 + fx22
1 fx2
cos β = = 1/2 .
1 + tan2 β fx21 + fx22
The directional derivative in the direction of the tangent line can also be
calculated by
eβ · ∇f = (cos β e1 + sin β e2 ) · (fx1 e1 + fx2 e2 ) = 0 .
The vector ∇f (indicated by eγ in Fig. 4.11) is perpendicular to the tan-
gent line on a contour.
The square of the magnitude of the gradient vector is |∇f |2 = fx21 + fx22 .
This quantity can be compared with the square of the magnitude of the
directional derivative in the direction α
2
2
D(α) f = (cos α fx1 + sin α fx2 )
= cos2 α fx21 + 2 sin α cos α fx1 fx2 + sin2 α fx22
= fx21 + fx22 − (sin2 α fx21 − 2 sin α cos α fx2 fx1 + cos2 α fx22 )
= fx21 + fx22 − (sin α fx1 − cos α fx2 )2
≤ fx21 + fx22 .
The increase in the direction eα is smaller than the increase in the direction
of the gradient vector.
These arguments demonstrate that the gradient vector ∇f is perpendicu-
lar to the tangent line on a contour and that it marks the direction of the
strongest increase of a function.
A corresponding statement can be made (and demonstrated) for a func-
tion of three variables z = f (x1 , x2 , x3 ): the vector
gradf = fx1 e1 + fx2 e2 + fx3 e3
is perpendicular to the surfaces of equal value (the correct name is equipoten-
tial surfaces, see Chap. 3.2.3) through the point P and marks the strongest
rise of the function (Fig. 4.12). This assertion can be supported by the fol-
3
grad (f)
1 2
Fig. 4.12. Illustration of the gradient vector
lowing example in R3 with the function

f (x1 , x2 , x3 ) = [x21 + x22 + x23 ]1/2 = r .
The equipotential surfaces are concentric spherical shells about the origin.
The gradient of this function is
x1 x2 x3
gradf = e1 + e2 + e3 ,
r r r
in spherical coordinates
gradf = (sin θ cos ϕ)e1 + (sin θ sin ϕ)e2 + (cos θ)e3 = er .
The gradient vector marks also the steepest slope of this function.
The gradient can be linked with an additional differential concept, the
total differential.
4.2.4 The total differential

The increase of a function of two variables z = f (x, y) evaluated at two points
in the x - y plane with an infinitesimal separation can be expressed as
dz = f (x + dx, y + dy) − f (x, y) .
This can be reformulated as
dz = f (x + dx, y + dy) − f (x, y + dy) + f (x, y + dy) − f (x, y)
so that the mean value theorem can be applied in linear approximation
dz = fx (x, y)dx + fy (x, y)dy + O(d2 ) .
This linear approximation of the increase is the first total differential
(Fig. 4.13) of the function f (x, y) .
z
f(x,y)
y
x P P+dP
Fig. 4.13. Defining the total differential
A geometric interpretation of this expression can be obtained by replacing

the differentials with differences (finite quantities). The resulting equation
(z − z0 ) = fx (x0 , y0 )(x − x0 ) + fy (x0 , y0 )(y − y0 )
describes a plane through the point P = (x0 , y0 , z0 ) . This plane is the tan-
gential plane of the surface f (x, y) in the space point (x0 , y0 , f (x0 , y0 )) . The
total differential (a function of four variables x, y, dx, dy) describes therefore
infinitesimal tangential planes of the surface f (x, y) .
Additional remarks concerning the total differential are:
• The right hand side of the definition can be written in the form
dz = ∇f · dr with dr = dx ex + dy ey .
The linear increase of the function in an arbitrary infinitesimal direction dr
is given by the scalar product of gradf with dr . An expression of the form
vector function times (infinitesimal) displacement (with the appropriate
vector function) plays a central role in physics for the discussion of the
concept of work (see Chap. 3.2.3).
• Error analysis (e.g. for lab sessions) is based on the total differential.
The following statement applies to a situation characterised by a function
f (x, y): Two quantities x, y are measured and determine via the relation
z = f (x, y) the quantity z . The magnitude of the total differential is
|dz| = |fx dx + fy dy| .
Use of the triangle inequality |a + b| ≤ |a| + |b| gives the standard estimate
|dz| ≤ |fx ||dx| + |fy ||dy|
if the differentials dx, dy are interpreted as the errors of the measurement.
This estimate is, in view of the linear approximation, only correct if the
errors of measurement |dx| and |dy| are not too large.
• The terminology ’first’ total differential indicates that total differentials of
higher order can be considered. The second total differential of a function
of two variables is4
∂ ∂
d2 z = (dz) dx + (dz) dy .
∂x ∂y
The derivatives involved are
∂
(dz) = fxx (dx) + fyx (dy)
∂x
∂
(dz) = fxy (dx) + fyy (dy)
∂y
so that the second total differential can be written as
d2 z = fxx (dx)2 + 2fxy (dx dy) + fyy (dy)
if the order of the differentiation in the mixed derivatives can be inter-
changed. It approximates a surface f by an infinitesimal, tangential surface
of second order.
The definition can be extended to the case of n variables. The first total
differential of a function z = f (x1 , . . . , x2 )
n

dz = fxi (x1 , . . . , xn ) dxi = ∇f · dr

i=1
4
Note that x, y, dx and dy are independent variables.
characterises the increase of a function in linear approximation The geo-

metrical statement ’the equation describes a tangential hyperplane on a
hypersurface f in an (n + 1)-dimensional space’ is not very helpful.
4.2.5 The chain rule
The rules for partial differentiation do not differ greatly from the rules for
ordinary differentiation. An example in support of this statement is the rule
for the differentiation of a product
∂
(u(x, y)v(x, y)) = ux v + u vx .
∂x
The proof can be given on the basis of the definition of the partial derivative
and a repeat of the arguments leading to the corresponding rule of ordinary
differentiation.
An exception to the statement is the chain rule. The increased number
of variables in the case of functions of several variables allows a larger variety
of formulae but also a larger spectrum of applications. Consider, for instance,
the following situation: the functions x = x(t), y = y(t) can be inserted into
a function z = f (x, y) . The result is a function of t
z = f (x(t), y(t)) = F (t) .
The set of functions (x(t), y(t)) could represent the parametric representation
of a curve K in the x - y plane. The function F (t) describes in this case the
values of the surface f (x, y) over this curve (Fig. 4.14), that is a curve in
space.
F(t)
y
x K
Fig. 4.14. A variant of the chain rule
The ordinary derivative dF/dt (the tangent line on the curve in space)
can be expressed in terms of the partial derivatives of f and the ordinary
derivatives of x(t) and y(t) . The corresponding chain rule can be derived
with the argument: the infinitesimal difference
dz = f (x + dx, y + dy) − f (x, y)
is in linear approximation
dz = fx (x, y) dx + fy (x, y) dy .
A displacement along the curve K can be expressed as

dx = ẋ(t) dt dy = ẏ(t) dt .
Insertion into the expression for the total differential gives in the limit dt → 0
dF ∂f dx ∂f dy
= fx ẋ + fy ẏ = + .
dt ∂x dt ∂y dt
This result can, for instance, be used to calculate the derivatives of functions
as
F (t) = (cos t)sin t or more generally F (t) = x(t)y(t) .
These derivatives are not easily calculated with standard methods. The chain
rule, on the other hand, gives directly
dF
= yxy−1 ẋ + (xy ln x) ẏ
dt
= xy−1 (y ẋ + xẏ ln x) ,
in the general case and
d * +
(cos t)sin t = (cos t)(sin t−1) − sin2 t + cos2 t ln(cos t)
dt
for the explicit example.
Additional variants of the chain rule arise e.g. if the following functions
z = f (x) x = g(t1 , t2 ) −→ z = F (t1 , t2 )
z = f (x, y) x = x(t1 t2 ), y = y(t1 t2 ) −→ z = F (t1 , t2 )
are considered. All possible cases are covered by the general problem: Cal-
culate the derivatives of the composite function z = F (t1 , . . . , tm ) , which is
obtained by insertion of the functions of m variables xi = xi (t1 , . . . , tm ) into
the function z = f (x1 , . . . , xn ) of n variables.
The derivation of the formulae can be based, as in the introductory exam-
ple, on the composition of total differentials5 . The results up to second order
(with the proviso that all second order derivatives exist and are continuous)
are

n
∂F ∂f ∂xi
= l = 1, 2 . . . , m
∂tl i=1
∂xi ∂tl

n n
∂2F ∂ 2 f ∂xi ∂xj
∂f ∂ 2 xi
= + l, k = 1, 2 . . . , m .
∂tl ∂tk i,j=1
∂xi ∂xj ∂tl ∂tk i=1
∂xi ∂tl ∂tk
All symbols for partial differentiations have to be replaced by the symbol for
ordinary differentiation if only one x - or only one t -variable occurs.
An example of particular interest in physics is the application of the chain
rule for the transformation of the Laplace and the gradient operators into
5
See list of literature for the proof of these formulae.
curvilinear coordinates. The answer is already lengthy in the two-dimensional

world so that the explicit calculation is restricted to this case. The following
quantities are discussed for a function of two variables
∇U (x, y) = Ux ex + Uy ey
ΔU (x, y) = Uxx + Uyy .
It is assumed that the function U is a composite function, as e.g.
U (x, y) = U (x(r, ϕ), y(r, ϕ))) = u(r, ϕ)
with
y
r(x, y) = x2 + y 2 ϕ(x, y) = arctan .
x
Four derivatives of U , that is Ux , Uy , Uxx , Uyy , have to be calculated with
the chain rule6 The first two are
Ux = ur rx + uϕ ϕx
Uy = ur ry + uϕ ϕy .
The following derivatives of r and ϕ are needed
x y
rx = = cos ϕ ry = = sin ϕ
r r
1 y y sin ϕ
ϕx = 2
− 2 =− 2 =−
(1 + (y/x) ) x r r

1 1 x cos ϕ
ϕy = = 2 = .
(1 + (y/x)2 ) x r r
Insertion and sorting yields
1
∇U (x, y) = ur (cos ϕ ex + sin ϕ ey ) + uϕ (− sin ϕ ex + cos ϕ ey )
r
1 ∂u 1 ∂u
= ur er + uϕ eϕ = er + eϕ .
r ∂r r ∂ϕ
The formal expression for the gradient operator in planar polar coordinates
is therefore
∂ 1 ∂
∇ = er + eϕ .
∂r r ∂ϕ
The calculation for the Laplace operator needs more time. The starting
point is the chain rule
6
Mark the notation U (x, y) and u(r, ϕ).
Uxx = urr rx rx + 2urϕ rx ϕx + uϕϕ ϕx ϕx + ur rxx + uϕ ϕxx

Uyy = urr ry ry + 2urϕ ry ϕy + uϕϕ ϕy ϕy + ur ryy + uϕ ϕyy .
Insertion of all the derivatives of r and ϕ required
1 x2 y2 sin2 ϕ
rxx = − 3 = 3
=
r r r r
x2 cos2 ϕ
ryy = = 3 =
r r
2xy 2 cos ϕ sin ϕ
ϕxx = 2 =
(x + y 2 )2 r2
2xy 2 cos ϕ sin ϕ
ϕyy =− 2 2 2
=− ,
(x + y ) r2
gives

cos ϕ sin ϕ cos ϕ sin ϕ
ΔU (x, y) = urr cos2 ϕ + sin2 ϕ + 2urϕ − +
r r

1 2 1 2
+ uϕϕ 2
sin ϕ + cos2 ϕ + ur sin ϕ + cos2 ϕ
r r
2
+ uϕ(cos ϕ sin ϕ − cos ϕ sin ϕ)
r2
1 1
= urr + ur + 2 uϕϕ .
r r
The formal version of the Laplace operator in planar polar coordinates is
∂2 1 ∂ 1 ∂2
Δ= + + .
∂r2 r ∂r r2 ∂ϕ2
The derivation of the corresponding formulae for the conversion of the

two differential operators from Cartesian coordinates in R3 into cylinder
coordinates does not introduce any new aspects. The calculation for the
conversion into spherical coordinates or other curvilinear coordinates is
more involved7 . As the formulae for spherical coordinates are often needed,
the results are already quoted here. The formal gradient operator is
∂ 1 ∂ 1 ∂
∇ = er + eϕ + eθ ,
∂r r sin θ ∂ϕ r ∂θ
the Laplace operator

7
A different approach is discussed in Vol. 2, Math.Chap. 5.
∂2 2 ∂ 1 ∂2 1 ∂2 cot θ ∂
Δ= 2
+ + 2 2 2
+ 2 2
+ 2 .
∂r r ∂r r sin θ ∂ϕ r ∂θ r ∂θ
It is advised that you trythe explicit derivation, at least for spherical

coordinates8 .
A last remark might be useful in order to avoid a common misconception:
The evaluation of the partial derivative of the function
z = f (x1 x2 , . . . , xn , t) with xi = xi (t)
with respect to t
∂f (x1 , x2 , . . . , xn , t)
∂t
involves only differentiation with respect to the explicit variable t . The total
derivative of f with respect to t is
n
df ∂f
∂f dxi
= + .
dt ∂t i=1
∂xi dt
The chain rule has to be applied in this case.
4.3 Integration
There exists a number of options for the integration of functions of several
variables. A choice of the domain of integration in the form of curves, areas,
volumes and higher dimensional domains and the consideration of fixed and
open (dependence on the variables) limits of integration is possible. The re-
sult of the integration comprises areas, volumes (also in higher dimensional
spaces) and also functions, if the integration involves e.g. only one of the
variables. The integration of functions of two variables allows, for instance,
the definition and the representation of ’higher functions’. Elliptic integrals,
which are introduced in a separate section (Math.Chap. 4.3.4), constitute an
example of this class of functions. The discussion of integration begins with
integrals of functions of two variables.
4.3.1 Single integrals of f (x, y)
A function z = f (x, y) is defined over a suitable domain (and continuous).

The (definite) integral
b
Fab (x) = f (x, y) dy
a
8
Note also that all expressions are found in Mathematical Tables.
4.3 Integration 133
can be evaluated with this function. The variable x is treated as a constant

during the integration. The limits of integration are supposed to lie in the in-
terior of the domain of definition of the function (Fig. 4.15a). The geometric
(a) (b)
z z
a b
y y
x x
domain of integration dependence on x
Fig. 4.15. The integral Fab (x)
interpretation of this integral is: the function f (xfixed , y) describes the inter-
section of the surface f with the plane x = xfixed . The integral corresponds
to the contents of the planar area which is confined by the line segment ab
in the x - y plane, the curve of intersection and appropriate parallels to the
z-axis. It is actually a standard integral which is embedded in a three dimen-
sional world. Different values of x yield different curves of intersection and
hence different areas. The evaluation of the integral determines not only one
area but a whole family of areas (Fig. 4.15b).
The result for the explicit example (Fig. 4.16)
1 1
1 1
(x2 + y 2 ) dy = (x2 y + y 3 ) = x2 +
0 3 0 3
can be interpreted as follows: the intersection for x = 0 is the parabola
z = y 2 . The area under the parabola between the limits 0 and1 has the value
1/3 . The area under the curve z = x2 + y 2 for x = 0 is composed of the
rectangle x2 · 1 and the area under the arc of the parabola (Fig. 4.16b).
There exists no reason to favour the y -coordinate. The integral
β
Gαβ (y) = f (x, y) dx
α
can be interpreted in a similar fashion: it represents the area under the in-
tersection of the surface f and the plane y = yfixed . Additional examples are
the calculation of two integrals with the function y x
1 1
x y x+1 1
y dy = = x+1 (x = −1)
0 (x + 1) 0
1 1
y x (y − 1)
y x dx = = (y = 0)
0 ln y 0 ln y
(a) (b)
z z
y 1 y
cut for x = 0 cut for x = fixed = 0
Fig. 4.16. Integration under a paraboloid
(use the substitution y = eln y in the second integral).

The result of such integrations is a function of one variable, which is
defined and represented via the integral (independent of any geometrical
interpretation). Some of the ’higher function’ as the e.g. the elliptic integrals
π/2 −1/2
F (k) = 1 − k 2 sin2 y dy (first kind)
0
π/2 1/2
E(k) = 1 − k 2 sin2 y dy (second kind)
0
are defined in this fashion. These functions play a role in the discussion of
different physical problems, as the discussion of the motion of a mathematical
pendulum for large displacements from the equilibrium position (Chap. 4.2.1)
or of the motion of spinning tops (Chap. 6.3.7). Elliptic integrals are in-
troduced in detail in Math.Chap. 4.3.4. Two simpler functions, which are
represented in terms of such integrals are:
• The function defined by the integral (Fig. 4.17)
∞ b
sin xy sin xy
F (x) = dy = lim dy .
0 y b→∞ 0 y
The integrand is continuous for y = 0

sin xy sin xy
lim =x lim =0 .
y→0 y x→0 y
The area, which is represented by the integral, is reasonably complicated.
The intersection of a plane x = const. with the surface
sin (const. y)
f (x, y) =
y
is illustrated in Fig. 4.17. The integral can be simplified slightly by a sub-
stitution.
4.3 Integration 135
Fig. 4.17. The function (sin xy)/y for x = const.
∗ Use u = xy and du = x dy for x > 0 and find

∞
sin u π
F (x) = du = .
0 u 2
The value of the integral is taken from a Table of definite integrals. The
explicit evaluation has to be based on the Cauchy integral theorem of
complex analysis.
∗ The result F (0) = 0 follows for x = 0 (as f (0, y) = 0).
∗ The substitution u = |x| y du = |x| dy has to be used for the case of
negative x -values (x < 0). The result is here
∞ ∞
sin |x| y sin u π
F (x) = − dy = − du = − .
0 y 0 u 2
The function F (x) is a step function (Fig. 4.18). The function is not con-
tinuous even though the integrand is and the improper integral converges.
Fig. 4.18. The integral with (sin xy)/y defines a step function
• The next example is the often used Γ -function (Gamma function) which
is defined by the improper integral
∞
Γ (x) = y x−1 e−y dy
0
(see also Vol. 2, Math.Chap. 4.1). On the basis of the definition the follow-
ing properties can be established:
∗ The integral is simple for x = 1
∞
∞
Γ (1) = e−y dy = −e−y 0 = 1 .
0
∗ Partial integration of
∞
xΓ (x) = x y x−1 e−y dy (x > 0)
0
yields

∞ ∞
= y x e−y 0 + y x e−y dy .
0
The first term vanishes for x > 0 . The second term represents the function
Γ (x + 1) .
This result can be used as an alternative definition of the Gamma function
(for x > 0) in the form of a functional equation
Γ (x + 1) = xΓ (x) .
For integer values of x follows
Γ (n) = (n − 1)Γ (n − 1) .
The exploitation of this recursion relation starting with Γ (1) = 1 leads to
Γ (2) = 1 Γ (3) = 2 · 1 Γ (4) = 3 · 2 · 1 , etc. → Γ (n) = (n − 1)! .
The Γ -function is a generalisation of the factorial.
It is necessary to answer the question whether and how a function, defined
by such integrals, can be differentiated. The answer (rule) is:
b
The derivative of a function Fab (x) = a
f (x, y) dy is
b
dFab (x)
= fx (x, y) dy
dx a
provided the functions f (x, y) and fx (x, y) are continuous.
It is permitted to interchange integration and differentiation under the

assumptions stated. A corresponding rule holds for
β
Gαβ (y) = f (x, y) dx .
α
These rules can be illustrated with
1
1 d 1 1
= y x dy , and =− .
(x + 1) 0 dx (x + 1) (x + 1)2
4.3 Integration 137
The rule results in

1
d 1
= (y x ln y) dy
dx (x + 1) 0
so that partial integration leads to

1 1
d 1 1 1
= y x+1
ln y − y x dy
dx (x + 1) (x + 1) 0 (x + 1) 0
1
= 0−
(x + 1)2
as expected.
The discussion of integration of a function of two variables over one of
the variables can be extended by consideration of variable rather than fixed
limits
b(x)
f (x, y) dy = g(x, a(x), b(x)) = F (x) .
a(x)
The limits of integration are functions of x in this case (Fig. 4.19). The
function y = a(x) as well as the function y = b(x) represent curves in the
x - y plane. The interval of integration for the integration over y is confined
by these curves (for each value of x).
a(x) b(x)
x
Fig. 4.19. Integration over y with variable limits
A direct example is an integral with the function z = 3 − x2 − y 2 – an

inverted paraboloid of revolution – and variable limits of integration in the
form of semicircular arcs

a(x) = − 1 − x2 b(x) = 1 − x2 .
The results, areas embedded in a three-dimensional world, are represented in
Fig. 4.20 a-c for x = 0, 1/2 and 1 .
Evaluation of the integral results in
(a) (b) (c)

z z z
y y y
x x x
x=0 x = 1/2 x=1
Fig. 4.20. Integration under a paraboloid, the range of integration is confined by

semicircular arcs. Illustration of the integral for fixed values of x .
√
1−x2
F (x) = √ (3 − x2 − y 2 ) dy
− 1−x2
√1−x2
1
= (3 − x2 )y − y 3 √
3 − 1−x2
4
(4 − x2 ) 1 − x2 .
=
3
The rules for the differentiation of integrals, which define a function, have
to be extended for the case of integrals with variable limits. This extension
is based on the following
(i) The chain rule requires
d ∂g ∂g da ∂g db
g(x, a(x), b(x)) = + + .
dx ∂x ∂a dx ∂b dx
(ii) The derivative with respect to variable limits is
x
d
f (x̃) dx̃ = f (x) .
dx x0
Combination of these statements yields the extended rule
b(x) b(x)
d
f (x, y) dy = fx (x, y) dy
dx a(x) a(x)
db(x) da(x)
− f (x, a(x))
+f (x, b(x)) .
dx dx
The last term arises from the derivative with respect to the lower limit.
(Check the validity of this rule for the example given above).
A step beyond the standard integration, although still embedded in R3 ,
is taken by the discussion of double integrals of a function f (x, y) in the
following section.
4.3 Integration 139
4.3.2 Double and domain integrals with f (x, y)
Starting point of the discussion is the integral

β
I1 = dx Fab (x) .
α
It is assumed that the function Fab (x) has been obtained by integration over
the second variable of a function f (x, y)
β β b
I1 = dx Fab (x) = dx dy f (x, y) .
α α a
This is a double integral. The result of the calculation is a volume over a
rectangle in the x - y plane which is limited from above by the surface f . The
lateral surfaces are planar (Fig. 4.21).
a b
α y
x
Fig. 4.21. Integration over a rectangular domain: illustration of the integral
The explicit form of the integral indicates the manner in which it is evalu-
ated: the areas Fab (x) (inner integration) are obtained first. The second step
involves the addition of infinitesimal slabs Fab (x) dx . The result of this outer
integration is a volume (Fig. 4.22a). It is assumed that appropriate limiting
processes are used for each of the integrations.
There exist additional options to subdivide the total volume. A second
possibility is expressed by the integral
b β
I2 = dy dx f (x, y) .
a α
Areas parallel to the x - z plane are obtained first in this case. This is followed
by adding infinitesimal slabs in the y -direction (Fig. 4.22b). It might be
expected that the two volumes calculated are equal I1 = I2 . This is indeed
the case if the following condition is met: f (x, y) has to be bounded over the
domain of integration
|f (x, y)| < M
except for a finite number of points with infinities.
(a) (b)
z z
y y
x x
segmentation with dx segmentation with dy
Fig. 4.22. Integration over a rectangular domain
A third possibility to evaluate the integral is indicated by the notation

I3 = f (x, y) dxdy .
R
The index R denotes the rectangular domain of integration used in the other
options. The volume is subdivided in the case of this domain integral as:
subdivide the domain of integration into infinitesimal rectangles dxdy and
calculate the volume of a column for each of these rectangles (Fig. 4.23)
dV = dxdy f (suitable intermediate point) .
The total volume is obtained by addition of the contributions of all infinites-
imal columns (using a proper limiting process). All three options give the
x
Fig. 4.23. Integration over a rectangular domain: division into infinitesimal
columns
same value for the volume I1 = I2 = I3 provided f is bounded over the do-
main R . This statement is in so far very useful as the evaluation of a domain
integral is in general only possible if it can be reduced to a double integral
(with two ordinary integrations which are executed one after the other).
The following example with the function
f (x, y) = x2 + y 2
4.3 Integration 141
and the domain of integration

R : −2 ≤ y ≤ 2 −1≤x≤1
illustrates the individual steps. The domain of integration is a rectangle about
the origin. The result of the integration is the volume between the rectangle
and the paraboloid of revolution f (x, y) (Fig. 4.24). The actual calculation
proceeds e.g. in the following fashion
1 2 1 2
2 2 2 1 3
I= dx dy (x + y ) = dx x y + y
−1 −2 −1 3 −2
1 1
16 4 3 16 40
= dx 4x2 + = x + x = .
−1 3 3 3 −1 3
The same result is obtained by an interchange of the sequence of integrations.
z 4
–4 –4
–2 –2
0 0
y 2 2
x
4 4
Fig. 4.24. Integration over a rectangle under a paraboloid of revolution
The first step of the next task, the calculation of volumina over an arbi-
trary domain, involves the integrals
β b(x)
I1 = dx dy f (x, y) = f (x, y) dxdy
α a(x) B1
respectively
b β(y)
I2 = dy dx f (x, y) = f (x, y) dxdy .
a α(y) B2
The domain of integration is confined by the straight lines x = α and x = β

and by the curves y = a(x) and y = b(x) in the first example (Fig. 4.25a).
The integral represents the volume between this area in the x - y plane and
the surface f over this domain. The domain of integration is confined by
the curves x = α(y) and x = β(y) and the straight lines y = a respectively
y = b in the second example (Fig. 4.25b). The values of the integrals will
obviously be different (in general) as the domains of integration are different.
It is preferable, in view of the shapes of the domain of integration, to divide
the domain of integration into slabs parallel to the y -axis in the first and into
slabs parallel to the x -axis in the second case. The two integrals can also be
listed as

f (x, y) dxdy i = 1, 2 ,
Bi
independent of the subdivision. The understanding is that the reduction to

a double integral is to be geared to the shape of the domain Bi at hand.
(a) (b)
y
y α (y) β (y)
b(x) b
a(x)
a
x
α β x
variable limits in the y -direction variable limits in the x -direction
Fig. 4.25. Domains of integration with variable limits
Integrals of the type indicated can also be used to deal with a domain of
integration which is limited only by curvilinear boundaries.
• This is, for example, the case if the curves y = a(x) and y = b(x) intersect
in the points with x = α and x = β (Fig. 4.26a).
• A similar situation is encountered if the curves x = a(y) and x = b(y)
intersect in the points with y = a and y = b (Fig. 4.26b).
• More complicated shapes of the domain can be treated by a subdivision
into suitable sub-domains. An example is the domain of Fig. 4.27 with
a kidney shape. The subdivision can be oriented with respect to the x -
direction or the y -direction. The following subdivision offers itself for the
choice of an inner integration in the y -direction

f (x, y) dxdy = f (x, y) dxdy + f (x, y) dxdy
B
B1 B2
+ f (x, y) dxdy .
B3
This relation is written in the form

4.3 Integration 143
(a) (b)
y y
a(x) α (y) β (y)
a
b(x)
b
x
α β x
variable boundaries in the variable boundaries in the
y -direction x -direction
Fig. 4.26. Examples for domains of integration with curvilinear boundaries
y a 1 (x)
B2
a 2(x)
B1
a 3 (x)
B3
a 4 (x)
α1 α2 α3 α4 x
Fig. 4.27. Subdivision of a domain with a kidney shape.
α2 a1 (x)
f (x, y) dxdy = dx dy f (x, y)
B α1 a4 (x)
α3 a1 (x)
+ dx dy f (x, y)
α2 a2 (x)
α4 a3 (x)
+ dx dy f (x, y)
α2 a4 (x)
if full details have to be exhibited.

The subdivision of more general domains of integration is illustrated in
the following examples.
• The first example is a circular area with radius R about the origin. The
subdivision of the integral
R √R2 −x2
f (x, y) dxdy = dx √ dy f (x, y)
K −R − R2 −x2
uses stripes parallel to the y -axis. The limiting curves are semicircles in
the upper and lower half-planes (Fig. 4.28a). The subdivision
R √R2 −y2
f (x, y) dxdy = dy √ dx f (x, y)
K −R − R2 −y 2
can be used as well if the integrand is bounded. A subdivision into stripes
(a) (b)
y y
x x
subdivision with dx subdivision with dy

Fig. 4.28. Integration over a circular area
parallel to the x -axis and addition of infinitesimal slabs between −R and

R in the y -direction is used in the second option (Fig. 4.28b).
A definite problem is the calculation of the volume of a cylinder of radius

R = 1 with a lid in the form of a paraboloid which is described by the
function f = 3 − x2 − y 2 . The volume is obtained with the steps
1 √1−x2
V = dx √ dy (3 − x2 − y 2 )
−1 − 1−x2
1
4
= dx (4 − x2 ) 1 − x2
−1 3
(as calculated before). The second step is often more involved as the first
integration often leads to more complicated functions. The integrals, which
are still to be evaluated, can be taken from suitable tables in the present
case
)
16 x 1
V = 1 − x + arcsin x
2
3 2 2
)1
4 x 2 x 1
− (x − 1) 1 − x2 + 1 − x2 + arcsin x .
3 4 8 8 −1
Only terms with arcsin contribute for the limits specified. The final result
is
8 1 5
V = π− π= π
3 6 2
4.3 Integration 145
because of arcsin 1 = − arcsin(−1) = π/2 . This corresponds to the volume

of the cylinder VZ = πR2 h = 2π plus the volume of the parabolic ’hat’
VH = (1/2)π .
• The center of the circular area with radius R is shifted to the point (A, B)
in the second example. A subdivision into stripes parallel to the y- axis
leads to
A+R B+√R2 −(x−A)2
f (x, y) dxdy = dx √ dy f (x, y) .
B A−R B− R2 −(x−A)2
• The third example deals with a triangular domain marked by the corner
points (0, 0), (0, 2) and (1, 2) . The double integral is
1 2
f (x, y) dxdy = dx dy f (x, y)
B 0 2x
if the y -integration is chosen as the inner integration.
y=2
y = 2x
1 x
Fig. 4.29. Integration over a triangular domain
• The following options are possible for integration over a circular ring (ex-
ample 4) with the radii R1 and R2 (R1 < R2 ). The decomposition is simple
if the function f is defined and continuous over the interior of the annulus
(Fig. 4.30a)

f (x, y) dxdy = f (x, y) dxdy − f (x, y) dxdy .
B K2 K1
The contributions of the circular areas can be calculated as in the first

example. A subdivision of the ring becomes necessary if the function f is
not defined or not bounded over the interior of the ring. The division is
indicated in Fig. 4.30b. The outer ring gives the contribution
, ,
a1 (x) = R22 − x2 and a3 (x) = − R22 − x2 ,
the inner ring
, ,
a2 (x) = R12 − x2 and a4 (x) = − R12 − x2 ,
so that
(a) (b)
y y
a2 a1
R1 2
1 4
x x
R 2
3
a3 a4
f (x, y) bounded in the interior f (x, y) singular or not defined

in the interior
Fig. 4.30. Domains of integration for an annulus
−R1 √R22 −x2

f (x, y) dxdy = dx √ dy f (x, y)
B −R2 − R22 −x2
R1 √R22 −x2
+ dx √ dy f (x, y)
−R1 R12 −x2
√
R1 − R12 −x2
+ dx √ dy f (x, y)
−R1 − R22 −x2
R2 √R22 −x2
+ dx √ d yf (x, y) .
R1 − R22 −x2
Two additional points can be noted concerning the calculation of volumes

in R3 :
(1) The volume is negative if the function f is smaller than zero over the
complete domain of integration (similar to the appearance of negative
areas for integrals of one variable).
(2) The surface area of the domain of integration can be calculated with
f =1

FB = dxdy .
B
Some examples are
(i) The circular area about origin
R √R2 −x2 R
FK = dx √ dy = 2 dx R2 − x2
−R − R2 −x2 −R
- x .R
= x R2 − x2 + R2 arcsin = πR2 .
R −R
4.3 Integration 147
(ii) The circular area about(A, B)

A+R
1/2
fK = 2 dx R2 − (x − A)2 = πR2 .
A−R
(iii) The triangular area of the third example
1 2 1
1
FD = dx dy = dx (2 − 2x) = 2x − 2x2 = 1 .
0 2x 0 0
(iv) The annulus
FO = π(R22 − R12 )
as the function f = 1 is bounded over the complete domain.
The following rules have been used in the examples given above:

c f (x, y) dxdy = c f (x, y)dxdy
B B

(f (x, y) ± g(x, y)) dxdy = f (x, y) dxdy ± g(x, y) dxdy
B B B

f (x, y) dxdy = f (x, y) dxdy + f (x, y) dxdy .
B1 +B2 B1 B2
They correspond to the rules for ordinary integration or are a consequence of

the definition of the domain integral. Only the extension of the rule for sub-
stitution has to be explained in more detail. The next example demonstrates
that substitution is helpful in the evaluation of domain integrals.
The domain of integration is a circular area about the origin. It is
subdivided
into infinitesimal rectangles if Cartesian coordinates are used
I = B f (x, y) dxdy . A more appropriate subdivision of a circular area
should, however, be based on polar coordinates. This implies a decomposition
of the circular area with the aid of rays and concentric circles (Fig. 4.31). An
rdϕ
dr
Fig. 4.31. Subdivision of a circular area with polar coordinates
infinitesimal element for this subdivision has the magnitude dS = (dr)(rdϕ) .

The volume I can then be obtained by substitution of the integrand f (x, y)
f (x, y) = f (r cos ϕ, r sin ϕ) = F (r, ϕ) ,
construction of the infinitesimal volumina

dV = F (r, ϕ) r drdϕ
and addition of all contributions with the usual limiting process

I= F (r, ϕ) r drdϕ .
B
The index B characterises the domain of integration in terms of polar coor-

dinates. The circular area is defined by the limits
B : 0 ≤ ϕ ≤ 2π 0≤r≤R.
The image of the circular area (the domain B ) is a rectangle if the curvilinear
(orthogonal) coordinates are plotted in the manner of Cartesian coordinates
(Fig. 4.32). A subdivision with a rectangular grid is possible so that the
R
B
2π ϕ
Fig. 4.32. The image of a circle in planar polar coordinates
integral can be evaluated (for bounded integrands) as a double integral

R 2π
I= dr r [f (r cos ϕ, r sin ϕ)] dϕ
0 0
2π R
= dϕ [rf (r cos ϕ, r sin ϕ)] dr .
0 0
The advantage of using polar coordinates in the present example becomes
apparent in the last equation. The use of coordinates, which are adapted to
the limits of the integration, leads to constant limits and simplifies in general
the evaluation of the integral. For instance, the double integral factorises
completely if the integrand depends only on the radial coordinate, for instance
for the
• calculation of the surface area of a circle (f = 1)
2π R 2
R
F = dϕ r dr = (2π)
0 0 2

• or the calculation of the volume of a hemisphere (f = R2 − x2 − y 2 )):
2π R R
2
2 1/2 1 2
2 3/2 2π 3
V = dϕ r R −r dr = (2π) − R − r = R .
0 0 3 0 3
4.3 Integration 149
The discussion of the general substitution rule takes longer: A domain B

of the x - y plane is covered by a set of curvilinear coordinates
u = u(x, y) v = v(x, y) ⇐⇒ x = x(u, v) y = y(u, v) ,
for example by the coordinates
1/2
x 2 y 2 ay
u= + v = arctan
a b bx
x = au cos v y = bu sin v .
The curves u = const. are ellipses, the curves v = const. represent rays
originating from the origin (Fig. 4.33). The next step is the specification of
Fig. 4.33. Subdivision of the area of an ellipse
the area between the two pairs of (infinitesimally) neighbouring curves u =

const. and u + du as well as v = const. and v + dv in linear approximation
(Fig. 4.34a). This area represents an irregular tetragon in linear approxima-
tion. The coordinates of the corner points are
P1 x1 = x(u, v) y1 = y(u, v)
∂x
P2 x2 = x(u, v + dv) = x(u, v) + dv
∂v
∂y
y2 = y(u, v + dv) = y(u, v) + dv
∂v
∂x ∂x
P3 x3 = x(u + du, v + dv) = x(u, v) + du + dv
∂u ∂v
∂y ∂y
y3 = y(u + du, v + dv) = y(u, v) + du + dv
∂u ∂v
∂x
P4 x4 = x(u + du, v) = x(u, v) + du
∂u
∂y
y4 = y(u + du, v) = y(u, v) + du .
∂u
The area of the tetragon can be calculated by a decomposition into two

triangles (e.g. as in Fig. 4.34b). The corresponding formula is
1
[(r1 − r2 ) × (r4 − r2 ) + (r4 − r2 ) × (r3 − r2 )]
dS =
2
1
= [(r1 − r3 ) × (r4 − r2 )] .
2
The vectors r i represent the vectors to the four corner points. Only the
(a) (b)
1 1
4 4
v
2 2
v+dv 3
u 3
u+du
the infinitesimal tetragon vectorial representation of the area of the tetragon
Fig. 4.34. Infinitesimal subdivision of a two-dimensional domain in arbitrary curvi-
linear coordinates
magnitude of the infinitely small area is of interest and not its orientation. It
is therefore sufficient to use
1
dS = |dSz | = |(x1 − x3 )(y4 − y2 ) − (x4 − x2 )(y1 − y3 )| .
2
Insertion of the coordinates of the corner points yields

∂x ∂y ∂x ∂y
dS = − dudv .
∂u ∂v ∂v ∂u
The absolute value can be written in terms of a determinant

∂x ∂x

∂u ∂v ∂(x, y)
|D| = = .
∂y ∂y ∂(u, v)

∂u ∂v
The determinant D is known as the Jacobian determinant or simply the
Jacobian of the transformation x = x(u, v), y = y(u, v) . The corresponding
matrix is the Jacobian matrix. The substitution rule can be formulated with
this definition in the form

∂(x, y)
f (x, y) dxdy =
f (x(u, v), y(u, v)) dudv .
B B ∂(u, v)
4.3 Integration 151
It corresponds to a transcription of the rule used in ordinary integration

b b
dx
f (x) dx = f (x(u)) du .
a a du
• The extension of the substitution of the line element is

dx
dx = du −→ dxdy = |D| dudv .
du
Only the magnitude of the Jacobian occurs as the order of the coordinates
is not important. The determinant itself would change sign if two rows
or columns are interchanged. The sign describes the two possible vectorial
orientations of the infinitesimal element which are not of interest.
• The substitution of the integrand need not be discussed.
• The change of the domain of integration is the equivalent of the change of
the limits of integration. The domain B in the u - v plane is the image of
the original domain (Fig. 4.35). The substitution is particularly useful if
the image B of the original domain is a rectangle.
y u
B
B’
x v
Fig. 4.35. Imaging domains
The technique of substitution is illustrated with an example: the domain

of integration is the area of an ellipse
x2 y2
B: + ≤1.
a2 b2
The problem, which is posed, is: Calculate the area of the ellipse and the
volume of an ellipsoid with the semi axes a , b , c (Fig. 4.36) by integration
over the domain stated. This implies that the corresponding domain integral

f (x, y) dxdy
B
is to be evaluated for
1/2
x2 y2
f1 = 1 and f2 = c 1 − 2 + 2 .
a b
Fig. 4.36. Area of an ellipse and volume of an ellipsoid
The image of the domain B is a rectangle with the sides 1 and 2π for the
substitution
x = au cos v y = bu sin v .
The Jacobian is

a cos v −au sin v

|D| = | | = abu .
b sin v bu cos v
The calculation yields therefore
2π 1
I1 = F(ellipse) = dxdy = ab dv udu
B 0 0

2 1
u
= (ab)(2π) = abπ
2 0
and
1/2
x2 y2
I2 = V(ellipsoid/2) = c 1− 2 − 2 dxdy
B a b
2π 1
1/2
= abc dv u 1 − u2 du
0 0
1
1 3/2 2
= abc(2π) − 1 − u2 = abcπ .
3 0 3
4.3.3 Integrals with f (x, y, z)
The discussion of integrals with functions of three variables proceeds along

the same lines as for functions of two variables. The details can be more
laborious, though, as an additional integration is involved. This section will
be restricted to a survey rather than the full story.
The following hierarchy of integrals can be considered for the case of fixed
limits of integration of the three variables
β
1. g(y, z) = α f (x, y, z) dx
b β
2. h(z) = a dy α f (x, y, z) dx
B b β
3. I = A dz a dy α f (x, y, z) dx .
4.3 Integration 153
A graphical presentation of the results is not possible, even if they can be

indicated verbally. The domains of integration can, however, be characterised
and plotted.
1. The domain of integration is a straight line segment. The straight line
which contains the segment runs through the point (y, z) and is parallel
to the x -axis (Fig. 4.37). The result of the calculation is an ’area’ in the
fourth space dimension.
x= α y
x= β
x
Fig. 4.37. Integration over a line segment in R3
2. The domain of integration is a rectangle in the plane z = const. which is

limited by straight lines parallel to the x - and y -axes (Fig. 4.38a). The
result of the calculation is a volume above the rectangle. This volume is
embedded in the three-dimensional subspace spanned by the coordinates
x , y, u .
3. The domain of integration is a cuboid in R3 (Fig. 4.38b) bordered by
six planes x = α, β, y = a, b and z = A, B . The result represents a
four-dimensional volume.
(a) (b)
z
z b
a
α
β
y
y x
x
integration over a planar integration over a cuboid in R3
rectangular area in R3
Fig. 4.38. Integration over two- and three-dimensional domains
No additional calculational aspects arise in any of the three cases. The integral
I with f (x, y, z) = x + y + z is for instance
1 1 1
I= dz dy dx (x + y + z)
0 0 0
1 1 1
1 3
= dz dy +y+z = dz (1 + z) = .
0 0 2 0 2
The question of an interchange of the order of the integration in the cases 2
and 3 can be answered as before. The condition is: interchange is possible if
the function f (x, y, z) is bounded over the appropriate domain. The following
shorthand formulae can be used in this case

h(z) = f (x, y, z) dxdy = f (x, y, z) dSxy
R R
respectively

I= f (x, y, z) dxdydz = f (x, y, z) dV .
Q Q
The domain is supposed to be subdivided into infinitesimal rectangles or

cuboids.
A corresponding hierarchy exists for integrals with variable limits. The
form of the domains of integration can, however, be quite complex. The dif-
ferent classes of integrals of this hierarchy (using a notation similar to the
integrals with fixed limits) are
β(y,z)
1. g(y, z) = f (x, y, z) dx .
α(y,z)
The domain of integration is still a straight line segment parallel to the x -

axis. The segment is confined by the surfaces x = α(y, z) and y = β(y, z)
rather than the planes x = α, β (Fig. 4.39).
α (y,z)
y
x
β (y,z)
Fig. 4.39. Integration over a straight line segment in R3 with variable limits
b(z) β(x,y)
2. h(z) = dy f (x, y, z) dx .
a(z) α(x,y)
4.3 Integration 155
The domain of integration is an area in the plane z = const. The border of

this area can be characterised in the following fashion:
(a) The limit in the x -direction is determined by the intersecting curves
of the surfaces x = α(y, z) and x = β(y, z) with the plane z = const.
(Fig. 4.40a).
(b) The limit in the y -direction is determined by straight line sections parallel
to the x -axis which are determined by the intersection of the curves
y = a(z) and y = b(z) with the plane z = const. (Fig. 4.40b).
(a) (b)
z
α(z) β(z)
z
α(y,z) z fixed
z fixed β(y,z)
y
x y
limits in the x -direction variation of the limits with z

Fig. 4.40. Integration over a planar surface in R3 with a curvilinear border
The form of the domain of integration changes with the z -coordinate.

The most important of these integrals is
B b(z) β(y,z)
3. I= dz dy f (x, y, z) dx .
A a(z) α(y,z)
This integral represents a four-dimensional volume. The domain of integration

is a kind of ’standard domain’ in R3 . Arbitrary three-dimensional domains
of integration can be composed of such standard domains. One example for
a standard domain, with a different order of the integration, is given by
β b(x) B(x,y)
I= dx dy f (x, y, z) dz .
α a(x) A(x,y)
This standard domain BS (Fig. 4.41), defined by the limits indicated, can be
described as follows: the integration in the x -direction is limited by the planes
x = α, β . The limits of the y -integration are the surfaces y = a(x), b(x) .
The basic area, which is defined in this fashion, is projected into the space
and provided with a surface at the bottom z = A(x, y) and on the top
z = B(x, y) . The three-dimensional domain can be subdivided into arbitrary
infinitesimal volume elements (infinitesimal cuboids is only one possibility) if
the integrand is bounded. The limiting process leading to

I= f (x, y, z) dV
BS
α
y
x
β
Fig. 4.41. A standard domain in R3
can be carried out after construction and addition of the infinitesimal four-
dimensional volume elements f (x, y, z) dV .
The integral with integrand f = 1 can be used to calculate the contents
of arbitrary three-dimensional volumina (compare the situation for integrals
with f (x, y))

VB = dxdydz = dV .
B B
This statement is illustrated with some examples using a subdivision of the
domain of integration in the form
β b(x) B(x,y)
dx dy f (x, y, z) dz .
α a(x) A(x,y)
• The limits of the integration for a spherical volume (Fig. 4.42) about the
origin (radius R) are for this order of integration:
two hemispheres (bottom and lid) in the z -direction
y
x
Fig. 4.42. A spherical domain of integration
1/2
A(x, y) = − R2 − x2 − y 2
1/2
B(x, y) = R2 − x2 − y 2 ,
two circular arcs in the y -direction
4.3 Integration 157
1/2
a(x) = − R2 − x2
1/2
b(x) = R2 − x2
and constant limits
α = −R β=R
in the x -direction.
The symmetric form of the domain allows an interchange of the order of the
integration with a corresponding adjustment of the limits. The volume of
a sphere could now be calculated. It is, however, apparent that a transition
to spherical coordinates offers advantages.
• An elliptic cylinder over the x - y plane with a parabolic lid (Fig. 4.43) is
characterised by the following limits:
The bottom of the domain is the x - y plane A(x, y) = 0, the lid is the
y
x
Fig. 4.43. An elliptic cylinder with a parabolic lid
paraboloid B(x, y) = h − x2 − y 2 . The volume of integration in the y -

direction is confined by elliptic arcs
1/2 1/2
x2 x2
a(x) = −b 1 − 2 b(x) = b 1 − 2 ,
a a
in the x -direction by the constants
α = −a β=a.
• A subdivision into standard domains is necessary in this example. The
domain is a pyramid with the height h over a square with the side length
a in the corner of the first quadrant of the x - y plane (Fig. 4.44). The
pyramid is confined by four triangular areas from above. It is therefore
not a standard domain. The total domain has to be divided into four
subdomains. The four corresponding areas in the x - y plane have to be
characterised by an appropriate choice of the order of integration. Two of
these four subdomains are discussed explicitly:
Fig. 4.44. A domain in the form of a quarter pyramid
1. The subdomain, which borders on the x -axis, is limited by the plane

A(x, y) = 0 in the z -direction and by a plane through the x -axis and
the tip of the pyramid (a/2, a/2, h)
2h
B(x, y) = y.
a
The straight lines α(y) = y and β(y) = a − y are the limits in the x -
direction. The values a = 0 and b = a/2 limit the integration in the
y -direction.
2. The limits for the subdomain bordering on the y -axis are
2h
in the z−direction : A(x, y) = 0 , B(x, y) = x,
a
in the y−direction : a(x) = x b(x) = a − x ,
a
in the x−direction : α = 0 β= .
2
(Work out the limits for the remaining subdomains.)
A quarter of the volume of the pyramid can e.g. calculated with
a/2 a−x 2ax/h a/2 a−x
V 2h
=I= dx dy dz = dx dy x
4 0 x 0 0 x a
a/2 a/2
2h 2h ax2 2 1 ha2
= dx (ax − 2x2 ) = − x3 = .
a 0 a 2 3 0 4 3
This result could have been obtained by elementary means, results for the
first two examples definitely not.
Substitution is very helpful for the evaluation of triple integrals. The rule
is in this case: the integral

I= f (x, y, z) dxdydz
B
is transformed into
)
∂(x, y, z)
I=
f (x(u, v, w), y(u, v, w), z(u, v, w)) dudvdw
B ∂(u, v, w)

= F (u, v, w) dV
B
4.3 Integration 159
upon application of the transformation

x = x(u, v, w) y = y(u, v, w) z = z(u, v, w) .
The Jacobian of the transformation is

∂x ∂x ∂x

∂u ∂v ∂w

∂(x, y, z) ∂y ∂y ∂y
= .
∂(u, v, w) ∂u ∂v ∂w
∂z ∂z ∂z

∂u ∂v ∂w
The order of the coordinates does not matter in the present case as well. Only
the absolute value of the Jacobian is required. B is the image of the domain
B in the u, v, w -system. The image of a sphere (in Cartesian coordinates) is
a cuboid in spherical coordinates (Fig. 4.45).
z
θ
y ϕ
x r
Fig. 4.45. The image of a sphere in Cartesian coordinates is a cuboid in spherical

coordinates
The present rule is a direct extension of the rule for double integrals.
The domain B is subdivided with the aid of the surfaces u(x, y, z) = const.,
v(x, y, z) = const., w(x, y, z) = const. in order to prove the rule. The irregular
infinitesimal volume elements, which are formed by the infinitesimal surfaces,
are then approximated linearly.
Two important infinitesimal volume elements, which are used constantly,
should be kept in mind:
cylinder coordinates : dV = ρ dρdϕdz
spherical coordinates : dV = r2 sin θ drdϕdθ .

Spherical geometry calls for the use of spherical coordinates. The volume of
a sphere is evaluated as
R 2π π
VKu = dV = r2 dr dϕ sin θ dθ
B 0 0 0
3

R π 4π 3
= (2π) (− cos θ)0 = R .
3 3
The volume of a segment of a sphere with the opening angle 2α can be

calculated in a similar fashion (Fig. 4.46)
R 2π α
2π 3
Vseg = r2 dr dϕ sin θ dθ = R (1 − cos α) .
0 0 0 3
Fig. 4.46. Calculation of the volume of a segment of a sphere
An additional exercise is offered by the task to calculate the volume of an

elliptic cylinder with a parabolic lid. Suitable coordinates are
x = au cos v y = bu sin v z=w.
The result is

1
V = abπ h − (a2 + b2 ) .
4
Triple integrals are found in all areas of theoretical physics. A typical
mechanics problem is the calculation of properties of a body with a definite
surface given the density as an (arbitrary) function of the coordinates

ρ(x, y, z) f ür (x, y, z) ∈ B
ρ=
0 f ür (x, y, z) ∈ B .
The total mass and the coordinates of the center of mass can be calculated
with

1. The total mass M = B ρ(x, y, z) dV .
2. The coordinates of the centre of mass

1
X= x ρ(x, y, z) dV
M B

1
Y = y ρ(x, y, z) dV
M B

1
Z= z ρ(x, y, z) dV .
M B
4.3 Integration 161
These relations follow from an infinitesimal subdivision of the body. Summa-

tion over all the contribution leads in the limit to a triple integral, e.g.

M =⇒ dmi =⇒ ρ(i) dVi =⇒ ρ(x, y, z) dV .
i i
An example is the calculation of these quantities for a hemisphere with a

uniform distribution of mass ρ0 . The mass is

2π
M = ρ0 dV = ρ0 R3 .
3
The centre of mass coordinates are evaluated with spherical coordinates
r, ϕ, θ (with 0 ≤ θ ≤ π/2) and yield for a hemisphere
2π π/2
ρ0 R 2
X= r dr dϕ sin θ(r cos ϕ sin θ) dθ
M 0 0 0
2π π/2
ρ0 R 3
= r dr cos ϕ dϕ sin2 θ dθ .
M 0 0 0
The integral factorises into three ordinary integrals. The x and y -coordinates
of the center of mass vanish because of the symmetry expressed e.g. by
2π
cos ϕ dϕ = 0 .
0
One finds also
2π π/2
ρ0 R 3
Y = r dr sin ϕ dϕ sin2 θ dθ = 0
M 0 0 0
but
R 2π π/2
ρ0
Z= r3 dr dϕ sin θ cos θ dθ
M 0 0 0
π/2
ρ0 R4 1 ρ0 π 4 3
= (2π) sin2 θ = R = R.
M 4 2 0 M 4 8
The center of mass lies on the z -axis (a symmetry axis) in a distance of
(3 R/8) units above the origin.
Integrals with more than three variables of integration

I = ... f (x1 . . . xn ) dx1 . . . dxn
Bn
can be found in all branches of physics. Their discussion and methods for
their evaluation are fashioned after the simpler cases. The actual evaluation
might, however, prove to be quite taxing.
4.3.4 Addendum: Elliptic integrals
Elliptic integrals have been mentioned in Math.Chap. 4.3.1 as examples for

functions which are defined via an integral. These functions are briefly intro-
duced in this Addendum. The aim is merely to discuss the usual classification
and to list the different forms found in applications rather than a complete
discussion of their properties. Integrals of the form

dx R(x, a0 + a1 x + a2 x2 + a3 x3 ; k)
or

dx R(x, a0 + a1 x + a2 x2 + a3 x3 + a4 x4 ; k)
can in general not be represented by elementary functions if R is a rational

function of x and the square roots. The quantity k is a parameter which is
called the module.
The rational function R can be represented as a quotient of two polyno-
mials (the module is not of interest in the present discussion and therefore
repressed)
P (x, y(x))
R(x, y(x)) = .
Q(x, y(x))
The square roots of the polynomial of third or fourth order is denoted by
y(x) . The actual structure of the function becomes apparent if the quotient
is augmented as
yP (x, y)Q(x, −y)
R(x, y) = .
yQ(x, y)Q(x, −y)
The product of the two polynomials in the denominator is a polynomial in y 2
as it does not change under the transformation y → −y . This implies that it
is a polynomial in x. The square root does not appear any more. The product
in the numerator could be expanded. Every term of the form y 2n , with n an
even number, corresponds again to a polynomial in x . Terms of the form
y 2n+1 factorise into a polynomial times y . The expression for R finally takes
the form
P1 (x) + P2 (x)y P1 (x)
R(x, y) = = + P2 (x) .
y y
An integral with a polynomial P2 (x) is not really of interest as it can be
represented in terms of elementary functions. The discussion can centre on
integrals of the form

P1 (x) dx
1/2
.
[a0 + a1 x + a2 x2 + a3 x3 + a4 x4 ]
The function P1 (x) is a polynomial. In addition, it assumed that a3 and a4
do not equal zero at the same time.
4.3 Integration 163
All integrals of this kind can be classified with respect to three standard
integrals. They are known as elliptic integrals of the first, the second
or the third kind. The definition of the simplest forms (a form more or less
adapted to applications in physics) of these integrals are
ϕ
dϕ
first kind: F (ϕ, k) = 1/2 .
0 1 − k 2 sin2 ϕ
ϕ
1/2
second kind: E(ϕ, k) = dϕ 1 − k 2 sin2 ϕ .
0
ϕ
dϕ
third kind Πh (ϕ, k) = 1/2
0 1 − h sin2 ϕ 1 − k 2 sin2 ϕ
(h is a number from the interval −∞ < h < ∞ ).
The parameter k is restricted to the interval 0 ≤ k 2 ≤ 1 . It is usually written
in the form k 2 = sin2 α . A suitable substitution is required if numbers for
k , which are larger than 1 , occur. The substitution

1 1
ϕ = arcsin sin θ with dϕ = 1/2 cos θ dθ
k k 2 − sin2 θ
transforms e.g. an elliptic integral of the first kind (write κ = 1/k) into
ϕ θ
dϕ cos θ dθ
F (ϕ, k) = 1/2
= κ 1/2
0 1 − k 2 sin2 ϕ 0 cos θ 1 − κ2 sin2 θ
θ
dθ
=κ 1/2 .
0 1 − κ2 sin2 θ
Values of k 2 , which are larger than 1 can be handled in this fashion.
A form of the elliptic integrals, which is used often in mathematics, follows
from the substitution
dt
t = sin ϕ dϕ = √ .
1 − t2
The integrals then take the form
t ! "−1/2

F (t , k) = dt (1 − t 2 )(1 − k 2 t 2 )
0

1/2
t
1 − k2 t 2
E(t, k) = dt
0 1 − t 2
t ! "−1/2

Πh (t, k) = dt (1 − ht 2 )−1 (1 − t 2 )(1 − k 2 t 2 ) .
0
The original polynomials under the square root are apparent here.
The elliptic integral of the second kind is often rewritten as
t
(1 − k 2 t 2 )
E(t, k) = 2 2 1/2
dt
0 [(1 − t )(1 − k t )]
2
t
t 2 dt
= F (t, k) − k 2 1/2
[(1 − t 2 )(1 − k 2 t 2 )]
0
= F (t, k) − Ered (t, k)

with the reduced form
t
t 2 dt
Ered (t, k) = k 2 1/2
.
0 [(1 − t 2 )(1 − k 2 t 2 )]
A generalisation of the simplest variants are the integrals
! "−1/2

first kind : dt A1 + B1 t 2 A2 + B2 t 2
! "−1/2

second kind : dt t 2 A3 + B3 t 2 A4 + B4 t 2
−1 ! "−1/2

third kind : dt 1−t2 A5 + B5 t 2 A6 + B6 t 2 .
The parameters Ai , Bi are constants. These generalisations are also known

as elliptic integrals of the first to third kind. It is possible to show that every
cubic or quartic polynomial can be converted by a suitable substitution (with
minor restrictions) into the radicand indicated
substitution 2
a0 + a1 x + a2 x2 + a3 x3 + a4 x4 −→ Ai t + Bi Ak t2 + Bk .
Elliptic integrals with the upper limit ϕ = π/2 respectively t = 1 are
called complete elliptic integrals. For example, the function
π 1 dt
F (k) ≡ F ,k = 1/2
2 0 [1 − k 2 t2 ]
of the module k is a complete elliptic integral of the first kind.

There exist compilations of the properties of elliptic integrals (incomplete
or complete) – relations with other higher functions, special values, numerical
approximations, etc. – as well as tables of their values9 .
9
See list of literature.
5 Basic concepts of vector analysis
The analysis of functions of several variables deals with the situation that
exactly one number is assigned to each point of a domain in an n-dimensional
space
f
(x1 , . . . , xn ) −→ f (x1 , . . . , xn ) .
It is, however, also possible to assign an m-tuple of numbers to each point of
such a domain
{f1 ,...,fm }
(x1 , . . . , xn ) −−−−−→ {f1 (x1 , . . . , xn ), . . . , fm (x1 , . . . , xn )} .
These m functions can be interpreted as the components of a vector in an
m-dimensional representation space (with an orthogonal basis) so that the
set can be summarised as
m

f (x1 , . . . , xn ) = fk (x1 , . . . , xn )ek .

k=1
This relation defines a vector function in Rn which is also referred to as

a vector field. The term ’vector field’ is to be understood in the following
sense: the word ’vector’ refers to the m-tuple, the word ’field’ expresses the
assignment of this vector to the points of Rn . The assignment of a single
number to each point of space
m = 1 −→ f ((x1 , . . . , xn )
can be referred to as a scalar field in this connection.
This chapter explores, in the language of physics, the different aspects
offered by the differential and integral calculus of vector fields. In particu-
lar, the integral theorems, which are presented in the last section, are an
indispensable tool of theoretical physics.
5.1 Vector fields
The word ’vector field’ encompasses a wealth of possibilities. Some examples

from physics and mathematics are:
166 5 Basic concepts of vector analysis
1. The position vector of a point particle

r(t) = (x(t), y(t), z(t)) → m=3 n=1
corresponds to three functions of one variable which represent the para-
metric representation of a curve in (three-dimensional) space.
2. Force fields as e.g. the gravitational action of a mass M at the origin on
a mass m at the position r
x y z
G(x, y, z) = −γmM 3 , 3 , 3 → m=n=3
r r r
1/2
with r = x2 + y 2 + z 2
is represented by three functions of three variables. This is illustrated by
an appropriate arrow at each point of R3 .
3. A set of three functions of two variables
r(u, v) = (x(u, v), y(u, v), z(u, v)) → m=3 n=2.
This vector field can be interpreted as the parametric representation of
a surface in space. Explicit examples are e.g. spherical surfaces and tori.
The end point of a vector r traces the surface of a sphere for
⎫
⎬ 0 ≤ u ≤ 2π
x = R cos u sin v ⎪
y = R sin u sin v 0≤v≤π
⎪
⎭
z = R cos v R = const.
if the parameters u and v vary within the limits stated.
A torus is described by
⎫
⎬ 0 ≤ u ≤ 2π
x = (a + R cos v) cos u ⎪
y = (a + R cos v) sin u 0 ≤ v ≤ 2π
⎪
⎭
z = R sin v R, a → const.
The specification for the value u = 0
u=0: x = a + R cos v y=0 z = R sin v
is the parametric representation (see Fig. 5.1a) of a circle in the x - z
plane about the point (a, 0, 0) . This circle is rotated with (cos u, sin u)
about the z-axis so that a ring surface is obtained (Fig. 5.1b).
It is apparent that the cases with n = 3 m ≤ 3 play a special role
for applications in physics. The discussion will therefore mainly concentrate
on these cases. The first task of vector analysis is the transcription of the
various limiting processes (differentiation and integration) from the case of
scalar fields to the case of vector fields.
5.2 Differentiation of vector fields 167
(a) (b)
y
R
x
construction perspective view

Fig. 5.1. A torus
5.2 Differentiation of vector fields
The discussion of the differentiation of vector fields has to start pro forma
with the definition of the derivative with respect to one variable. The defini-
tion (and the nomenclature) of this limiting value is
∂
f (x1 , . . . , xn ) = f xi (x1 , . . . , xn )
∂xi
)
1
= lim [f (x1 , . . . , xi + hi , . . . , xn ) − f (x1 , . . . , xi , . . . , xn )]
hi →0 hi
m

1
= lim (fk (x1 , . . . , xi + hi , . . . , xn ) − fk (x1 , . . . , xi , . . . , xn )) ek .
hi →0 hi
k=1
Summation and the limiting procedure can be interchanged (and this is the
case of interest here) for a finite sum so that one obtains
m

∂fk (x1 , . . . , xn )
∂
f (x1 , . . . , xn ) = ek .
∂xi ∂xi
k=1
The (partial) derivative of a vector function corresponds – under the condition

that the components can be differentiated – to the vector function of the
partial derivatives of the components of this function.
Three quantities play a special role for the discussion of the differentiation
of vector functions. One of these quantities, the gradient operator has already
been introduced in Math.Chap. 4.2.3. The others are the operators with the
names divergence and the rotation of a vector field.
5.2.1 Gradient,divergence and rotation of vector fields
These basic concepts will be introduced in this section by an ad hoc definition

for the case m = n = 3 . Extensions to the case of higher dimensions are
possible and of use in theoretical physics. They will be addressed, together
with the question of visualisation, only in Math.Chap. 5.3. The action of
the gradient operator on a scalar function φ(x1 , x2 , x3 ) generates a vector
function
grad
3
∂φ
φ(x1 , x2 , x3 ) −→ f (x1 , x2 , x3 ) = ∇φ(x1 , x2 , x3 ) = ek .
∂xk
k=1
The components of the vector function are the partial derivatives of the scalar
function. The visualisation of the vector f = gradφ is: it is perpendicular to
the tangential plane in each point of a surface φ = const. (Fig. 5.2).
x3 grad φ
x2
x1 φ
Fig. 5.2. Illustration of the gradient vector of a function φ(x1 , x2 , x3 ) = const.
The divergence of a vector field f is defined by
∂f1 ∂f2 ∂f3

∂fk (x1 , x2 , x3 )
3
div f = + + = .
∂x1 ∂x2 ∂x3 ∂xk
k=1
This differential operation associates a scalar function with the vector func-
tion f
div
f −→ Φ = div f .
The rotation of a vector field f is defined by the following equation

∂f3 ∂f2 ∂f1 ∂f3
rot f (x1 , x2 , x3 ) = − e1 + − e2
∂x2 ∂x3 ∂x3 ∂x1

∂f2 ∂f1
+ − e3 .
∂x1 ∂x2
5.2 Differentiation of vector fields 169
The application1 of the operator rot on a vector function f yields a vector

function rotf
rot
f −→ g = rotf .
The structure of the operator and the sequence of the indices is best repre-
sented by the symbolic determinant

e1 e2 e3

∂ ∂ ∂ .
rotf (x1 , x2 , x3 ) =
∂x ∂x2 ∂x3
1
f (x , . . .) f2 f3
1 1
’Evaluation’ of the determinant leads to the explicit definition. Alternatively,

the notation in terms of sums with the Levi-Civita symbol is often used for
formal manipulations

∂fj
rotf (x1 , x2 , x3 ) = ijk ek .
∂xi
ijk
A definite analogy can not be overlooked. The formation of the diver-

gence corresponds to the scalar product of two vectors, the formation of
the rotation to the vector product. This analogy is expressed formally with
the introduction of the differential operator ∇ (with vector character, see
Math.Chap. 4.2.5), the nabla operator
3
∂
∇= ek .
∂xk
k=1
This operator can be used to represent the three differential operators.

• The construction of the gradient of a scalar function
grad φ(x1 , x2 , x3 ) = ∇φ(x1 , x2 , x3 )
corresponds to the multiplication of a vector with a scalar (from the right
hand side !).
• The divergence of a vector function
div f (x1 , x2 , x3 ) = ∇ · f (x1 , x2 , x3 )
corresponds to the scalar product of two vectors. The details are
3 3

∂
∇ · f (x1 , x2 , x3 ) = ei · ek fk (x1 , x2 , x3 )
i=1
∂xi
k=1

∂fk (x1 , x2 , x3 )
∂fk (x1 , x2 , x3 )
= (ei · ek ) = .
∂xi ∂xk
ik k
1
The name curl is often used in the Anglo-Saxon literature instead of rot .
The order of the ’vectors’ can obviously not be interchanged in this ’scalar
product’.
• The rotation of a vector function
rotf (x1 , x2 , x3 ) = ∇ × f (x1 , x2 , x3 )
is the analogue of the vector product, in detail

∂fj
∂fj
∇ × f (x1 , x2 , x3 ) = (ei × ej ) = ijk ek .
ij
∂xi ∂xi
ijk
The application of the nabla operator can be extended to a complete nabla

calculus. The most important rules of the nabla calculus are2 :
(1) One set of rules is concerned with the application of the nabla operator
to products of functions as e.g.
(a) ∇ · (φ(r) f (r)) = (∇φ) · f + φ (∇ · f )
or in semi verbal form
div (φ f ) = gradφ · f + φ div f .
(b) ∇ · (f (r) × g(r)) = g · (∇ × f ) − f · (∇ × g)
or
div (f × g) = g · rotf − f · rot g .
The correctness of these rules can be demonstrated by explicitly writ-
ing out the terms and using the rules for partial differentiation.
(2) A second set of rules deals with multiple applications of the nabla oper-
ator as e.g.
(a) The Laplace operator, which has been introduced in Math.Chap. 4.2.3,
can be written as
Δφ(r) = ∇ · (∇φ) = div (grad φ) .
Application of the inner nabla operator gives the vector function
grad φ which is turned into a scalar function through the applica-
tion of the divergence operator. The detailed argument involves the
steps

∂ ∂φ
Δφ(r) = ei · ek
∂xi ∂xk
ik

∂2φ
∂2φ
= (ei · ek ) = ,
∂xi ∂xk i
∂ 2 xi
ik
so that finally the operator

∂2
Δ=
i
∂ 2 xi
acts on the scalar function φ .
2
A complete list of these rules is found in most mathematical tables, see list of
literature.
5.3 Integration of vector functions 171
(b) The combination ∇ × (∇φ(r)) yields a zero vector for every scalar
vector function which can be differentiated twice
∇ × (∇φ(r)) = rot (grad φ) = 0 .
(c) The combination ∇ · (∇ × f (r)) generates the number zero for every
vector function which can be differentiated twice
∇ · (∇ × f (r)) = div (rotf ) = 0 .
A physical interpretation of the application of the operators divergence
and rotation on a vector function can be obtained with the aid of the inverse
operation, the integration of vector functions. This topic is the theme of the
next section.
5.3 Integration of vector functions

The first class of integrals, which is of particular importance for theoretical
mechanics, is the class of line integrals.
5.3.1 Line integrals

The definition of this type of integration – in a more mathematically oriented
form – involves the statements
• A curve K in R3 is given by the parametric representation (Fig. 5.3)

3
r(t) = ei xi (t) ta ≤ t ≤ t b .
i=1
x3
tb
ta
x2
x1
Fig. 5.3. Curve in R3
The curve is called smooth if the functions xi (t) are differentiable.

• A vector function

f (x1 , x2 , x3 ) = fi (x1 , x2 , x3 ) ei
i
is defined in a spatial domain G, which contains the curve.
The definition of the line integral of the vector function is then

I= f · dr = (f1 dx1 + f2 dx2 + f3 dx3 )
K K
tb
= fi (x1 (t), x2 (t), x3 (t)) ẋi (t) dt .

ta i
Note that the result is a number, a scalar.

A list of rules for line integration should contain the following items:

dt
1. The substitution t = t(τ ) with dt = dτ does not change the value
dτ
of the line integral
tb
τb
dxi (t) dxi (t(τ ))

I= fi (t) dt = fi (t(τ )) dτ .
ta i
dt τa i
dτ
One of the consequences of this rule for physics is the following: the value
of the work for the motion of a point particle in a force field along the
curve from A to B is independent of the nature of the movement. The
same number is obtained for the actual motion of the point particle (t →
time) or any other mode of travelling (τ ) on the same curve.
2. A number of standard rules are

(f + g) · dr = f · dr + g · dr
K K K

f · dr = − f · dr .
−K K
The second equation states that the sign of the line integral is changed if
the line segment is traversed in the opposite direction. This rule follows
from
tb ta
F (t) dt = − F (t) dt .
ta tb
3. The following relations hold for two curves which are joined together

f · dr = f · dr + f · dr .
K1 +K2 K1 K2
This rule implies that line integrals can not only be defined for smooth
curves but also for arbitrary curves which are joined together.
4. An extension of the third rule are the decomposition theorems. The fol-
lowing theorem is particularly important in the discussion to follow: A
vector function f is defined in a domain G of R3 . A surface F , which
is bordered by a closed curve K, exists in G . The curve K is traversed
in a definite sense. The surface is now decomposed by a set of bordered
subdomains (Fig. 5.4). The following theorem holds in this case
x3
x2
x1
Fig. 5.4. Decomposition theorem for line integrals
2 2 2
f · dr = f · dr + . . . + f · dr .
K K1 Kn
The proof of this statement is obtained by remarking that each of the

curves within G which is used for the decomposition, is traversed twice
but in opposite directions. The contributions of the dividing lines cancel
and the contribution of the exterior border K remains.
This relation 3introduces also the standard notation for integrals over
closed curves K (with or without an arrow, which indicates the sense of
the circulation).
One point is particularly important in the application of line integration:
Which conditions guarantee that the line integral between two points is inde-
pendent of the path of integration? The question can also be expressed with
the aid of an equation: Which are the conditions so that the value of a line
integral between two points A and B

f · dr = f · dr
K1 (A,B) K2 (A,B)
is independent of the path of integration (provided the paths are in the do-
main of definition of the vector function) ?
The following example (Fig. 5.5) demonstrates that the value of a line
integral need not necessarily be independent of the path. The example is the
evaluation of the line integral with the vector function

f = 2xy, y 2 , 0
along the following paths in the x - y plane
K1 x=t y=0 0≤t≤1
x=1 y =t−1 1≤t≤2
K2 x=t y=t 0≤t≤1.
The value of the path K1 from the position (0, 0) to the position(1, 1) parallel
to the coordinate axes is
(1,1)
K2
K1
(0,0) t
Fig. 5.5. Path dependence of line integrals
1 2 2
1
I1 = f · dr = (0) dt + (0 + (t − 1)2 ) dt = (t − 1)2 dt = .
K1 0 1 1 3
By contrast, the integral along the diagonal in the first quadrant (path K2 )
yields the result
1
I2 = f · dr = (2t2 + t2 ) dt = 1 .
K2 0
The question concerning the independence of a line integral from the path
can be answered by the following two arguments:

1. It is quite simple to prove that the integral K f ·dr is independent of the
path if the vector function is the gradient of a scalar function f = ∇φ .
The assumption allows the statement
tb
∂φ dx1 ∂φ dx2 ∂φ dx3
f · dr = + + dt .
K ta ∂x1 dt ∂x2 dt ∂x3 dt
Rewrite the expression in the bracket with the chain rule and integrate
to find
tb
dφ
= dt = φ(tb ) − φ(ta )
ta dt
or in detail
= φ (x1 (tb ), x2 (tb ), x3 (tb )) − φ (x1 (ta ), x2 (ta ), x3 (ta )) .
The line integral depends only on the values of the function φ at the
starting point and the endpoint of the path. It is therefore path indepen-
dent.
2. The proof of the
inverse statement
From K(A,B) f · dr = φ(B) − φ(A) follows f = ∇φ
is more involved. The assumption can be used to write (see Fig. 5.6)

f · dr = φ(x1 , x2 , x3 ) − φ(A)
K

f · dr = φ(x1 , x2 , x3 + h) − φ(A) .
K+Kh
An additional, infinitesimal path Kh , which is chosen to run parallel to

the x3 -axis, has been added in the second line.
x3 Kh
x2
x1
Fig. 5.6. Illustration for the proof of the relation between line integration and
gradient formation
The definition of the partial derivative in the x3 -direction

)
∂φ 1
= lim (φ(x1 , x2 , x3 + h) − φ(x1 , x2 , x3 ))
∂x3 h→0 h
)
1
= lim f · dr
h→0 h K
h
and the parametric representation of the path Kh

x1 (t) = x1 x2 (t) = x2 x3 (t) = x3 + t (0 ≤ t ≤ h)
gives

∂φ 1 h
= lim f3 (x1 , x2 , x3 + t)dt .
∂x3 h→0 h 0
Application of the mean value theorem of integral calculus allows the step
)
1
= lim (hf3 (x1 , x2 , x3 + c h)) (0 ≤ c ≤ 1)
h→0 h
= f3 (x1 , x2 , x3 ) .
A corresponding argument can be given for an infinitesimal path parallel
to the x1 - or the x2 -axis.
The statements

K
f · dr is path independent and f = ∇φ
are completely equivalent. The first statement claims that a special class
of vector functions (those which can be represented as the gradient of
a scalar function) exists for which the line integral is path independent.
The second statement confirms that this is the only class of functions
with path independence.
These two basic statements can be cast into a different form.
3. The statement

f · dr = f · dr
K1 K2
implies
2
f · dr = f · dr + f · dr = 0 .
K2 −K1
The line integral along a closed curve vanishes if it is path independent.

4. The rotation of a differentiable vector function of the form f = ∇φ vanishes
according to the rules of nabla calculus
∇ × (∇φ) = 0 .
The rotation of a vector functions, which leads to a path independent line
integral, vanishes
rotf = 0 .
The arguments can be combined in the following fashion. There exist four
equivalent statements

K
f · dr = φ(B) − φ(A) f = ∇φ
3
f · dr = 0 rotf = 0 .
The validity of one of the statements implies the validity of the other three.
If it has e.g. been verified that the rotation of a vector field f vanishes in a
domain G , then the statements that the line integral along a closed curve
vanishes or the line integral is path independent or f can be represented
as the gradient of a scalar function follow directly. This property of path
independence is of interest for the discussion of the law of energy conservation
in mechanics or for the foundation of electrostatics.
The next class of integrals with vector functions that has to be considered
are the slightly more intricate surface integrals.
5.3.2 Surface integrals with vector functions
The integrals, which are discussed in this section, can be described in the
following manner: specified are an arbitrary surface S in space and a vec-
tor function f which is defined in a domain G. The domain contains the
surface. The surface is subdivided into infinitesimal surface elements, which

are characterised by a magnitude and an orientation. These elements can be
represented by a vector dS . The length of this vector is a measure of the
size of the infinitesimal element. The direction of the vector is perpendicular
to the surface elements and indicates an orientation, which will be detailed
immediately. The scalar products
f (at the position of dS) · dS
for every infinitesimal surface element are then added (in the sense of the
usual limiting process) to form the surface integral (Fig. 5.7)

I= f · dS .
S
x3 dS
f
S
x2
x1
Fig. 5.7. Definition of a surface integral
The question of the orientation of the infinitesimal elements is best ex-

plained by a specific example, a spherical surface with radius R . A standard
subdivision of this surface uses a grid of parallels and meridians, in mathe-
matical terms a grid with spherical coordinates (Fig. 5.8a). The size of an
infinitesimal element is (Fig. 5.8b)
dS = (Rdθ) (Rdϕ sin θ) = R2 sin θ dθdϕ .
The direction of the normal in the outward direction is defined as the direction
of the elements, that is
dS = dS er .
There is no profound thinking behind this definition. The choice ’in the in-
ward direction’ would have been as acceptable.
The distinction between interior and exterior is not possible for an open
surface (a spherical shell or any section of a surface in space). The orientation
is determined in this case in the following fashion: the surface is endowed with
a boundary curve. The orientation of the surface is chosen according to the
sense of circulation on the boundary. This choice determines the orientation
of the boundary curves of the infinitesimal elements with the aid of the de-
composition theorem. The direction of dS corresponds to the direction of the
(a) (b)
dS R d ϕ sinθ R dθ
R dϕ
general view definition of an infinitesimal

surface element
Fig. 5.8. Surface integrals on a sphere
dS
x3
x1 x2
Fig. 5.9. Decomposition of a surface in space
surface normal. This direction is obtained (see Fig. 5.9) from the circulation
according to the right hand rule (or the rule of the screw). A simpler rule is
the right hand grip rule: the four fingers of the right hand indicate the sense
of circulation, the thumb points then in the direction of the (normal) vector.
It should be said, however, that not all surfaces in space can be oriented. A
popular counterexample is the strip of Moebius.
The surface integral can be calculated directly with this descriptive defi-
nition for simple situations. An example is the surface integral with a central
field f (r) = f (r)er on a spherical surface f (r) = f (r)er

I = f (r) · dS = f (R) dS
Sp Sp
(as f and dS point in the same direction)

2π φ
= R2 f (R) dϕ sin θdθ = 4πf (R)R2 .
0 0
In particular, the surface of the sphere can be calculated directly with

f (r) = 1

er · dS = 4πR2 .
Sp
One of the application of surface integrals can be recognised here: The surface
is smoothed out and charted to scale if it is traced with a suitable vector field
(unit vector in the direction of the normal).
The evaluation of surface integrals is more complicated if the vector field
is not a central field even if the surface is spherical. A decomposition of the
surface element dS in terms of Cartesian components has to be used in this
case
dS = dS er = R2 sin θ (cos ϕ sin θ, sin ϕ sin θ, cos θ) dϕdθ
so that the surface integral of a vector function f = (f1 , f2 f3 ) takes the form

I = f · dr
Sp

*
= R2 dϕdθ f1 (x(R, ϕ, θ), y(R, ϕ, θ), z(R, ϕ, θ)) cos ϕ sin2 θ
+f2 (x(R, ϕ, θ), y(R, ϕ, θ), z(R, ϕ, θ)) sin ϕ sin2 θ
+f3 (x(R, ϕ, θ), y(R, ϕ, θ), z(R, ϕ, θ)) sin θ cos θ} .

Now three double integrals have to be evaluated.
For example, the surface integrals with the simpler functions with only
one component (Fig. 5.10a, b):
f 1 = (x, 0, 0) f 2 = (x2 , 0, 0)
(a) (b)
z z
ds ds ds ds
f1 f1
f2 f2
x x
with the function f1 with the function f2

Fig. 5.10. Details for the evaluation of surface integrals on a hemisphere
yield for an integration over a hemisphere

2π π/2
I1 = f 1 · dS = R 3 2
dϕ cos ϕ sin3 θdθ
H 0 0

3 2 2
= R [π] = πR3
3 3
2π π/2
I2 = f 2 · dS = R4 dϕ cos3 ϕ sin4 θdθ
H 0 0

3
= R4 [0] π =0.
16
Opposite surface elements contribute with the angle cos α(dS, f ) on one side
and with
cos(π − α)(dS, f ) = − cos α(dS, f )
on the other so that the contributions of the front (the positive x- direction)
and of the back of the hemisphere cancel.
The descriptive definition used so far has to be translated into an ex-
plicit algorithm for the discussion of the general situation. An arbitrary open
surface with oriented boundary can be projected onto the three coordinate
planes. The domains which are obtained in this fashion are subscripted with
the cyclic complement. The domain B1 is e.g. the projection onto the 2 - 3
plane. The orientation of the boundary of S is transferred to the orientation of
the boundaries of the projections (Fig. 5.11). This projection of an infinites-
x3
B1
B2
x2
x1
B3
Fig. 5.11. Projection of a surface in space on the coordinate planes
imal surface element dS leads to three irregular surfaces in the coordinate

planes (Fig. 5.12). The projection corresponds exactly to the decomposition
of the vector dS into Cartesian components
dS = dS1 e1 + dS2 e2 + dS3 e3 .
The decomposition of the complete integral is therefore
x3
x2
x1
Fig. 5.12. Projection of a surface in space on the coordinate planes: details

f · dS = f1 dS1 + f2 dS2 + f3 dS3 .
S B1 B2 B3
A surface integral of a vector function with a curved surface in space can be

transcribed into three domain integrals (see Math.Chap. 4.3.2) over planar
domains in the coordinate planes. The domain integrals can (for bounded
integrands) be subdivided in an arbitrary manner. A rectangular subdivision
dS1 = dx2 dx3 dS2 = dx1 dx3 dS3 = dx1 dx2
can be used instead of the irregular subdivision which originates from the
projection. This allows the decomposition

f · dS = f1 (x1 , x2 , x3 ) dx2 dx3
S B1

+ f2 (x1 , x2 , x3 ) dx1 dx3 + f3 (x1 , x2 , x3 ) dx1 dx2 .
B2 B3
Two additional points have to be accounted for:

• The integration variables in the integral over B1 are x2 and x3 . The points
(x1 , x2 , x3 ) define the surface S . This implies that x1 is no independent
variable but a function of x2 and x3 in the first of the three integrals.
If the surface can be described by an implicit equation
S(x1 , x2 , x3 ) = 0
(e.g. x21 + x22 + x23 − R2 = 0 for a spherical surface) ,
resolution with respect to x1 gives
1/2
x1 = x1 (x2 , x3 ) e.g. x1 = ± R2 − x22 − x23 .
A similar statement is possible for the other two domain integrals. The
complete form of the surface integral in Cartesian decomposition is there-
fore

f · dS = f1 (x1 (x2 x3 ), x2 , x3 ) dx2 dx3
S
B1
+ f2 (x1 , x2 (x1 x3 ), x3 ) dx1 dx3
B2
+ f3 (x1 , x2 , x3 (x1 x2 )) dx1 dx2 .
B3
This expression reduces the calculation of surface integrals of a vector

function and an arbitrary surface in space completely to the calculation
of domain integrals (of functions of two variables). A small difficulty still
remains, however.
• The projection onto the coordinate planes can cause a double covering.
For example: the image of a hemisphere (with x3 ≥ 0) is a halfcircle for a
projection onto the 2 - 3 plane. The back and the front of the hemissphere
can only be distinguished if the orientation of the quarter spheres is taken
into account (Fig. 5.13). The vectors dS 1 are
x3
V H x2
x1
Fig. 5.13. Double covering
dS 1 = dx2 dx3 e1 respectively dS 1 = −dx2 dx3 e1

for the front respectively the back part of the hemisphere. Each of the
three domain integrals must be split according to double covering under
these circumstances. In addition an appropriate choice of the form of x1 =
x1 (x2 , x3 ) is required.
The result obtained with these arguments for the example f 1 = (x1 , 0, 0)
with integration over a hemisphere is

2 1/2
f 1 · dS = R − x22 − x23 dx2 dx3 (front QS)
H B1
1/2
− − R2 − x22 − x23 dx2 dx3 (back QS)
B1

1/2 2π 3
=2 R2 − x22 − x23 dx2 dx3 = R .
B1 3
The integrand for both parts of the hemisphere is

+(R2 − x22 − x23 )
for the example f2 = (x21 , 0, 0) . The result is therefore (as calculated
before)

f 2 · dS = 0 .
H
The problem of double covering and the necessary discussion of the details
can be avoided by evaluation of the domain integrals in adapted, rather than
Cartesian, coordinates. This is achieved by characterising the surface in terms
of a parametric representation
x1 = x1 (u, v) x2 = x2 (u, v) x3 = x3 (u, v)
a≤u≤b α≤v≤β.
Application of the substitution rule for each of the domain integrals results
in

∂(x2 , x3 )
f · dS = f1 (x1 (u, v), x2 (u, v), x3 (u, v))
S B(S) ∂(u, v)
∂(x3 , x1 )
+f2 (x1 (u, v), x2 (u, v), x3 (u, v))
∂(u, v)

∂(x1 , x2 )
+ f3 (x1 (u, v), x2 (u, v), x3 (u, v)) dudv .
∂(u, v)
The integration is over the image of S which follows from the substitution.
One point has to be observed though: the Jacobians have to be used without
the absolute value. The order of the coordinates in the terms of the integral
is cyclic. The orientation and the covering are resolved automatically in this
fashion (with the exception of an overall sign, which follows from the order of
the coordinates u and v). The final form also demonstrates that the surface
integral (with two integration variables) is a consequent extension of the line
integral (with one integration variable)
The general discussion can be illustrated once more by the two examples
with the functions f 1 and f 2

∂(x2 , x3 )
f i · dS = fi,1 (x1 (θ, ϕ)) dθdϕ
H B(H) ∂(θ, ϕ)
which have been introduced above. The relevant Jacobian is

∂x2 ∂x3

∂(x2 , x3 ) ∂θ ∂θ R sin ϕ cos θ −R sin θ
= = = R2 cos ϕ sin2 θ .
∂(θ, ϕ) ∂x2 ∂x3 R cos ϕ sin θ 0
∂ϕ ∂ϕ
Insertion of the parametric representation and the Jacobian gives

π/2
2π
f 1 · dS = dϕ dθ (R cos ϕ sin θ) R2 cos ϕ sin2 θ
H 0 0
respectively
π/2
2π
f 2 · dS = dϕ dθ R2 cos2 ϕ sin2 θ R2 cos ϕ sin2 θ .
H 0 0
The image of the half sphere is a rectangle. The integrals are the same as
those found in the elementary approach.
The evaluation via the parametric representation offers a certain automa-
tism which can also be applied in more complicated situations. The next
section addresses, however, an alternative approach which is more efficient in
most cases.
5.3.3 The integral theorems of Gauss and Stokes
The representation of vector fields with the aid of field patterns constitutes
a good introduction to this topic. The field lines are the tangential curves,
which are endowed with a direction, on neighbouring field vectors. The field
lines of the simple example of the gravitational field of a mass M at the origin
r
G = −γM 3
r
are rays which are directed radially towards the origin from all directions
(Fig. 5.14). The gravitational field of two equal masses, which are placed in
Fig. 5.14. Field with spherical symmetry
a symmetric fashion on one of the coordinate axes, is

)
(r − a) (r + a)
G(r) = G1 + G2 = −γM + .
|r − a|3 |r + a|3
The geometry and the field pattern are indicated in Figs. 5.15a,b. The masses
(a) (b)
r+a r r+a
M a a M
geometry of the distribution field pattern

of the mass points
Fig. 5.15. The dipole field
dominate the pattern in the vicinity of their position. The field lines are
strongly modified in the region between the two masses. The lines adapt
themselves to the parting plane. The complete field pattern is rotationally
symmetric with respect to the axis on which the masses are placed. The
pattern is an example of a dipole field.
The electric field of a point charge q (replacing a point mass) at the origin
can be repulsive or attractive with field lines emanating from or entering into
the origin
r
E = −γ q 3 (q = ±) .
r
Another example is the vector field

y x
B= − 2 , , 0 .
x + y 2 x2 + y 2
This field possesses translational symmetry with respect to the z-axis. The
same pattern is obtained for each plane z = const. The field lines are concen-
tric circles about the z -axis (Fig. 5.16). The field represents (up to a constant
factor) the magnetic field of a thin current carrying wire along the z -axis.
The surface integrals for the different fields

G · dS E · dS B · dS
S S S
are named the flux of the fields through the surface S, in particular the
gravitational flux, the electric flux, the magnetic flux, etc. This terminology
originates from hydrodynamics, for instance in the form of the velocity field
of a stationary fluid flow (Fig. 5.17). A velocity field v(r) is associated with
every infinitesimal volume element. The flux S v would be a measure of the
amount of fluid which flows through the surface per unit time if the velocity
Fig. 5.16. Field pattern of a simple magnetic field
v(r)
Fig. 5.17. Hydrodynamical flow
is uniform and if a planar surface is placed perpendicularly with respect to

the direction of the flow. The measure corresponds to the scalar product S ·v
if the the planar surface is inclined with respect to the direction of the flow.
The velocity flux is equal to the integral

v(r) · dS
S
if the flow is not uniform and/or the surface is not planar (Fig. 5.18). An
equivalent quantity describing the ’strength of the flow’ is needed if this the
(a) (b)
ds
v v(r)
Fig. 5.18. Definition of the flux
idea of a flux is applied to the discussion of arbitrary vector fields. Graphically,

such a measure is supplied by the number of field lines (normalised to a
given mass, charge, current) which pass through the surface. The flux of a
gravitational field of a point particle at the origin through a spherical surface
of radius R about the origin is

1
G · dS = −γM 2 er · dS = −4πγM .
R
Sp Sp
The result is independent of the radius of the sphere. It is therefore valid for
any spherical surface about the origin. This corresponds to the interpretation
suggested above. The same number of field lines (however normalised) passes
through all spherical surfaces about the origin (Fig. 5.19a).
The same result can be expected for any closed surface S(or) of arbitrary
shape about the origin

G · dS = −4πγM .
S(or)
This assertion would follow from the interpretation of the concept of flux.
The same number of field lines, that pass through each of the spherical sur-
faces, passes through any arbitrary closed surface S(or) (Fig. 5.19b). The
(a) (b)
V V
through a sphere surrounding through an arbitrary surface

the mass surrounding the mass
Fig. 5.19. Flux of a point mass in the origin
expectation for an arbitrary closed surface S(nor) , which does not contain
the origin with a point mass, would be

G · dS = 0 .
S(nor)
The number of field lines entering into the volume enclosed by this surface
equals the number that leave this volume (Fig. 5.20). The proof of these
statements is provided by the theorem of Gauss which is discussed below.
Fig. 5.20. Flux through a surface not containing a point mass
The language used in this connection is: The mass point is referred to as
a source of the field. More generally sources and sinks3 have to be distin-
guished (Figs. 5.21a,b). Field lines emanate from a source, they enter into a
(a) (b)
source sink
Fig. 5.21. Illustration of sources and sinks
sink. The flux through a closed surface about a source or a sink is not equal
to zero. The flux through a closed surface, which does not contain a source
or sink, vanishes.
A quantitative description of this situation is the contents of the theorem
of Gauss which is also known as divergence theorem. The theorem reads
as follows
3
In some instances (e.g. for semiconductors) the word ’drain’ is used instead of
’sink’.
A differentiable vector function f is defined in a domain V of R3 .

The shape of the domain is such that any straight line parallel to the
coordinate axes enters and leaves the domain only once. The following
relation between a volume and a surface integral

(div f (x1 , x2 , x3 )) dx1 dx2 dx3 = f (x1 , x2 , x3 ) · dS
V S(V)
is valid under these assumptions.

The theorem states that a volume integral over the divergence of a vector
field is equal to the surface integral of this function over the boundary of
the volume (Fig. 5.22). The closed surface S(V) is (by standard agreement)
oriented in such a way that the infinitesimal vectors dS point outwards.
dS
V
S(V)
Fig. 5.22. Illustrating the theorem of Gauss
The restriction for the volume indicates that the domain should be convex
(Fig. 5.23).
(a) (b)
convex not convex

Fig. 5.23. Classification of volumina
The proof of the theorem proceeds in the following fashion: write the left
side of the central relation as

∂f1 ∂f2 ∂f3
div f (x1 , x2 , x3 ) dV = + + dx1 dx2 dx3 .
V ∂x1 ∂x2 ∂x3
The domain of integration of the first term

∂f1
T1 = dx1 dx2 dx3
V ∂x1
can be stated directly if the boundary of the volume is divided into a lower
and upper surface with respect to the x1 -direction (Fig. 5.24)
lower : x1 = B(x2 , x3 ) upper : x1 = D(x2 , x3 ) .
This statement uses the restriction to a complex volume. The triple integral
can then be written as
D(x2 x3 )
∂f1
T1 = dx2 dx3 dx1 .
B1 B(x2 x3 ) ∂x1
The remaining double integral over x2 and x3 has to be evaluated by projec-
tion of the volume V into the 2 - 3 plane. This domain (B1 ) does not have
to be specified further. The integration over the coordinate x1 is trivial. The
primitive is f1 so that the result

T1 = dx2 dx3 {f1 (D(x2 x3 ), x2 x3 ) − f1 (B(x2 x3 ), x2 x3 )}
B1
= f1 · dS1
S(V )
is obtained. This is exactly the first term of the surface integral (for a Carte-
sian subdivision and double covering of the domain B1 ) on the right hand side
of the equation. A corresponding argument for the terms T2 and T3 would
complete the proof.
x3
B1
B D
x1
Fig. 5.24. Concerning the proof of the Gauss theorem
It is relatively easy to extend the theorem to non-convex domains. A

volume with a neck can be divided into two convex subdomains by means of
an interface (Fig. 5.25a). The theorem is valid for the subdomains

div f dV = f · dS
V1 S(V1 )

div f dV = f · dS .
V2 S(V2 )
Addition of these relations yields the volume integral over the complete vol-
ume on the left hand side. The contributions of the two sides of the interface
cancel on the right hand side as they are oriented in opposite directions in
each point (Fig. 5.25b). There remains the surface integral over the boundary
(a) (b)
V1 V2
V1 V2
an example contributions on the interface

Fig. 5.25. Non-convex volumina
of the total volume.

The practical aspect of the Gauss theorem is the option to change from a
volume integral to a surface integral or vice versa. Depending on the situation,
it might be simpler to calculate this or the other integral. For instance, the
problem: calculate the surface integral with the vector field f = (x1 , 0, 0)
for an ellipsoid around the origin with the semi axes a, b, c, is tedious if the
surface integral

x1 e1 · dS
S(Ell)
is evaluated. The theorem of Gauss leads directly to

4
x1 e1 · dS = dV = πabc
V (Ell) 3
S(Ell)
as div f = 1 .
The theoretical aspect of the theorem is the possibility to obtain a better

understanding of the importance of the concept of divergence of a vector
field. This point will be approached first in terms of an explicit example, a
central field with a point source
1 r
f ce = cer = c 3 .
r2 r
The surface integral of this field for a spherical surface about the origin has
the value

f ce · dS = 4πc .
Sp
The divergence of this field is also needed for a discussion of the theorem
)
∂ x1 ∂ x2 ∂ x3
div f ce = c + + .
∂x1 r3 ∂x2 r3 ∂x3 r3
It can be evaluated with the relation
∂ xi 1 3x2i
= − for r = 0 (i = 1, 2, 3)
∂xi r3 r3 r5
which is valid for all points of space with the exception of the origin. The
divergence of the central field for these points turns out to be
)
3 3(x21 + x22 + x23 )
div f ce = c − =0.
r3 r5
It vanishes for all points except the origin. No statement can be made for
this point for the time being but the fact that the surface integral does not
vanish, calls for further discussion.
First of all it can be stated that this result allows the conclusion that an
integral over a closed surface S(nor), which does not contain the origin (with
the ponit source), satisfies the relation

f ce · dS = div f ce dV = 0 .
S(nor) V (S)
Surface integrals, which do not contain the source, vanish because the diver-
gence of the field vanishes in the relevant regions of space. This result can be
used to demonstrate that any surface integral, which contains the source, has
the same value. An arbitrary volume about the origin can be decomposed into
a spherical volume about the origin and a number of part volumina which do
not contain the origin (Fig. 5.26). The contributions of the interfaces cancel
due to the orientation. Therefore follows

f ce · dS = f ce · dS + f ce · dS
S(or) Sp(or) n S(nor)n
Fig. 5.26. Decomposition of an arbitrary surface about a point mass
so that the result

f ce · dS = f ce · dS = 4πc
S(or) Sp(or)
is obtained. Surfaces, which do not contain the origin, do not contribute.

The situation at the origin can be discussed as follows: the volume integral
containing the origin has a finite value not equal to zero

(div f ce ) dV = f ce · dS = 4πc
VS(or) S(or)
even though div f ce vanishes in all points except the origin. This implies that
div f ce can not vanish for r = 0
lim (div f ce ) = 0 .
r→0
An exceptional situation must exist at the origin as the contribution to the

integral originates from a single point. A new class of mathematical objects
is indeed encountered in this situation, which is expressed by
div f ce −→ 4πc δ(x1 )δ(x2 )δ(x3 ) for r −→ 0 .
The ’function’ δ(x) on the right hand side is called (Dirac’s) delta function.
It is, however, not a function but a distribution4 . A naive characterisation
of this distribution, which does not meet mathematical standards in any way,
would state: the object δ(x) takes the value zero at every point of space except
for x = 0 . It is singular in this point (∞) in a way so that the integral
∞
δ(x)dx = 1
−∞
has the value 1 (Fig. 5.27). Such objects can obviously only be defined rigor-
ously by an extension of the concept of functions.
4
Distributions are discussed in Vol. 2, Math.Chap. 1.
oo
Fig. 5.27. Naive representation of the δ -function
The relation
)
div f ce = 0 f ür r = 0
div f ce = 4πcδ(r)
div f ce = 0 f ür r = 0
with δ(r) = δ(x1 )δ(x2 )δ(x3 ) is valid, independent of this sideline, for a point
source (a central field). The point with the source (or sink) – that is a point
mass or a point charge – is characterised by div f ce = 0, the remaining
(empty) space by div f ce = 0 . The divergence of a vector field describes in
this sense the distribution of the sources of the field in differential form. The
integral form can be obtained with the Gauss theorem in terms of surface
integrals over closed surfaces. All surface integrals, which enclose a point like
source, have the value 4π c, that is solid angle times ’ strength of the source’.
This statement can be expressed in a different form: Enclose a point of the
space carrying a (vector) field in an infinitesimal volume ΔV . The divergence
theorem states

div f ce dV = f ce · dS
ΔV S(ΔV)
or expressed in terms of a limiting value

⎧ ⎫
⎨ 1 ⎬
div f ce = lim f ce · dS .
ΔV →0 ⎩ ΔV ⎭
S(ΔV)
The divergence of a vector field has the dimension flux of the field per volume.
This flux density can be termed a source density as it is a net flux. A positive
source density corresponds to a true source, negative source density to a sink.
Similar statements can be made for extended sources, for instance for
the gravitational field of a homogeneous (from a macroscopic point of view),
spherical mass distribution

ρ0 f ür r ≤ R
ρ(r) = .
0 f ür r > R
The mass is (see Math.Chap 4.3)

4
M= ρ(r)dV = πρ0 R3 .
Ku 3
In order to calculate the gravitational field the volume is divided into
infinitesimal elements dV at the position r (Fig. 5.28a). The contribution
of this volume element to the field at the position r is
ρ0 dV
dG(r) = −γ (r − r ) .
|r − r |3
The contributions of all mass elements have to be added in the sense of a
limiting value

dV
G(r) = −γρ0 (r − r ) .
|r − r |3
Actually three (!) volume integrals have to be evaluated. The following consid-
eration reduces the labour of the explicit evaluation: there exists a diametrical
element for each element at the position r which gives a contribution of the
same magnitude. The vector sum of the contributions of the two volume ele-
(a) (b)
z
r’
dV’
division symmetry
Fig. 5.28. Calculation of the gravitational field of a sphere with a homogeneous
mass distribution
ments is a vector in the radial direction (Fig. 5.28b). The gravitational field
of a sphere with a homogeneous mass distribution is a radial field. Explicit
integration yields
⎧
⎪ M
⎨ −γ er r≥R
G(r) = r2 .
⎪
⎩ −γ M rer r ≤ R
R3
******************************************************************
The following paragraph contains the details for the evaluation of the vol-
ume integral. It can be skipped. It seems to be useful, on the other hand, to
present the steps of the calculation as they may serve as a model for similar
calculations.
The symmetry allows the field point to be placed on the z -axis for the
evaluation of the integral

dV
G(r) = −γρ0 (r − r ) .
|r − r |3
This choice facilitates the calculation, the general result can be regained at
the end. The coordinates of the field point r and the variable of integration
r are then
r = (0, 0, z)
r = (r cos ϕ sin θ , r sin ϕ sin θ , r cos θ ) ,
the magnitude of the difference vector
1/2
|r − r | = r 2 + z 2 − 2r z cos θ .
The relation
2π 2π

cos ϕ dϕ = sin ϕ dϕ = 0
0 0
implies
G(r) = −2πγρ0 Iez
with
R

1
(z − r x)dx
I= r 2 dr
0 −1 (z 2 + r 2 − 2r zx)3/2
using the substitution cos θ = x . The inner integration can be performed
with the steps
1
(z − r x)dx
I(r , z) = 2
−1 (z + r − 2r zx)
2 3/2

1 1 1
= 2 (z − r )
2 2
1/2
− 1/2
2r z (z 2 + r 2 − 2r z) (z 2 + r 2 + 2r z)
1/2 1/2 )

− z 2 + r 2 − 2r z − z 2 + r 2 + 2r z .
The two cases

2
z > r I(r , z) =
z2
z < r I(r , z) = 0
have to be distinguished. The result for the case z > R is
R
2 2 R3
I= r 2 dr =
z2 0 3 z2
and
4π 1 1
G(r) = − ρ0 R3 γ 2 ez = −γM 2 ez .
3 z z
It is
z
2 2
I= 2 r 2 dr = z
z 0 3
and
4π z
G(r) = − ρ0 γzez = −γM 3 ez
3 R
for z < r . The symmetry allows the replacement of z by r and of ez by er
at this stage.
******************************************************************
The strength of the field increases linearly with r within the sphere and
decreases as 1/r2 on the outside (Fig. 5.29). The field of the homogeneous
G
r 1/ r
R r
Fig. 5.29. Radial variation of the gravitational field of a sphere with a homogeneous
mass distribution
sphere outside the sphere can not be distinguished from the field of a point
mass of the same magnitude in the centre. The divergence of the gravitational
field of the homogeneous sphere is
⎧
∂Gi ⎨0 r>R
div G = i =
∂xi ⎩ −3γ M r≤R,
R3
or, after insertion of the mass for r ≤ R ,

∂Gi 0 r>R
div G = i = .
∂xi −4πγρ0 r≤R
The form is in this case also ’solid angle times density of the source’. The di-
vergence is not equal to zero (div G = 0) for all points which carry mass (the
sources of the field), it vanishes (div G = 0) for all other points. The diver-
gence of the field describes the distribution of its sources and their strength
(−4πγρ0 ). The step of div G (Fig. 5.30) at the surface of the sphere is a
consequence of the sharp edge of the distribution.
div G
Fig. 5.30. Illustration of the divergence of the gravitational field of the homoge-
neous sphere
The second central integral theorem of vector analysis is the theorem

of Stokes. This theorem deals with the concept of ’rotation’, for which it
supplies, among other points, a more graphic interpretation. The theorem
can be stated in the form
A vector function f (x1 , x2 , x3 ) is defined and continuously differ-
entiable on an open surface S in space with an oriented boundary
K(S) . The following relation between a surface integral and a line
integral
2
rotf · dS = f · dr
S K(S)
is valid under these assumptions.
The surface integral with the rotation of the vector field is equal to the
line integral over the oriented boundary of the surface S (Fig. 5.31a). The
orientation of the boundary and the orientation of the infinitesimal surface
elements dS are related by the right hand rule (or the right hand grip rule)
by the theorem of decomposition (Fig. 5.31b).
The proof of Stokes’ theorem uses the same pattern as the proof of the
divergence theorem. It is more involved though as the two integrals contain
scalar products of vectors in the present case.
The first step is the decomposition of the oriented surface S into subdo-
mains (Fig. 5.32). Each subdomain is characterised by
2
Ii = f · dr .
Ri
(a) (b)
K(S)
surface with orientated boundary orientation of the infinitesimal

elements
Fig. 5.31. The theorem of Stokes
K(S)
Fig. 5.32. Proof of the theorem of Stokes: subdivision and projection
Addition of all subdomains gives

n 2
Ii = f · dr
i=1 K(S)
as the contributions of the dividing lines cancel. The same relation holds for
an arbitrarily fine subdivision

n 2
lim Ii = f · dr .
n→∞ K(S)
i=1
The second step is a decomposition (actually projection) of the curved

subdomains into (sufficiently fine) planar structures (Fig. 5.33a). The re-
sults obtained in this fashion are the contributions parallel to the coordinate
planes, e.g. in the 1 - 2 plane (Fig. 5.33b)
x1 +dx1 x2 +dx2
Ii (1, 2) = f1 (x1 , x2 , x3 ) dx1 + f2 (x1 + dx1 , x2 , x3 ) dx2
x1 x2
x1 x2
+ f1 (x1 , x2 + dx2 , x3 ) dx1 + f2 (x1 , x2 , x3 ) dx2 .
x1 +dx1 x2 +dx2
(a) (b)
x 2+ dx 2
x2
1
x1 x 1+ dx 1
3
decomposition, spatial view projection of an element

Fig. 5.33. Proof of the theorem of Stokes: details
This can be sorted according to

x2 +dx2 x1 +dx1
∂
Ii (1, 2) = dx2 dx1 f2 (x1 , x2 , x3 )
x2 x1 ∂x1
x1 +dx1 x2 +dx2
∂
− dx1 dx2 f1 (x1 , x2 , x3 )
x1 x2 ∂x 2
and summarised in the form

Ii (1, 2) = (rotf )3 dS3 .
Bi,3
The domain Bi,3 is the projection of Bi onto the 1 - 2 plane. Summation of

all the contributions in this plane yields

Ii (1, 2) = (rotf )3 dS3 .
i B3
The domain B3 is then the projection of S itself on the 1 - 2 plane. It can

again be shown that an equivalent result follows for the domains parallel to
the 2 - 3 and 1 - 3 planes

Ii (2, 3) = (rotf )1 dS1
i B1

Ii (1, 3) = (rotf )2 dS2 .
i B2
The final result is the theorem

2
Ii = f · dS = (rotf ) · dS .
i K(S) S
An extension of the theorem is very useful for applications in physics. For

this purpose the assumption has to be used that the vector function is defined
and continuously differentiable in a spatial domain containing the surface S
(rather than only on the surface, Fig. 5.34). The theorem is in this case:
S2
S1
Fig. 5.34. Surfaces with the same oriented boundary curve
A vector function f is defined and continuously differentiable in a spatial

domain G . The relation
2
f · dr = rotf · dS = rotf · dS = . . .
K S1 S2

= rotf · dS
Si (K)
is valid for every surface Si , which is fully embedded in G and which has the
same oriented boundary curve K .
This form of the theorem illustrates again the connection between the
statement rotf = 0 and the path independence of the line integral which
has been discussed in Math.Chap. 5.3.1.
• The line integral over a closed curve vanishes
2
f · dr = 0
K
if the relation3 rotf = 0 holds in the domain containing the curve.
• The relation K f · dr = 0 for a closed curve in G implies rotf = 0 in the
domain as all surface integrals for surfaces with the same boundary vanish
according to the theorem.
An idea for the interpretation of the concept rotf can be gleaned from
the comparison of two examples: The pattern of the field lines for the vector
field with cylindrical symmetry
1/2
f = (xg(ρ), yg(ρ), 0) ρ = x2 + y 2
is centrally symmetric with respect to every plane parallel to the x - y plane

(Fig. 5.35a). The rotation of this field

ex ey ez

rotf = ∂x ∂y ∂z = ez (ygx − xgy )

xg yg 0
can be calculated with the aid of the chain rule

∂g ∂ρ x ∂g ∂ρ y
gx = = gρ gy = = gρ .
∂ρ ∂x ρ ∂ρ ∂y ρ
The rotation of this field vanishes rotf = 0 for every point in space. This
implies according to the theorem of Stokes that line integrals over closed
curves (around the z -axis, or without inclusion of this axis) vanish as well.
This can easily be checked for a circle about the z -axis. With the parametric
representation
x = R cos t y = R sin t 0 ≤ t ≤ 2π
of the curve (the orientation has been chosen to be counter clockwise) one
finds
2 2π
f · dr = R2 g(R) (− cos t sin t + sin t cos t) dt = 0 .
circle 0
The answer could also have been guessed in this case as f is always perpen-
dicular to dr in this case.
(a) (b)
y y
x x
radial with respect to the z -axis concentric about the z -axis

Fig. 5.35. Two vector fields with cylindrical symmetry
The field lines of the vector field

f = (−yg(ρ), xg(ρ), 0) ,
which corresponds to the magnetic field B discussed above for g = 1/ρ2 , are
concentric circles about the z -axis (Fig. 5.35b). The rotation of this field is

ex ey ez

rotf = ∂x ∂y ∂z = ez (g + xgx + g + ygy ) .

−yg xg 0
Insertion of the expressions for gx and gy as in the previous example gives
rotf = ez (2g + ρ2 gρ ) .
The rotation does not vanish in general except for a function g which is
singular for points on the z -axis
2
gρ = − g → g = e2/ρ .
ρ2
The theorem of Stokes states in this case
2
f · dr = 0 .
The line integral for a circle about the z -axis (traversed counter clockwise)
is
2 2π
2
f · dr = R g(R)
2
sin t + cos2 t dt = 2πR2 g(R) .
circle 0
A line integral with this vector field is in addition path dependent. The
integral for a curve along the sides of a square with the side length 2R about
the z-axis (Fig. 5.36) can be evaluated with the parametric representation
Fig. 5.36. A contour for the line integral of the second vector field with cylindrical
symmetry
⎫
x=R y =t ⎪ ⎪
⎬
x = −t y =R
R ≤t≤R .
x = −R y = −t ⎪
⎪
⎭
x=t y = −R
The result
2 R
f · dr = 4R g R2 + t2 dt
−R
can not be evaluated without further specification of g(ρ) . The value for the
magnetic field with g = 1/ρ2 g = 1/ρ2 is
2 2
f · dr = 8R arctan R = f · dr .
square circle
The comparison of the two examples indicates a possible interpretation of

the quantity rotf : the rotation describes the occurrence of closed field lines.
or, in other words the occurrence of vortices. The field of the first example is
free of vortices, therefore rotf vanishes. The field of the second example is
a simple vortex field.
The theorem of Stokes can be used for a formal definition of the concept
’rotation of a vector field‘: it allows the statement
2
1
(rotf )Normal = lim f · dr
ΔS→0 ΔS K(ΔS)
for an infinitesimal loop with the area ΔS about a point of the space carrying
a field. The orientation of the corresponding surface is the direction of the
normal. The vector rotf is projected onto this direction in the scalar product
of rotf and ΔS (Fig. 5.37). The line integral which features in the equation
dS
f
Fig. 5.37. Definition of the rotation of a vector field
above is termed the circulation of the vector field. The rotation is then the
circulation per unit area or specific circulation. The limiting value defines a
quantity which can be called the vortex density.
The connection between a surface integral of (rotf ) with a line integral
of f can also be put to use in practical applications. It is, however, not quite
as useful as the Gauss theorem due to the more complicated structure of
the surface integrals. One application of interest is the explanation of the
connection between different methods for the calculation of the contents of
planar surfaces.
• A method, which can be extracted from the discussion of the law of areas
in mechanics (Chap. 2.3.3), determines the contents of a planar area by
tracing the boundary of the area with the aid of a parametric representation
(Fig. 5.38a)
2
1
F1 = (x(t)ẏ(t) − y(t)ẋ(t))dt .
2 K
• The contents of planar surfaces can also be calculated by domain integra-
tion with functions of two variables. A subdivision of the area e.g. by means
of rectangles calls for the evaluation of (Fig. 5.38b)

F2 = dxdy .
B
(a) (b)
by tracing the boundary by domain integration

Fig. 5.38. The contents of planar surfaces
A connection between the two methods can be established with the following
argument: rewrite the integrand of the first method as
2
F1 = f · dr
K
with

1 1
f= − y, x, 0 and dx = ẋ(t)dt, dy = ẏ(t)dt .
2 2
Calculate rotf for this field function
rotf = ez
and use the theorem of Stokes to obtain
2
f · dr = rotf · dS = 1 · dSz = dxdy .
K S(K) S(K) S(K)
The theorem provides an elegant connection of the two methods for the cal-
culation of the contents of planar surfaces.
The discussion of the basic concepts of vector analysis (in R3 ) can be
summarised in the following fashion: An idea of the structure of a vector
field f (x, y, z) can be obtained by examination of the quantities div f (a
scalar quantity) and rotf (a vector quantity). The divergence describes the
distribution of sources and sinks, the rotation the occurrence of vortices.
Conversely, these quantities can be used for an overall classification of vector
fields:
(1) A field f with div f = 0 is called source free or solenoidal. An example
is the magnetic field.
(2) A field f with rotf = 0 is called vortex free. Examples are the fields of
electrostatics or the conservative force fields of mechanics.
(3) A field f with rotf = 0 is called a vortex field.
Finally, two possibilities to extend the discussion are mentioned without
giving any details:
• The integral theorems of Green constitute a variation of the two inte-
gral theorems discussed here. They are considered in Vol. 2 in connection
with the theory of electrostatic fields.
• The statements of the last sections have dealt exclusively with the situation
f (x1 , x2 , x3 ) = (f1 (x1 , x2 , x3 ), f2 (x1 , x2 , x3 ), f3 (x1 , x2 , x3 )) ,
that is a vector field with m = n = 3 . They can be extended to the case
of arbitrary dimensions (with m = n)
f (x1 . . . xn ) = (f1 (x1 . . . xn ), f2 (x1 . . . xn ) . . . fn (x1 . . . xn )) .
The gradient operator is
n

∂
∇= ei .
i=1
∂xi
The concept ’divergence’ and the divergence theorem can be generalised
in a simple manner. The generalisation of the concept ’rotation’ and the
theorem of circulation is more complicated. Such extensions are e.g. of
interest in the discussion of the theory of relativity. The dimension of the
appropriate space is then n = 4, it is, however, not Euclidian.
6 Differential equations II
The discussion of differential equations is continued in this chapter. It begins

with the second part of the overview which was started in Math.Chap. 2.
The main theme is, however, the investigation of methods for the solution of
classes of differential equations which play a role in applications.
6.1 Further orientation

A general form of an ordinary differential equation for a function of one
variable x = x(t) can be expresses by the implicit equation1
F (t, x, x , x , , . . . , x(n) ) = 0 .
The following options exist for the type of solution of these differential equa-
tions:
• Any function, which satisfies the differential equation is called a special
solution (or particular solution).
• Normally the solutions of differential equations of physics are expected
to satisfy additional conditions. The value of the functions or the deriva-
tives are prescribed for particular t -values. Three different situations are
distinguished: initial value problems, boundary value problems or initial-
boundary value problems.
• The general solution is the most general function which satisfies the differ-
ential equation. The general solution of an ordinary differential equation
of n -th order contains n integration constants (see Math.Chap 2.1). The
particular solutions indicated above can be obtained from the general so-
lution.
The diversity encountered for differential equations is underscored once more
by an additional set of examples:
1. The differential equation of the exponential function x = kx, with a
given constant k .
1
The notation for derivatives of first, second respectively n -th order is
x , x , . . . , x(n) in this chapter.
208 6 Differential equations II
2. The general differential equation for oscillations x + ax + bx + c = f (t)

with the constants a, b, c and the driving function f (t) .
3. The differential equation of the confluent, hypergeometric function
tx + (c − t)x − ax = 0
with constants a, c . This function is one of the special (this means non-
elementary) functions of mathematical physics (some remarks on special
functions are found in Math.Chap. 6.3.3).
4. The differential equation of a loaded beam x(4) = f (t) . The function
f (t) represents the variable load in the horizontal (variable t) and x(t)
the bending of the beam under the load.
5. A product of fantasy (x(3) )2 + (x )4 + (sin t)x − x5 = 0 .

6. Another product of this kind ex = 1 + t .
A rough classification uses, besides the term ’order’ of a differential equa-
tion (see Math.Chap. 2.1), the term degree of a differential equation. Use
of this term is possible if the differential equation can be written as a poly-
nomial in the derivatives. The power of the highest derivative is referred to
as the degree of the differential equation. The standard form of a differential
equation, which can be characterised by a degree, is
f1 (x, t)(x(n) )n1 + f2 (x, t)(x(n−1) )n2 + · · · fn+1 (x, t) = 0 .
This is a differential equation of n -th order and n1 -th degree. The superficial
classification of the differential equations listed above is
example : 1 2 3 4 5 6
order : 1 2 2 4 3 1
degree : 1 1 1 1 2 1 .
The only comment, which might be necessary, concerns the degree of
example 6 . This differential equation can be reformulated as x = ln(1 + t)
by taking the logarithm.
A function of several variables
x = x(t1 , t2 , . . . , tn )
can be characterised by a partial differential equation. The general im-
plicit form contains the variables, the function and its partial derivatives2
F (t1 , . . . , tn ; x; xt1 , . . . , xtn ; xt1 t1 , xt1 t2 , . . .) .
The task is still the determination of the function x on the basis of the differ-
ential equation specified. The implementation of this task is in general more
involved as the solution of ordinary differential equations. It has to be stated
though, that most of the basic differential equations of theoretical physics
2
The partial derivatives are denoted as xt1 = (∂x(t1 , . . .)/∂t1 ) , etc. Compare
Math.Chap. 4.
6.1 Further orientation 209
(Maxwell equations, Schroedinger equation, . . .) are partial differential equa-

tions. Two examples from the field of mechanics are the wave equation (see
Chap. 6.1.4) and the Poisson equation of the potential theory
∂2V ∂2V ∂2V
ΔV (r) = + + = 4πρ(r) .
∂x2 ∂y 2 ∂z 2
The problem, which is e.g. posed by the Poisson equation, is the determi-
nation of the potential function V (r) of a given mass distribution which is
described by the density function ρ(r) . The solution of partial differential
equations is discussed in detail in Vol. 2 against the background of electro-
dynamics.
Other forms of differential equations, which play a role in mathematical
physics, are
• Systems of differential equations. A set of functions of one variable
{x1 (t), x2 (t), . . . , xn (t)} can be characterised by a set of (coupled) differ-
ential equations
F1 (t; x1 , . . . xn ; x1 . . . xn ; x1 . . .) = 0
.. ..
. .
Fk (t; x1 , . . . xn ; x1 . . . xn ; x1 . . .) = 0 ,
generally with k = n . One example are Newton’s equations of motion in
the three-dimensional world for one point mass, in detail
mẍi = Fi (t; x1 , x2 , x3 ; ẋ1 , ẋ2 , ẋ3 ) i = 1, 2, 3
or in vector shorthand
mr̈ = F (t, r, ṙ) .
A statement, which is of use for the proof of theorems and for numer-
ical applications, is: a differential equation of n -th order can always be
reformulated as a system of n differential equations of first order and vice
versa. The differential equation of second order F (t, x, x , x ) = 0 can, for
example, be replaced by the system
x1 = x2 F (t, x1 , x2 , x2 ) = 0
by using the substitutions x = x1 and x = x2 .
• Integral equations. A typical example for an integral equation is
t
x(t) = f (t) + K(t, t̃) x(t̃)dt̃ .
0
The function f (t) and the ’kernel’ of the integral K(t, t̃) are specified.
The task is the determination of the function x(t) which occurs under the
integral sign. There exists a relation between differential equations and
integral equations which is similar to the relation between differentiation
and integration. This relation is relevant for both formal as well as practical
aspects.
• Integro-differential equations are characterised by the fact that the

function to be determined and its derivatives occur in an integral equation.
The derivative can also be found under the integral sign, as e.g. in the
following example
t
d2 x(t) dx(t̃)
= f (t) + K1 (t, t̃)x(t̃) + K2 (t, t̃) dt̃ .
dt2 0 dt̃
• A last option are difference-differential equations which will be indi-
cated by an explicit example. The differential equation for the exponential
function
dx(t)
= kx(t) with the solution x(t) = c ekt
dt
describes a particular form of increase or decrease. The verbal character-
isation of this differential equation is: the change of the quantity x(t) is
proportional to this quantity at the time t . The differential equation
dx(t)
= kx(t − τ ) with τ = const.
dt
looks similar, but the structure of its solution is entirely different. The
seemingly harmless variant: the change of the quantity x(t) is proportional
to the quantity at an earlier or (even !) a later time, is responsible for the
structural change. Time lag effects play a role for the discussion of control
mechanisms and other feedback control systems.
The actual discussion of this chapter concentrates, however, on the question of
the solution of the ordinary differential equation F (t, x, x , x , . . . , x(n) ) = 0 .
6.2 Differential equation of first order

One might have thought that differential equations of first order F (t, x, x ) = 0
could all be solved analytically. This is not the case. Only specific classes of
differential equations of first order, for which an analytic solution can be ob-
tained, can be identified. It might even be more correct to say –be obtained
in principle– as some integrals at issue might not be calculated easily. A first
order differential equation, which can be reduced to a quadrature, is called
solvable. A differential equation, which can not be associated with any of
the classes of solvable differential equations, will probably have to be solved
numerically.
The most important differential equations of first order, which can be
solved analytically, are introduced below. The simplest is the differential
equation of first degree with the general form
f (t, x) x + g(t, x) = 0 .
It is opportune to suppress the distinction between the dependent variable x
and the independent variable t by writing
6.2 Differential equation of first order 211
f (t, x) dx + g(t, x) dt = 0 .
The replacement of the differential quotient by differentials can be justified
rigorously (see Math.Chap. 2.2.1). But not even all differential equations of
first order and first degree can be solved analytically. This is only possible if
the functions f and g possess a special form. The simplest case can be treated
with the method of separation of variables, which will be developed further
below.
6.2.1 Separation of variables and transformation of variables
The differential equation

f (t, x) dx + g(t, x) dt = 0
can be solved directly by integration
x t
f2 (x̃) g1 (t̃)
dx̃ + dt̃ = c
g2 (x̃) f1 (t̃)
if the functions f and g factorise in the form
f (t, x) = f1 (t) f2 (x) and g(t, x) = g1 (t) g2 (x) .
There are no integration constants on the left hand side, all constants are
summarised in the constant c . Separation of variables yields e.g. for the
differential equation
x = 1/(2xt)
the solution
x2 − ln t = c .
It is often required to use an explicit form instead of the implicit expression
which is found here, e.g with the resolution
√
x = ± c + ln t .
The actual form of the constant of integration can be changed without a
problem. The completely equivalent solution

x = ± ln(c1 t)
can be obtained by writing c = ln c1 . Resolution with respect to the variable
t gives, with another renaming of the constant of integration,
2
t = c2 ex .
A differential equation, for which a separation of variables is not possible,
can in certain circumstances be reduced to a separable form by a transforma-
tion of variables. The differential equation (t + x)dt + dx = 0 is not directly
separable. It can, however, be brought into a separable form with the trans-
formation
v =t+x dv = dt + dx .
The transformed differential equation
v dt + dv − dt = 0 or (v − 1) dt + dv = 0
has the implicit solution t + ln(v − 1) = c which can be rewritten with the
steps
et (v − 1) = c and et (t + x − 1) = c
in the explicit form x = c e−t − t + 1 . There exist no definite rules for finding
a suitable transformation. Everything rests on the so called ’mathematical
intuition’ which is not easily defined.
6.2.2 The total differential equation

The differential equation
f (t, x) dx + g(t, x) dt = 0
is a total differential equation (also called exact differential equation)
if the expression is the total differential of an implicit function of two variables
z(t, x) = c . The way to recognise a total differential equation and its general
solution can be taken from the material collected in Math.Chap. 4.2.4. The
following statements are valid for an explicit function of two variables z =
z(t, x) which is continuously differentiable in a domain G 3 .
• The total differential of the function is
∂z ∂z
dz = dt + dx .
∂t ∂x
• The mixed partial derivatives of second order agree (theorem of Schwarz)
∂2z ∂2z
= .
∂t∂x ∂x∂t
• The line integral

dz = z(t, x) − z(t0 , x0 )
K
along a curve, which connects the points (t0 , x0 ) and (t, x), is path inde-
pendent.
• The implicit function z(t, x) = c describes the contours of the function
z = z(t, x) in the t - x plane so that
∂z ∂z
dz = dt + dx = 0 .
∂t ∂x
The increment of the function z = z(t, x) along a contour line is zero.
3
All partial derivatives up to second order exist and are continuous.
The definition of a total differential equation of the form

f (t, x) dx + g(t, x) dt = 0 ,
which can be extracted from these statements, implies that the functions (the
coefficients) of the differential equation are partial derivatives of a function
z = z(t, x)
∂z(t, x) ∂z(t, x)
g(t, x) = and f (t, x) =
∂t ∂x
and that the general solution is the implicit function z(t, x) = c . The condi-
tion (condition of integrability)
∂g(t, x) ∂f (t, x)
=
∂x ∂t
is a sufficient and necessary condition for the presence of a total differential
equation.
The solution is obtained by line integration

(g(t, x) dt + f (t, x) dx) = z(t, x) − z(t0 , x0 ) .
K
The simplest possible path can be chosen as the line integral is (under the
conditions stated) is path-independent. For a decomposition with a lower
path parallel to the axes according to Fig. 6.1 follows
t x
g(t, x0 ) dt + f (t, x) dx = c .
t0 x0
(x,t)
(x,t)0
t
Fig. 6.1. Line integration
The first section runs parallel to the t -axis with t - fixed, the second parallel
to the x -axis with t fixed. The upper path with
t x
g(t, x) dt + f (t0 , x) dx = c ,
t0 x0
that is a t -integration for fixed x and an x -integration for fixed t0 , could have
been chosen as well. The starting position can be chosen freely. A change of
the starting position corresponds to a renaming of the constant of integration.
The actual process of solution is in general much simpler than the de-
scription above. This will be demonstrated in terms of two examples. The
differential equation of the first example is
(3t2 x2 + t2 )dt + (2t3 x + x2 )dx = 0 .
The first step is a check whether the differential equation is indeed exact
with the aid of the condition of integrability. This condition is satisfied as one
finds gx = ft = 6t2 x . The second step is the execution of the line integration.
Several options will be presented for the present example as an exercise.
The path (Fig. 6.2) begins at the origin (often a good choice of the starting
point), runs along the t -axis to the point (t, 0) and then parallel to the x -axis.
The corresponding integral
t x
2
t̃ dt̃ + (2t3 x̃ + x̃2 ) dx̃ = c
0 0
leads to the implicit solution
1 3 1
t + t3 x2 + x3 = c .
3 3
t=const.
dt=0
x=0 dx=0
t
Fig. 6.2. Variation of the path of integration: (0, 0) → (t, 0) → (t, x)
The second path (Fig. 6.3a) runs along the x -axis to the point (0, x) and
then parallel to the t -axis to the point (t, x). This gives
t x
2 2 2
(3t̃ x + t̃ ) dt̃ + x̃2 dx̃ = c
0 0
with the same solution as before.
A third variant uses a similar path as in the first case, but the starting
position is the point (1, 1) . The path shown in Fig. 6.3b connects the points
(1, 1), (t, 1) and (t, x) with straight lines parallel to the axes. The integral
is here
t x
2
4t̃ dt̃ + (2t3 x̃ + x̃2 ) dx̃ = c1 .
1 1
The result

4 3 4 1 3 1
t − + t x + x −t −
3 2 3
= c1
3 3 3 3
agrees with the previous ones after renaming c1 + 5/3 = c .
(a) (b)
x x
x=const. dx=0
t=const.
t=0 dt=0
dt=0
x=const dx=0
t t
(0, 0) → (0, x) → (t, x) (1, 1) → (t, 1) → (t, x)
Fig. 6.3. Variation of the paths of integration
The second example is the differential equation

(x cos t) dt + (sin t) dx = 0 .
It can be seen without further calculation that the condition of integrability
gx = ft = cos t is satisfied. The solution can be guessed as well (a method
which is quite acceptable). The question, which function possesses the partial
derivatives zt = x cos t and zx = sin t , can be answered with z = x sin t . A
check is advisable if the solution is found by guessing. The explicit differen-
tial equation of the present example is x = −x cot t . The derivative of the
function x = c/ sin t is indeed
c cos t
x = − = −x cot t .
sin2 t
It should also be noted that differential equations of the form
f (x)dx + g(t)dt = 0 ,
which can be solved by separation of variables, are a special case of the exact
differential equation, as ft = gx = 0 . On the other hand not every differential
equation of the form
f (t, x) dx + g(t, x) dt = 0
is exact. It can, however be stated that every differential equation of this class
can be converted into an exact differential equation (at least in principle).
The keyword is ’integrating factor’.
6.2.3 The integrating factor
This topic is best introduced with a simple example. The differential equation
t dx − x dt = 0
is not exact, as ft = 1 and gx = −1 so that ft = gx . The solution is
nonetheless simple. Write
dx dt
− =0
x t
and obtain ln x − ln t = c1 , which can also be written as (after renaming
the constant of integration) x = ct . The total differential of the solution in
implicit form
x 1 x 1
d = dx − 2 dt = 2 (t dx − x dt) = 0
t t t t
gives a hint, how the differential equation could have been solved in a different
fashion. The differential equation, which results from multiplication of the
original form with 1/t2
1 x
dx − 2 dt = 0
t t
is exact (ft = gx = −1/t2 ) so that it could be solved by line integration.
The function u(t) = 1/t2 is called an integrating factor of the differen-
tial equation at hand. There exists, however, not only one but an arbitrary
number of such factors. Write e.g. the solution in the form t/x = c2 and find
that u(x) = −1/x2 is an integrating factor as

t t 1 1
d = − 2 dx + dt = − 2 (t dx − x dt) = 0 .
x x x x
A third possibility can be obtained with the modified implicit solution
x x 1
arctan = c3 with d arctan = 2 (t dx − x dt) = 0 .
t t (x + t2 )
The function u(t, x) = 1/(x2 +t2 ) is also an integrating factor. A large number
of additional options are possible.
The general argument proceeds as follows: Try to choose an integrating
factor u(t, x) of the differential equation
u(t, x) f (t, x) dx + u(t, x) g(t, x) dt = 0
in such a way that the relation
∂(u(t, x) f (t, x)) ∂(u(t, x) g(t, x))

=
∂t ∂x
is satisfied. This requirement corresponds to a partial differential equation
for the function u
∂u(t, x) ∂u(t, x)
f (t, x) − g(t, x) = u(t, x)(gx (t, x) − ft (t, x)) .
∂t ∂x
The statement ’in principle’ becomes apparent at this point of the argument.
It is possible to demonstrate that solutions of the partial differential equation
above exist (provided the functions f and g have suitable properties). The
partial differential equation can also be solved for a number of basic types, but
shifting the problem does not necessarily make it easier to obtain a solution.
There remains the discussion of a last special case of differential equations
of first order and first degree, the linear differential equation.
6.2.4 Linear differential equation
The general solution of the linear differential equation

x (t) + a(t) x(t) = b(t)
can be obtained via the principle of superposition4 . The general solution of
the inhomogeneous differential equation is the sum of the general solution of
the homogeneous differential equation and a special solution of the inhomo-
geneous differential equation
xi (t, c) = xh (t, c) + xp (t) .

The solution of the homogeneous differential equation is obtained by separa-
tion of variables
t
dx
+ a(t) dt = 0 gives ln x + a(t̃) dt̃ = c1
x
which can also be expressed in the form (resolve and rename the constant of
integration)
t
xh (t, c) = c exp − a(t̃)dt̃ .
The special solution of the inhomogeneous differential equation, which is still

needed, can be obtained in simple situations with an appropriate ansatz. The
table shows some examples for which this method leads to a solution
4
The statements for the linear differential equation of second order, discussed in
Math.Chap. 2.2.2 can be carried over to the case of linear differential equations
of n -th order.
differential equation xh xp (ansatz) xp
x − 3x = 15 c e3t − −5
x − x/t = t3 ct at4 t4 /3
x + x = cos t c e−t a cos t + b sin t (cos t + sin t)/2 .
This naive approach might not be possible. A general method to find the
special solution of the inhomogeneous differential equation is the method of
the variation of the constant. This method is based on the solution of the
homogeneous differential equation, which can be written as xh (t, c) = c g(t)
with
t
g(t) = exp − a(t̃)dt̃ .
The ansatz
xp (t) = c(t) g(t)
for the desired special solution accounts for the term ’variation of the con-
stant’. Insertion of the ansatz into the differential equation gives
c (t) g(t) + c(t)[g (t) + a(t) g(t)] = b(t) .
The term in the square brackets corresponds to the homogeneous differen-
tial equation. The remaining differential equation c (t) = b(t)/g(t) for the
function c(t) has the special solution
t
b(t̃)
c(t) = dt̃ .
g(t̃)
The same results for the solutions xp (t) of the simple examples above are
obtained by application of this method.
The differential equations of first order and first degree, which have been
discussed here, are part of every day tools of theoretical physics. The list of
differential equations of this class, which can be solved analytically, can be
enlarged to some extent5 . The next section contains, however, some remarks
on the related class of differential equations of first order and higher degree.
6.2.5 Differential equations of first order and higher degree
The standard form is a polynomial in derivatives of first order

(x )n + f1 (t, x)(x )(n−1) + · · · + f(n−1) (t, x)x + fn (t, x) = 0 .
Such a polynomial can be factorised
(x − g1 (t, x))(x − g2 (t, x)) · · · (x − gn (t, x)) = 0
5
6.3 Differential equations of second order 219
where it might be necessary to deal with complex functions. The strategy

would be to solve the differential equations for the individual factors and use
these solutions to construct the complete solution. This strategy is illustrated
in terms of two examples.
The factorisation of the differential equation
(x )2 − 4t2 = 0
is (x − 2t)(x + 2t) = 0 . The individual solutions are
x − 2t = 0 −→ x = t2 + c1
x + 2t = 0 −→ x = −t2 + c2 .
The product (x − t2 − c1 )(x + t2 − c2 ) is, however, not a solution of the
original differential equation. This is of first order and can therefore contain
only one constant of integration (independent of the degree). One option to
avoid this dilemma is to set c1 = c2 = c . It is then found that the function
(x − c)2 − t4 = 0 or x = c ± t2 is indeed a general solution of the original
differential equation. The functions represent a family of parabolae, shifted
parallel and oriented upwards or downwards.
The second example is, for simplicities sake, also a differential equation
of second degree
(x )2 − x = 0 .
√ √
Factorisation gives the individual equations (x − x)(x + x) = 0, with the
solutions
√ √ 1
x = x −→ x = (c1 + t)
2
√ √ 1
x = − x −→ x = (c2 − t) .
2
The construction of the general solution according to the procedure suggested
is

√ 1 √ 1 1
x − (c + t) x − (c − t) = 0 or x = (t ± c)2 .
2 2 4
The parameter c can take the values −∞ ≤ c ≤ ∞, so that one of the signs
is sufficient. The solution represents a family of parabolae with a minimum
on the t - axis.
6.3 Differential equations of second order

Ordinary differential equation of second order
F (t, x, x , x ) = 0
play a special role in physics (not only in mechanics). The simplest examples
have been discussed in Math.Chap. 2.2. These examples were
• Differential equations in which only one of the other variables appeared

next to the second derivative of the function x
x = f (t), x = f (x), x = f (x ) ,
• The linear differential equation with constant coefficients.
The technique for the solution of the first class is a two step method. The
solution is achieved by consecutive integration of two differential equations of
first order. This method can also be used for solvable, but more complicated
differential equations of second order.
6.3.1 Solvable implicit differential equations
The differential equations

F (t, x , x ) = 0 and F (x, x , x ) = 0
are characterised by the fact that the dependent variable x does not occur in
the first case. The independent variable t does not occur in the second case.
The application of the two step method for the equation F (t, x , x ) = 0
involves the steps:
• Step 1: Substitute x = v x = v in order to obtain a differential equation
of first order
F (t, v, v ) = 0 .
The solution is a one parameter family of curves g(t, v, c1 ) = 0 .
• Step 2: The result of the first step is another differential equation of first
order
g(t, x , c1 ) = 0 .
The solution f (t, x, c1 , c2 ) = 0 constitutes the general solution of the orig-
inal differential equation.
The execution of these steps might present some difficulties, however. This is
illustrated by the example of the differential equation

1 x
2x − x + 2 = 0 .
t t
The substitution in the first step gives

v v 1 v
2vv − + 2 = 0 or 2v − dv + 2 dt = 0 .
t t t t
It can be checked that this differential equation is exact as
∂(2v − 1/t) ∂(v/t2 ) 1
= = 2 .
∂t ∂v t
The general solution, which can be found via line integration, is
v
v2 − + c1 = 0 .
t
The result of the first step is an implicit function (the normal situation). The
following options are possible for further processing:
• Option 1: A direct integration is all that is needed if the result of the first
step can be resolved in the form v = x = v(t) . This is possible in the
present example. The resolution leads to
1
x = 1 ± 1 − 4c1 t2 .
2t
The implementation of the second integration is not trivial at all (this is
also the normal situation). The result
√ )
1 1 + 1 − 4c1 t2
x(t) = ln t ± 1 − 4c1 t − ln
2 √ + c2
2 2 c1 t
can be obtained with some effort or with a Table of Integrals. The result
is a complicated curve with two branches.
• Option 2: It is often simpler to resolve the result of the first step in the
form t = t(v). The second differential equation of first order can then be
obtained by combination of the relations

dt(v)
dx = vdt and dt = dv
dv
to give

dt(v)
dx = v dv .
dv
By integration one finds
v
dt(ṽ)
x(v) = ṽ dṽ + c3 .
dṽ
This is rewritten by partial integration
v
x(v) = v t(v) − t(ṽ) dṽ + c3 .
The solution is obtained in this option in the form of a parametric repre-

sentation
t = t(v) x = x(v) with A ≤ v ≤ B .
The range of the parameter v follows from the structure of the functions
t(v) and x(v) .
Resolution with respect to t gives for the present example
v
t(v) = 2
(v + c1 )
so that the result
v
v2 ṽ 2
x(v) = 2
− 2
dṽ + c3
(v + c1 ) (ṽ + c1 )
v2 1
= 2 − ln(v 2 + c3 )
(v + c1 ) 2
is obtained for the function x(v) . Elimination of the parameter v repro-
duces (after a number of steps) the previous result for x = x(t).
A corresponding procedure can be used for the differential equation
F (x, x , x ) = 0 . The substitutions

dv
x = v and x = v (chain rule)
dx
lead to a differential equation of first order G(x, v, dv/dx) = 0 . The solution
is normally again an implicit function g(x, v, c1 ) = 0 . The final result is found
by solution of the differential equation g(x, x , c1 ) = 0 .
An example is the differential equation
x(x − 1)x + (x )2 = 0 .
The substitution leads to
dv dv dx
x(x − 1)v + v2 = 0 or + =0.
dx v x(x − 1)
The solution of this separable differential equation is v = c1 x/(x − 1) so that
the second step has to deal with the differential equation

c1 x 1
x = or 1− dx = c1 dt .
(x − 1) x
The solution and hence the general solution of the original differential equa-
tion of second order is
1
t= [x − ln x] + c2 .
c1
These examples demonstrate once more that the discussion of differential
equations of first order is not an unnecessary exercise. The two step method
for differential equations of second order relies on this discussion. There ex-
ist additional classes of differential equations of second order, for which the
two step method can be applied6 . The most important class of second order
differential equations of theoretical physics are, however, the (general) linear
differential equations.
6
6.3.2 Linear differential equation
The importance of this class of differential equations can not be realised fully
by looking at the field of classical mechanics. Differential equations of this
kind are one of the main tools of electrodynamics and quantum mechanics.
The starting equations are actually partial differential equations which are,
as a rule, reduced to a set of ordinary differential equations (of second order)
by separation of variables. The basic items of electrodynamics, the electro-
magnetic fields, and of quantum mechanics, the wave functions, follow the
principle of superposition. This accounts for the fact that the partial and
finally the ordinary differential equations have to be linear.
General statements concerning the solution of the differential equation
a0 (t)x (t) + a1 (t)x (t) + a2 (t)x(t) = b(t)
have been discussed in Math.Chap. 2.2.2. Here is a summary:
• The general solution of the inhomogeneous differential equation is the sum
of the general solution of the homogeneous differential equation and a par-
ticular solution of the inhomogeneous differential equation
xi (t, c1 , c2 ) = xh (t, c1 , c2 ) + xp (t) .
• Two particular solutions of the homogeneous differential equation {x1 (t), x2 (t)}
are linearly independent if the Wronski determinant does not vanish

x1 (t) x2 (t)

W (x1 (t), x2 (t)) = = 0 .
x1 (t) x2 (t)
They represent a fundamental system of solutions in this case.
• The linear combination xh (t) = c1 x1 (t) + c2 x2 (t) of the fundamental solu-
tions is the general solution of the homogeneous differential equation.
The solution of the linear differential equation with constant coefficients has
also been outlined in Math.Chap. 2.2.2. The class of linear differential equa-
tions, for which the coefficients are (more or less simple) polynomials in t, will
be discussed here. This discussion includes the determination of the general
solution of homogeneous differential equation and the preparation of methods
for the calculation of particular solutions of the inhomogeneous differential
equations.
The determination of the particular solution represents an extension
of the corresponding problem for differential equation of first order (see
Math.Chap. 6.2.3). The ansatz for the case of an inhomogeneous differen-
tial equation of second order is
xp (t) = c1 (t)x1 (t) + c2 (t)x2 (t) .
The functions x1 and x2 form a fundamental system and there are two ’con-
stants’ which can be varied. As only one particular solution is required there
exists a certain leeway which can be used to advantage. The first step is the
calculation of the derivative of the ansatz
xp = c1 x1 + c2 x2 + c1 x1 + c2 x2
followed by the demand (this is the leeway) that the coefficient functions ci (t)
satisfy the equation
c1 (t)x1 (t) + c2 (t)x2 (t) = 0 .
The ansatz for xp , the first derivative xp = c1 x1 + c2 x2 and the second
derivative are inserted into the inhomogeneous differential equation in the
second step. The result (properly sorted) is
c1 (a0 x1 + a1 x1 + a2 x1 ) + c2 (a0 x2 + a1 x2 + a2 x2 ) + a0 (c1 x1 + c2 x2 ) = b .
The expressions in the first two brackets vanish as x1 and x2 are solutions of
the homogeneous differential equation. The two equations
x1 (t)c1 (t) + x2 (t)c2 (t) = 0
x1 (t)c1 (t) + x2 (t)c2 (t) = b(t)/a0 (t) a0 = 0
represent a system of linear equations for the derivatives c1 (t) and c2 (t) . The
solution of this system of equations can e.g. expressed with Cramer’s rule
b(t)x2 (t) b(t)x1 (t)
c1 (t) = − c2 (t) = .
a0 (t)W (x1 (t), x2 (t)) a0 (t)W (x1 (t), x2 (t))
A non-trivial solution is assured as the Wronski determinant W (t) of the
fundamental solutions (and a0 (t)) are not equal to zero by assumption. The
functions themselves can be obtained by integration
t
ci (t) = ci (t̃) dt̃ (i = 1, 2) .
The example below illustrates the method in some detail. The general
solution of the homogeneous part of the differential equation
x + x = (cos t)−1
is xh (t) = c1 cos t + c2 sin t . The Wronski determinant is W (cos t, sin t) = 1 .
The system of equations for the derivatives of the constants to be varied is
cos t c1 (t) + sin t c2 (t) = 0
− sin t c1 (t) + cos t c2 (t) = (cos t)−1
so that the result
t
c1 = − tan t and c1 (t) = − tan t̃ dt̃ = ln | cos t|
t
c2 = 1 and c2 (t) = dt̃ = t
can be obtained. The general solution of the inhomogeneous differential equa-

tion of the example is therefore
xi (t, c1 , c2 ) = {c1 + ln | cos t|} cos t + {c2 + t} sin t .
The method of the variation of the constants is applicable in general but
somewhat cumbersome. The search for a particular solution of inhomoge-
neous, linear differential equations can often be abbreviated by inspection
and use of the superposition principle. The procedure can be gleaned from
the following direct example. A particular solution of the differential equation
x − 4x = (5 + et )
can be found by considering the individual differential equations
x − 4x = 5 and x − 4x = et .
Particular solutions can immediately be written down xp1 = −5/4 and
xp2 = −et /3 . The sum xp = −(5/4 + et /3) is clearly a particular solution
of the full differential equation.
6.3.3 Differential equations of the Fuchs class
An important class of homogeneous linear differential equations of second

order is characterised by coefficient functions in the form of polynomials in t
P0 (t) x + P1 (t) x + P2 (t) x = 0 .
They are usually, after multiplication with 1/P0 (t), discussed as
x + p1 (t) x + p2 (t) x = 0 .
The functions pi are rational fractions of polynomials. These differential equa-
tions are denoted as differential equations of the Fuchs class if the functions
pi are weakly singular. This signifies that they can be represented by an ex-
pansion in terms of partial fractions
m

m )
ak bk ck
p1 (t) = and p2 (t) = +
(t − tk ) (t − tk )2 (t − tk )
k=1 k=1
with the constants ak , bk , ck . The particular importance of this class of dif-

ferential equations is due to the fact that many ’higher’ functions (Legendre
functions, Bessel functions, confluent hypergeometric functions, etc.), which
are important in theoretical physics, are defined by differential equations of
this kind.
The method, which is used in most cases to obtain the solutions, is an
expansion in terms of power series. This technique is introduced here with
an example which does not yet address the question of higher functions.
The problem is the determination of the general solution of the differential
equation
2tx + (t + 1)x + 3x = 0 .
The ansatz for the solution is a power series in t multiplied by an arbitrary
power of t
∞

x(t) = tρ bn t n
n=0
= b0 tρ + b1 tρ+1 + b2 tρ+2 + . . . .
The additional factor tρ serves to eliminate a possible dependence of the
solution on non integer functions as t−1 , t1/3 , etc. Insertion of the ansatz
into the differential equation (after calculation of x and x ) and sorting by
powers of t yields the expansion
tρ−1 [2ρ(ρ − 1)b0 + ρb0 ]
+ tρ [(ρ + 1)(2ρ + 1)b1 + (ρ + 3)b0 ]
+ ...
+ tρ+k [(ρ + k + 1)(2ρ + 2k + 1)bk+1 + (ρ + k + 3)bk ]
+ ... = 0.
A power series can only have the value zero for all values of the variable if
the coefficients of all powers vanish
∞

dn tn = 0 −→ dn = 0 for all n .
n=0
This condition leads to the following statements for the present case
• The factor of the power tρ−1 is
2ρ(ρ − 1) + ρ = 0 .
This is the indicial or characteristic equation which determines the power
of the prefactor. The roots of the indicial equation in the present example
are ρ1 = 0 and ρ2 = 1/2 .
• The coefficient b0 can not be determined from the homogeneous differential
equation which can be multiplied by an arbitrary constant factor. The fac-
tors of the remaining powers (tρ+k with k = 0, 1, . . .) yield the binominal
recursion relation
(ρ + k + 3)
bk+1 = − bk .
(ρ + k + 1)(2ρ + 2k + 1)
The coefficients b1 , b2 , . . . can now be calculated after the choice of one of
the roots of the indicial equation and the value of the coefficient b0 (for
simplicity mostly 1). It can be expected that the different roots correspond
to linearly independent solutions. This has to be checked, however.
The recursion relation of the present example is binominal. It can therefore
be handled easily. Recursion relations with polynomial structure do occur
(one example is the differential equation of the parabolic cylinder function

with a three term recursion) which are more difficult to resolve.
The recursion formula for the root ρ = 0 of the indicial equation is
(k + 3)
bk+1 = − bk .
(k + 1)(2k + 1)
The resolution
2
b1 = −3 b0 b2 = − b1 = 2 b0
3
1 2 3 1
b3 = − b2 = − b0 b4 = − b 3 = b0
3 3 14 7
etc.
gives the power series (b0 = 1)
2 3 1 4 1 5
x1 (t) = 1 − 3t + 2t2 − t + t − t + ...
3 7 45
(k + 1)(k + 2)
+ (−1)k tk + . . . .
2 · 1 · 3 · 5 · · · (2k − 1)
The first question which would have to be answered is the question of the ra-
dius of convergence. After this has been established, further properties of the
function, which is defined by the power series, would have to be investigated
in detail.
The second solution of the indicial equation (ρ2 = 1/2) with the recursion
relation
(2k + 7)
bk+1 = − bk
(2k + 2)(2k + 3)
is

√ 7 21 2 11 3
x2 (t) = t 1 − t + t − t + ...
6 40 80

k (2k + 3)(2k + 5) k
+ (−1) t + ... .
3 · 5 · 2 · 4 · 6 · · · (2k)
It can be established that the radius of convergence of both series is infinite.
The general solution of the differential equation can therefore be given in the
form (after checking linear independence)
xh (t) = c1 x1 (t) + c2 x2 (t) .
Unfortunately, no general recipe can be given if the coefficient functions
do not have the form of a polynomial in the independent variable. It might,
in some circumstances, be possible to obtain the polynomial form with a
suitable transformation. An important example is the differential equation
of Legendre for a function p(θ) of the polar angle of the spherical coordinates
with 0 ≤ θ ≤ π . The differential equation
)
d2 p dp μ2
+ cot θ + λ(λ + 1) − p=0
dθ2 dθ sin2 θ
with parameters μ and λ can be transformed into the differential equation
)
μ2
(1 − t2 )P − 2tP + λ(λ + 1) − P =0
(1 − t2 )
with the substitution

t = cos θ 1 − t2 = sin θ (−1 ≤ t ≤ 1) .
The function P is a function of t = cos θ and is equivalent to the function p
P (cos θ) ≡ p(θ) . The application of the transformation involved the relations
dp dP d2 p d2 P dP
= − sin θ = sin2
θ − cos θ
dθ dt dθ2 dt2 dθ
which are based on the chain rule.
Most of the ’special functions of mathematical physics’ are (as the Leg-
endre functions) defined by differential equations of the Fuchs class. These
functions are discussed in detail in Vol. 2 and Vol. 3 as they play a special
role in electrodynamics and quantum mechanics.
6.4 Addendum: Numerical methods of solution

Approximate numerical methods have to be employed if the solution of a
differential equation can not be obtained analytically. A small selection of
numerical techniques for the solution of initial value problems of differen-
tial equations of first order is presented in this section. The restriction to
differential equations of first order can be justified with the statement that
• differential equations of n -th order (the major interest in physics is second
order) can be recast into the form of a system of n differential equations
of first order and
• the arguments presented below can also be applied to systems of differential
equations of first order.
The problem to be addressed is therefore
Find a solution of the differential equation x (t) = f (t, x(t)) , which
satisfies the initial condition x(a) = xa in the interval a ≤ t ≤ b .
The numerical treatment of this problem can either be based on one-
step or multi-step techniques. One-step techniques are characterised by
the fact that the value of the function x , which is to be approximated, at the
position tk+1 is determined by the approximate value of the function x(tk )
of the previous step. The value of the function x(tk+1 ) is determined by the
approximation of several previous points in the case of multi-step methods.
This method can be implemented by a representation of the function f (t, x(t))
in terms of interpolation polynomials which use
6.4 Addendum: Numerical methods of solution 229
• the values of the function x(t) at the positions tk−l , tk−l+1 , . . . , tk (explicit
method)
or
• the values of the function x(t) at the positions tk−l , tk−l+1 , . . . , tk , tk+1 (im-
plicit method).
One-step methods can be handled more easily. An interval [a, b] is divided

into N parts of equal size. The sampling points are
tk = a + kh (k = 0, 1, . . . , N )
so that the step size amounts to h = (b − a)/N . A direct one-step method is
based on the Taylor expansion method. It uses the expansion (assuming
differentiability up to the order required)
hn (n)
x(t + h) = x(t) + hx (t) + . . . + x (t) + . . .
n!
and obtains the approximate formula by insertion into the differential equa-
tion

hn−1 dn−1 f (t, x(t))
x(t + h) = x(t) + h f (t, x(t)) + . . .
n! dtn−1
= x(t) + h F (t, x(t), h) .
This approach can be applied if the total derivatives, which emerge, can
actually be calculated. An approximate solution of the initial value problem
could then be obtained by successive evaluation of the equation
x(tk+1 ) = x(tk ) + h F (tk , x(tk ), h)
beginning at t0 = a with x(t0 ) = xa .
Most of the one-step methods are, however, based on the evaluation of
the differential equation from one sampling point to the next
t+h t+h
dx(t̃)
dt̃ = dt̃ f (t̃, x(t̃))
t dt̃ t
or
t+h
x(t + h) − x(t) = dt̃ f (t̃, x(t̃)) .
t
Different one-step procedures are obtained by approximating the integral in
this expression
t+h
dt̃ f (t̃, x(t̃)) ≈ I(t, x(t), h) , (6.1)
t
in simpler approximations by direct extension of the approximation formulae
of integral calculus of functions of one variable
t0 +h
I= dt̃ f (t̃) .
t0
The simplest approximation, which can be used in this case, is the rectan-
gle rule (see Fig. 6.4a). It uses only the lower sampling point t0 of the interval
[t0 , t0 + h], so that
Ir = f (t0 )h .
(a) (b)
f(t) f(t)
h
t t
to to
rectangle rule tangent trapezoidal rule
Fig. 6.4. Evaluation of integrals: Simple approximation rules 1
Variants with a linear approximation of the integrand are the tangent trape-
zoidal rule (Fig. 6.4b)
Itt = f (t0 + h/2)h ,
in which the central point of the interval is used as a sampling point, or the
direct trapezoidal rule (Fig. 6.5a) with
1
Itr = (f (t0 ) + f (t0 + h)) h .
2
(a) (b)
f(t) f(t)
t t
to to
direct trapezoidal rule Simpson rule
Fig. 6.5. Evaluation of integrals: Simple approximation rules 2
The best known quadrature rule, Simpson’s rule (Fig. 6.5b),

1
IS = (f (t0 ) + 4f (t0 + h/2) + f (t0 + h)) h (6.2)
6
is obtained by first interpolating the function f (t) at the sampling points
t0 , t0 + h/2 and t0 + h in the interval [t0 , t0 + h] with a quadratic function.
The ansatz
fans (t) = a + b(t − t0 ) + c(t − t0 )2
and the determination of the three coefficients by
f (tsample ) = fans (tsample )
yields the interpolation

t − t0
f (t) ≈ f (t0 ) + [−3f (t0 ) + 4f (t0 + h/2) − f (t0 + h)]
h
2
t − t0
+ [2f (t0 ) − 4f (t0 + h/2) + 2f (t0 + h)] ,
h
integration over the interval [t0 , t0 + h] finally the classical formula (6.2).
The application of these rules to the case of a linear differential equation,
in which an integral over the function f (t, x(t)) is involved, leads to the
simplest methods for the iterative numerical integration of the differential
equation. It is useful to denote the interval by the shorthand notation [t, t+h]
(instead of [tk , tk + h]). The simplest approximations for the integral in (6.1)
are
• The Euler-Cauchy method corresponds to the rectangle rule
Ir = hf (t, x(t)) .
• Improvements of the Euler-Cauchy method are based on the direct trape-
zoidal rule
h
Itr = f (t, x(t)) + f (t + h, x(t + h))
2
• and the tangent trapezoidal rule
Itt = hf (t + h/2, x(t + h/2)) .
The trapezoidal rule (tr) and the tangent trapezoidal rule (tt) are the
starting point of the widely used Runge-Kutta method. An expansion of
the function f in the integral I in these approximations in powers of h gives
for instance the results
h2
(Δx)tt ≈ f (t, x)h + ft (t, x) + fx (t, x)f (t, x)
2
h3
+ ftt (t, x) + 2ftx (t, x)f (t, x) + fxx (t, x)f (t, x)2
8
h2
(Δx)tr ≈ f (t, x)h + ft (t, x) + fx (t, x)f (t, x)
2
h3
+ ftt (t, x) + 2ftx (t, x)f (t, x) + fxx (t, x)f (t, x)2 .
4
The notation Δx = x(t + h) − x(t) and the standard notation for the partial
derivatives of the function f (t, x)
∂ 2 f (t, x(t))
ftx (t, x) ≡ ftx (t, x(t)) =
∂t ∂x
have been used to write this result.
On the other hand, expansion of the full integrand in the integral in (6.1)
about the position t leads to
t+h
df (t, x(t))
(Δx)exact = dt̃ f (t, x(t)) + (t̃ − t) + . . .
t dt

1 dn f (t, x(t)) n
+ (t̃ − t) + . . .
n! dtn
and after integration and evaluation of the total derivatives to the exact
expansion
1 h2
(Δx)exact = f (t, x)h + ft (t, x) + fx (t, x)f (t, x)
2! 2
1
+ ftt (t, x) + 2ftx (t, x)f (t, x) + fxx (t, x)f (t, x)2
3!

+fx (t, x)[ft (t, x) + fx (t, x)f (t, x) ] h3 + . . . .
The expansions for (Δx)tt and (Δx)tr agree with the exact result up to
second order in h . Construction of the combination (first suggested by Runge:
C. Runge, Mathematische Annalen, 46 (1895), p. 167)
2 1
(Δx)comb 1 = (Δx)T T + (Δx)ST ,
3 3
gives a result that agrees at least with the prefactor in third order. However
two of the terms are missing.
The agreement can be improved if the form
1
(Δx)ans 1 = k0 (t, x) + k2 (t, x) (6.3)
2
with
k0 (t, x) = f (t, x) h
k1 (t, x) = f (t + h, x + k0 (t, x)) h
k2 (t, x) = f (t + h, x + k1 (t, x)) h
is taken into consideration. Expansion in terms of powers of h gives in this

case
1 h2
(Δx)ans 1 = f (t, x)h + ft (t, x) + fx (t, x)f (t, x)
2 2
1
+ (ftt (t, x) + 2ftx (t, x)f (t, x) + fxx (t, x)f (t, x)2
4

+2fx (t, x)[ft (t, x) + fx (t, x)f (t, x)] h3 + . . . ,
so that the combination

4 1
(Δx)comb 2 = (Δx)ans 1 − (Δx)tt
3 3
agrees now up to third order with the exact expansion.
The basic idea indicated here
• Use a general ansatz following the pattern

(Δx)RK = an kn (t, x) (6.4)

n=0
with
k0 (t, x) = f (t, x)h
..
.
n−1

kn (t, x) = f (t + αn h, x + βni ki (t, x)h (n ≥ 1)

i=0
and
• determine the parameters of the ansatz (as far as possible) by expansion
in powers of h and comparison with the exact expansion
has been employed by W. Kutta (W. Kutta, Z. für Mathematik und Physik,
46 (1901), p. 435). It is the basis for a large number of classical and of
modern variants of the Runge-Kutta method (see list of literature e.g. in
M. Abramovitz and I. Stegun).
The ansatz suggested has the virtue that optimal intermediate sampling
points in the interval [t, t + h] can be obtained by fixing αn (0 ≤ αn ≤ 1)
and that a direction kn (t, x(t))/h is determined by the polygonal line in
the already calculated directions starting at the initial point. In addition
to the adaption to the exact expansion, the postulate is used, that each of
the intermediate points should lie on the tangent line to the integral curve
f (t, x(t)) through the initial point with an error of better than second order.
This postulate is implemented by
n−1

αn = βni
i=0
as well as

an = 1 .
n=0
A Runge-Kutta formula, which is widely employed, represents an approx-

imation which agrees with the exact expansion up to fourth order. The cal-
culation of four values of the function is necessary in fourth order. The com-
parison with the exact expansion and the requirement concerning the four
sampling points correspond to eight equations for the ten parameters
a0 , a1 , a2 , a4 , β10 , β20 , β21 , β30 , β31 , β32 .
Two of the parameters can be chosen freely. The set of parameters with
1 1
a1 = a4 = a2 = a3 =
6 3
1
β10 =
2
1
β20 = 0 β21 =
2
1
β30 = 0 β31 = 0 β31 =
2
results in
h
(Δx)RK = [k0 (t, x) + 2k1 (t, x) + 2k2 (t, x) + k3 (t, x)]
6
with the auxiliary functions
k0 (t, x) = f (t, x)
h h
k1 (t, x) = f (t + , x + k0 (t, x))
2 2
h h
k2 (t, x) = f (t + , x + k1 (t, x))
2 2
k3 (t, x) = f (t + h, x + hk2 (t, x)) .
The Simpson rule for integration of a function f (t) can be recovered for
the simple differential equation x (t) = f (t) . Insertion of the solution
t
x(t) = xa + dt̃ f (t̃)
gives the Runge-Kutta functions

k1 (t) = k2 (t) = f (t + h/2) k3 (t) = f (t + h)
and hence the original Simpson formula.
The following points would complement this brief introduction to approx-
imation methods. The respective literature should be consulted for a proper
discussion.
• A system of M differential equations of first order is characterised by

xi (t) = fi (t, x1 (t), . . . , xM (t)) with xi (a) = xia (i = 1, . . . , M ) .
The arguments presented above can be carried over directly. The set of
Runge-Kutta formulae with the integral Ii,RK
xi (t + h) = xi (t) + Ii,RK (t, x1 (t), . . . , xM (t))
agrees with the one obtained for the case of a single differential equation
h
Ii,RK = [ki,0 + 2ki,1 + 2ki,2 + ki,3 ]
6
if the auxiliary functions
ki,0 (t, x1 , . . . , xM ) = fi (t, x1 , . . . , xM )
h h
ki,1 (t, x1 , . . . , xM ) = fi (t + , x1 + k1,0 (t, . . .), . . . ,
2 2
h
xM + kM,0 (t, . . .))
2
h h
ki,2 (t, x1 , . . . , xM ) = fi (t + , x1 + k1,1 (t, . . .), . . . ,
2 2
h
xM + kM,1 (t, . . .))
2
ki,3 (t, x1 , . . . , xM ) = fi (t + h, x1 + hk1,2 (t, . . .), . . . ,
xM + hkM,2 (t, . . .))
are used.
• Explicit or implicit approaches are also possible for one-step methods. The
trapezoid rule without an expansion (and approximation) of x(t + h) leads
e.g. to the implicit one-step procedure
h
x(t + h) = x(t) + [f (t, x(t)) + f (t + h, x(t + h))] .
2
The quantity x(t + h) has to be determined by solution of this implicit
equation.
• Concerning the mathematical foundation of approximation methods for the
solution of differential equations the following points have to be considered.
1. The question whether the approximate solution at the sampling points
approaches the exact solution in the limit h → 0 (for the initial condi-
tion given) has to be answered. This point is discussed under the heading
consistence and convergence. In the case of consistence the question
is investigated whether the error for the approximation of the integral
decreases uniformly in the limit h → 0 . Consistence alone is, however,
not sufficient to assure convergence. Convergence implies that the se-
quence of solutions really approaches the exact solution of the initial
value problem with vanishing step size. This happens only if the func-
tion f (t, x(t)) satisfies an additional condition concerning continuity (the
Lipschitz condition). Convergence can be demonstrated for all the meth-
ods discussed above.
2. An additional question of interest is: of which order in h is the error
for the different methods, respectively do explicit estimates of the error
exist?
• The question of roundoff errors has to be raised from a more practical
point of view. A to small choice of the step size favours the accumulation
of roundoff errors. A step size, which is too large, leads on the other hand
to an inaccurate representation of the integral. The optimal choice of the
step size is therefore not an easy matter. An estimate of the stability of
the solution by variation of the number of sampling points is usually the
measure used in practical calculations.
7 Complex numbers and functions
7.1 Definitions
The solutions of quadratic equations are not necessarily real numbers. For
instance, application of the standard method for the solution of the equation
x2 − 10x + 40 = 0
yields
√
x=5± −15 .
It is necessary to extend the system of real numbers and the corresponding
rules of arithmetic in a suitable and consistent manner. The first step is the
introduction of the imaginary unit
√
i = −1 with i2 = −1
so that the solution of the example can be written as
√
x = 5 ± i 15 ,
that is a complex number with a real and an imaginary part. Complex num-
bers have to be represented in a plane, in contrast to real numbers which can
be represented on a number ray. The complex number
z = x + iy x, y real
Im
x
Re
Fig. 7.1. The complex number z = x + iy
238 7 Complex numbers and functions
is represented in the complex (number) plane which is spanned by a real

axis and a perpendicular imaginary axis. An analogy with vectors in a two-
dimensional space can be recognised immediately (compare Math.Chap. 3.1.2).
7.2 Fundamental rules of complex arithmetic
The basic statement concerning the four fundamental rules of arithmetic is:
it is possible to use complex numbers formally in the same manner as real
numbers. Addition and subtraction are executed by adding or subtracting
the real parts and the imaginary parts separately
z = z1 ± z2
= (x1 + iy1 ) ± (x2 + iy2 )
= (x1 ± x2 ) + i(y1 ± y2 ) .
The graphical representation of addition is indicated in Fig. 7.2.
Im
z z2
z1
Re
Fig. 7.2. Addition of two complex numbers
Multiplication corresponds to the application of the distributive law, using

i2 = −1
z = z1 · z2 = (x1 + iy1 ) · (x2 + iy2 )
= (x1 x2 − y1 y2 ) + i(x1 y2 + x2 y1 ) .
Division can, as usual, be defined as the inverse operation with respect to
multiplication. The equation z1 · z = z2 with the unknown number z = x + iy
can be written as
(x1 x − y1 y) + i(x1 y + xy1 ) = x2 + iy2 .
Comparison of real and imaginary parts leads to
x1 x − y1 y = x2 , x1 y + xy1 = y2 (7.1)
and finally by resolution with respect to x, y to
7.2 Fundamental rules of complex arithmetic 239
x1 x2 + y1 y2 x1 y2 − y1 x2
x= , y= . (7.2)
x21 + y12 x21 + y12
The same result can be obtained by extending the fraction z2 /z1 in a suitable
fashion
z2 (x2 + iy2 ) (x1 − iy1 )
z= =
z1 (x1 + iy1 ) (x1 − iy1 )
and explicitly calculating numerator and denominator following the rules for
multiplication
(x1 x2 + y1 y2 ) + i(x1 y2 − x2 y1 )
= .
x21 + y12
Division by the complex number z = 0 + i0 does, naturally, not make sense.
The rules for the four fundamental rules of arithmetic are reduced to the
corresponding rules for real numbers if the imaginary part of the numbers z1
and z2 vanishes. The usual extensions as e.g. raising a complex number z to
the power n , where n is a natural number, with rules like
z n · z m = z n+m ,
follow directly from a consequent application of the fundamental rules.
The trigonometric representation of complex numbers (Fig. 7.3) is
very useful besides the Cartesian representation introduced above. This ap-
plies in particular to the visualisation of multiplication and division. The real
Im
z
ρ
φ
Re
Fig. 7.3. Trigonometric decomposition of a complex number
and the imaginary parts of a complex number z = x + iy are represented as

x = ρ cos φ y = ρ sin φ
in this case. Such a representation is possible for every real pair of num-
bers x, y . The inverse relation for the transition from the Cartesian to the
trigonometric representation is
y
ρ = x2 + y 2 tan φ = .
x
The quantity ρ is the magnitude of the complex number |z| = ρ . An an-

gle φ ≡ arc z is needed for a characterisation of the complex number be-
sides the magnitude. The trigonometric decomposition corresponds (compare
Chap. 2.4) to the use of polar coordinates in R2 .
The concept of the distance (or separation) of two complex numbers is
needed for the discussion of limiting processes in the complex plane. A natural
definition of this quantity is the magnitude of the difference (Fig. 7.4)
|z| = |z2 − z1 | = |z1 − z2 | .
Im
z2
|z|
z1
Re
Fig. 7.4. Distance of two complex numbers
The multiplication of two complex numbers in the trigonometric repre-

sentation
z1 · z2 = ρ1 ρ2 (cos φ1 + i sin φ1 )(cos φ2 + i sin φ2 )
can be carried out by sorting and use of the sum formulae for the sine and
the cosine
= ρ1 ρ2 (cos(φ1 + φ2 ) + i sin(φ1 + φ2 )) .
The result can also be quoted in the form
|z1 z2 | = |z1 ||z2 |
arc(z1 z2 ) = arc z1 + arc z2 .
The product in the trigonometric representation can therefore be charac-
terised by the statement: the magnitude of the product is the product of the
magnitudes, the angle is the sum of the angles (Fig. 7.5).
A corresponding statement is valid for the division
z1 ρ1
= (cos(φ1 − φ2 ) + i sin(φ1 − φ2 )) ,
z2 ρ2

z1 |z1 | z1
= arc = arc z1 − arc z2 .
z2 |z2 | z2
The division of complex numbers is illustrated in Fig. 7.6. The special case
7.2 Fundamental rules of complex arithmetic 241
Im
z1z 2
z2
z1
Re
Fig. 7.5. The product of two complex numbers
Im
z2
Im
φ2
z1 / z 2 Re
z1 z1= 1
z2
Re 1/z2
z1 /z2 1/z2
Fig. 7.6. Division of complex numbers
of the inverse of a complex number with z1 = 1 gives

1 1
= (cos φ2 − i sin φ2 ) .
z2 ρ2
The inverse of the complex number z2 is obtained by reflection of the corre-
sponding point at the unit circle followed by a reflection at the real axis (the
sequence can be interchanged). The image of a complex number z, which has
been reflected at the unit circle, can be obtained, as indicated in Fig. 7.7, by
elementary geometrical means. The construction is based on the theorem of
intersecting lines in the form
OA 1
=
1 z
or
z · z = 1 .
An additional operation, which is used often, is complex conjugation
1
z’
O A 1
Fig. 7.7. Illustrating the intersecting lines theorem
z = x + iy z ∗ = x − iy ,
in trigonometric form
z ∗ = ρ(cos φ − i sin φ) .
Im
z
φ
Re
z*
Fig. 7.8. Complex conjugation
This corresponds to a reflection at the real axis (Fig. 7.8). The product z z ∗
is clearly a real number.
Multiple application of the rules for multiplication gives the Moivre for-
mula
z n = [ρ(cos φ + i sin φ)]n
= ρn (cos nφ + i sin nφ) .
Application of the binomial theorem for (cos φ + i sin φ)n followed by a sep-
aration of real and imaginary parts results in a representation of cos nφ and
sin nφ in terms of powers of cos φ and sin φ, as for instance in the simplest
case
(cos φ + i sin φ)2 = (cos2 φ − sin2 φ) + 2i sin φ cos φ
= cos 2φ + i sin 2φ .
7.3 Elementary functions 243
The standard relations for double the angle can be read off here. Correspond-
ing formulae for triple, quadruple, . . . the angle can be generated relatively
easily.
7.3 Elementary functions
A function of one complex variable

w = f (z)
Im(w)
Im(z)
w = f (z) Re(w)
codomain
=⇒
domain of definition Re(z)
Fig. 7.9. Domain of definition and codomain of a function of one complex variable
maps a given domain of the complex plane (domain of definition) onto a

different domain of the complex plane (codomain). The situation is best rep-
resented in terms of two planes: the original z plane and the explicit image
plane (w).
Simple examples are:
• The linear function w = az + b with complex constants a, b .
Every point of the plane is shifted by the constant b for a = 1, b = 0
(Fig. 7.10). The parameters a = 0, b = 0 (Fig. 7.11) describe a rotation
with stretching.
• The function represents w = 1/z a reflection at the unit circle plus complex
conjugation. The fix points of this mapping are the points ±1 on the real
axis (Fig. 7.12).
• The quadratic function w = z 2 maps points of the z plane onto a doubly
covered image plane (Fig. 7.13). Points of the upper half plane are mapped
onto the complete w plane (as φw = 2φz ). A corresponding statement holds
Im(w)
Im(z)
z+b
b
=⇒
z z
Re(z) Re(w)
Fig. 7.10. w = f (z) = 1 · z + b
Im(z)
Im(w)
az
=⇒
z z
Re(z) Re(w)
Fig. 7.11. w = f (z) = a · z
Im(w)
Im(z)
=⇒
Re(w)
–1 1 Re(z)
Fig. 7.12. w = f (z) = 1/z
Im(z) Im(w)
Re(w)
Re(z)
=⇒
Fig. 7.13. w = f (z) = z 2

for the lower half plane. An unambiguously reversible mapping is obtained

by the use of a representation of the map on Riemann surfaces.
upper half plane (without positive real axis)

Im(z) > 0, plus Im(z) = 0 with Re(z) < 0 −→ full w plane,
the first sheet
lower half plane (without negative real axis)

Im(z) < 0, plus Im(z) = 0 , with Re(z) > 0 −→ a second sheet
of the w plane.
As one ought to return to the first sheet of the w plane after a full rev-
olution, one has to imagine that the two sheets of the w plane are cut
along the positive real axis and are then connected crosswise. Figure 7.14
illustrates the situation as viewed in the direction of the axis.
Im(w)
2 2
1 1
positive real
axis
Fig. 7.14. Riemann surfaces
The doubly covered w , which is connected along the real axis, is referred to
as the Riemann surface of the function w = z 2 . Each point of this surface
is endowed with two values (on the lower and the upper sheet) except the
origin (w = z = 0), which only occurs once. This point is called a branch
point of the surface. The advantage of the Riemann construction for the
function w = z 2 is the unambiguous mapping of the simply covered z plane
onto the two sheets of the w plane.
The complete analysis of functions of one real variable can be carried
over to the complex case in spite of the slightly more involved form of the
representation. Topics, which would have to be covered (see Mathematical
Supplement of Vol. 2), are e.g.
• sequences of numbers with complex members and their convergence,
• infinite series including power series,
• limiting values of functions,
• continuity and differentiability of functions.
(Compare Math.Chap. 1 for a corresponding discussion of these points for
functions of one real variable).
For instance, the definition of the derivative of a complex function f (z)

at the position ζ is
df (f (z) − f (ζ))
= f (ζ) = lim . (7.3)
dz ζ z→ζ (z − ζ)
The point ζ can be approached from all sides in the plane. This fact is respon-
sible for some significant differences with respect to the analysis of functions
of one real variable.
Higher complex functions are usually defined by their power series. For
instance, the series

∞
1 n
z = 1 + z + . . . = ez
n=0
n!
is called the complex exponential function as it represents the usual expo-
nential function for real values of z.
A short indication of the properties of this function is
1. The power series converges absolutely for every z . This allows (by mul-
tiplication of the power series and sorting) the derivation of the formula
ez1 ez2 = ez1 +z2 .
2. This formula leads to the relation
ez = ex+iy = ex eiy x, y real .
Consideration of the power series for eiy and sorting with respect to real
and imaginary parts then yields

∞
(iy)n
eiy =
n=0
n!
∞

∞
y 2k y 2k+1
= (−)k +i (−)k
(2k)! (2k + 1)!
k=0 k=0
= cos y + i sin y .
One extracts
|ez | = eRe(z) arc(ez ) = Im(z) .
3. One finds with the power series for e−iy
e−iy = cos y − i sin y .
From the statements for e±iy follow after resolution with respect to cos y
or sin y the relations
1
cos y = (eiy + e−iy )
2
1
sin y = (eiy − e−iy ) .
2i
These are the relations which are used often for a representation of oscil-
lations or of wave phenomena.
4. Other well used relations are
e2πi = 1 eπi = e−πi = −1
eπi/2 = i e−πi/2 = −i .
The multiplication formula given above states therefore in particular
ez+2πi = ez .
The complex exponential function is a periodic function with the period1
2πi . This implies that this function maps a fundamental strip of the
z plane (the standard choice is −π < Im z ≤ π) onto the entire w plane
(Fig. 7.15).
Im(z)
Im(w)
iπ
Re(z)
Re(w)
-i π w=e z
=⇒
Fig. 7.15. Domain of definition of the fundamental strip and corresponding
codomain of the function ez
This extremely compact indication of the complex analysis should be aug-

mented by consultation of the list of literature. The discussion of complex
functions is continued in Vol. 2.
1
Note: in the direction of the imaginary axis.
8 List of literature
The following listing contains textbooks with mathematical topics which are
available in book shops and in libraries. The books are listed separately for
the different fields. The alphabetic order does not refer to the level or the
quality of the presentation.
Reference books
Handbooks, general
• I.N Bronstein, K.A. Semendjajew: ’Handbook of Mathematics’ (Springer

Verlag, Berlin, 2007)
• A.C. Fischer-Cripps: ’The Mathematics Companion’ (Institute of Physics,
Bristol, 2005)
• A. Jeffrey: ’Handbook of Mathematical Formulae and Integrals’ (Elsevier,
Amsterdam, 2008)
• L. Råde, B. Westergren: ’Mathematics Handbook for Science and Engi-
neering’ (Springer Verlag, New York, 1999)
• J. Harris, H. Stöcker: ’Handbook of Mathematical and Computational Sci-
ence’ (Springer Verlag, New York, 1998)
• E. Zeidler (editor): ’Oxford Users’ Guide to Mathematics’ (Oxford Univer-
sity Press, Oxford, 2004)
• D. Zwillinger ’Standard Mathematical Tables and Formulae’ (CRC Chap-
man and Hall, London, 2003)
Special functions
• M. Abramovitz, I. Stegun: ’Handbook of Mathematical Functions’ (Dover,

New York, 1964, tenth printing 1972, electronic version: see web)
• G. Andrews, R. Askey, R. Roy: ’Special Functions’ (Cambridge University
Press, Cambridge, 2000)
• A. Erdelyi, W. Magnus, F. Oberhettinger, F.G. Tricomi: ’Higher transcen-
dental Functions’ Vols. I and II (McGraw Hill, New York, 1953 and 1954,
reprint 1981 by R.E. Krieger Publ. Co.)
250 8 List of literature
• W. Magnus, F. Oberhettinger, R.P. Soni: ’Formulae and Theorems for the

Special Functions of mathematical Physics’ (Springer Verlag, New York,
1966)
• N.M. Temme: ’Special Functions, an Introduction’ (Wiley, New York, 1996)
• Z.X. Wang, D.R. Guo: ’Special Functions’ (World Scientific, Singapore,
1989)
Tables of integrals
• A. Apelblat: ’Tables of Integrals and Series’ (Verlag H. Deutsch, Frank-

furt/M, 1996)
• Y. Brytschkow, O.Maritschew, A. Prudnikow: ’Integrals and series’ (Nauka,
Moscow, 1981) IN RUSSIAN
• I.S. Gradshteyn, I.M. Ryzhik: ’Tables of Integrals, Series and Products’
(Academic Press, New York, 1998 and electronic version 2007)
• W. Gröbner, N. Hofreiter: ’Integraltafel’ Band I und II (Springer Verlag,
Wien, 1975 and 1973) IN GERMAN
• R.J. Tallarida: ’Pocketbook of Integrals and Mathematical Formulae’ (CRC
Chapman and Hall, London, 2009)
• see also the relevant sections of the general handbooks
Special areas
Linear Algebra
• S. Lang: ’Linear Algebra’ (Springer Verlag, Berlin, 2004)

• J.F. Montague, E.F. Robertson, T.S. Blyth: ’Basic Linear Algebra’ (Springer
Verlag, Berlin, 2002)
• D.J.S. Robinson: ’A Course in Linear Algebra with Applications’ World
Scientific, Singapore, 2006)
• G. Strang: ’Introduction to Linear Algebra’ (Cambridge University Press,
Cambridge, 2009)
• G. Strang: ’Linear Algebra’ (Springer Verlag, Berlin, 2003)
Analysis
• R.G. Bartle, D.R. Sherbert: ’Introduction to Real Analysis’ (Wiley and

Sons, New York, 2000)
• L. Brand: ’Advanced Calculus’ (Dover, New York, 2006)
• R.C. Buck: ’Advanced Calculus’ (Waveland Press, Long Grove, 2004)
• R. Courant, F. John: ’Introduction to Calculus and Analysis’ Vols. I and
II/1, II/2 (Springer Verlag, Heidelberg, 1989)
• S. Lang: ’A First Course in Calculus’ (Springer Verlag, Heidelberg, 1998)
8 List of literature 251
• S. Lang: ’Calculus of Several Variables’ (Springer Verlag, Heidelberg, 1996)

• G.V. Limaye, S.R. Ghorpade: ’A course in Calculus and Real Analysis’
(Springer Verlag, Heidelberg, 2006)
Vector Analysis
• K. Jaenisch: ’Vector Analysis’ (Springer Verlag, Heidelberg, 2001)

• P.C. Matthews: ’Vector Calculus’ (Springer Verlag, Heidelberg, 1998)
• M. Spiegel: ’Schaum’s Outline of Vector Analysis’ (McGraw-Hill, New
York, 1959)
Ordinary Differential Equations
• V.I. Arnold: ’Ordinary Differential Equations’ (MIT Press, Boston, 1998)

• E.A. Coddington, ’An Introduction to Ordinary Differential Equations’
(Dover, New York, 1989)
• L. Collatz: ’Differential Equations: an Introduction with Applications’
(John Wiley, New York, 1986)
• H. Heuser: ’Ordinary Differential Equations’ (Teubner Verlag, Wiesbaden,
1991)
• A.D. Polyanin, V.F. Zajitsev: ’Handbook of Exact Solutions for Ordinary
Differential Equations’ (Chapman and Hall, London, 2002)
• M. Tennebaum, H. Pollard: ’Ordinary Differential Equations’ (Dover, New
York, 1985)
• W. Walter: ’Ordinary Differential Equations’ (Springer Verlag, Heidelberg,
1998)
Theory of Functions – Complex Variables
• L. Ahlfors: ’Complex Analysis’ (McGraw-Hill, New York, 1979)

• R.V. Churchill, J.W. Brown: ’Complex Variables and Applications’ (McGraw-
Hill, New York, 1989)
• J.M. Howie: ’Complex Analysis’ (Springer Verlag, Heidelberg, 2003)
• K. Knopp: ’Theory of Functions’ (Dover Publications, New York, 1996)
• S.G. Krantz, S. Kress, R. Kress: ’Handbook of Complex Variables’ (Birkhaeuser
Verlag, Heidelberg, 1999)
• S. Lang: ’Complex Analysis’, (Springer Verlag, Heidelberg, 2003)
Index
Bessel’s inequality, 27 – complex conjugation, 241

Grassmann’s expansion, 65 – distance, 240
– inverse, 241
approximation method – Moivre formula, 242
– Euler-Cauchy, 231 – trigonometric representation, 239
– multi-step, 228 constant of integration, 34
– numerical, 228 continuity, 7, 114
– one-step, 228 convergence, 4
– Runge-Kutta, 231 – criteria, 18
– Simpson’s rule, 231 – majorant criterion, 18
– Taylor expansion, 229 – quotient criterion, 20
– trapezoidal rule, 231 – radius, 21
– root criterion, 19
basis vector, 58 – Taylor series, 21
binomial formula, 15 coordinate system
binomial symbol, 15 – Cartesian, 58
boundary value problem, 37 – plane polar coordinates, 131
branch point – reciprocal, 69
– Riemann surface, 245 – spherical, 117, 126, 131
Cramer’s rule, 97
calculation rules
– complex numbers, 238 decomposition into components, 60
– determinant, 103 – contravariant, 69
– line integration, 172 – covariant, 69
– matrix, 84 delta function, 193
– matrix multiplication, 81 derivative, 9
– nabla operator, 170 determinant
– partial differentiation, 128 – calculation rules, 103
– vector, 67 – Cramer’s rule, 97
– vectors, 61 – definition, 97, 102
Cartesian components, 60 – expansion, 103
Cauchy principal value, 32 – rule of Sarrus, 100
chain rule, 128 difference-differential equation, 210
characteristic equation, 47, 226 differentiability, 9
classification differential
– of functions, 10 – total, 126, 212
complex number, 237 differential equation
– calculation rules, 238 – boundary value problem, 37
254 Index
– characteristic equation, 47 – normalisation condition, 105

– condition of integrability, 213 Euler-Cauchy method, 231
– constant of integration, 34 exponential function
– degree, 208 – complex, 246
– Fuchs class, 225
– fundamental system, 223 factorial
– higher degree, 218 – Gamma function, 136
– implicit, 35, 220 flux, 185
– indicial equation, 226 function
– initial value problem, 38 – classification, 10
– Laplace, 118 – complex
– linear, 44, 223 – – branch point, 245
– – general solution, 44 – – exponential function, 246
– – inhomogeneous, 44, 49 – – fundamental strip, 247
– – solution, 223 – – Riemann surfaces, 245
– linear, second order, 47 – complex variables, 243
– numerical approximation of solution, – continuity, 7
228 – – n variables, 114
– order, 33, 208 – definition, 1
– ordinary – differentiability, 9
– – second order, 219 – differentiation
– – separation of variables, 211 – – partial, 114
– partial, 39, 117, 208 – directional derivative, 119
– particular solution, 37, 223 – explicit, 107
– principle of superposition, 45, 217 – family of curves, 34
– recursion relation, 226 – implicit, 110
– separation of variables, 40 – limiting value, 6
– superposition principle, 225 – – n variables, 113
– systems, 209 – periodic, 25
– transformation of variables, 211 – points of discontinuity, 8
– variation of the constant, 218 – three variables, 111
differential quotient, 9 – two variables, 107
differentiation fundamental strip, 247
– partial, 114
– – chain rule, 128 Gamma function, 135
– vector fields, 167 gradient, 168
directional cosines, 121 gradient operator, 122
directional derivative, 119 – planar polar coordinates, 130
distribution, 193 – spherical coordinates, 131
divergence
– definition, 168 Hesse canonical form, 63
– illustration, 191
divergence theorem, 188 imaginary unit, 237
domain integral, 140 indicial equation, 226
double integral, 139 initial value problem, 38
inner product, 54
eigenvalue problem, algebraic, 104 integral
– eigenvalue, 105 – with f (x)
– eigenvector, 105 – – Cauchy principal value, 32
Index 255
– – complete elliptic, 164 – multiplication with number, 78

– – elliptic, 163 – null, 79
– – improper, 28 – regular/singular, 84
– – improper, limiting value, 29 – row vector, 77
– with f (x, y) – rules of calculus, 79
– – arbitrary domain, 141 – similar, 78
– – domain integral, 140 – square, 76
– – double integral, 139 – unit, 83
– – elliptic, 134 Moivre formula, 242
– – rule for substitution, 147
– – simple, 132 nabla calculus, 170
– – simple, rule of differentiation, 137 nabla operator, 169
– – simple,extended rule of differentia-
operator
tion, 138
– gradient, 122
– with f (x, y, z)
– Laplace, 118
– – fixed limits, 152
– nabla, 169
– – rule for substitution, 158
Orthogonality relations
– – variable limits, 154
– trigonometric functions, 26
integral equation, 209
integro-differential equation, 210 parallelepipedal product, 64
inverse matrix, 83 particular solution, 37
power series, 12
Jacobian, 150 – binomial series, 14
– exponential function, 13
Kronecker symbol, 59 – geometrical series, 15
– Taylor expansion, 13
Laplace operator, 118, 170 – trigonometric functions, 14
– plane polar coordinates, 131 principle of superposition, 217
– spherical coordinates, 131 product
Levi-Civita symbol, 59 – matrix, 79
limiting value, 6, 113
– improper integral, 29 Riemann surfaces, 245
line integration right hand grip rule, 178
– calculation rules, 172 rotation
– definition line integral, 212 – definition, 168
– path of integration, 173 – – formal, 204
linear independence, 45, 99 – interpretation, 201
rule
matrix – for matrix multiplication, 81
– addition, 78 – for scalar product, 55
– calculation rules, 84 – for vector product, 58
– column vector, 77 – human figure, 57
– definition, 76 – matrix calculus, 79
– difference, 79 – of Sarrus, 100
– inverse, 83 – right hand, 178, 198
– invertible, 84 – right hand grip, 178
– main diagonal, 77 – right hand rule, 57
– multiplication, 79 – screw, 178, 198
– – rules, 81 – screw rule, 57
256 Index
Runge-Kutta method, 231 – linear, R2 , 86

– linear, R3 , 91
scalar, 51 – orthogonal, 88, 92
scalar field, 165 – parity operation, 97
scalar product, 54 – projection, 87, 92
– nonorthogonal decomposition, 71 – reflections
separation of variables, 40 – – R3 , 95
sequences – rotation, 75
– convergence, 4 – – R3 , 92
– limiting value, 4 – rotation plus stretching, 87, 92
– numerical sequence, 4 – translation, 86
series trapezoidal rule, 231
– Fourier, 25 triple vector product, 65
– function, 24
– numerical, 16 unit matrix, 83
– partial sum, 17 unit vector, 58
– power, 12
Simpson’s rule, 231 variation of the constant, 218
space vector
– Euclidian, 60 – Grassmann’s expansion, 65
– – n-dimensional, 68 – addition, 53, 61
– vector – basis, 58
– – n-dimensional, 67 – calculation rules
spat product, 64 – – n-dimensional, 67
Stokes – decomposition
– theorem of, 198 – – contravariant, 69
superposition principle, 45, 225 – – covariant, 69
surface integral, 177 – decomposition into components
– double covering, 182 – – Cartesian, 60
– evaluation, 179 – definition, 51
– orientation, 180 – human figure rule, 57
symbol – inner product, 54
– binomial, 15 – linear independence, 99
– Kronecker, 59 – multiplication with scalar, 53, 61
– Levi-Civita, 59 – orthogonality, 55
– outer product, 56
Taylor expansion, 13 – parallelepipedal product , 64
– method of , 229 – reciprocal coordinate system, 69
tensor – right hand rule, 57
– metric, 68 – scalar product, 54, 61
theorem – screw rule, 57
– of Gauss, 188 – spat product, 64
– – proof, 189 – subtraction, 54, 61
– of Schwarz, 119, 212 – triple vector product, 65
– of Stokes, 198 – unit, 58
– – proof, 198 – vector product, 56, 62
transformation vector field, 165
– coordinate, 72 – circulation, 204
– Euler angles, 95 – classification, 206
Index 257
– differentiation, 167 – definition, 165

– divergence, 168 – line integrals, 171
– flux, 185 – surface integral, 177
– nabla operator, 169 vector product, 56
– rotation, 168 vortex field, 206
– source/sink, 188
vector function Wronski determinant, 46, 223

Dreizler E

Enviado por

Dados do documento

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Dreizler E

Enviado por

Direitos autorais:

Formatos disponíveis

R.M. Dreizler und C.S.

June 29, 2010

The name ’Mathematical Supplement’ indicates that this is not a full-ﬂedged

1 Analysis I: Functions of one real variable . . . . . . . . . . . . . . . . . 1

4 Analysis II: Functions of several variables . . . . . . . . . . . . . . . . . 107

5 Basic concepts of vector analysis . . . . . . . . . . . . . . . . . . . . . . . . . 165

6 Diﬀerential equations II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

7 Complex numbers and functions . . . . . . . . . . . . . . . . . . . . . . . . . . 237

8 List of literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

1.1 The concept of a function

The deﬁnition of a function of one variable includes the statements:

Fig. 1.1. Domain of deﬁnition and co-domain

function =⇒ unique speciﬁcation x(t)

the function x = et the function sin 1/t

Fig. 1.2. Diagrams of the examples for x(t)

• The speciﬁcation in the domain [−∞, ∞] is

Fig. 1.3. Diagram of the present step function

more restricted class of functions (excepting some special cases) is of interest

1.2 Continuity and diﬀerentiability

1.2.1 Naive considerations

Fig. 1.4. A ’continuous’ function

1.2.3 Convergence of sequences

A sequence {an } is convergent and has the limiting value

if it is possible to ﬁnd for each number ε > 0 a natural number

This convergence criterion sounds rather formal but is nonetheless a prac-

Fig. 1.5. Illustration of the convergence of numerical sequences

The following explicit example, a sequence with the law

2. The postulate ν > N corresponds to 1/N > 1/ν .

A warning should be expressed at this stage: a practically inclined person

1.2.4 Limiting value of a function

The task is the determination of the limiting value of this function at

1.2.5 Continuous functions

Fig. 1.7. An obviously discontinuous function

1.2.5.1 Points of discontinuity. It might be of use to name possible points

isolated jump step

(right-handed) value of the function (Fig. 1.8b) or vice versa.

Fig. 1.9. Function with a gap in the domain

inﬁnities inﬁnite jumps

ematically rigorous manner. A more lax approach is often used in physics.

1.2.6 Diﬀerentiable functions

The discussion of diﬀerentiable functions just addresses a particular limiting

This deﬁnition is usually expressed verbally in the following way: the

Fig. 1.11. A function which is continuous but not diﬀerentiable at t = 1

The reverse statement is, however: a function is continuous at a point t0

cases be approximated by a step function (Fig. 1.12b). This function can be

1.3 Series expansions

Series expansions are of considerable importance for practical applications in

1.3.1 Taylor series

Taylor series can be introduced as a step by step approximation of a function

Fig. 1.13. Simple approximation of x(t) at the point t = 0

This ansatz would represent an approximation of the function by a polyno-

This ansatz raises a number of questions:

with the result2

• The derivatives of x(t) = sin t are

x (t) = − cos t x(4) (t) = sin t ...

x(n) (t) = α(α − 1) . . . (α − n + 1)(1 + t)α−n ,

The coeﬃcients can be expressed in terms of the binomial symbol

The binomial series breaks oﬀ if α is a positive integer (α = m). There results

The binomial coeﬃcients can be expressed fully in terms of factorials if α is

Diﬀerentiation of this expression (if possible) any number of times yields

1.3.2 Numerical series or number series

Numerical series are denoted generically as

The individual terms represent numbers. Numerical series can be obtained by

1.3.2.1 The calculation of the cumulative values. It is possible to