Você está na página 1de 132

Notes on

Classical and Quantum Mechanics

Jos Thijssen

February 10, 2005

(560 pages)

Available beginning of 1999


These notes have been developed over several years for use with the courses Classical and Quantum
Mechanics A and B, which are part of the third year applied physics degree program at Delft Uni-
versity of Technology. Part of these notes stem from courses which I taught at Cardiff University of
Wales, UK.
These notes are intended to be used alongside standard textbooks. For the classical part, several
texts can be used, such as the books by Hand and Finch (Analytical Mechanics, Cambridge Uni-
versity Press, 1999) and Goldstein (Classical Mechanics, third edition, Addison Wesley, 2004), the
older book by Corben and Stehle (Classical Mechanics, second edition, Dover, 1994, reprint of 1960
edition), and the textbook by Kibble and Berkshire, (Classical Mechanics, 5th edition, World Scien-
tific, 2004). The part on classical mechanics is more self-contained than the quantum part, although
consultation of one or more of the texts mentioned is essential for a thorough understanding of this
For the quantum mechanics part, we use the book by D. J. Griffiths (Introduction to Quantum
Mechanics, Second Edition, Pearson Education International/Prentice Hall, 2005). This is a very
nice, student-friendly text which, however, has two drawbacks. Firstly, the informal way in which
the material is covered, has led to a non-consistent use of Dirac notation; very often, the wavefunc-
tion formalism is used instead of the linear algebra notation. Secondly, the book does not go into
modern applications of quantum mechanics, such as quantum cryptography and quantum computing.
Hopefully these notes remedy that situation. Other books which are useful for learning this mate-
rial from are Introductory Quantum Mechanics by Liboff (fourth edition, Addison Wesley, 2004) and
Quantum Mechanics by Bransden and Joachain (second edition, Prentice Hall, 2000). Many more
standard texts are availbale – we finally mention here Quantum Mechanics by Basdevant and Dal-
ibard (Springer, 2002) and, by the same authors, The Quantum Mechanics Solver (Springer, 2000).
Finally, the older text by Messiah (North Holland, 1961) the books by Cohen-Tannoudji, Diu and
Laloë (2 vols., John Wiley, 1996), by Gasiorowicz (John Wiley, 3rd edition, 2003) and by Merzbacher
(John Wiley, 1997) can all be recommended.
Not all the material in these notes can be found in undergraduate standard texts. In particular, the
chapter on the relation between classical and quantum mechanics, and those on quantum cryptography
and on quantum information theory are not found in all books listed here, although Liboff’s book
contains a chapter on the last two subjects. If you want to know more about these new developments,
consult Quantum Computing and Quantum Information by Nielsen and Chuang (Cambridge, 2000).
Along with these notes, there is a large problem set, which is more essential than the notes them-
selves. There are many things in life which you can only learn by doing it yourself. Nobody would
seriously believe you can master any sport or playing a musical instrument by reading books. For
physics, the situation is exactly the same. You have to learn the subject by doing it yourself – even by
failing to solve a difficult problem you learn a lot, since in that situation you start thinking about the
structure of the subject.
In writing these notes I had numerous discussions with and advice from Herre van der Zant and
Miriam Blaauboer. I hope the resulting set of notes and problems will help students learn and appre-
ciate the beautiful theory of classical and quantum mechanics.


Preface i

1 Introduction: Newtonian mechanics and conservation laws 1

1.1 Newton’s laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Systems of point particles – symmetries and conservation laws . . . . . . . . . . . . 3

2 Lagrange and Hamilton formulations of classical mechanics 8

2.1 Generalised coordinates and virtual displacements . . . . . . . . . . . . . . . . . . . 8
2.2 d’Alembert’s principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.1 The pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.2 The block on the inclined plane . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 Heavy bead on a rotating wire . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 d’Alembert’s principle in generalised coordinates . . . . . . . . . . . . . . . . . . . 15
2.5 Conservative systems – the mechanical path . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6.1 A system of pulleys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6.2 Example: the spinning top . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.7 Non-conservative forces – charged particle in an electromagnetic field . . . . . . . . 23
2.7.1 Charged particle in an electromagnetic field . . . . . . . . . . . . . . . . . . 23
2.8 Hamilton mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.9 Applications of the Hamiltonian formalism . . . . . . . . . . . . . . . . . . . . . . 27
2.9.1 The three-pulley system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.9.2 The spinning top . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.9.3 Charged particle in an electromagnetic field . . . . . . . . . . . . . . . . . . 29

3 The two-body problem 30

3.1 Formulation and analysis of the two-body problem . . . . . . . . . . . . . . . . . . 30
3.2 Solution of the Kepler problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4 Examples of variational calculus, constraints 35

4.1 Variational problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 The brachistochrone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Fermat’s principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4 The minimal area problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.5 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.5.1 Constraint forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.5.2 Global constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Contents iii

5 From classical to quantum mechanics 45

5.1 The postulates of quantum mechanics . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Relation with classical mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 The path integral: from classical to quantum mechanics . . . . . . . . . . . . . . . . 50
5.4 The path integral: from quantum mechanics to classical mechanics . . . . . . . . . . 53

6 Operator methods for the harmonic oscillator 55

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 The harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7 Angular momentum 60
7.1 Spectrum of the angular momentum operators . . . . . . . . . . . . . . . . . . . . . 60
7.2 Orbital angular momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.3 Spin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.4 Addition of angular momenta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
7.5 Angular momentum and rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

8 Introduction to Quantum Cryptography 69

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.2 The idea of classical encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
8.3 Quantum Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

9 Scattering in classical and in quantum mechanics 75

9.1 Classical analysis of scattering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.2 Quantum scattering with a spherical potential . . . . . . . . . . . . . . . . . . . . . 78
9.2.1 Calculation of scattering cross sections . . . . . . . . . . . . . . . . . . . . 82
9.2.2 The Born approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

10 Symmetry and conservation laws 87

10.1 Noether’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
10.2 Liouville’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

11 Systems close to equilibrium 92

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
11.2 Analysis of a system close to equilibrium . . . . . . . . . . . . . . . . . . . . . . . 93
11.2.1 Example: Double pendulum . . . . . . . . . . . . . . . . . . . . . . . . . . 95
11.3 Normal modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
11.4 Vibrational analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
11.5 The chain of particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

12 Density operators — Quantum information theory 103

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2 The density operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3 Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4 The EPR paradox and Bell’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5 No cloning theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6 Dense coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
7 Quantum computing and Shor’s factorisation algorithm . . . . . . . . . . . . . . . . 116
iv Contents

Appendix A Review of Linear Algebra 119

1 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

Appendix B The time-dependent Schrödinger equation 123

Appendix C Review of the Schrödinger equation in one dimension 125


Introduction: Newtonian mechanics and

conservation laws

In this lecture course, we shall introduce some mathematical techniques for studying problems in
classical mechanics and apply them to several systems. In a previous course, you have already met
Newton’s laws and some of its applications. In this chapter, we briefly review the basic theory, and
consider the interpretation of Newton’s laws in some detail. Furthermore, we consider conservation
laws of classical mechanics which are connected to symmetries of the forces, and derive these conser-
vation laws starting from Newton’s laws.

1.1 Newton’s laws

The aim of a mechanical theory is to predict the motion of objects. It is convenient to start with point
particles which have no dimensions. The trajectory of such a point particle is described by its position
at each time. Denoting the spatial position vector by r, the trajectory of the particle is given as r(t),
a three-dimensional function depending on a one-dimensional coordinate: the time. The velocity is
defined as the time-derivative of the vector r(t), and by convention it is denoted as ṙ(t):

ṙ(t) = r(t), (1.1)
and the acceleration a is defined as the second derivative of the position vector with respect to time:

a(t) = r̈(t). (1.2)

The last concept we must introduce is that of momentum p: it is defined as

p = mṙ(t), (1.3)

where m is the mass. Although we have an intuitive idea about the meaning of mass, this is also a
rather subtle physical concept, as is clear from the frequent confusion of mass with the concept of
weight (see below).
Now let us state Newton’s laws:

1. A body not influenced by any other matter will move at constant velocity

2. The rate of change of momentum of a body is equal to the force, F:

= F(r,t). (1.4)

2 Introduction: Newtonian mechanics and conservation laws

Table 1.1: Forces for various systems. The symbol mi stand for the mass of point particle i, qi stands for electric
charge of particle i, B is a magnetic, and E an electric field. G,  and g are known constants. The gravitational
and the electrostatic forces are directed along the line connecting the two particles i = 1, 2.

Forces in nature
System Force
Gravity FG = GmM r12
Gravity near earth’s surface Fg = −mgẑ
Electrostatics FC = 4π 0
q1 q2 r12
Particle in an electromagnetic field FEM = q (E + ṙ × B)
Air friction Ffr = −γ ṙ

3. When a particle exerts a force F on another particle, then the other particle exerts a force on the
first particle which is equal in magnitude but opposite in direction to the force F – these forces are
directed along the line connecting the two particles. Denoting the particle by indices 1 and 2, and
the force exerted on 1 by 2 by F1,2 and the force exerted on 2 by 1 by F2,1 , we have:

F1,2 = −F2,1 = ±F1,2 r̂1,2 . (1.5)

where r̂1,2 is a unit vector pointing from r1 to r2 . The ± denotes whether the force is repulsive
(−) or attractive (+).

Some remarks about these laws are in place. It is questionable whether the second law is really
a statement, as a new vector quantity, called ‘force’, is introduced, which is not yet defined. Only if
we know the force, we can predict how a particle will move. In that sense, a real ‘law’ is only formed
by combining Newton’s second law together with an explicit expression for the force. In table 1.1,
known forces are given for several systems. Note that the force generally depends on the position r,
on the velocity ṙ, and also explicitly on time (e.g. when an external, time-varying field is present). An
implicit dependence on time is further provided by the time dependence of the position vector r(t).
In most cases, the mass is taken to be constant, although this is not always true: you may think of
a rocket burning its fuel, or disposing of its launching system, or bodies moving at a speed of the order
of the speed of light, where the mass deviates from the rest mass. With constant mass, the second law
mr̈(t) = F(r,t). (1.6)
In fact, the second law disentangles two ingredients of the motion. One is the mass m, which is
a property of the moving particle which is acted upon by the force, and the other the force, which
itself arises from some external origin. In the case of gravitational interaction, the force depends
on the mass, which drops out of the equation of motion. Generally, mass can be described as the
resistance to velocity change, as the second law states that the larger the mass, the smaller the change
in velocity (for the same force). It is an experimental fact that the mass which enters the expression
for the gravitational force is the same as this universal quantity mass, which occurs for any force in
the equation of motion. The weight is the gravity force acting on a body.
Usually, the first law is phrased as follows: ‘when there is no force acting on a point particle, the
particle moves at constant velocity’. This statement obviously follows from the second law by taking
F = 0. The formulation adopted above emphasises that force has a material origin. It is impossible
1.2. Systems of point particles – symmetries and conservation laws 3

fulfill the requirements of this law, as everywhere in the universe gravitational forces are present: the
first law is an idealisation. The first law is not obvious from everyday life, where it is never possible
to switch friction off completely: in everyday life, motion requires a force in order to be maintained.
The third law is a statement about forces. It turns out that this statement does not hold exactly,
as the forces of this statement should act simultaneously. In quantum field theory, particles travelling
between the interacting particles are held responsible for the interactions, and these particles cannot
travel at a speed faster than that of light in vacuum (about 3 · 108 m/s). However, for everyday life
mechanics, the third law holds to sufficient precision, unless the moving particles carry a charge and
interact through electromagnetic interactions. In that case, the force acts no longer along the line
connecting the two particles.

1.2 Systems of point particles – symmetries and conservation laws

Real objects which we describe in mechanics are not point particles, but to very good agreement
they can be considered as large collections of interacting point particles – in this section we consider
systems consisting of N point particles. It is possible to disentangle the mutual forces acting between
these particles from the external ones. The mutual forces satisfy Newton’s third law: for every force
Fi, j , which is the force exerted by particle j on particle i, the force F j,i is equal in magnitude but
opposite in direction to Fi, j . For a particle i, we consider all the mutual forces Fi, j for j 6= i – the
remaining forces on i must then be due to external sources (i.e., not depending on the other particles
in our system), and we lump these forces together in one external force, FExt i :

Fi = ∑ Fi, j + FExt
i , i = 1, . . . , N. (1.7)
j=1; j6=i

The equations of motion read:

mi r̈i = ∑ Fi, j + FExt
i . (1.8)
j=1; j6=i

The total momentum of the system is the sum of the momenta of all the particles:

p = ∑ pi = ∑ mi ṙi . (1.9)
i=1 i=1

We can view the total momentum of the system as the momentum of a single particle with a mass
equal to the total mass M of the system, and position vector rC . This position vector is then defined
p = MṙC = ∑ mi ṙi ; M = ∑ mi . (1.10)
i=1 i=1

This is equivalent to
1 N
rC = ∑ mi ri
M i=1

up to an integration constant which is always taken to be zero. The vector rC is called centre of mass
of the system. A particle of mass M at the centre of mass (which obviously changes in time) represents
the same momentum as the total momentum of the system.
4 Introduction: Newtonian mechanics and conservation laws

Let us find an equation of motion for the centre of mass. We do this by summing Eq. (1.8) over i:
∑ mi r̈i = ∑ Fi, j + ∑ FExt
i . (1.12)
i=1 i, j=1,i6= j i=1

In the first term on the right hand side, for every term Fi, j , there will also be a term F j,i , but this is
equal in magnitude and opposite to Fi, j ! So the first term vanishes, and we are left with
∑ mi r̈i = ṗ = ∑ FExt Ext
i ≡F . (1.13)
i=1 i=1

We see that the centre of mass behaves as a point particle with mass M subject to the total external
force acting on the system.
Conservation of physical quantities, such as energy, momentum etcetera, is always the result of
some symmetry. This deep relation is borne out in a beautiful theorem, formulated by E. Noether,
which we shall consider in the next semester. In this section we shall derive three conservation prop-
erties from Newton’s laws and the appropriate symmetries.
The first symmetry we consider is that of a system of particles experiencing only mutual forces,
and no external ones. We then see immediately from Eq. (1.13) with FExt = 0 that ṗ = 0, in other
words, the total momentum is conserved.

Conservation of momentum
In a system consisting of interacting particles, not subject to an external force, the total
momentum is always conserved.

Next, let us consider the angular momentum L. This is a vector quantity, which for particle i is
defined as Li = ri × pi . The total angular momentum L is the sum of the vectors Li :
L = ∑ Li . (1.14)

To see how L varies in time, we calculate the time derivative of Li :

L̇i = ṙi × pi + ri × ṗi . (1.15)

The first term of the right hand side vanishes because pi is parallel to ṙi , so we are left with

L̇i = ri × ṗi = Ni ; (1.16)

Ni is the torque acting on particle i.

Now we calculate the torque on the total system by summing over i and replacing ṗi by the force
(according to the second law):
L̇ = ∑ ri × Fi = ∑ ri × ∑ Fi, j + FExt
i . (1.17)
i=1 i=1 j=1, j6=i

The first term in right hand side vanishes, again as a result of the third law:

ri × Fi, j + r j × F j,i = Fi, j × (ri − r j ) = 0, (1.18)

1.2. Systems of point particles – symmetries and conservation laws 5


r1 F

Figure 1.1: Path from r1 to r2 . The force at some point along the path is shown, together with the contribution
to the work of a small segment dr along the path.

where the last equality is a result of the direction of Fi, j coinciding with that of the line connecting ri
and r j (which excludes electromagnetic interactions between moving particles from the discussion).
We therefore have
L̇ = ∑ ri × FExt
i . (1.19)
We see that if the external forces vanish, the angular momentum does not change:
Conservation of angular momentum
In a system consisting of interacting particles (not electromagnetic), not subject to an exter-
nal force, the angular momentum is always conserved.
Finally, we consider the energy. Let us evaluate the work W done by moving a single particle from
r1 to r2 along some path Γ (see figure 1.1). This is by definition the inner product of the force and the
infinitesimal displacements, summed over the path:
Z Z t2
W= F · dr(t) = F · ṙ dt. (1.20)
Γ t1

Using Newton’s second law, we can write:

Z t2 Z t2
md m 2
ṙ2 dt = ṙ2 − ṙ21 ,
W= mr̈ṙdt = (1.21)
t1 t1 2 dt 2
where ṙ1 is the velocity at time t1 and similar for ṙ2 . We see that from Newton’s second law it follows
that the work done along the path Γ is equal to the change in the kinetic energy T = mṙ2 /2.
A conservation law can be derived for the case where F is a conservative and time-independent
force. This means that F can be written as the negative gradient of some scalar function, called the
F(r) = −∇V (r). (1.22)
In that case we can write the work in a different way:
Z t2
dV (r)
W =− ∇V (r)dr(t) = − dt = V (r1 ) −V (r2 ). (1.23)
Γ t1 dt
1 From vector calculus it is known that a necessary and sufficient condition for this to be possible is that the force is
curl-free, i.e. ∇ × F = 0.
6 Introduction: Newtonian mechanics and conservation laws

From this and from Eq. (1.21) it follows that

T1 +V1 = T2 +V2 , (1.24)

where T1 is the kinetic energy in the point r1 (or at the time t1 ) etcetera. Thus T + V is a conserved
quantity, which we call the energy E.
Of course, now that we know the expression for the energy, we can verify that it is a conserved
quantity by calculating its time derivative, using Newton’s second law:

Ė = mṙ · r̈ + ∇V (r) · ṙ = F · ṙ − F · ṙ = 0. (1.25)

For a many-particle system, the derivation is similar – the condition on the force is then that there
exists a potential function V (r1 , r2 , . . . , rN ), such that the force Fi on particle i is given by

Fi = −∇iV (r1 , r2 , . . . , rN ). (1.26)

Note that V depends on 3N coordinates – the gradient ∇i acting on V gives a 3-dimensional vector
∂V ∂V ∂V
∇iV = , , . (1.27)
∂ xi ∂ yi ∂ zi
The kinetic energy is the sum of the one-particle kinetic energies. Now the energy conservation is
derived as follows:

Ė = ∑ mi ṙi · r̈i + ∑ ∇iV · ṙi = ∑ (Fi · ṙi − Fi · ṙi ) = 0. (1.28)

i i i

The function V above depends on time only through the time-dependence of the arguments ri .
If we consider a charged particle in a time-dependent electric field, this is no longer the case: then
t occurs as an additional, explicit argument in V . If V would depend explicitly on time, the energy
would change at a rate

Ė = V (r1 , r2 , . . . , rN ,t), (1.29)
where the arguments ri also depend on time (but do not take part in the differentiation with respect to
t). If V does not depend explicitly on time, we can define the zero of time (i.e. the time when we set
our clock to zero) arbitrarily. This time translation invariance is essential for having conservation of
Similarly, the conservation of momentum is related to space translation invariance of the potential,
i.e. this potential should not change when we translate all particles all over the same vector. Finally,
angular momentum is related to rotational symmetry of the potential. In quantum mechanics, all these
symmetries lead to the same conserved quantities (or rather their quantum mechanical analogues).
A final remark concerns the evaluation of the kinetic energy of a many-particle system. As we
have seen above, the motion of the centre of mass can be split off from the analysis in a suitable way.
This procedure also works for the kinetic energy. Let us decompose the position vector ri of particle
i into two parts: the centre of mass position vector rC and the position relative to the centre of mass,
which we call r0i :
ri = rC + r0i . (1.30)
As, by definition, rC = ∑i mi ri /M, we have

∑ mi r0i = ∑ mi ri − MrC = 0. (1.31)

i i
1.2. Systems of point particles – symmetries and conservation laws 7

We can use this decomposition to rewrite the kinetic energy:

mi 2 M mi
T =∑ ṙC + ṙ0i = ṙ2C + ṙC · ∑ mi ṙ0i + ∑ ṙ02
i . (1.32)
i 2 2 i i 2

The second term vanishes as a result of (1.31) and therefore we have succeeded in writing the kinetic
energy of the many-particle system as the kinetic energy of the centre of mass plus the kinetic energy
of the relative coordinates:
T = TCM + ∑ ṙ02 i . (1.33)
i 2

This formula is a convenient device for calculating the kinetic energy in many applications.

Lagrange and Hamilton formulations of classical


The laws of classical mechanics, formulated by Newton, and the various laws for the forces (see
table 1.1) supply sufficient ingredients for predicting the motion of mechanical systems in the classical
limit. Working out the solution for particular cases is not always easy, however. In this chapter
we shall develop an alternative formulation of the laws of classical mechanics, which renders the
analysis of many systems easier than the traditional Newtonian formulation, in particular when the
moving particles are subject to constraints. The new formulation will not only enable us to analyse
new applications more easily than using Newton’s laws, but it also leads to an important example
of a variational formulation of a physical theory. Broadly speaking, in a variational formulation, a
physical solution is found by minimising a mathematical expression involving a function by varying
that function. Many physical theories can be formulated in a variational way, in particular quantum
mechanics and electrodynamics.

2.1 Generalised coordinates and virtual displacements

When observing motion in everyday life, we often encounter systems in which the moving particles
are subject to constraints. For example, when a car moves on the road, the road surface withholds
the car from moving downward, as is the case with the balls on a billiard table. Another example is a
particle suspended on a rigid rod (i.e. the pendulum), which can only move on the the sphere around
the suspension point with radius equal to the rod length. The constraints are realised by forces, which
we call the forces of constraint. The forces of constraint guarantee that the constraints are met – they
often do not influence the motion within the subspace.1 The main object of the next few sections is
to show that it is possible to eliminate these constraint forces from the description of the mechanical
As the presence of constraints reduces the actual degrees of freedom of the system, it is useful
to use a smaller set of degrees of freedom to describe the system. As an example, consider a ball
on a billiard table. In that case, the z-coordinate drops out of the description, and we are left with
the x and y coordinates only. This is obviously a very simple example, in which one of the Cartesian
coordinates is simply left out of the description of the system. More interesting is a ball suspended
on a rod. In that case we can use the angular coordinates ϑ and ϕ to describe the system – that is, we
replace the coordinates x, y and z by the angles ϕ and ϑ – see figure 2.1. In this case, we see that the
coordinates no longer represent distances, that is, they do not have the dimension of length, but rather
they are values of angles, and therefore dimensionless. This is the reason why we speak of generalised
1 The subspace on which the particle is allowed to move is not necessarily a linear subspace, e.g. the spherical subspace
in the case of a pendulum. Mathematicians would use the term ‘submanifold’ rather than subspace.

2.1. Generalised coordinates and virtual displacements 9


Figure 2.1: The pendulum in three dimensions. The position of the mass is described by the two angles ϕ and

coordinates. These coordinates form a reduced representation of a system subject to constraints. In

chapter 2 of the Schaum book you find many examples of constraints and generalised coordinates.
Generalised constraints are denoted by q j , where j is an index which runs over the degrees of freedom
of the constrained system.
We now shall look at constraints and generalised coordinates from a more formal viewpoint. Let
us consider a system consisting of N particles in 3 dimensions, so that the total number of coordinates
is 3N. The system is subject to a number of constraints, which are of the form

g(k) (r1 , . . . , rN ,t) = 0, k = 1, . . . , K. (2.1)

Constraints of this form (i.e. independent of the velocities) are called holonomic. Usually, it is then
possible to transform the 3N degrees of freedom to a reduced set of 3N − K generalised coordinates
{q} = q j , j = 1, . . . , 3N − K. It is now possible to express the position vectors in terms of these new
ri = ri ({q},t). (2.2)
As an example, consider the particle suspended on a rod; see figure 2.2. The Cartesian coordinates
are x, y and z and they can be written in terms of the generalised coordinates ϑ and ϕ as:

x = l sin ϑ cos ϕ; (2.3)

y = l sin ϑ sin ϕ; (2.4)
z = −l cos θ , (2.5)

where l is the length of the rod (and therefore fixed). These equations are a particular example of
Eqs. (2.2).
The velocity can be expressed in terms of the q̇ j :

∂ ri ∂ ri
ṙi = ∑ ∂qj
q̇ j +
. (2.6)

From this equation we also find directly

∂ ṙi ∂ ri
= (2.7)
∂ q̇ j ∂qj
10 Lagrange and Hamilton formulations of classical mechanics

a result which will be very useful further on.

Newton’s laws predict the evolution of a mechanical system without ambiguity from a given initial
state (if that state is not on an unstable point, such as zero velocity at the top of a hill). However, we are
sometimes interested in a variation of the path of a system, i.e. a displacement of one or more particles
in some direction. Such displacements are called virtual displacements in order to distinguish them
from the actual displacement, which is always governed by the Newton equations of motion. If we
now generalise the definition of work, Eq. (1.20) to include virtual displacements δ ri rather than the
mechanical displacements which actually take place, then the work done due to this displacement is
defined as
δW = ∑ F · δ ri . (2.8)

The notion of virtual work is very important in the following section.

2.2 d’Alembert’s principle

We start from Newton’s law of motion for an N-particle system.

ṗi = mr̈i = Fi , i = 1, . . . , N. (2.9)

It is always possible to decompose the total force on a particle into a force of constraint FC and the
remaining force, which we call the applied force FA :

F = FC + FA . (2.10)

If you consider any system consisting of a single particle (or nonrotating rigid body), subject to con-
straints, you will find that the work forces of constraint are always perpendicular to the space in which
the particle is allowed to move. For example, if a particle is attached to a rigid rod which is suspended
such that it can rotate freely, the particle can only rotate on a spherical surface. The force of con-
straint, which is the tension in the rod, is always normal to that surface. Similarly, the force of the
billiard table on the balls is always vertical, i.e. perpendicular to the plane of motion. This notion
provides a way to eliminate these forces from the description. Consider an arbitrary but small virtual
displacement δ r within the subspace allowed by the constraint. Because the force of constraint is
perpendicular to this subspace, we have:

ṗ · δ r = FC + FA · δ r = FA · δ r.


We see that the force of constraint drops out of the system, and we are left with a motion determined
by the applied force only. Because (2.11) holds for every small δ r, we have

ṗ = FA (2.12)

if we restrict all vectors to be tangential to the constraint subspace. The principle we have formu-
lated in Eq. (2.11) is called d’Alembert’s principle. For systems consisting of a single rigid body, it
expresses the fact that the forces of constraint are perpendicular to the subspace of the constraint. The
expression F · δ r is the virtual work done as a result of the virtual displacement.
It is important to note that the virtual displacements are always considered to be spatial – the time
is not changed. This is particularly important in cases where the constraints are time-dependent. In
the next section we shall consider an example of this.
2.3. Examples 11



ϕ F

Figure 2.2: The pendulum moving in a plane. The rod of length is rigid, massless, and is suspended frictionless.

For more than one object, the contributions to the virtual work must be added, so that we obtain:

∑ ṗi · δ ri = ∑ FAi · δ ri . (2.13)
i i

In this form, the contributions of the constraint forces to the virtual work do not all vanish for each
individual object, but the total virtual work due to the constraint forces vanishes:

∑ FCi · δ ri = 0. (2.14)

In summary, we can formulate d’Alembert’s principle in the following, concise form:

The virtual work due to the forces of constraint is always zero for virtual displacements
which do not violate the constraint.

The use of d’Alembert’s principle can simplify the analysis of systems subject to constraints,
although we often use this principle tacitly in tackling problems in the ‘Newtonian’ approach. In
that approach we usually demand that the forces of constraint balance the components of the applied
force perpendicular to the constraint subspace. Nevertheless, it is convenient to skip this step, using
d’Alembert’s principle, especially in complicated problems (many applied forces and constraints).

2.3 Examples

2.3.1 The pendulum

As a simple example, let us consider a pendulum moving in a plane. This system is shown in figure 2.2.
Using Newton’s mechanics, we say that the ball of mass m is kept on the circle by the tension in the
suspension rod. This tension is directed along the rod, and it precisely compensates the component
of the gravitational force along the same line. The component of the gravitational force tangential to
12 Lagrange and Hamilton formulations of classical mechanics

F2 F1

- F1 ^y
IP α


Figure 2.3: Small block on an inclined plane.

the circle of motion determines the motion. The motion is given by r(t) = lϕ(t), where ϕ is the angle
shown in the figure. So r̈(t) = l ϕ̈(t), and the equation of motion is

l ϕ̈(t) = −g sin ϕ(t). (2.15)

Using d’Alembert’s principle simplifies the first part of this analysis. We can simply say that the
motion is determined by the component of the applied force (i.e. gravity) lying in the subspace of the
motion (i.e. the circle) and this leads to the same equation of motion. Although in this simple case
the difference between the approaches with and without d’Alembert’s principle is minute, in more
complicated systems, the possibility to avoid analysing the forces of constraint is a real gain.

2.3.2 The block on the inclined plane

Now we consider a more complicated example: that of a block sliding on a wedge . We shall denote
the block by SB (small block) and the wedge by IP (Inclined plane). The setup is shown in figure 2.3.
It consists of the wedge (inclined plane) of mass M which can move freely (i.e. without friction)
over a horizontal table, and the small block off mass m, which can slide over the inclined plane (also
frictionless). The aim is to find expressions for the accelerations of IP and SB. The cartesian unit
vectors are x̂ and ŷ, and the unit vector along the inclined plane, pointing to the right, is d̂, and the
upward normal unit vector to the plane is called n̂. Let us solve this problem using the standard
approach. The acceleration of IP is called A, and that of the small block is A + a, i.e., a is the
acceleration of the small block with respect to the inclined plane.
Newton’s second law for the two bodies reads:

MA = −Mgŷ + F2 ŷ − F1 n̂, (2.16a)

m(A + a) = −mgŷ + F1 n̂. (2.16b)

As we know that the motion of IP is horizontal, we know that all ŷ components of the forces acting
on it will cancel, and A is directed along x̂. Similarly, we know that a is zero along n̂. This allows us
2.3. Examples 13

to simplify the equations:

MA = −F1 sin α; (2.17a)

m(Ax̂ + ak d̂) = −mgŷ + F1 n̂ (2.17b)

where ak is the component of a directed along d̂. The first of these equations is a scalar equation. The
second equation represents in fact two equations, one for the x and one for the y component. We have
three unknowns: A, ak and F1 . Translating d̂ and n̂ in the x- and y- components is straightforward, and
(2.17b) becomes:

m(A + ak cos α) = F1 sin α, (2.18a)

−mak sin α = F1 cos α − mg. (2.18b)

Now we can solve for the accelerations by eliminating F1 from our equations, and we find:
(M + m) sin α
ak = g ; (2.19a)
M + m sin2 α
m sin α cos α
A = −g . (2.19b)
M + m sin2 α
The solution of this problem contains one nontrivial step: the fact that we have split the acceleration
of SB into the acceleration of IP plus the acceleration of the SB with respect to the IP has enabled us
to remove the latter’s component along n̂. This is not so easy a step when a different representation is
used (e.g. when the acceleration is not split into these parts).
Now we turn to the solution using d’Alembert’s principle:

ṗSB · δ rSB + ṗIP · δ rIP = FA A

SB · δ rSB + FIP · δ rIP . (2.20)

We identify two natural coordinates: the coordinate X of the IP along the horizontal direction, and the
distance d from the top of the IP to the SB. The total virtual work done as a result of displacements
δ X and δ d is the sum of the work done by both bodies:

δ rSB = δ d d̂ + δ X x̂ and δ rIP = δ X x̂. (2.21)

The applied forces are the gravity forces – we do not care about constraint forces any longer – and we
IP · δ rIP = 0, (2.22)
as the displacement is perpendicular to the applied (gravity) force. Furthermore

SB · δ rSB = mg sin α δ d. (2.23)

On the other hand:

pSB = m(Ẋ x̂ + d˙d̂) (2.24)
pIP = M Ẋ x̂, (2.25)
so that ṗIP = MAx̂ and ṗSB = m(Ax̂ + ak d̂). Taking time derivatives of (2.24) and (2.25) and using
d’Alembert’s equations (2.20) for this problem, together with (2.22) and (2.23), we obtain

mA δ X + mak δ d + mA cos α δ d + mak cos α δ X + MA δ X = mg sin α δ d. (2.26)

14 Lagrange and Hamilton formulations of classical mechanics


Figure 2.4: Bead on a rotating wire.

As this equation should hold for any pair of virtual displacements δ X and δ d, the coefficients of both
δ X and δ d should vanish simultaneously, giving the equations:

(m + M)A + mak cos α = 0. (2.27a)

m(ak + A cos α) = mg sin α. (2.27b)

Not surprisingly, these equations lead to the same result (2.19) as obtained before. Although the
second approach does not seem simpler, it is safer since the constraint forces do not have to be taken
into account explicitly. This manifests itself explicitly in the fact that we do not have to eliminate the
constraint force F1 as in the direct approach.

2.3.3 Heavy bead on a rotating wire

In this section, we consider a system with a time-dependent constraint. A bead slides without friction
along a straight wire which rotates along a vertical axis, under an angle α (see figure 2.4). The
position of the bead along the wire is denoted by q, which is the distance of the bead from the origin.
The momentum of the bead is given by

p = mqω sin α t̂ + mq̇ q̂ (2.28)

It should however be noted that the unit vectors t̂ and q̂ rotate themselves, and hence their time
derivatives occur in ṗ. The latter occurs in d’Alembert’s equation, in which gravity enters as the
applied force FA . Instead of working out ṗ explicitly, we can use the following trick:

ṗ · δ r = (p · δ r) − p · δ ṙ. (2.29)
At first sight, you might think that the second term on the right hand side is zero as δ r = δ q q̂ and δ q
does not involve any time dependence: virtual displacements are always assumed to be instantaneous
and do not involve any time dependence. However, even with a time-independent δ q, the displacement
δ r is time-dependent as the displacement is carried out in a rotating frame. This can also be seen from
2.4. d’Alembert’s principle in generalised coordinates 15

the fact that q̂ is time-dependent. In fact, in our system the displacement along the wire will cause a
change in the rotational velocity, and it is this velocity change which gives δ ṙ. If the bead is moved
upward, for example, the bead will move along a circle which has a larger radius, but still at the same
angular velocity, so that the orbital speed increases. The orbital speed is given as qω sin α, so that we
δ ṙ = ω sin α δ q t̂. (2.30)
As δ r is given by δ q q̂, we find

ṗ · δ r = mq̈ δ q − mω 2 sin2 α q δ q = Fa · δ r = −mg cos α δ q (2.31)

and we find the equation of motion:

q̈ − ω 2 sin2 α q = −g cos α. (2.32)

The solution to this equation can be found straightforwardly:

q(t) = q0 + AeΩt + Be−Ωt (2.33)

with q0 = g cot α/(ω 2 sin α), A and B arbitrary constants and Ω = ω sin α. Later we shall encounter
more powerful techniques which enable us to solve such a problem more easily.

2.4 d’Alembert’s principle in generalised coordinates

In the previous section we have encountered a few examples of systems subject to constraints, and
analysed them using d’Alembert’s principle. In this section we shall do the same for an unspeci-
fied system and derive the equations of motion for a general constrained system using d’Alembert’s
We start from d’Alembert’s equation for N objects:
∑ ṗi · δ ri = ∑ FAi · δ ri . (2.34)
i=1 i=1

If we write
∂ ri
δ ri = ∑ ∂qj
δ q j, (2.35)

and realise that the q j can be varied independently, we see that we must have
∂ ri ∂ ri
∑ ṗi · = ∑ FAi ·
∂ q j i=1 ∂qj
. (2.36)

In order to reformulate this equation we use a trick similar to the one we applied already to the bead
sliding along the wire:
∂ ri d ∂ ri d ∂ ri
∑ ∂ q j dt ∑ ∂ q j ∑ pi · dt ∂ q j .
ṗ i · = pi · − (2.37)
i=1 i=1 i=1

We note furthermore that in the second term, the time derivative can be written as
d ∂ ri ∂ ṙi
= . (2.38)
dt ∂ q j ∂qj
16 Lagrange and Hamilton formulations of classical mechanics

In section 1.2 we have seen that the work done equals the change in kinetic energy. This suggests
that the kinetic energy might be a convenient device for expressing d’Alembert’s equation in gener-
alised coordinates. To see that this is indeed the case, we first calculate its derivative with respect to
q j and multiply with δ q j and sum over j:
∂T ∂ ṙi
= ∑ mi ṙi · . (2.39)
∂ q j i=1 ∂qj

N 3N−K N
∂T ∂ ṙi ∂ ri
= ∑ mṙi · ∑ = ∑ pi · , (2.40)
∂ q̇ j i=1 j=1 ∂ q̇ j i=1 ∂qj
where we have used (2.7). We see that the left hand side of d’Alembert’s equation leads to
d ∂T ∂T
− . (2.41)
dt ∂ q̇ j ∂qj

∂ ri
∑ FAi ∂ q j = F j , (2.42)

where F j is the generalised force, we have the following

Formulation for d’Alembert’s principle in generalised coordinates:

d ∂T ∂T
− = F j. (2.43)
dt ∂ q̇ j ∂qj

There is no sum over j in this equation because the variations δ q j are arbitrary and independent. It is
then possible to obtain the form (2.43) from d’Alembert’s principle by taking only one particular δ q j
to be nonzero.

2.5 Conservative systems – the mechanical path

Consider now a particle which moves in a constrained subspace under the influence of a potential. As
an example you can imagine a non-flat surface on which a ball is moving from r1 to r2 . If the ball is
not forced to obey the laws of mechanics, it can move from r1 at time t1 to r2 at time t2 along many
different paths. Instead of approaching the problem of finding the motion of the ball from a differential
point of view, where we update the position and the velocity of a particle at each infinitesimal time
step, we consider the path allowed for by the laws of mechanics1 as a special one among all the
available paths from r1 at t1 to r2 at t2 .
We thus try to find a condition on the path as a whole rather than for each of its infinitesimal
segments. To this end, we start from d’Alembert’s principle, and apply it to two paths, ra (t) and rb (t),
which are close together for all times. The difference between the two paths at some time t between
t1 and t2 is δ r(t) = rb (t) − ra (t), and we write down d’Alembert’s principle at time t using this δ r(t):

mr̈(t) · δ r(t) = F · δ r(t), (2.44)

1 This path not always, but nearly always, unique.
2.5. Conservative systems – the mechanical path 17

where it is understood that F is the applied force only, as δ r lies in the constrained subspace.2 This
equation holds for every t between t1 and t2 , and we can formulate a global condition on the path by
integrating over time from t1 to t2 :
Z t2 Z t2
mr̈(t) · δ r(t)dt = F · δ r(t)dt. (2.45)
t1 t1

The analysis which follows resembles that of the previous chapter when we derived the conservation
property of the energy. Indeed, the right hand side looks like an expression for the work, but it should
be kept in mind that δ r is not a real displacement of the particle, but a difference between two possible
Via partial integration, and using the fact that the begin and end point of the path are fixed, we can
transform the left hand side of (2.45):

m ∂ ṙ2
Z t2 Z t2 Z t2 Z t2
mr̈(t) · δ r(t)dt = − δ ṙdt = − δ ṙdt ≈ − [T (ṙb ) − T (ṙa )] dt, (2.46)
t1 t1 2 ∂ ṙ t1 ∂ ṙ t1

where the approximation holds to first order in δ r. The resulting expression is the difference in kinetic
energy between the two paths, integrated over time.
If we are dealing with a conservative force field, the right hand side of (2.45) can also be trans-
formed to a difference between two global quantities:
Z t2 Z t2 Z t2
F · δ r(t)dt = − ∇V · δ r(t)dt ≈ − [V (rb ) −V (ra )] dt. (2.47)
t1 t1 t1

Combining (2.46) and (2.47) we obtain:

Z t2
δ (T −V )dt = 0, (2.48)

in other words, d’Alembert’s principle for a conservative force can be transformed to the condition
that the linear variation (2.48) vanishes. This global condition distinguishes the mechanical path from
all other ones.
The quantity T −V is called the Lagrangian, L. The integral over time of this quantity tt12 L dt is

called the action, denoted by S:

Z t2 Z t2
S= dt (T −V ) = dt L. (2.49)
t1 t1

We have derived a new principle:

The mechanical path of a particle moving in a conservative potential field from a position r1
at time t1 to a position r2 at t2 is a stationary solution of the action, i.e. the linear variation
of the action with respect to an allowed variation of the path around the mechanical path,
This principle is called Hamilton’s principle. Note that the variations of the path are restricted
to lie within the constrained subspace. The advantage of this new formulation of mechanics with
conservative force fields over the Newtonian formulation is that it holds for any system subject to
2 We suppose that the constrained subspace is smooth and that ra (t) is close to rb (t) for all t.
18 Lagrange and Hamilton formulations of classical mechanics

constraints, and that it holds independently of the coordinates which are chosen to represent the mo-
tion. This is clear from the fact that we search for the minimum of the action within the subspace
allowed for by the constraint, and this subspace is properly described by the generalised coordinates
q j . When solving the motion of some particular mechanical system our task is therefore to properly
express T and V in terms of these generalised coordinates, plug the Lagrangian L = T −V into the ac-
tion, and minimise the latter with respect to the generalised coordinates (which are functions of time).
Although this might seem a complicated way of solving a simple problem, it should be realised that
the transformation of forces and accelerations to generalised coordinates is usually more complicated
than writing the kinetic energy and the potential in terms of these new coordinates. Furthermore we
shall see below that the problem of finding the stationary solution for a given action leads straightfor-
wardly to a second-order differential equation, which is the correct form of the Newtonian equation
of motion in terms of the chosen generalised coordinates.
As an example, consider the pendulum. The position of the mass m is given by the 2 coordinates
x and y (we neglect the third coordinate z). The constraint obeyed by these coordinates is x2 + y2 = l 2 .
This constraint allows us to use only a single generalised coordinate ϕ: x = l sin ϕ and y = −l cos ϕ.
The velocity is given by vϕ = l ϕ̇. This example shows that the generalised coordinate q = ϕ does not
necessarily have to have the dimension of length, and likewise q̇ = ϕ̇ does not necessarily have the
dimension of velocity. The kinetic energy is now given as T = ml 2 ϕ̇ 2 /2, and the potential energy by
V = −mgl cos ϕ. The Lagrangian of the pendulum is therefore
 2 2 
l ϕ̇
L = T −V = m + gl cos ϕ . (2.50)
We now turn to the problem of determining the stationary solution for an action with such a La-
The Lagrangian can have many different forms, depending on the particular set of generalised
coordinates chosen; therefore we shall now work out a general prescription for determining the sta-
tionary solution of the action without making any assumptions concerning the form of the Lagrangian,
except that it may depend on the q j and on their time derivatives q̇ j :
Z t2
S[q] = L(q, q̇,t)dt. (2.51)

Here q(t) is any vector-valued function, q(t) = (q1 (t), . . . , qN (t)). We now consider an arbitrary, but
small variation δ q(t) of the path q(t), and calculate the change in S as a result of this variation:
Z t2 Z t2
δ S[q] = S[q + δ q] − S[q] = L (q + δ q, q̇ + δ q̇,t) dt − L (q, q̇,t) dt ≈
t1 t1
Z t2  
∂ L (q, q̇,t) ∂ L (q, q̇,t)
δq+ δ q̇ dt. (2.52)
t1 ∂q ∂ q̇
Note that both q and q̇ depend on time. Note further that ∂ /∂ q is a vector – the derivative must
be interpreted as a gradient with respect to all the components of q. The use of ∂ and not d in the
derivatives indicates that when calculating the gradient with respect to q, q̇ is considered as a constant,
and vice-versa.
Of course, δ q and δ q̇ are not independent: if we know q(t) for all t in the interval under consid-
eration, we also know the time derivative q̇. We can remove δ q̇ by partial integration:
Z t2   Z t2  
∂ L (q, q̇,t) ∂ L (q, q̇,t) ∂L d ∂L
δq+ δ q̇ dt = − δ qdt. (2.53)
t1 ∂q ∂ q̇ t1 ∂ q dt ∂ q̇
2.5. Conservative systems – the mechanical path 19

Because δ q is small but arbitrary, this variation can only vanish when the term in brackets on the right
hand side vanishes. Consider for example a δ q which is zero except for a very small range of t-values
around some t0 in the interval between t1 and t2 . Then the term between the square brackets must
vanish in that small range. We can do this for any small interval on the time axis, and we conclude
that the term in brackets vanishes for all t in the integration interval. So our conclusion reads

The action S[q] is stationary, that is, its variation with respect to q vanishes to first order, if
the following equations are satisfied:

∂L d ∂L
= , for j = 1, . . . , N. (2.54)
∂qj dt ∂ q̇ j

The equations (2.54) are called Euler equations. In the case where L is the Lagrangian of classical
mechanics, L = T − V , the equations are called Euler–Lagrange equations (note that in the above
derivation, no assumption has been made with respect to the form of L nor what it means – the only
assumption is that L depends at most on q, q̇ and t). The Euler equations have many applications
outside mechanics.
Often the following notation is used:
∂L d ∂L
δL = ∑ −
∂ q j dt ∂ q̇ j
δqj (2.55)

δL ∂L d ∂L
= − , (2.56)
δq ∂ q dt ∂ q̇
or, written in another way:  
δL ∂L d ∂L
= − . (2.57)
δqj ∂ q j dt ∂ q˙j
Note that (2.56) is an equality between (N-dimensional) vector quantities.
The analysis given here can be summarised by a procedure for solving a mechanical problem in
classical mechanics with conservative forces:

• Find a suitable set of coordinates which parametrises the subspace of the motion allowed for by
the constraints.

• Express the kinetic energy T and the potential V in those coordinates.

• Write down the Lagrange equations (2.54) for the Lagrangian L = T −V and solve them.

Turning again to our simple example of a pendulum, we use the Lagrangian found in (2.50) and
write down the Euler–Lagrange equation for this:

∂L d ∂L
= −mgl sin ϕ = = ml 2 ϕ̈. (2.58)
∂ϕ dt ∂ ϕ̇

The solution to this equation can be found through numerical integration. In the next section we shall
encounter some more complicated examples which show the advantages of the new approach more
20 Lagrange and Hamilton formulations of classical mechanics

2.6 Examples

2.6.1 A system of pulleys

We consider a system of massless pulleys as in the figure below.

l l
2 3 l
l 4

ma mc


The string is also massless and furthermore inextensible. It is quite complicated to find out what
the forces on the system are when taking all the forces on the pulleys and on the wire into account.
However, it turns out that using Hamilton’s principle makes it an easy problem. The total string length
is l = l1 + l2 + l3 + l4 and is fixed. Of course l2 = l3 . Therefore, we can take l1 and l4 as generalised
coordinates, and we have:
l2 = l3 = (l − l1 − l4 ). (2.59)
The height of the central pulley is given by l2 (or l3 ), and the total potential energy is therefore given
as: h mb i
V = −g ma l1 + (l − l1 − l4 ) + mc l4 . (2.60)
The speed of the left mass ma is given by l˙1 , and that of the right one, mc , by l˙4 . Using (2.59) we find
that the speed of the central pulley is given by 12 (−l˙1 − l˙4 ). The Lagrangian is therefore given as

1 1 1 h mb i
L = ma l˙12 + mc l˙42 + mb (l˙12 + l˙42 + 2l˙1 l˙4 ) + g ma l1 + (l − l1 − l4 ) + mc l4 . (2.61)
2 2 8 2

The Euler-Lagrange equations can be derived straightforwardly:

1 ¨ 1 ¨ 1
(ma + mb )l1 + mb l4 = ma − mb g; (2.62a)
4 4 2
1 1 1
(mc + mb )l¨4 + mb l¨1 = mc − mb g. (2.62b)
4 4 2
2.6. Examples 21

The two equations can be solved for l¨1 and l¨4 and the result is
4ma mc + ma mb − 3mc mb
l¨1 = g; (2.63a)
mc mb + 4ma mc + ma mb
4ma mc + mc mb − 3ma mb
l¨4 = g. (2.63b)
ma mb + 4ma mc + mb mc
To check whether the answer is reasonable we verify that a stationary motion (i.e. a motion with
constant velocity) is possible if mb = 2ma = 2mc . The solution is now trivial, since the right hand
sides of (2.63) vanish as should indeed be the case. We see that the Lagrange equations provide a
framework which enables us to find the equations of motion quite easily.

2.6.2 Example: the spinning top

Consider a top with cylindrical symmetry. The position of the top is defined by its two polar angles
ϑ and ϕ and a third angle, ψ, defines the rotation of the top around its symmetry axis. The angular
velocity is given in terms of these three polar angles as:

ω = ϕ̇ ẑ + ϑ̇ ê + ψ̇ d̂ (2.64)

where ẑ is a unit vector along the z-axis; ê is a unit vector in the xy plane which is perpendicular to
the axis of the top, and d̂ is a unit vector along the axis of the top. The axis of the top is shown in the



x f

From this figure, it is clear that

ê = (− sin ϕ, cos ϕ, 0) and (2.65a)

d̂ = (cos ϕ sin ϑ , sin ϕ sin ϑ , cos ϑ ). (2.65b)

And it follows that

f̂ = ê × d̂ = (cos ϕ cos ϑ , sin ϕ cos ϑ , − sin ϑ ). (2.66)
The rotational kinetic energy of the top is given by
T = ω T Iω
ω (2.67)
22 Lagrange and Hamilton formulations of classical mechanics

(the superscript T turns the column vector ω into a row vector). It is always possible to find some axes
with respect to which the moment of inertia tensor is diagonal, and as a result of the axial symmetry
of the top one diagonal element, which we shall denote by I3 , corresponds to the symmetry axis d,
and two other diagonal elements correspond to axes in the plane perpendicular to the body axis, such
as e and f – we call these elements I1 .
The kinetic energy is then given by

1 1 1 1 I 1
ω · ê)2 + I1 (ω
T = I1 (ω ω · d̂)2 = I1 ϕ̇ sin2 ϑ + 1 ϑ̇ 2 + I3 (ψ̇ + ϕ̇ cos ϑ )2 .
ω · f̂)2 + I3 (ω (2.68)
2 2 2 2 2 2
The gravitational force results in a potential V = MgR cos ϑ , where M is the top’s mass and R the
distance from the point where it rests on the ground to the centre of mass. The Lagrangian therefore
1 I1 1
L = I1 ϕ̇ 2 sin2 ϑ + ϑ̇ 2 + I3 (ψ̇ + ϕ̇ cos ϑ )2 − MgR cos ϑ . (2.69)
2 2 2
The Lagrange equations for ϑ , ϕ and ψ are then given by:

I1 ϑ̈ = I1 ϕ̇ 2 sin ϑ cos ϑ − I3 (ψ̇ + ϕ̇ cos ϑ )ϕ̇ sin ϑ + MgR sin ϑ ; (2.70a)

I1 ϕ̇ sin2 ϑ + I3 (ψ̇ + ϕ̇ cos ϑ ) cos ϑ = 0;

I3 (ψ̇ + ϕ̇ cos ϑ ) = 0. (2.70c)
We immediately see that ψ̇ + ϕ̇ cos ϑ is a constant of the motion – we shall call this ω3 :

ω3 = ψ̇ + ϕ̇ cos ϑ = Constant. (2.71)

ω3 denotes the component of angular velocity along the spining axis.

Let us search for solutions of constant precession: ϑ̇ = constant, or ϑ̈ = 0. We furthermore set
ϕ̇ = Ω. The first Hamilton equation then gives:

I1 Ω2 cos ϑ − I3 ω3 Ω + MgR = 0. (2.72)

If ω3 is large, we find the two solutions

Ω= (2.73)
I3 ω3
for which Ω is inversely proportional to ω3 and

I3 ω3
Ω= (2.74)
I2 cos ϑ

i.e. Ω is proportional to ω3 . The first solution corresponds to slow precession and fast spinning around
the spinning axis; the second solution corresponds to rapid precession in which the gravitational force
is negligible.
For general ω3 , the quadratic equation (2.72) with ϑ̈ = 0 has two real solutions for Ω if

I32 ω32 > 4I1 cos ϑ MgR. (2.75)

For smaller values of ω3 , a wobbling motion sets in (“nutation”).

2.7. Non-conservative forces – charged particle in an electromagnetic field 23

2.7 Non-conservative forces – charged particle in an electromagnetic field

In this section we consider one particular type of force which is not conservative, but which can still
be analysed fully within the Lagrangian approach. This is the very important example of a charged
particle in an electromagnetic field.
Suppose we have a collection of N particles which experience a non-conservative force which can
be derived from a generalised potential W (ri , ṙi ) in the following way:

∂W d ∂W
F=− + . (2.76)
∂ ri dt ∂ ṙi
Analogous to the previous section we can derive a variational condition, starting from d’Alembert’s
Z t2 Z t2 Z t2  
∂W d ∂W
mr̈i δ ri dt = − δ T dt = − + δ ri dt. (2.77)
t1 t1 t1 ∂ ri dt ∂ ṙi
The left hand side has been transformed as in (2.46), and the procedure for the right hand side is
similar with the extension that the second term of the integrand is subject to a partial integration,
leading to
Z t2 Z t2   Z t2
∂W ∂W
− δ T dt = − δ ri − δ ṙi dt = − δW dt. (2.78)
t1 t1 ∂ ri ∂ ṙi t1

So we see that the variation of the action

Z t2
S[q] = [T −W ] dt (2.79)

vanishes. It can also be checked by working out the Euler-Lagrange equations, which for this action
directly leads to the classical equation of motion mr̈i = Fi .

2.7.1 Charged particle in an electromagnetic field

A point particle with charge q moving in an electromagnetic field experiences a force

F = q (E + v × B) . (2.80)

The charge q of the particle should not be confused with the generalised coordinates qi introduced
before. E is the electric field, B is the magnetic field. These fields are not independent, but they are
related through the Maxwell equations. We use the following two Maxwell equations

∇ · B = 0 and (2.81a)
∇×E+ = 0. (2.81b)
We know from vector calculus that a vector field whose divergence is zero, can always be written
as the curl of a vector function depending on space (and, in our case, time); applying this to (2.81a)
we see that we can write B in the form B = ∇ × A, where A is a vector function, called the vector
potential, depending on space and time. Substituting this expression for B in Eq. (2.81b) leads to
∇× E+ = 0. (2.82)
24 Lagrange and Hamilton formulations of classical mechanics

Now we use another result from vector calculus, which says that any function whose curl is zero can
be written as the gradient of a scalar function, which in this case we call the potential, φ (r,t). This
results in the following representations of the electromagnetic field:

E(r,t) = −∇φ (r,t) − (r,t); (2.83a)
B(r,t) = ∇ × A(r,t). (2.83b)

In fact, by using two Maxwell equations, we have reduced the set of 6 field values (3 for E and 3 for
B) to 4 (3 for A and 1 for φ ).
As the force is velocity-dependent, it is not conservative. We are after a function W (r, ṙ) which,
when used in an action of the usual form, yields the correct equation of motion with the force (2.80).
The potential which does the job is

W (r, ṙ) = qφ (r,t) − qṙ · A(r,t) = qφ − q(ẋAx + ẏAy + żAz ). (2.84)

Note that Ax denotes the x-component and not the partial derivative with respect to x. The Lagrangian
occurring in the action is therefore:

L = mṙ2 + qṙ · A(r,t) − qφ (r,t). (2.85)
To see that this Lagrangian is indeed correct we work out the force component in the x-direction.
First we calculate the derivative of the potential W with respect to x:
∂W ∂φ ∂ Ax ∂ Ay ∂ Az
− = −q + q ẋ + ẏ + ż . (2.86)
∂x ∂x ∂x ∂x ∂x

d ∂W dAx ∂ Ax ∂ Ax ∂ Ax ∂ Ax
= −q = −q + ẋ + ẏ + ż . (2.87)
dt ∂ ẋ dt ∂t ∂x ∂y ∂z
The Euler-Lagrange equations for the action contain the two contributions resulting from the potential.
We have
∂W d ∂W ∂ φ ∂ Ax
mẍ = − + = −q + +
∂x dt ∂ ẋ ∂x ∂t
∂ Ay ∂ Ax ∂ Az ∂ Ax
q ẏ − + ż − = qEx + q(ẏBz − żBy ) (2.88)
∂x ∂y ∂x ∂z

i.e. precisely the equation of motion with the force given in (2.80)!

2.8 Hamilton mechanics

It is possible to formulate Lagrangian mechanics in a different way. At first sight this does not add
anything new to the formalism which was constructed in the previous sections, but we shall see that
this new formalism provides us with a conserved quantity which is the energy or some analogous
object. More importantly, this formalism is essential for setting up quantum mechanics in a structured
way, as will be shown in a later course.
2.8. Hamilton mechanics 25

Let us again consider a system described by a Lagrangian formulated in terms of generalised

coordinates, with the equations of motion given by:

d ∂L ∂L
= . (2.89)
dt ∂ q̇ j ∂qj

This is a second order differential equation, which we shall transform into two first order ones.
We define the canonical momenta p j as

pj = . (2.90)
∂ q̇ j

The canonical momentum should not be confused with the mechanical momentum, which is simply
∑i mṙi , although the two coincide when the generalised coordinates are simply the ri . Using the
canonical momenta, the equations of motion can be formulated as:

ṗ j = . (2.91)

In the particular example of a conservative system formulated in terms of the position coordinates ri :
mi 2
L=∑ ṙi −V (r1 , . . . , rN ), (2.92)
i=1 2

the momenta are given as

pi = mṙi (2.93)
and the equations of motion are
ṗi = − . (2.94)
∂ ri
We see that in the case of a particle moving in a conservative force field, the generalised momentum
corresponds to the usual definition of momentum.

We have reformulated the Euler-Lagrange equation as two first-order differential equations. The
Euler-Lagrange equations were derived from a variational principle, the Hamilton principle, which
requires the action to be stationary for the mechanical path. We may ask ourselves if it is possible to
define our two new equations in terms of the same variational principle. This turns out to be the case
indeed. If a variational principle should lead to two equations for each generalised coordinate, the
corresponding functional to be minimized should have two independent parameters per generalised
coordinate q j which should be varied. Of course, in addition to the generalised coordinate q j we
use p j for the second coordinate. We know the form of the Lagrangian in terms of q j and p j (the
parameter q̇ j obviously disappears from the description as argued above).
The problem is that straightforward application of variational calculus with respect to q j and p j
is quite intricate. In fact, in order to simplify the derivation of the variation principle, it is useful to
introduce a new functional, called the Hamiltonian H, depending on the generalised coordinates and
momenta, and the time t as follows:
H(p j , q j ,t) = ∑ p j q̇ j − L [q j , q̇ j (qk , pk ),t] . (2.95)
26 Lagrange and Hamilton formulations of classical mechanics

Note that we can indeed express q̇ j in terms of the pk and qk as indicated in the second argument of L
by inversion of Eq. (2.90).1
Let us calculate the derivatives of H with respect to q j and p j :

∂H ∂ q̇k ∂ L ∂ q̇k
= q̇ j + ∑ pk −∑ . (2.96)
∂ pj k ∂ pj k ∂ q˙k ∂ p j

Note that it follows from (2.90) that the second and third terms on the right hand side cancel, so that
we are left with
= q̇ j . (2.97)
∂ pj
Now let us calculate the derivative with respect to q j :

∂H ∂L ∂ q̇k ∂ L ∂ q̇k
=− + ∑ pk −∑ . (2.98)
∂qj ∂ q̇ j k ∂qj k ∂ q̇k ∂ q j

Again using (2.90) we see that the second and third term on the right hand side cancel – furthermore
the first term on the right hand side is equal to − ṗi and we are left with:

= − p˙ j . (2.99)

Eqs. (2.97) and (2.99), together with the definition of the Hamiltonian (2.95) and of the momen-
tum (2.90) are equivalent to the equations of motion. Eqs (2.97) and (2.99) are called Hamilton’s
equations. Note that we must consider the generalised coordinates and the canonical momenta as
independent coordinates, in contrast to the Lagrange picture, in which q j and q̇ j are related by

q̇ j = .
This independence of coordinates and momenta is needed in order to arrive at the correct equations of
motion. When these equations are solved, we obtain relations between them. It is very important to
realise the difference between the formal independence of the coordinates at the level of formulating
the Hamiltonian and deriving the equations of motion and the dependence which is a consequence of
the solution of these equations.
If the system does not depend explicitly on time, the Hamiltonian is the analogue of the energy.
The simplest case is a conservative system with the positions ri as coordinates. In that case it is easy
to see that
H = ∑ i +V (r1 , . . . , rN ). (2.100)
i=1 2m

More generally, let us consider a conservative system formulated in terms of generalised coordinates
q1 , . . . , qs . Note the difference with Eq. (2.2), where ri may contain an explicit time-dependence – in
the present case we assume that the constraints have no explicit time-dependence. In that case it is
possible to express the position coordinates ri in terms of the s generalised coordinates q j , j = 1, . . . , s:

ri = ri (q1 , . . . , qs ) (2.101)
1 For this inversion to be possible, the Lagrangian should be convex, but we shall not go into details concerning this point.
2.9. Applications of the Hamiltonian formalism 27

and therefore the velocities can be calculated as

∂ ri
ṙi = ∑ ∂ q j q˙j . (2.102)

Therefore, if we formulate the kinetic energy ∑i 12 mṙ2i in terms of the generalised coordinates, we
obtain an expression which is quadratic in the q̇ j :
T= ∑ Mk j (q1 , . . . , qs )q̇ j q̇k (2.103)
k, j=1

mi ∂ ri ∂ ri
M jk = Mk j = ∑ . (2.104)
i=1 2 ∂ q j ∂ qk

If we calculate the contribution to the momenta arising from the kinetic energy, we find that they
depend linearly on the q̇ j :
s s
= ∑ M jk + Mk j q̇k = 2 ∑ Mk j q̇k .

pj = (2.105)
∂ q̇ j k=1 k=1

∑ q̇ j p j = 2T (2.106)

H = 2T − (T −V ) = T +V = Energy. (2.107)
For a general system Hamilton’s equations of motion can be used to derive the time derivative of
the Hamiltonian:
s s
dH ∂H ∂H ∂H
=∑ q˙j + ∑ p˙ j + . (2.108)
dt j=1 ∂ q j j=1 ∂ p j ∂t
Using Hamilton’s equation of motion (2.97) and (2.99) we see that the first two terms on the right
hand side cancel and we are left with:
dH ∂H
= . (2.109)
dt ∂t
We see therefore that if H (or L) does not depend explicitly on time, then H is a conserved quantity. If
the potential does not contain a q̇ j dependence, this implies conservation of energy. If the potential on
the other hand does contain such a dependence, then (2.109) implies conservation of some quantity
which plays a role more or less equivalent to energy.

2.9 Applications of the Hamiltonian formalism

In this section we shall reconsider the systems studied before in the Lagrange framework and point out
which features are different when these systems are considered within the Hamiltonian framework.
From the derivation of the Hamiltonian and Hamilton’s equations, it is seen that the latter can be
viewed as a different way of writing Lagrange’s equations. The reason for introducing the Hamiltonian
and Hamilton’s equations is that they are often used in quantum mechanics and because the Hamilton
formalism is more convenient for discovering some conserved quantities.
28 Lagrange and Hamilton formulations of classical mechanics

2.9.1 The three-pulley system

From the Lagrangian (2.61), the momenta p1 and p4 associated with the degrees of freedom l1 and l4
are found as:
 mb  ˙ mb ˙
p1 = m a + l1 + l4 ; (2.110a)
4 4
 mb ˙ mb ˙
p4 = m c + l4 + l1 ; (2.110b)
4 4
After some calculation, we therefore find for the Hamiltonian:
1 h mb i h mb i
H= mc p21 + ma p24 + (p1 − p4 )2 − g ma l1 + (l − l1 − l4 ) + mc l4 . (2.111)
2∆ 4 2
∆ = (ma + mc )mb /4 + ma mc . (2.112)
The Hamilton equations read:
ṗ1 = (ma − )g; (2.113a)
ṗ4 = (mc − )g. (2.113b)
The solution is simple since the right hand sides of these equations are constants:
p1 = (ma − )gt; (2.114a)
p4 = (mc − )gt, (2.114b)
where the initial conditions are that the system is standing still at t = 0. Together with Eqs. (2.110),
we obtain the same solution as in the Lagrangian case. We see that the difference between the two
approaches are not very dramatic in this case. Note that it is now easy to see that for ma = mc = 2mb
the system is in equilibrium.

2.9.2 The spinning top

From the Lagrangian, we can derive the momenta associated with the three degrees of freedom ϕ, ϑ
and ψ:

pϕ = I1 ϕ̇ sin2 ϑ + I1 (ψ̇ + ϕ̇ cos ϑ ) cos ϑ ; (2.115a)

pϑ = I1 ϑ̇ ; (2.115b)
pψ = I3 (ψ̇ + ϕ̇ cos ϑ ). (2.115c)

If we want to express the kinetic energy in terms of these momenta, we need to solve for the time
derivatives of the angular coordinates ϑ , ϕ and ψ in terms of these momenta:
pϕ − pψ cos ϑ
ϕ̇ = ; (2.116a)
I1 sin2 ϑ

ϑ̇ = ; (2.116b)
pψ pϕ − pψ cos ϑ
ψ̇ = − cos ϑ . (2.116c)
I3 I1 sin2 ϑ
2.9. Applications of the Hamiltonian formalism 29

After some calculation, the Hamiltonian is then found to be

(pϕ − pψ cos ϑ )2 p2ϑ p2ψ

H= + + + Mgr cos ϑ . (2.117)
2I1 sin2 ϑ 2I1 2I3

As the Hamiltonian does not depend on ψ and ϕ, we see immediately that pψ and pϕ must
be constant. Coordinates of which only the momentum does appear in the Hamiltonian are called
ignorable: these momenta are constant in time – they represent constants of the motion. We have seen
that both pψ and pϕ are constants of motion.
The Hamiltonian now reduces to a simple form:

H= +U(ϑ ), (2.118)
(pϕ − pψ cos ϑ )2 p2ψ
U(ϑ ) = + + Mgr cos ϑ . (2.119)
2I1 sin2 ϑ 2I3
The Hamilton equations yield
−I1 ϑ̈ = − ṗϑ = . (2.120)

This equation is difficult to solve analytically. Note that apart from the ignorable coordinates, we have
an additional constant of the motion, the energy:

+U(ϑ ) = E = constant. (2.121)
The motion and its analysis will be considered in a worksheet.

2.9.3 Charged particle in an electromagnetic field

Finally we consider again the charged particle in an electromagnetic field. The momentum can be
found as usual from the Lagrangian – we obtain

p = mṙ + qA. (2.122)

The Hamiltonian is
m 2 m (p − qA)2
H = pṙ − ṙ − qṙ · A + qφ = ṙ2 + qφ (r) = + qφ (r). (2.123)
2 2 2m
You might already know that this Hamiltonian is used in quantum mechanics for a particle in an
electromagnetic field.

The two-body problem

In this chapter we consider the two-body problem within the framework of Lagrangian mechanics.
One of the most impressive results of classical mechanics is the correct description of the planetary
motion around the sun, which is equivalent to electric charges moving in each other’s field. With
the analytic solution of this problem, we shall recover the famous Kepler laws. The problem is also
important in quantum mechanics: the hydrogen atom is a quantum version of the Kepler problem.

3.1 Formulation and analysis of the two-body problem

The two-body problem describes two point particles with masses m1 and m2 . We denote their positions
by r1 and r2 respectively, and their relative position, r2 − r1 , by r. Finding the Lagrangian is quite
simple. The kinetic energy is the sum of the kinetic energies of the two particles, and the potential
energy is the interaction, which depends only on the separation r = |r| of the two particles, and is
directed along the line connecting them (note that this last restriction excludes magnetic interactions).
We therefore have:
m1 2 m2 2
L= ṙ + ṙ2 −V (r). (3.1)
2 1 2
Before deriving the equations of motion, we note that instead of writing the kinetic energy as the
sum of the kinetic energies of the two particles, it can also be separated into the kinetic energy of the
centre of mass and that of the relative motion, as in Eq. (1.33):
mi 02
T = TCM + ∑ ri , (3.2)
i=1 2


r0i = ri − rCM ; (3.3)

m1 r1 + m2 r2
rCM = , (3.4)
M 2
TCM = ṙ , M = m1 + m2 . (3.5)
2 CM
As there are only two particles, we can work out the coordinates r0i relative to the centre of mass
rC , and we find, using Eq. (3.4):
r01 = r1 − rCM = (r1 − r2 ), (3.6)
r02 = r2 − rCM = (r2 − r1 ). (3.7)

3.1. Formulation and analysis of the two-body problem 31

We can take time derivatives by simply putting a dot on each r in these equations and then, after some
calculation, we find for the kinetic energy:
M 2 m1 m2 2
T= ṙCM + ṙ . (3.8)
2 2M
The Lagrangian is therefore
M 2 m1 m2 2
L = T −V = ṙ + ṙ −V (r). (3.9)
2 CM 2M
We see that the kinetic energy of the relative motion has the form of the kinetic energy of a single
particle of mass m1 m2 /(m1 + m2 ) and position vector r(t). The mass term µ = m1 m2 /(m1 + m2 ) is
called reduced mass.
Of course we could write down the Euler–Lagrange equations for this Lagrangian as before, but it
is convenient to perform a further separation: that of the kinetic energy of the relative coordinate into
a radial and a tangential part. First we must realise that the plane through the origin and the initial
velocity vector ṙ of the relative position will always remain the plane of the motion, as the force acts
only within that plane. In this plane, we choose an x and a y axis. Then we can conveniently introduce
polar coordinates r and ϕ, in which the x and y coordinate can be expressed as follows:
x = r cos ϕ; (3.10)
y = r sin ϕ. (3.11)
It then immediately follows that the kinetic energy of the relative motion can be rewritten as
µ 2  µ 2
ẋ + ẏ2 = ṙ + r2 ϕ̇ 2 .

2 2
The Lagrange equations given in Eq. (2.54) then take the form:
Mr̈CM = 0; (3.13a)
µr2 ϕ̇ = 0;

dV (r)
µ r̈ − µrϕ̇ 2 = − . (3.13c)
The first equation tells us that the centre of mass moves at constant speed: it does not feel a net
force. This follows from the fact that it does not appear in the potential, and is in accordance with
the conservation of total momentum in the absence of external forces. Coordinates such as rCM with
constant canonical momentum, are called ignorable – see section 2.9.2. The second and third equation
do not depend on rCM – therefore we see that the relative motion can entirely be understood in terms
of a single particle with mass µ and moving in a plane under the influence of a potential V (r).
We now use the second equation to eliminate ϕ̇. First note that the term in brackets occurring
in this equation must be a constant, – we call this constant ` (` = µr2 ϕ̇ is precisely the angular
momentum, and we see that it is conserved); the third equation then transforms into
`2 dV (r)
µ r̈ − 3
=− , (3.14)
µr dr
Note that this equation can be viewed as that of a one-dimensional particle subject to a force F =
− dVdr(r) . Such a force can in turn be derived from a conservative potential:
d d `
F(r) = VEff (r) = +V (r) . (3.15)
dr dr 2µr2
32 The two-body problem

VEff (r)
rmin E> 0
rmin rmax E< 0
0 1 2 3 4

Figure 3.1: Effective potential for a two-particle system.

The subscript Eff is used to distinguish between this ‘effective’ potential and the original, bare attrac-
tion potential V (r). The potential VEff is represented in figure 3.1 for the case V (r) = −1/r. From
Eqs. (3.13b) and (3.13c) and from figure 3.1, we can infer the qualitative behaviour of the motion.
We have seen [Eq. (3.13b)] that the angular momentum is constant. This implies that the motion
will always keep the same orientation (i.e. clockwise or anti-clockwise). If the particles move apart,
the speed at which they orbit around each other will be slower (since increasing r implies decreasing
The motion in the radial direction can be understood qualitatively as follows. Note that we can
interpret Eq. (3.13c) as the motion of a particle in one dimension. This particle has a mechanical
energy which is the sum of its kinetic energy and the effective potential, and this energy should
remain constant. Furthermore, the energy cannot be lower than the lowest value of the effective
potential shown in figure 3.1. If it lies between this value and 0, then then r will vary between some
minimum and maximum value as shown in this figure. If E is on the other hand positive, r will vary
between some minimum value and infinity.
We have seen that the r-component of the two-body motion can be described in terms of a single
particle in one dimension. The energy of this particle is the sum of its kinetic and potential energy
– the latter is the effective potential [see Eq. (3.15)]. It turns out that this energy is equal to the total
energy of the two-particle system (neglecting the contribution of the centre of mass motion to the
latter). As we have already worked out the kinetic and potential energy of the two-body problem
above, we immediately see that

µ 2
ṙ + r2 ϕ̇ 2 +V (r),

E = T +V = (3.16)

which can easily be identified as the kinetic energy µ ṙ2 /2 of the one-dimensional particle plus the
effective potential.
3.2. Solution of the Kepler problem 33

3.2 Solution of the Kepler problem

The special case V (r) = −A/r is very important as it describes the gravitational and the Coulomb
attraction. Also, in this special case, the motion can be studied further by analytical means. Finding
the solution in the form r(t), ϕ(t) is not convenient – rather, we search for r(ϕ), which contains
explicit information about the shape of the orbit.
We use the fact that the angular momentum ` = µr2 ϕ̇ is constant and combine this with the fact
that the energy is constant and given by (3.16):

ϕ̇ = ; (3.17)
2 `2
ṙ2 = (E −V ) − 2 2 . (3.18)
µ µ r

Eliminating the dt of the time derivatives by dividing (3.17) by the square root of (3.18) leads to

dϕ ±`
= . (3.19)
dr r2 [2µ(E −V (r)) − `2 /r2 ]1/2

With V (r) = −A/r this can directly be integrated to give

µAr − `2
ϕ −C = arcsin . (3.20)
ε µAr

In addition to the integration constant C on the left hand side, we see a constant ε, called the eccen-
tricity, which is given in terms of the problem parameters as
ε = 1+ . (3.21)

Inverting Eq. (3.20) to find r as a function of the polar angle ϕ gives:

r= . (3.22)
µA [1 − ε sin(ϕ −C)]

We have some freedom in choosing C – it changes the definition of the angle ϕ. If we take ϕ = 0 as
the angle for which the two particles are closest (perihelion), we see that C = π/2.
The motion can now be classified according to the value of ε. We take ε positive – changing the
sign of ε does not change the shape of the orbit (putting ϕ → ϕ + π compensates this sign change).
For ε = 0, r does not depend on φ . This corresponds to a circle. If 0 < ε < 1, we have an ellipse
(r varies between some maximum and minimum value). For ε = 1, we have a parabola (r → ∞ for
ϕ = π), and for ε > 1 we have an hyperbola (r → ∞ for cos ϕ = 1/ε).
Usually, the notation
`2 1
λ= (3.23)
µA 1 + ε
is used, so that the equation relating the two polar coordinates on the curve of the motion reads:

λ (1 + ε)
r= . (3.24)
1 + ε cos ϕ
34 The two-body problem

a aε
F1 F2

Figure 3.2: Ellipse with various parameters indicated.

In figure 3.2, we indicate the semi-major and semi-minor axis a and b respectively and the focal points.
The semi-major axis can be related to the parameters we use to represent the motion:
a= . (3.25)

The area of an ellipse in terms of its semi-major axis is πa2 1 − ε 2 . This can be related to the angular
momentum by realising that the infinitesimal area swept by a line from the origin to the point of the
motion is given by r2 dϕ/2. This tells us that the rate at which this area changes is given as `2 /(2µ),
so that the total area, which is swept in one revolution of period T is equal to T `/(2µ), so that we
` p
T = πa2 1 − ε 2 . (3.26)

The quantities a and ε are not independent – remember a = λ /(1 − ε); furthermore ` related to λ and
ε [see Eq. (3.23)]. Using this to eliminate ε finally leads to

4π 2 µ 3
T2 = a . (3.27)
We have now recovered all three laws of Kepler:

• All planets move around the sun in elliptical paths. In fact, most planets have eccen-
tricities very close to zero.
• A line drawn from the sun to a planet sweeps out equal areas in equal times. The
rate at which this area increases is given by `2 /(2µ) as we have seen above.
• The squares of the periods of revolution of the planets about the sun are propor-
tional to the cubes of the semimajor axes of the ellipses. See Eq. (3.27).

Examples of variational calculus, constraints

4.1 Variational problems

In the previous chapters, we have considered a reformulation of classical mechanics in terms of a

variational principle. This will lead the way to formulating quantum mechanics – this is the subject of
the next chapter. In this chapter we make an excursion which is still in the field of classical problems,
though not classical dynamics as in the previous chapters. In fact, variational calculus is not only
useful for mechanics. Many physical problems which occur in every day life can be formulated as
variational problems. In the next sections we shall consider a few examples.
We shall first introduce some further analysis concerning the problems we are about to treat in this
chapter. Consider an expression of the form
J= dx F(y, y0 , x). (4.1)

We have used a notation which differs from that used in previous chapter in order to emphasise that J is
not always the action and F not always the Lagrangian. J assigns a real value to every function y(x) –
it is called a functional. There is a whole branch of mathematics, called functional analysis, dedicated
to such objects. Here we shall only consider finding the stationary solutions (minima, maxima or
saddle points) of J; they are given as the solutions to the Euler equations

∂F d ∂F
− =0 (4.2)
∂ y dx ∂ y0

In the case where F does not depend explicitly on x, i.e.

F = F(y, y0 ), (4.3)

we can directly integrate the Euler equation(s) once: by multiplying the Euler equation by y0 we find

∂ F(y, y0 )
F(y, y0 ) − y0 = 0. (4.4)
dx ∂ y0

From this it follows that the solution must obey

∂ F(y, y0 )
F(y, y0 ) − y0 = Constant. (4.5)
∂ y0

This is a first order differential equation: we have integrated the second order Euler equations once.

36 Examples of variational calculus, constraints

4.2 The brachistochrone

Near the end of the 18-th century, Jean Bernouilli was studying a problem, which we formulate as
follows. Suppose you are to design a monorail in an amusement park. There is a track in your monorail
where the trolleys, which arrive at some high point A with low (approximately zero) speed should
move to another place B under the influence of gravity (no motor is used and friction is neglected) in
the shortest possible time. The problem is to design the shape of the track in order to achieve this goal.
it will be clear that the track lies in a plane. Let us first consider the possible solutions heuristically.
One could argue that a straight line would be the best solution because it is the shortest path between
A and B. On the other hand, it would seem favourable to increase the particle’s velocity as much as
possible in the beginning of the motion. This would call for a steep slope near the starting point A,
followed by a more or less horizontal path to B, but the resulting curve is considerably longer than
the straight line, which is the shortest path between A and B. We must therefore find the optimum
between the shortness of the path and the earliest increase of the velocity by a steeper slope.
We can solve this problem using the techniques of the previous section. We must minimise
the time for a curve which can be parametrised as x(s), y(s). Obviously, there are many ways to
parametrise a curve – we shall use for s the distance along the curve, measured from the point A. The
infinitesimal segment ds is given by
s  2
p q
2 2
ds = dx + dy = dx 1 + = dx 1 + y0 2 (4.6)
s can be expressed as a function of t – the relation between the two is given by

ds = vdt (4.7)
where v = v2x + v2y is the particle speed. The time needed to go from A to B is given by
t= . (4.8)
A v
We need an equation for v in terms of the path length. As the gravitational force is responsible for the
change in velocity it is useful to consider the x- and y-components of the path. In fact, we have the
following relation between v and y as a result of conservation of energy:
1 2
v = gy, (4.9)
where the height y is measured from the point A. This means that we have put in the boundary
condition that when y = 0, then v = 0, which is correct since the particle is released from A with zero
velocity. Therefore, using (4.9), we arrive at
Z x0 p
1 + y0 2
t[y] = dx √ (4.10)
0 2gy
where x0 is the horizontal distance between A and B. We have to find the stationary function y(x) for
the functional t[y]. The Euler-Lagrange equations have the solution [see Eq. (4.5)]:
s s
1 + y0 2 2 1
− y0 = Constant. (4.11)
1 + y0 2 2gy

4.3. Fermat’s principle 37

This can be simplified to

y(1 + y0 ) = C = Constant. (4.12)
In order to solve this equation we substitute
y0 = tan φ (4.13)
so that we have:  
2 1 1
y = C cos φ = C + cos(2φ ) . (4.14)
2 2
dx 1 dy C sin(2φ )
= 0 = = 2C cos2 φ . (4.15)
dφ y dφ tan φ
The solution is therefore
x = C φ + sin(2φ ) + D (4.16a)
1 1
y=C + cos(2φ ) . (4.16b)
2 2
D and C are integration constants – if we identify the point A with (0, 0), the curve starts at φ = π/2,
and D/C = −π/2. The two coordinates of B are used to fix the value of φ at point B and the constants
D and C. Note that the boundary condition y0 = 0 at point A was already realised in Eq. 4.9). The
resulting curve is called the cycloid – it is the curve described by a point on a horizontally moving

4.3 Fermat’s principle

The path traversed by a ray of light in a medium with a varying refractive index is not a straight line.
According to Fermat’s principle this path is determined by the requirement that the light ray follows
the path which allows it to go from one point to another in the shortest possible time. The time needed
to traverse a path is determined by the speed of light along that path, and this quantity is given as
c(n) = (4.17)
where c is the speed of light in vacuum and n is the refractive index. The latter might vary with
If the path lies in the xy plane, the path length dl of a segment corresponding to a distance dx
along the x-axis is given by s  2
dl = dx 1 + . (4.18)
The time dt needed to traverse the path dl is given as:
dt = , (4.19)
so that the total time can now be given as an integral over dx:
 s 
Z L  2
n(y) dy
t= dx  1+ . (4.20)
0 c dx
38 Examples of variational calculus, constraints


Here we have assumed that n depends on the coordinate y only. Now take n(y) = 1 + y2 , then we
must minimise  s 
Z L  2
p dy 
ct = dx  1 + y2 1 + . (4.21)
0 dx

For this case, the Euler-Lagrange equations reduce to the equation [see Eq. (4.5)]

dy 1
= (1 − A2 ) + y2 . (4.22)
dx A
The solution is given as
p x 
y(x) = ± 1 − A2 sinh +B . (4.23)
The possible range of A-values is |A| ≤ 1.

4.4 The minimal area problem

Consider a soap film which is suspended between two parallel hoops (see figure). The soap film has
a finite surface tension, which means that its energy scales linearly with its area. As the film tends to
minimise its energy, it minimises its area. The minimal area for a surface of revolution described by a
function y(x) is given by:
Z L q
2π dx y 1 + y0 2 . (4.24)

Minimising this functional of y leads to the standard Euler-Lagrange solution Eq. (4.5) for functionals
with no explicit time dependence:
p =C (4.25)
1 + y0 2
The solution to this equation is given by
y(x) = C cosh (4.26)
4.5. Constraints 39

We now assume that the hoops have the same diameter. Let us furthermore choose the x-axis such
that the origin is in the middle between the two hoops. Using the fact that cosh is an even function,
we have:  
R = C cosh . (4.27)
where R is the radius of the hoops. Consider now the graph of C cosh[L/(2C)] as a function of C for
fixed L: It is clear that for R lying in the “gap” of the graph, no solution can be found. What happens




-4 -2 0 2 4

is that if the hoops are not too far apart, the soap film will form a nice cosh-formed shape. However,
when we pull the hoops apart, there will be moment at which the film can no longer be sustained and
it collapses.
It can be seen from the graph that usually there are two different solutions. The one with the
smallest surface is to be selected. The surface area is found as
h p i
A(y) = π LC + 2R C2 + R2 . (4.28)

4.5 Constraints

4.5.1 Constraint forces

In d’Alembert’s approach, the forces of constraint are neglected as they are usually of limited physical
importance. In some cases, however, it might be useful to know what these forces are. For example, a
designer of a monorail would like to know the force which is exerted on that rail by the train in order
to certify that the monorail is robust enough. In fact, it is possible to work out within a Lagrangian
analysis what the forces of constraint are.
Let us first recall the solution to the following problem

Find the minimum of the function f (x), where x = (x1 , x2 , . . . , xN ), under the condition g(k) (x) =
Ck , where Ck are constants; k = 1, . . . , K.

Consider a small variation δ x such that g(k) (x + δ x) = Ck still holds for all k. Then it holds that

g(k) (x) + δ x · ∇g(k) (x) = g(k) (x) = Ck (4.29)

40 Examples of variational calculus, constraints

δ x · ∇g(k) (x) = 0 (4.30)
for all k. On the other hand, for variations δ x satisfying (4.30), f should not change to first order
along δ x, so we have
δ x · ∇ f (x) = 0. (4.31)
Now we can show that ∇ f (x) must lie in the span of the set ∇g(k) (x). If it would lie outside the span,
we can write it as the sum of a vector lying in the span of ∇g(k) (x) plus a vector perpendicular to this
space. If take δ x to be proportional to the latter, then (4.30) is satisfied, but (4.31) is not. Therefore
we conclude that ∇ f can be written as a linear combination of the gradients ∇g(k) :
∇ f (x) = ∑ λk ∇g(k) (x). (4.32)

This is the well-known Lagrange multiplier theorem.

Let us consider a simple example: finding the minimum or maximum of the function f (x, y) = xy
on the unit circle: g(x, y) = x2 + y2 − 1 = 0. There is only one Lagrange parameter λ , and Eq. (4.32)
for this case reads
(y, x) = λ (2x, 2y), (4.33)
2 2
√ solution is x = ±y,√λ = ±1/2. The constraint x + y = 1 then fixes the solution to x = ±y =
1/ 2 and x = ±y = −1/ 2. Indeed, the symmetry of the problem allows only the axes x = ±y and
the x = 0 or y = 0 as the possible solutions, and it is easy to identify the stationary points (minima,
maxima, saddle points) among these.
Now suppose we have a mechanical N-particle system without constraints. For such a system
we know the Lagrangian L. The combined coordinates of the system are represented as a vector
R = (r1 , . . . , rN ). Then we have for any displacement δ R that the corresponding change in Lagrangian
δ L(R, Ṙ,t)
∑ δ ri · δ ri ≡ δ L(R, Ṙ,t) = 0 (4.34)

for all t, where we have used the notation of (2.55). Now suppose that there are constraints present of
the form
g(k) (R) = 0. (4.35)
The argument used above for ordinary functions of a single variable can be generalised to show that
we should have
δ L(R, Ṙ,t)
= ∑ λk ∇R g(k) (R); (4.36)
δR k=1

the reader is invited to verify this. As L is the Lagrangian of a mechanical system without constraints,
we know that the left hand side of this equation can be written as ṗ − FA . The right hand side has the
dimension of a force and must therefore coincide with the constraint force.
Let us analyse the simple example of the pendulum once again. Without constraints we have
m 2
ẋ + ẏ2 − mgy.

L= (4.37)
The constraint is given by
l 2 = x 2 + y2 . (4.38)
4.5. Constraints 41

So the pendulum equations of motion become

mẍ = 2λ x (4.39a)
mÿ = −mg + 2λ y. (4.39b)

These equations cannot be solved analytically, as they describe the full pendulum, and not the small
angle limit. In the small angle limit the force in the x-direction dominates, and therefore mg/(2λ )
should be approximately equal to l. We then see
that λ is negative, so that the solution is oscillatory
p p
and the frequency is given by ω = 2λ /m = g/l. The λ -dependent terms in the equation of
motion represent indeed a force in the +y direction of magnitude mg: this is the tension in the string
or rod on which the weight is suspended.
When using polar coordinates, we have

L=m + + mg cos ϕ, (4.40)
2 2

with the constraint r = l. which leads to the Lagrange equations:

mr̈ = mg cos ϕ + mrϕ̇ 2 − λ ; (4.41)

m r2 ϕ̈ + rṙϕ̇ = −mgr sin ϕ.


Filling the constraint is particularly easy. The constraint force is given by

λ = mg cos ϕ + mrϕ̇ 2 . (4.43)

The constraint force consists of a term which compensates for the gravity force (first term) and an
extra term which is necessary for keeping the circular motion going (a centripetal force, the second
term). The equation for ϕ̈ reduces to the usual pendulum equation when the constraint is used:

ϕ̈ = −g/l sin ϕ. (4.44)

In practice, constraints are seldom used explicitly in the solution of mechanical problems.

4.5.2 Global constraints

In (4.5.1) we have analysed constraints of the form:

g(k) (r1 , r2 , . . . , rN ;t) = 0. (4.45)

This type of constraint is called holonomic, and it frequently allows us to represent the system using
generalised coordinates. This type of constraint imposes conditions on the system which should hold
at any moment in time. We may therefore consider this type of constraints as an infinite set (one
constraint for each time). Such constraints are called local, where this term refers to the fact that the
constraint is local in time.
Sometimes however, we must deal with constraints of a different form. Consider for example the
problem of finding the shape of a chain of homogeneous density ρ (= mass/unit length) suspended
at its two end points. We represent this shape by a function y(x) where x is the coordinate along the
line connecting the two end points and y(x) is the height of the chain for coordinate x. The shape
42 Examples of variational calculus, constraints

x1 x2

Figure 4.1: Example of a function δ y(x) which is nonzero only near x1 and x2 .

is determined by the condition that it minimises the (gravitational) potential energy, and it is readily
seen that this energy is given by the functional
s 2
Z X 
V = gρ dx y 1+ . (4.46)
0 dx

We leave out the constants g and ρ in the following as they do not affect the shape.
If we would minimise the potential energy (4.46), we would get divergences, as we have not yet
restricted the total length L of the wire to have a fixed length. This requirement can be formulated as
s 2
Z X 
L= dx 1+ . (4.47)
0 dx

This is a constraint which is not holonomic, and there is no way to reduce the number of degrees of
freedom. This type of constraint is called global as it is formulated as a condition on an integral of the
same type as the functional to be minimised, and is not to be satisfied for all values of the integration
variable. Therefore, we must generalise the derivation of the Euler equations to cases in which a
functional constraint is present.
Let us consider two functionals, J and K:
Z b
J= dx F(y, y0 , x) (4.48a)
Z b
K= dx G(y, y0 , x), (4.48b)

and suppose we want to minimise J under the condition that K has a given value, i.e., for each variation
δ y which satisfies: Z
δ G(y, y0 , x) dx = 0; (4.49)

we require that Z
δ F(y, y0 , x) dx = 0. (4.50)

Consider now a particular variation which is nonzero only in a small neighbourhood of two values x1
and x2 (see figure 4.1). If the areas under these two humps are A1 and A2 respectively, we have

δ G[y(x1 ), y0 (x1 ), x1 ] δ G[y(x2 ), y0 (x2 ), x2 ]

Z b
δ G(y, y0 , x) dx = A1 + A2 , (4.51)
a δy δy
4.5. Constraints 43

(0,0) (X,0)

Figure 4.2: The cosh solution to the suspended chain problem.

and therefore
δ G1 /δ y A2
=− (4.52)
δ G2 /δ y A1
with an obvious shorthand notation.
Applying this argument once again we see that for functions y(x) satisfying requirement (4.52),
we should have
δ F1 /δ y A2
=− (4.53)
δ F2 /δ y A1
But this can only be true for arbitrary x1 and x2 when δ F/δ y and δ G/δ y are proportional:

δF δG
=λ . (4.54)
δy δy

Therefore, we must solve the Euler equations for the combined functional

J(y) − λ K(y) (4.55)

where λ is fixed by putting the solution of this minimisation back into the constraint. This is the
Lagrange multiplier theorem for functionals. p
p We shall now apply this to the suspended chain problem. We have F = y 1 + y and G =
1 + y0 2 . Therefore, the Euler equations read:
y 02
(y + λ ) 1 + y0 − p = Constant, (4.56)
1 + y0 2

which leads to q
y+λ =C 1 + y0 2 . (4.57)
The solution is given by
y(x) = A cosh [α(x − x0 )] + B (4.58)
44 Examples of variational calculus, constraints


A = C = 1/α (4.59a)
B = −λ . (4.59b)

Boundary conditions are y(0) = y(X) = 0 and the length of the wire must be equal to L. These
conditions fix x0 and λ : x0 = X/2, λ = − cosh [X/(2C)] and C = L/ {sinh [X/(2C)]}. In figure 4.2
the solution is shown.

From classical to quantum mechanics

In the first few chapters we have considered classical problems, in particular the variational formula-
tion of classical mechanics, in the formulations of Hamilton and Lagrange. In this chapter, we look at
quantum mechanics. In the first section, we introduce quantum mechanics by formulating the postu-
lates on which the quantum theory is based. Later on, we shall then try to establish the link between
the classical mechanics and quantum mechanics, via Poisson brackets and via the path integral.

5.1 The postulates of quantum mechanics

When we consider classical mechanics, we start from Newton’s laws and derive the behaviour of
moving bodies subjects to forces form these laws. This is a nice approach as we always like to see
a structured presentation of the world surrounding us. However, in reality, for thousands of years
people have thought about motion and forces before Newton’s compact formulation of the underlying
principles was found. It is not justified to forget this and to pretend that physics only consists on
understanding and predicting phenomena from a limited set of laws. The ‘dirty’ process of walking
in the dark and trying to find a comprehensive formulation on the phenomena under consideration is
an essential part of physics.
This also holds for quantum mechanics, although it was developed in a substantially shorter
amount of time than classical mechanics. In fact, quantum mechanics started at the beginning of
the twentieth century, and its formulation was more or less complete around 1930. This formulation
consisted of a set of postulates which however do not have a canonized form similar to Newton’s laws:
most books have their own version of these postulates and even their number varies.
We now present a particular formulation if these postulates.
1. The state of a physical system at any time t is given by the wavefunction of the system at that time.
This wavefunction is an element of the Hilbert space of the system. The evolution of the system
in time is determined by the Schrödinger equation:

i} |ψ(t)i = Ĥ |ψ(t)i .
Here Ĥ is an Hermitian operator: the Hamiltonian
2. Any physical quantity Q is being represented by an Hermitian operator Q̂.
When we perform a measurement of the quantity Q, we will always find one of the eigenvalues of
the operator Q̂. For a system in the state |ψ(t)i, the probability of finding a particular eigenvalue
λi , with an associated eigenvector |φi i of Q̂ is given by

|hφi |ψ(t)i|2
Pi = .
hψ(t)|ψ(t)i hφi |φi i

46 From classical to quantum mechanics

Immediately after the measurement, the system will find itself in the state |φi i corresponding to
the value λi which was found in the measurement of λi .

Several remarks can be made.

1. The wavefunction contains the maximum amount of information we can have about the system. In
practice, we often do not know the wavefunction of the system.

2. Note that the eigenvectors |φi i always form a basis of the Hilbert space of the system under
consideration. This implies that the state |ψ(t)i of the system before the measurement can always
be written in the form
|ψ(t)i = ∑ ci |φi i .

The probability to find in a measurement the values λi is therefore given by

|ci |2
Pi = 2 .
∑ j c j

For a normalised state |ψ(t)i it holds that, if the eigenvectors |φi i are normalised too:

∑ |ci |2 = 1.

In that case
Pi = |ci |2 .

3. So far we have suggested in our notation that the eigenvalues and eigenvectors form a discrete set.
In reality, not only discrete, but also continuous spectra are possible. In those cases, the sums are
replaced by integrals.

4. In understanding quantum mechanics, it helps to make a clear distinction between the formalism
which described the evolution of the wavefunction (the Schrödinger equation, postulate 1) versus
the interpretation scheme. We see that the wavefunction contains the information we need to
predict the outcome of measurements, using the measurement postulate (number 2).

It now seems that we have arrived at a formulation of quantum mechanics which is similar to
that of classical mechanics: a limited set of laws (prescriptions) from which everything can be de-
rived, provided we know the form of the Hamiltonian (this is analogous to the situation in classical
mechanics, where Newton’s laws do not tell us what the form of the forces is).
However there is an important difference: the classical laws of motion can be understood by
using our everyday life experience so that we have some intuition for their meaning and content. In
quantum mechanics, however, our laws are formulated as mathematical statements concerning objects
(vectors and operators) for which we do not have a natural intuition. This is the reason why quantum
mechanics is so difficult in the beginning (although its mathematical structure as such is rather simple).
You should not despair when quantum mechanics seems difficult: many people find it difficult, and
the role of the measurement is still the object of intensive debate. Sometimes you must switch your
intuitition off and use the rules of linear algebra to solve problems.
Above, we have mentioned that quantum mechanics does not prescribe the form of the Hamil-
tonian. In fact, although the Schrödinger equation, quite unlike the classical equation of motion, is
a linear equation, which allows us to make ample use of linear algebra knowledge, the structure of
5.2. Relation with classical mechanics 47

quantum mechanics is richer than that of classical mechanics because in principle any type of Hilbert
space cold occur in Nature. In classical mechanics, the space containing all possible states of a system
is essentially a 6N dimensional space (for a N-body system we have 3N space- and 3N momentum
coordinates). In quantum mechanics, wavefunctions can be part of infinite-dimensional spaces (like
the wave functions of a particle moving along a one-dimensional axis) but they can also lie in a finite-
dimensional space (for example spin which has no classical analogue).

5.2 Relation with classical mechanics

In order to see whether we can guess the structure of the Hamiltonian for systems which have a
classical analogue, we consider the time evolution of a physical quantity Q. We assume that Q does
not depend on time explicitly. However, the expectation value of Q may vary in time due to the change
of the wavefunction in the course of time. For normalised wavefunctions:
d ∂ ∂
(hψ(t)|Q|ψ(t)i) = hψ(t)| Q |ψ(t)i + hψ(t)| Q |ψ(t)i .
dt ∂t ∂t
Using the Schrödinger equation and its Hermitian conjugate:

−i} hψ(t)| = hψ(t)| Ĥ,
(note the minus-sign on the left hand side which results from the Hermitian conjugate) we obtain

i} (hψ(t)|Q|ψ(t)i) = ψ(t)|Q̂Ĥ − Ĥ Q̂|ψ(t) = ψ(t)| Q̂, Ĥ |ψ(t) ,
where Q̂, Ĥ is the commutator. We see that the time derivative of Q̂ is related to the commutator
between Q̂ and Ĥ. This should wake you up or ring a bell. In the exercises, we have seen that for any
function f (q j , p j ) of the coordinates q j and momenta p j , the time derivative is given by
df ∂ f ∂H ∂ f ∂H
=∑ − ≡ { f , H} .
dt j ∂qj ∂ pj ∂qj ∂ pj

We see that this equation is very similar to that obtained above for the time derivative of the expectation
value of the operator Q̂! The differences consist of replacing the Poisson bracket by the commutator
and adding a factor i}. It seems that classical and quantum mechanics are not that different after all.
Could this perhaps be a guide to formulate quantum mechanics for systems for which have already a
classical version? This turns out to be the case.
As an example, we start by considering a one-dimensional system for which the relevant classical
observables are the position x and the momentum p. Classically, we have
∂x ∂ p ∂x ∂ p
{x, p} = − = 1.
∂x ∂ p ∂ p ∂x
The second term in the expression vanished because x and p are to be considered as independent
coordinates. From this, we may guess the quantum version of this relation:
[x, p] = i}
which should sound familiar (if it does not, return to the second year quantum mechanics course). It
seems that our recipe of making quantum mechanics out of classical mechanics makes sense! There-
fore we can now state the following rule:
48 From classical to quantum mechanics

If the Hamiltonian of some classical system is known, we can use the same form in quantum
mechanics, taking into account the fact that the coordinates q j and p j become Hermitian operators
and that their commutator relations are:

[q j , qk ] = 0; [p j , pk ] = 0; [q j , pk ] = δ jk .

You can verify these extended commutation relations easily by working out the corresponding classi-
cal Poisson brackets.
In the second year, you have learned that
} d
p̂ = .
i dx
What about this relation? It was not mentioned here so far. The striking message here is that this
relation can be derived from the commutation relation. In order to show this, we must discuss another
object you might have missed too: the wavefunction written in the form ψ(r) (for a particle in 3D). It is
important to study the relation between this and the state |ψi. Consider a vector a in two dimensions.
This vector can be represented by two numbers a1 and a2 , which are the components of the vector
a. However, the actual values of the components depend on how we have chosen our basis vectors.
The vector a is an arrow in a two dimensional space. In that space, a has a particular length and a
particular orientation. By changing the basis vectors, we do not change the object a, but we do change
the numbers a1 and a2 .
In the case of the Hilbert space of a one-dimensional particle, we can use as basis vectors the
states in which the particle is localised at a particular position x. We call these states |xi. They are
eigenvectors of the position operator x̂ with eigenvalue x:

x̂ |xi = x |xi .

The states |xi are properly normalised:

x|x = δ (x − x0 ),

where δ (x − x0 ) is the Dirac delta-function. We now can define ψ(x):

ψ(x) = hx|ψi ,

that is, ψ(x) are the ‘components’ of the ‘vector’ |ψi with respect to the basis |xi. For three dimen-
sions, we have a wavefunction which is expressed with respect to the basis |ri.
In order to derive the representation of the momentum operator, p̂ = }i dx , we first calculate the
matrix element of the commutator:

x|[x̂, p̂]|x0 = x|x̂ p̂ − p̂x̂|x0 = (x − x0 ) x| p̂|x0 .

The last expression is obtained by having x̂ in the first term act on the bra-vector hx| on its left, and on
the ket |x0 i on the right in the second term.
On the other hand, using the commutation relation, we know that

x|[x̂, p̂]|x0 = i} x|x0 .

This is an even function of x − x0 , as interchanging x and x0 does not change the matrix element on the
right hand side. Since this function is equal to (x − x0 ) hx| p̂|x0 i, we know that hx| p̂|x0 i must be an odd
function of x − x0 .
5.2. Relation with classical mechanics 49

Now we evaluate the matrix element hx| p̂|ψi. We recall from linear algebra that, since |xi are the
eigenstates of an Hermitian operator, they form a complete set, that is:
I= |xi hx| dx,

where I is the unit operator. Then we can write


x| p̂|x0 x0 |ψ dx0 .

hx| p̂|ψi =

Now we perform a Taylor expansion around x in order to rewrite hx0 |ψi:

0 d (x0 − x)2 d 2
x |ψ = hx|ψi + (x0 − x) hx|ψi + hx|ψi + · · ·
dx 2! dx2
Then we obtain

(x0 − x)2 d 2

0 0
hx|ψi + · · · dx0 .

hx| p̂|ψi = x|p|x hx|ψi + (x − x) hx|ψi +
dx 2! dx2

The first term in brackets gives a zero after integration, as it is multiplied by hx| p̂|x0 i, which was an
odd function of x − x0 . The second term gives

d d

x| p̂|x0 (x0 − x) hx|ψi = −i} hx|ψi ,

dx dx
where we have used the relation

x|p|x0 (x0 − x) = −i}δ (x0 − x).

We use the same relation for the second term. But then we obtain a term of the form

(x0 − x)δ (x0 − x)

in the integral over dx0 . This obviously yields a zero. The same holds for all higher order terms, so
we are left with
} d
hx|p|ψi = hx|ψi ,
i dx
which is the required result.
Having obtained this we can analyse the form of the eigenstates of the momentum operator:

p̂ |pi = p |pi .

The states |pi can be represented in the basis hx|; the components then are hx|pi. We can find the form
of these functions by using the eigenvalue equation and the representation of the momentum operator
as a derivative:

hx| p̂|pi = p hx|pi and

} d
hx| p̂|pi = hx|pi .
i dx
50 From classical to quantum mechanics

The first of these equation expresses the fact that |pi is an eigenstate of the operator p̂, and the
second one follows directly from the fact that the momentum operator acts as a derivative in the
x-representation. Combining these two we obtain a simple differential equation

} d
hx|pi = p hx|pi ,
i dx
with a normalised solution:
hx|pi = √ eipx/} .
This allows us to find any state ψ in the momentum representation, that is, the representation in
which we use the states |pi as basis states:

hp|ψi = hp|xi hx|ψi dx = eipx/} ψ(x) dx.
The analysis presented here for a one-dimensional particle can be generalised to three or more dimen-
sions in a natural way.

5.3 The path integral: from classical to quantum mechanics

The path integral is a very powerful concept for connecting classical and quantum mechanics. More-
over, this formulation renders the connection between quantum mechanics and statistical mechanics
very explicit. We shall restrict ourselves here to a discussion of the path integral in quantum mechan-
ics. The reader is advised to consult the excellent book of Feynman and Hibbs (Quantum Mechanics
and Path Integrals, McGraw-Hill, 1965) for more details.
The path integral formulation can be derived from the following heuristics:

• A point particle which moves with momentum p at energy E can also be viewed as a wave with a
phase ϕ given by
ϕ = k · r − ωt
where p = }k and E = }ω.

• For a single path, these phases are additive, i.e. the phases for different segments of the path should
be added.

• The probablity to find a particle which at t = t0 was at r0 , at position r1 at time t = t1 , is given

by the absolute square of the sum of the phase factors exp(iϕ) of all possible paths leading from
(r0 ,t0 ) to (r1 ,t1 ):
P(r0 ,t0 ; r1 ,t1 ) = ∑ e


all paths

This probability is defined up to a constant which can be fixed by normalization (i.e. the term
within the absolute bars must reduce to a delta-function in r1 − r0 ).

These heuristics are the analog of the Huygens principle in wave optics.
To analyse the consequences of these heuristics, we chop the time interval between t0 and t1 into
many identical time slices (see Fig. 5.1) and consider one such slice. Within this slice we take the
path to be linear. To simplify the analysis we consider one-dimensional motion. We first consider the
5.3. The path integral: from classical to quantum mechanics 51


ti tf

Figure 5.1: A possible path running from an initial position xi at time ti to a final position xf at time tf . The time
is divided up into many identical slices.

contribution of k · x to the phase difference. If the particle moves in a time ∆t over a distance ∆x, we
know that its k-vector is given by
mv m∆x
k= = .
} }∆t
The phase change resulting from the displacement of the particle can therefore be given as

∆ϕ = k∆x = .
We still must add the contribution of ω∆t to the phase. Neglecting the potential energy we obtain

m∆x2 }2 k2 m∆x2
∆ϕ = − ∆t = .
}∆t 2m} 2}∆t
The potential also enters through the ω∆t term, to give the result:

m∆x2 V (x)
∆ϕ = − ∆t.
2}∆t }
For x occurring in the potential we may choose any value between x0 and x1 – the most accurate result
is obtained by substituting the mean value.
If we now use the fact that phases are additive, we see that for the entire path the phases are given
by (  )
m x(t j+1 ) − x(t j ) 2 V [x(t j )] +V [x(t j+1 )]

ϕ= ∑ − ∆t.
} j 2 ∆t 2
This is nothing but the discrete form of the classical action of the path! Taking the limit ∆t → 0 we
1 t1 mẋ2 1 t1
Z   Z
ϕ= −V (x) dt = L(x, ẋ) dt.
} t0 2 } t0
52 From classical to quantum mechanics

We therefore conclude that the probability to go from r0 at time t0 is to r1 at time t1 is given by

 Zt  2
P(r0 ,t0 ; r1 ,t1 ) = N ∑ exp

L(x, ẋ) dt

all paths } t0

where N is the normalization factor r

N = .
This now is the path integral formulation of quantum mechanics.
Let us spend a moment to study this formulation. First note the large prefactor 1/} in front of
the exponent. If the phase factor varies when varying the path, this large prefactor will cause the
exponential to vary wildly over the unit circle in the complex plane. The joint contribution to the
probability will therefore become very small. If on the other hand there is a region in phase space
(or ‘path space’) where the variation of the phase factor with the path is zero or very small, the phase
factors will add up to a significant amount. Such regions are those where the action is stationary, that
is, we recover the classical paths as those giving the major contribution to the phase factor. For } → 0
(the classical case), only the stationary paths remain, whereas for small }, small fluctuations around
these paths are allowed: these are the quantum fluctuations.
You may not yet recognise how this formulation is related to the Schrödinger equation. On the
other hand, we may identify the expression within the absolute signs in the last equation with a matrix
element of the time evolution operator since both have the same meaning:
i 1
x1 |Û(t1 − t0 )|x0 = ∑ N exp

L(x, ẋ) dt .
all paths } t0

This particular form of the time evolution operator is sometimes called the propagator. Let us now
evaluate this form of the time evolution operator acting for a small time interval ∆t on the wavefunction
Z ∞  Zt  2  
i 1 ẋ (t)
ψ(x1 ,t1 ) = N D[x(t)] exp m −V [x(t)] dt ψ(x0 ,t0 ) dx0 .
−∞ } t0 2
The notation D[x(t)] indicates an integral over all possible paths from (x0 ,t0 ) to (x1 ,t1 ). We first ap-
proximate the integral over time in the same fashion as above, taking t1 very close to t0 , and assuming
a linear variation of x(t) from x0 to x1 :
Z ∞
(x1 − x0 )2 V (x0 ) +V (x1 )
ψ(x1 ,t1 ) = N exp m − ∆t ψ(x0 ,t0 ) dx0 .
−∞ } 2∆t 2 2

A similar argument as used above to single out paths close to stationary ones can be used here to argue
that the (imaginary) Gaussian factor will force x0 to be very close to x1 . The allowed range for x0 is

(x1 − x0 )2  .
As ∆t is taken very small, we may expand the exponent with respect to the V ∆t term:
Z ∞
i (x1 − x0 )2
i[V (x0 ) +V (x1 )]
ψ(x1 ,t1 ) = N exp m 1− ∆t ψ(x0 ,t0 ) dx0 .
−∞ } 2∆t 2}
5.4. The path integral: from quantum mechanics to classical mechanics 53

As x0 is close to x1 we may approximate }[V (x0 )+V

(x1 )]
by }V (x1 ). We now change the integration
variable from x0 to u = x0 − x1 :
Z ∞
i u2
ψ(x1 ,t1 ) = N exp m [1 − i/}V (x1 )∆t] ψ(x1 + u,t0 ) du.
−∞ } 2∆t
As u must be small, we can expand ψ(x) about x1 and obtain
Z ∞
im u2 u2 ∂ 2
i ∂
ψ(x1 ,t1 ) = N exp 1 − V (x1 )∆t ψ(x1 ,t0 ) + u ψ(x1 ,t0 ) + ψ(x1 ,t0 ) du.
−∞ } 2∆t } ∂x 2 ∂ x2
Note that the second term in the Taylor expansion of ψ leads to a vanishing integral as the integrand
is an antisymmetric function of u. All in all, after evaluating the Gaussian integrals, we are left with
i∆t i}∆t ∂ 2
ψ(x1 ,t1 ) = ψ(x1 ,t0 ) − V (x1 )ψ(x1 ,t0 ) + ψ(x1 ,t0 ).
} 2m ∂ x2
ψ(x1 ,t1 ) − ψ(x1 ,t0 ) ∂
≈ ψ(x1 ,t1 ),
∆t ∂t
we obtain the time dependent Schrödinger equation for a particle moving in one dimension:
}2 ∂ 2

i} ψ(x,t) = − +V (x) ψ(x,t).
∂t 2m ∂ x2
You may have found this derivation a bit involved. It certainly is not the easiest way to arrive at
the Schrödinger equation, but it has two attractive features;
• Everything was derived from simple heuristics which were based on viewing a particle as a wave
and allow for interference of the waves;

• The formulation shows that the classical path is obtained from quantum mechanics when we let
} → 0.

5.4 The path integral: from quantum mechanics to classical mechanics

In the previous section we have considered how we can arrive from classical mechanics at the Schrödinger
equation. This formalism can be generalised in the sense that for each system for which we can
write down a Lagrangian, we have a way to find a quantum formulation in terms of the path integral.
Whether a Schrödinger-like equation can be found is not sure: sometimes we run into problems which
are beyond the scope of these notes. In this section we assume that we have a system described by
some Hamiltonian and show that the time evolution operator has the form of a path integral as found
in the previous section.
The starting point is the time evolution operator, or propagator, which, for a time-independent
Hamiltonian, takes the form
D i E
U(rf ,tf ; ri ,ti ) = rf e− } (tf −ti )Ĥ ri .

The matrix element is difficult to evaluate – the reason is that the Hamiltonian which, for a particle in
one dimension, takes the form
}2 d 2
Ĥ = − +V (x)
2m dx2
54 From classical to quantum mechanics

is the sum of two noncommuting operators. Although it is possible to evaluate the exponents of the
separate terms occurring in the Hamiltonian, the exponent of the sum involves an infinite series of
increasingly complicated commutators. For any two noncommuting operators  and B̂ we have

eÂ+B̂ = e eB̂ e−1/2[Â,B̂]−1/12([Â,[Â,B̂]]+[B̂,[B̂,Â]])+1/24[Â,[B̂,[Â,B̂]]]+...

This is the so-called Campbell–Baker–Hausdorff (CBH) formula. The cumbersome commutators

occurring on the right can only be neglected if the operators A and B are small in some sense. We can
try to arrive at an expression involving small commutators by applying the time slicing procedure of
the previous section:
e− } (tf −ti )Ĥ = e− } ∆t Ĥ e− } ∆t Ĥ e− } ∆t Ĥ . . .
i i i i

Note that no CBH commutators occur because ∆t Ĥ commutes with itself.

Having this, we can rewrite the propagator as (we omit the hat for operators)
U(xf ,tf ; xi ,ti ) = dx1 . . . dxN−1 xf |e−i∆tH/} |xN−1 xN−1 |e−i∆tH/} |xN−2 · · · x1 |e−i∆tH/} |xi .

Now that the operators occurring in the exponents can be made arbitrarily small by taking ∆t very
small, we can evaluate the matrix elements explicitly:
D i∆t 2
E D i∆t 2
x j |e−i∆tH |x j+1 = x j |e− } [p /(2m)+V (x)] |x j+1 = e−i∆tV (x j )/} x j |e− } p /(2m) |x j+1 .

The last matrix element can be evaluated by inserting two unit operators formulated in terms of inte-
grals over the complete sets |pi:
− i∆t p2 /(2m) 0 i∆t 2
hx|pi hp| e− } p̂ /(2m) p0 p0 |x0 .

x|e } |x =

We have seen that hx|pi = exp(ipx/})/ 2π}. Realising that the exponential operator is diagonal in
p space, we find, after integrating over p:
D i∆t 2
E 1
x|e− } p /(2m) |x0 = √ exp im(x − x0 )2 /(2∆t}) .
All in all we have
D E 1
x j |e−i∆tH/} |x j+1 = √ e−i∆tV (x j )/} exp mi(x − x0 )2 /(2∆t}) .
Note that we have evaluated matrix elements of operators. The result is expressed completely in
terms of numbers, and we no longer have to bother about commutation relations. Collecting all terms
together we obtain
(  )
i N m(x j+1 − x j )2
U(xf ,tf ; xi ,ti ) = dx1 . . . dxN−1 exp ∑
} j=0 2
−V (x j ) ∆t .

The expression in the exponent is the discrete form of the Lagrangian; the integral over all intermediate
values x j is the sum over all paths. We therefore have shown that the time evolution operator from xi
to xf is equivalent to the sum of the phase factors of all possible paths from xi to xf .

Operator methods for the harmonic oscillator

6.1 Introduction

Now that we know the basic formulation in terms of postulates of quantum mechanics, we are ready
to treat standard quantum problems. You have already met some wavefunction problems in the second
year – they are briefly mentioned in the appendix. In this chapter we consider a completely different
approach for finding the energy spectrum and eigenfunctions – this is the operator method. In the
wavefunction, or direct method one tries to find an explicit form of a wave function satisfying the
Schrödinger equation in, usually, the spatial representation. Operator methods however aim at solving
the problem by finding particular operators satisfying particular commutation relations and in which
the Hamiltonian can easily be expressed. By applying the commutation relations and a few general
physical criteria, the solution is obtained without using tedious mathematics but at the expense of a
somewhat higher level of abstraction.
We shall consider an application to the harmonic oscillator and use operator methods to find
spectra of angular momentum operators in the next chapter. The harmonic oscillator is of considerable
interest in numerous problems. The reason is that often systems in nature are close to the classical
ground state, and the potential can often be treated well in a harmonic approximation. Consider for
example the hydrogen molecule, which consists of two atoms linked together by a chemical bond
with an equilibrium distance r0 . We can stretch or contract the bond and it will then act as a spring,
which for small deviations from the equilibrium distance, is approximately harmonic as we shal see
in chapter 11. The harmonic oscillator also forms the basis of many advanced quantum mechanical
field theories, where we shall not go into.

6.2 The harmonic oscillator

Consider the one-dimensional harmonic oscillator. The Schrödinger equation reads

}2 d 2 1
− 2
ψ(x) + mω 2 x2 ψ(x) = Eψ(x). (6.1)
2m dx 2
ω is the frequency of the classical harmonic oscillator. This equation can also be written as:

p2 1
ψ(x) + mω 2 x2 ψ(x) = Eψ(x) (6.2)
2m 2
} d
where we have used the momentum operator p ≡ i dx . The momentum operator does not commute
with the position x. We have:
[p, x] = . (6.3)

56 Operator methods for the harmonic oscillator

In order to simplify the notation, we scale the momentum and the distance according to:
p̃ = √ (6.4a)

x̃ = x (6.4b)
so that p̃ = d/d x̃. The Schrödinger equation now assumes the form:
}ω 2
[ p̃ + x̃2 ]ψ(x̃) = Eψ(x̃) (6.5)
}ω 2
− 2 + x̃ ψ(x̃) = Eψ(x̃). (6.6)
2 d x̃
The commutation relation for p̃ and x̃ can be found using (6.3) and we have:

[ p̃, x̃] = −i. (6.7)

We shall first consider the solution of this problem following the direct method. In order to solve
the Schrödinger equation it turns out convenient to write ψ(x̃) in the form:
2 /2
ψ(x̃) = e−x̃ u(x̃) (6.8)

where a new function u(x̃) has been introduced. Denoting derivatives with respect to x̃ by a prime 0 ,
we have:
ψ 0 (x̃) = −x̃u(x̃) + u0 (x̃) e−x̃ /2

ψ 00 (x̃) = x̃2 u(x̃) − u(x̃) − 2x̃u0 (x̃) + u00 (x̃) e−x̃ /2

and substituting these expressions in (6.6) we obtain:

}ω  0
2x̃u (x̃) + u(x̃) − u00 (x̃) = Eu(x̃)

−u00 (x̃) + 2x̃u0 (x̃) + u(x̃) −
u(x̃) = 0. (6.12)

The resulting equation can be analysed by writing u as power series expansion in x̃:

u(x̃) = ∑ cn x̃n . (6.13)

Substituting this series into (6.12) leads to

∑ −n(n − 1)cn x̃n−2 + 2ncn x̃n + cn x̃n − }ω cn x̃n = 0. (6.14)

Collecting equal powers in this expression and demanding that the resulting coefficients for each
power should vanish, we obtain a recursive equation for the cn :
}ω− 1 − 2n
cn+2 = − cn . (6.15)
(n + 2)(n + 1)
6.2. The harmonic oscillator 57

This power series expansion diverges so strongly for large values of x̃ that it is impossible to normalise
the corresponding wave function, unless it truncates for a particular value of n. Therefore we must
require that cn vanishes for some n. This leads to the equation

En = }ω(n + 1/2) (6.16)

This is the spectrum of the one-dimensional harmonic oscillator: it is equidistant and bounded from
The solutions ψ can be written in terms of the solutions u which, for the condition (6.16) are the
so-called Hermite polynomials Hn :
√ n −1/2 −x̃2 /2
ψ(x̃) = π2 n! e Hn (x̃). (6.17)

We now show that the harmonic oscillator problem can also be solved by a different method, in
which merely commutation relations between operators are used to arrive at the energy spectrum. We
define two operators, a and a† which are each other’s Hermitian conjugates:
a = √ (x̃ + i p̃), (6.18a)
† 1
a = √ (x̃ − i p̃). (6.18b)
The fact that these operators are each other’s Hermitian conjugates can easily be checked using the
fact both x̃ and p̃ are Hermitian.
Using (6.7), it can be verified that
1 i i
[a, a† ] = ([x̃ + i p̃, x̃ − i p̃]) = [ p̃, x̃] − [x̃, p̃] = 1. (6.19)
2 2 2
Furthermore, using Eqs. (6.19) and (6.18), we obtain immediately:
}ω †
a a + aa† = }ω a† a + 1/2 .
H= (6.20)
From this, it is easy to calculate the following commutation relations:

[H, a] = }ω[a† a, a] = }ω[a† , a]a = −}ωa (6.21)

and similarly
[H, a† ] = }ωa† . (6.22)
After these preparations, we now consider the eigenvalue problem. Suppose ψE is an eigenstate
with energy E.
HψE = EψE . (6.23)
We now consider the action of the Hamiltonian on the state aψE . Using the commutation relation
HaψE = aHψE − }ωaψE = aEψE − }ωaψE (6.24)
H(aψE ) = (E − }ω)(aψE ) (6.25)
and we see that aψE is an eigenstate of H with energy E − }ω!
58 Operator methods for the harmonic oscillator

Similarly we have for a† ψE :

Ha† ψE = a† HψE + }ωa† ψE = (E + }ω)(a† ψE ) (6.26)

that is, a† ψE is an eigenstate with energy E + }ω. We say that a is a “lowering” operator, as it lowers
the energy eigenvalue by }ω and accordingly a† is called raising operator.
Note that if ψE is normalised, aψE and a† ψE need not have this property as an eigenvector is
defined only up to a normalisation constant. We will return to this below.
In order to find the spectrum, we use a physical argument. The spectrum must be bounded from
below as the potential does not assume infinitely negative values. Therefore, if we start with some ψE
and act successively on it with the lowering operator a, we must have at some point:

an ψE = 0 (6.27)

because otherwise the spectrum would not be bounded from below. Let us call an−1 ψE = ψ0 . Then
aψ0 = 0. Therefore,  
† 1 1
Hψ0 = }ω a a + ψ0 = }ωψ0 , (6.28)
2 2
that is, ψ0 is an eigenstate of H with eigenvalue }ω/2. Acting with a† on ψ0 we obtain an eigenstate
ψ1 (up to a constant) with eigenvalue 3}ω/2 etc. Acting n times with a† on ψ0 , we obtain an eigenstate
ψn (up to a constant) with energy }ω(n + 1/2), in accordance with the result derived above using the
direct method.
Often the operator a† a is called number operator, denoted by N, and H can now be written as
}ω(N + 1/2). ψn is an eigenstate of N with eigenvalue n. The norm of a† ψn can be expressed in that
of ψn :

a ψn | a† ψn = ψn | aa† ψn = ψn | (a† a + 1)ψn = (n + 1) hψn | ψn i .


Therefore, if ψn is normalised, a† ψn / n + 1 is normalised too, and normalised states ψn can be
constructed from a normalised state ψ0 according to:
1 n
ψn = √ a† ψ0 . (6.30)

Using the commutation relations for a, a† , it is also possible to show that states belonging to
different energy levels are mutually orthogonal:
D m E
hψn |ψm i ∝ ψ0 |an a† |ψ0 . (6.31)

Moving the a’s to the right by application of the commutation relations leads to a form involving the
lowering operator a acting on ψ0 which vanishes.
Exercise: show that hψ2 |ψ3 i vanishes indeed.
We have succeeded in finding the energy spectrum but it might seem that we have not made any
progress in finding the form of the eigenfunctions ψn . However, we have a simple differential equation
defining the ground state ψ0 : √
aψ0 (x̃) = (x̃ + i p̃)ψ0 (x̃) = 0 (6.32)
(x̃ + )ψ0 (x̃) = 0 (6.33)
d x̃
6.2. The harmonic oscillator 59

The solution can immediately be found as:

2 /2
ψ(x) = Const. e−x̃ (6.34)

in accordance with the result obtained in the direct method. The normalisation constant is found as

Const. = (mω/}π)1/4 (6.35)

Using (6.30), we can write the solution for general n as:
 mω 1/4 1 2
ψn (x̃) = π √ (x̃ + i p̃)n e−x̃ /2 . (6.36)
} n!2n

which indeed turns out to be in accordance with the solution found in the direct method, but we shall
not go into this any further.

Angular momentum

7.1 Spectrum of the angular momentum operators

We have seen that the energy spectrum of the harmonic oscillator is easy to find using creation and
annihilation operators. Similar methods can be used to find the eigenvalues of angular momentum
operators. We know two such types of operators: the analogue of the classical angular momentum:

L = r×p (7.1)

and the spin S. These operators satisfy the commutation relations:

[Ji , J j ] = i}εi jk Jk . (7.2)

Here, i, j and k are indices denoting the Cartesian components, x, y, z. The operator J is an angular
momentum operator like L or S. εi jk is the Lévy-Civita tensor – it is 1 if i jk is an even permutation of
1231 and −1 for an odd permutation. In fact, we will call every operator satisfying (7.2) an angular
momentum operator.
From the commutation relations (7.2) it can be derived that the components of J commute with J 2
– we can write this symbolically as:
[J, J 2 ] = 0. (7.3)
Exercise: prove this relation.
The operator J 2 is positive. This means that for any state |ui, u|J 2 |u ≥ 0.

Exercise: prove this.

If the Hamiltonian of a physical system commutes with every component of an angular momentum
operator J, the eigenstates can be rearranged to be simultaneous eigenstates of the Hamiltonian and
J 2 and Jz (it is impossible to include Jx or Jy because they do not commute with Jz ).
In analogy to the raising and lowering operators for the harmonic oscillator we define the operators
J+ and J− as follows:

J+ = Jx + iJy (7.4a)
J− = Jx − iJy (7.4b)

These operators are not Hermitian – they are each other’s Hermitian conjugates. They satisfy the
following commutation relations:

[Jz , J± ] = ±}J± ; (7.5a)

[J+ , J− ] = 2}Jz . (7.5b)
1 The even permutations of 123 are 123, 231 and 312. The remaining three are the odd permutations.

7.1. Spectrum of the angular momentum operators 61

By definition, we call the eigenvalues of J 2 }2 j( j + 1) and those of Jz }m. Here j and m are real (i.e.
not necessarily integer) numbers which we will have to find. The eigenstates can now be written as
| jmi where we have omitted quantum labels associated with other operators, such as the Hamiltonian.
Note that we can always take j ≥ 0 because of the fact that J 2 is a positive operator.
We now show that for an angular momentum eigenstate | jmi, J± | jmi is an angular momentum
eigenstate too:
Jz [J+ | jm i] = (J+ Jz + }J+ )| jm i = }(m + 1) [J+ | jm i] (7.6)
and because J+ commutes with J 2 , we see that J+ | jmi is proportional to an angular momentum
eigenstate | j, m + 1i. Similarly, J− | jmi is proportional to | j, m − 1i. Therefore, J± are called raising
and lowering operators for the quantum number m. This means that, given an eigenstate | jmi, we can
in principle construct an infinite sequence of eigenstates by acting an arbitrary number of times on it
with J± . The sequence is finite only if after acting a finite number of times with either J+ or J− , the
new state is zero. The first result we have obtained is that the eigenstates | jmi occur in sequences of
states with the same j but m stepping up and down by 1.
Suppose | jmi is normalised, then we can calculate the norm of J+ | jmi. Using J− J+ = J 2 −Jz2 −}Jz
(check!) we have:

hJ+ jm|J+ jmi = h jm|J− J+ | jmi = jm|J 2 − Jz2 − }Jz | jm = }2 ( j − m)( j + m + 1).



hJ− jm|J− jmi = h jm|J+ J− | jmi = jm|J 2 − Jz2 + }Jz | jm = }2 ( j + m)( j − m + 1).


Both expressions must be positive and this restricts m to the values

− j ≤ m ≤ j. (7.9)

The only way to restrict m to |m| ≤ j is when J+ acting a certain amount of times on | jmi yields zero:

J+p | jm i = 0. (7.10)

J−q | jm i = 0. (7.11)
Now consider the state
| j, m + p − 1 i = J+p−1 | jm i (7.12)
where the equality holds up to a normalisation constant. This is an angular momentum eigenstate
since it is obtained by acting p − 1 times with J+ on an eigenstate | jmi. We must have

J+ | j, m + p − 1i = 0 (7.13)

which implies that the norm of the resulting state vanishes. By (7.7) it follows that

j = m+ p−1 (7.14)

(note that the other solution j = −m − p is impossible because |m| ≤ j and p > 0). In a similar fashion
we find
−j = m−q+1 (7.15)
62 Angular momentum

and combining the last two equations yields:

2 j = p + q − 2 = integer. (7.16)

Therefore, j is either integer or half integer and m assumes the values

m = − j, − j + 1, . . . , j − 1, j. (7.17)

In conclusion we have
The angular momentum states can be labelled | j, mi. The numbers j are either integer or half-
integer. For a given j, the numbers m run through the values

− j, − j + 1, . . . , j − 1, j. (7.18)

From (7.7) and (7.8) we see that from a properly normalised state | jmi we can obtain properly
normalised states as follows:
| j, m − 1i = p J− | jmi (7.19a)
} ( j + m)( j − m + 1)
| j, m + 1i = p J+ | jmi. (7.19b)
} ( j − m)( j + m + 1)

These states are defined up to a phase eiα .

7.2 Orbital angular momentum

As we have seen, the quantum analogue of the classical angular momentum L = r × p is an angular
momentum operator because is satisfies the commutation relations (7.2). This can be shown using the
commutation relation
[p, x] = . (7.20)
Exercise: Show that (7.2) holds, using the following formulation of the cross-product:

(a × b)k = εi jk ai b j . (7.21)

This type of angular momentum will be called orbital angular momentum since it is expressed in the
orbital coordinates of the particle. Another type of angular momentum operator is that representing
the spin. We will now find the spectrum of the orbital angular momentum operators for a single
particle in three dimensions.
It turns out convenient to express Lz in polar coordinates. We will not derive this expression but
simply give:

Lz = −i} . (7.22)
The fact that it depends only on ϕ is obvious since Lz is associated with a rotation around the z-axis and
such a motion is expressed as a variation of the angle ϕ. The eigenfunctions of L2 , Lz can be written
as functions of the angles ϑ and ϕ. We know that these are eigenfunctions of Lz with eigenvalue m.
Denoting the eigenfunctions F(ϑ , ϕ) we have:
∂ F(ϑ , ϕ)
−i = mF(ϑ , ϕ). (7.23)
7.3. Spin 63

This differential equation has a solution

F(ϑ , ϕ) = G(ϑ )eimϕ . (7.24)

The wavefunction of the particle should be single-valued – hence it should be equal for ϕ and ϕ + 2π
and this restricts m to integer values. Hence we have:

The orbital angular momentum of a single particle has only integer eigenvalues j, m.

This result can be generalised to the orbital momentum of a system consisting of an arbitrary number
of particles. Half-integer values of j can only come about by having particles with half-integer spin.

7.3 Spin

Classically, a charged particle having a nonzero angular momentum has a nonzero magnetic moment.
The magnetic moment for a particle of charge q is given by:
q q
m = (r × v) = L. (7.25)
2 2m
The energy of a magnetic moment in an external magnetic field B is given by

−m · B. (7.26)

According to the correspondence principle we add this energy as an extra term to the Hamiltonian of
a (spinless) electron (q = −e):

H = H0 + H1 (7.27a)
H0 = +V (r); (7.27b)
H1 = L · B. (7.27c)
We take B in the z-direction. If the potential is spherically symmetric, V (r) = V (r), the eigenstates of
H0 can be taken as simultaneous eigenstates of L2 and Lz . But in that case, they are also eigenstates of
H1 and therefore of H. For an eigenstate of H0 with energy E0 , H1 shifts the energy by an amount

∆E1 = MB = µB MB. (7.28)
Here M is the quantum number associated with Lz (capital letter M is used in order to avoid confusion
with the mass m). We see that a magnetic field lifts the Lz -degeneracy, yielding a splitting of a l-level
into 2l + 1 sublevels.
Zeeman observed indeed a splitting of the levels of atoms in a magnetic field, but these splittings
were not in accordance with (7.28). Later, Uhlenbeck and Goudsmit (1925) explained the observed
anomaly by the assumption of the existence of an intrinsic angular momentum variable, i.e. not asso-
ciated with the orbital coordinates. This angular momentum was called spin, S. With the spin there is
associated a magnetic moment and this is given by
m=− S. (7.29)
64 Angular momentum

The factor g is very close to 2 and its value can be derived only by using relativistic quantum mechan-
ics. The eigenvalues of the spin operators S2 , Sz are by convention }2 s(s + 1) and }ms . s is always 1/2
for an electron and therefore, ms can only assume the values 1/2 and −1/2. Therefore, the eigenvalue
of S2 is always }2 3/4 for an electron. Other particles have been found with spin 0, 1, 3/2 etc.
Writing down the Hamiltonian for an electron with spin, we have:

H = H0 + H1 (7.30)
H0 = +V (r); (7.31)
H1 = (L + 2S) · B. (7.32)
We have however forgotten something. Associating a magnetic moment with the spin and one with the
angular momentum, we must also take into account the interaction between these two! For a proper
calculation of this interaction we would need relativistic electrodynamics and therefore we simply
quote the result:
ge 1 dV (r)
HSO = 2 2
S·L . (7.33)
4πε0 2m c r dr
For the hydrogen atom, with V (r) = 1/r (choosing suitable units), we have:

ge2 1
HSO = 2 2
S·L 3. (7.34)
8πε0 m c r
This spin-orbit splitting is observed experimentally.

7.4 Addition of angular momenta

Consider an electron in a hydrogen atom. The electron has orbital angular momentum, characterised
by the quantum numbers l, ml and spin quantum numbers, s, ms . The total angular momentum J is the
sum of the vector operators L and S:
J = L + S. (7.35)
What are the possible eigenvalues of J 2 and Jz ? Heuristically, we can approach this problem by adding
L and S as vectors. However, the relative orientation of the two is not arbitrary as we know that the
eigenvalues j, m of the resultant operator are quantised. If L and S are “aligned”, we have j = l + s
and if they are opposite we have j = l − s. This means that j is half-integer and does not differ more
than 1/2 from l. We want to analyse the combination of angular momenta now in a more formal way,
starting with the problem of adding two spins, S1 and S2 .
Let us first ask ourselves why we would like to know the relation between the two angular mo-
menta to be added and the result. To answer this question, consider a system consisting of two particles
with orbital angular momentum zero (l = 0, ml = 0) and each spin 1/2, described by the interaction
S1 S2
V (r) = V1 (r) +V2 (r) . (7.36)
The second term contains the magnetic interaction between the spins. Both S1z and S2z do not com-
mute with the second term and therefore the eigenstates of the Hamiltonian are not simultaneous
eigenstates of S1z and S2z . To find observables which do commute with the second term, we note that
1 2
S − S12 − S22

S1 S2 = (7.37)
7.4. Addition of angular momenta 65

and this commutes with S2 , S12 , S22 and Sz = S1z + S2z .

Exercise: Prove these commutation relations.
Therefore, the eigenstates can be labeled by s1 , s2 (both 1/2), stot (to be evaluated) and ms (the eigen-
value for Sz ; to be evaluated).
So let us consider the possible eigenvalues of S2 and Sz . We start from the states |s1 , m1 ; s2 m2 i
where the labels belonging to the two particles are still separated. As m1 and m2 can assume the values
1/2 and −1/2 (denoted by + and − respectively), we have four such states. As the values of s1 and s2
are fixed, we can denote the four states simply by |m1 , m2 i:

χ1 = | + +i (7.38a)
χ2 = | + −i (7.38b)
χ3 = | − +i (7.38c)
χ4 = | − −i. (7.38d)

We must find linear combinations of these states which are eigenstates of S2 and Sz . It turns out that
all four states are indeed eigenstates of Sz :

Sz χ1 = (S1z + S2z ) χ1 = }(1/2 + 1/2)χ1 = }χ1 (7.39)

and furthermore:

Sz χ2 = 0 (7.40a)
Sz χ3 = 0 (7.40b)
Sz χ4 = −}χ4 . (7.40c)

Now consider the state χ2 + χ3 . This is certainly not an eigenstate of S1z and neither of S2z . But it is
an eigenstate of Sz with eigenvalue 0. We see therefore that eigenstates of Sz need not necessarily be
eigenstates of S1z or S2z .
Now we try to find eigenstates of S2 . It is convenient two write this operator in the following form:

S2 = S12 + S22 + 2S1z S2z + S1+ S2− + S1− S2+ (7.41)

where we have used the raising and lowering operators:

S1± = S1x ± iS1y (7.42)

etc. These operators have the usual effect when acting on our states:

S1+ | + +i = 0 (7.43a)
S1+ | − +i = }| + +i (7.43b)

etcetera (check this). From this, it can be verified that the required eigenstates are:

Ψ1 = χ1 = | + +i; (7.44a)
Ψ2 = χ4 = | − −i (7.44b)
χ2 + χ3 1
Ψ3 = √ = √ (| + −i + | − +i) (7.44c)
2 2
χ2 − χ3 1
Ψ4 = √ = √ (| + −i − | − +i) (7.44d)
2 2
66 Angular momentum

Using Eq. (7.41), it follows that

S2 Ψ1 = 2}2 Ψ1 , hence s = 1; (7.45a)
S2 Ψ2 = 2}2 Ψ2 , hence s = 1; (7.45b)
S Ψ3 = 2} Ψ3 ,
2 2
hence s = 1; (7.45c)
S2 Ψ4 = 0 hence s = 0. (7.45d)
Exercise: Check these results.
The states can now be labeled |s, ms i (both s1 and s2 are equal to 1/2), and either s = 1 with ms either
−1, 0 or +1, or s = 0 and ms = 0. The s = 1 state is called triplet state and the s = 0 state singlet –
the names refer to the degeneracy.
Now we consider the addition of an orbital momentum L to a single spin S which has quantum
number s = 1/2:
J = L + S. (7.46)
The eigenstates we start from are |lm; sms i where s = 1/2 and we will omit the quantum number s
in the remainder. Furthermore, we again denote the two possible values for ms , 1/2 and −1/2 by
= and − respectively. The fact that we should end up with linear combinations being eigenstates of
Jz = Lz + Sz restricts combinations to the pairs
α|l, m; + i + β |l, m + 1; − i (7.47)
which has eigenvalue m j = m + 1/2 of Sz . α and β will be fixed by the requirement that the resulting
combination is an eigenstate of J 2 , which we write in the form:
J 2 = L2 + S2 + 2L1z S1z + L+ S− + L− S+ . (7.48)
Consider the action of this operator on the state (7.47):

3 p
[α|l, m; +i + β |l, m + 1; −i] = α l(l + 1) + + m |l, m; +i+α (l − m)(l + m + 1)|l, m+1; −i+
}2 4
3 p
β l(l + 1) + − m − 1 |l, m + 1; −i + β (l − m)(l + m + 1)β |l, m; +i. (7.49)
We require that this is equal to }2 j( j + 1)[α|l, m; +i + β |l, m + 1; −i]. This leads to a linear homoge-
neous set of equations for α, β :
3 p
α l(l + 1) + + m − j( j + 1) + β (l − m)(l + m + 1) =0 (7.50)
p 3
α (l − m)(l + m + 1) + β l(l + 1) + − m − 1 − j( j + 1) =0 (7.51)
This can only hold if the determinant of the system of linear equations vanishes and this leads to:
[l(l + 1) + 3/4 + m − j( j + 1)] [l(l + 1) + 3/4 − m − 1 − j( j + 1)] = (l − m)(l + m + 1) (7.52)
and for given l and m this equation has two solutions for j, given by the conditions :
j( j + 1) − l(l + 1) − = −l − 1 or (7.53a)
j( j + 1) − l(l + 1) − = l (7.53b)
7.5. Angular momentum and rotations 67

which leads to
j = l + 1/2 or j = l − 1/2. (7.54)
The ratio α/β follows from the above equations and these coefficients are fixed by the requirement
that they are normalised. For j = l + 1/2:
r r
l +m+1 l −m
α= ; β= (7.55)
2l + 1 2l + 1
and for j = l − 1/2: r r
l −m l +m+1
β= ; α= . (7.56)
2l + 1 2l + 1
The analysis presented here can be generalised to arbitrary angular momentum operators. This
becomes a tedious job, which leads to the identification of the linear expansion coefficients, which are
called Clebsch-Gordan coefficients. For details, see Messiah.

7.5 Angular momentum and rotations

In this section we consider rotations of the physical system at hand. Such a rotation can for the
three-dimensional space be expressed as a rotation matrix. This is the class of matrices which are
orthogonal: the columns when considered as vectors form an orthonormal set and the same can be
said of the rows. Furthermore the determinant of the matrix is +1 (if the determinant is −1, there is
an additional reflection). For simplicity, we will confine ourselves in the analysis which follows to
rotations around the z-axis. The matrix of such a rotation over an angle α reads:
 
cos α − sin α 0
R(α) =  sin α cos α 0  (7.57)
0 0 1

Of course, if we rotate a physical system, its state, which we will denote |ψi, will change and we
represent this change by an operator R:
|ψi −−−−→ R|ψi. (7.58)

Now consider the r-representation

ψ(r) = hr|ψi . (7.59)
The new state of the system is the same as the old one up to a rotation, so if we evaluate the old state at
a position r rotated back over an angle α, we should get exactly the same result as when we evaluate
the new state in r (see figure 1):
R r|ψ = hr|Rψi . (7.60)
Using this relation we can find an expression for the operator R.
Consider an infinitesimal rotation of a single particle around the z-axis with rotation angle δ ,
evaluated at r = x, y, z:

R(δ )ψ(r) = ψ(R−1 r) = ψ(x + yδ , y − xδ , z) =

∂ ψ(r) ∂ ψ(r)
ψ(x, y, z) + δ y −x = (1 − iδ Lz /})ψ(x, y, z). (7.61)
∂x ∂y
68 Angular momentum

This relation is valid for small angles. The expression for larger angles can be found by applying
many rotations over small angles in succession. Chopping the angle α into N pieces (N large), we
α Lz N
R(α) = 1 − i = exp(−iLz α/}). (7.62)
N }
This result can be generalised for rotations around an arbitrary axis characterised by a unit vector u:

R(α) = exp(−iαu · L/˝). (7.63)

This equation has been derived for a single particle, but it can be generalised for systems consist-
ing of more particles. The angular momentum operator is then the sum of the angular momentum
operators of the individual particles. If the particles have spin, this is to be included in the total an-
gular momentum. Equation (7.63) is in fact often used as the definition of total angular momentum.
The commutation relations (7.2) can be derived from it, using the commutation relations for rotation
matrices (exercise!).
Suppose we have a Hamiltonian H which is spherically symmetric. This implies that a rotation
has no effect on the matrix elements of H:

hψ|H|φ i = ψ 0 |H|φ 0


where the primed states are related to the unprimed ones through a rotation. Therefore we have:
hψ|H|φ i = ψ|R † HR|φ = ψ|eiαuJ/} He−iαuJ/} |φ .


This relation should hold in particular for infinitesimal rotations and expanding the exponentials to
first order in α we obtain:
hψ|H|φ i = hψ|1 − iα/} [HJ · u − J · uH] |φ i = ψ 1 − αu · [H, J] φ .

As this should hold for arbitrary directions u and arbitrary states ψ, φ , we have

[H, J] = 0 (7.67)

and therefore J is a conserved quantity, as

d hJi
= i} h[H, J]i = 0. (7.68)
Note that it is essential here to consider the total angular momentum, that is, including the spin degrees
of freedom.

Introduction to Quantum Cryptography

8.1 Introduction

Some of the most important technical developments in the next few years will be based on quantum
mechanics. In particular, spectacular developments and applications are expected in the areas of
quantum cryptography, quantum teleportation and quantum computing. In this note, I shall briefly
explain some issues involved in quantum cryptography.
The idea of quantum cryptography hinges upon the measurement postulate of quantum mechanics.
This postulate deals with measurements of physical observables. In quantum mechanics, such an
observable is represented by an Hermitian operator, say Q̂. The eigenvectors of this Hermitian operator
are denoted |φn i with corresponding eigenvalues λn . The measurement postulate says that, for a state
|ψi in Hilbert space, which can be expanded as

|ψi = ∑ cn |φn i , (8.1)


the result of a measurement of Q̂ yields one of its eigenvalues λn . The probability of finding a par-
ticular value λn is given by |cn |2 , and after the measurement the state of the system is reduced to the
corresponding eigenvector |φn i. This last aspect, the fact that the state of a system is influenced by
any observation, enables us to detect whether someone, an eavesdropper, has tried to read information
as we shall see below.

8.2 The idea of classical encryption

Encryption of messages can be useful for many different applications. In all these applications, some-
one, denoted as A, sends a message to B, in such a way that an eavesdropper (E) cannot detect the
information sent. To make the example more lively, A is usually given the name Alice, B is Bob, and
the eavesdropper E is called Eve.
A schematic drawing of the procedure is depicted here:

Alice Bob

A message which Alice sends to Bob is a series of bits:

70 Introduction to Quantum Cryptography

In order to prevent Eve from eavesdropping the message, Alice and Bob decide to encrypt their mes-
sages. For this purpose, several schemes exist, and we shall present the simplest one here.
Alice and Bob have met once, and on that occasion they have agreed on a key which they will use
to encrypt messages. A key is some sequence of bits, e.g.:
The key does not have any particular structure. Before Alice sends over her message, she encrypts it
by performing an exclusive or with her message and the key. An exclusive or performed on message
and key performs a bitwise comparison: if two corrsponding bits (at the same position) of the message
and the key are equal, the result has a bit value 0. In the other cases, i.e. when the bits are unequal, the
result has a bit value of 1:
0110011101011101.... Message
1111010001010100.... Key
1001001100001001.... Message XOR Key=Encrypted message.
Bob receives the encrypted message and performs again an exclusive or with the key, which unveils
the original contents of the message:
1001001100001001.... Encrypted message
1111010001010100.... Key
0110011101011101.... Message.
Eve can only intercept the encrypted message and it is difficult (usually impossible) for her to make
sense of it.
Suppose however that Alice and Bob communicate very frequently, using the same key for each
message. In that case, Eve might guess what the key is: if she would let her computer generate many
different keys and use them to decrypt the messages exchanged between Alice and Bob. She then
might quickly guess parts of the key, and gradually smaller and smaller parts of the key must be
discovered, which takes less and less effort. Therefore, it would be wise for Alice and Bob to use a
key which is at least as long as their messages. Here we have a problem. In order to safely exchange
the keys, Bob and Alice have to meet before each message, or they must use a (hopefully) reliable
courier. The dependence on couriers makes this encryption method cumbersome and vulnerable.
Another way of encrypting messages is to use much shorter keys and to encrypt the message using
some elaborate mathematical transformation depending on this key. This is done in the Rivest, Shamir
and Adelman (RSA) encryption. The idea is based on factorisation of numbers into prime numbers.
Consider the product of two large prime numbers. If you know that product, it is difficult to find out its
two prime factors. On the other hand, if you know one of these factors, it is easy to compute the other.
So, if Bob and Alice have an encryption and decryption algorithm based on the two prime factors,
they can encrypt and decrypt their messages if they know these factors. The product is public in this
case, that is, it is available to everyone, and to Eve in particular.
Now suppose that Eve finds out the factorisation of the product, then she can eavesdrop all the
messages. The point is now that the factorisation requires an amount of cpu time which grows expo-
nentially with the number of bits of the product. So, if that number is large enough, Eve will never be
able to crack the code. So this method seems quite safe.
In 1994, it was shown that a new type computers, which is based on quantum mechanical be-
haviour of matter, should be able to do the factorisation in a number of steps which grows as a power
8.3. Quantum Encryption 71

of the number of bits used by the product number. Only a very primitive quantum computer has been
developed to date, but people believe that in the future, RSA will not be safe anymore.

8.3 Quantum Encryption

In quantum encryption, two channels are used: one is a public channel, such as the internet, and the
other is a private one. The channels are shown in the figure.

Public channel
Alice Bob
QM channel
This private channel cannot always be guaranteed to be safe for Eve (otherwise, encryption would
not be necessary any longer), but Bob and Alice can detect whether their communication has been
eavesdropped by Eve, as shall explain below.
The private channel is used to communicate the key only, and this key can be used for the standard
exclusive-or encryption described in the previous section. The communication through the private
channel is based on quantum mechanics. The information carriers of this channel are photons in some
polarisation state. Unfortunately, the details of the quantum states of photons cannot be given here,
as they involve quantum field theory. Therefore you must accept some of the facts which are given in
the following.
Recall that light is an electromagnetic wave phenomenon and is therefore a wave with a certain
polarisation. A polaroid filter will be transparent for photons with a certain direction of polarisation
only, and opaque for photons with the perpendicular polarisation. A photon polarisation state can be
represented as a unit vector in the two-dimensional plane perpendicular to the direction of propagation
of the photon. The states |1i and |0i shown in the figure below form a basis in the Hilbert space of all
possible polarisation states (the wave propagates along a direction perpendicular to the paper).



A state with angle ϑ with respect to the x-axis would then have the form

|ϑ i = cos ϑ |1i + sin ϑ |0i .

72 Introduction to Quantum Cryptography

If a detector is put behind a polarisation filter aligned along the x-direction, and a photon is sent to
that filter, the detector will register the arrival of the photon if it is polarised in the x direction, and
not when its polarisation is along the y-axis. A photon in the state |ϑ i would thus be detected with a
probability cos2 ϑ

Now we consider the transmission of data through the quantum channel. This channel is a glass
fiber through which Alice sends photons which she selects by a polariser which has one of the four
possible orientations depicted in the figure below:

0β 1β

Note that

|1β i = √ (|1αi + |0αi)


Now Bob will receive these photons at his end of the fiber. He first lets them pass through a
polariser before they can arrive at the detector. For each photon, he aligns his polariser along either
1α in the figure, or along 1β . When he detects a photon, he records a ‘1’, otherwise a ‘0’. Whether
Bob detects a photon depends on his and on Alice’s polariser. Suppose Bob has his polariser along
1α. Then, if Alice has sent a |1αi photon, Bob will detect it. If she has sent a 1β photon, Bob will or
will not detect it with equal probabilities. The same holds for a 0β photon. If a 0α photon was sent by
Eve, Bob will not detect it. The figure below gives the probabilities with which Bob detects a photon
for each of the four possible polarisations which Alice can send over.
8.3. Quantum Encryption 73

Alice Bob
1α 1β

0 0.5

0.5 1

1 0.5

0.5 0

After a number of photons has been sent over, Bob sends to Alice the settings of the polariser he
has used, 1α or 1β , using the public channel for this information. Alice responds and tells Bob which
of his settings corresponded to hers (i.e. whether she used an α or a β polariser, and not whether
she used the 1 or the 0 setting). For compatible settings, i.e. when Bob and Allice both used α or
both used β , they both know the result of Bob’s detections. They both keep these results as bits of a
sequence. For all other photons, Bob has detected at random 0 or 1 photon, so these events are deleted.
The sequence of retained bits is now taken as the key for encrypting a message using an exclusive-or
encryption described in the previous section.
Let us consider what would happen during a sample session:

Alice: 1α 0α 0β 1β 1α 1β 1α
Bob’s settings: 1α 1β 1β 1α 1α 1β 1β
Bob’s detections: 1 1 0 0 1 1 1
Now Bob sends over his settings (see second line)
Alice tells Bob which of these were compatible with hers
Retained bits: 1 x 0 x 1 1 x

An ‘x’ in the last line denotes a discarded bit. The sequence 1011 is now the key.
Now consider the possibility of eavesdropping. If Eve intercepts the channel, the photons she
measures are lost, so she has to send new photons to Bob. Suppose however that Alice used polari-
sation |1αi and Bob has used the 1α polariser. Then he would receive a correct bit of the key, which
in this case is a 1. But suppose Eve used polariser β . If Eve detects |0β i, she will send a similar
photon to Bob. But Bob used polariser α so he will find a 0 (i.e. no detection) with 50 % probability.
If Bob and Alice exchange messages they immediately discover a mismatch in the keys, so they stop
74 Introduction to Quantum Cryptography

It is thus necessary to send over only one photon at a time, otherwise Eve could insert a beamsplit-
ter in the quantum channel and detect half or more of the key without being noticed. Therefore, only
low intensities must be used, which limits the distance over which communication is possible. With
present-day technology, a few tens of kilometers can be reliably bridged with low intensity optical

Scattering in classical and in quantum mechanics

Scattering experiments are perhaps the most important tool for obtaining detailed information on the
structure of matter, in particular the interaction between particles. Examples of scattering techniques
include neutron and X-ray scattering for liquids, atoms scattering from crystal surfaces, elementary
particle collisions in accelerators. In most of these scattering experiments, a beam of incident particles
hits a target which also consists of many particles. The distribution of scattering particles over the
different directions is then measured, for different energies of the incident particles. This distribution
is the result of many individual scattering events. Quantum mechanics enables us, in principle, to
evaluate for an individual event the probabilities for the incident particles to be scattered off in different
directions; and this probability is identified with the measured distribution.
Suppose we have an idea of what the potential between the particles involved in the scattering
process might look like, for example from quantum mechanical energy calculations (programs for this
purpose will be discussed in the next few chapters). We can then parametrise the interaction potential,
i.e. we write it as an analytic expression involving a set of constants: the parameters. If we evaluate the
scattering probability as a function of the scattering angles for different values of these parameters, and
compare the results with experimental scattering data, we can find those parameter values for which
the agreement between theory and experiment is optimal. Of course, it would be nice if we could
evaluate the scattering potential directly from the scattering data (this is called the inverse problem),
but this is unfortunately very difficult (if not impossible) as many different interaction potentials can
have similar scattering properties as we shall see below.
Many different motivations for obtaining accurate interaction potentials can be given. One is that
we might use the interaction potential to make predictions about the behaviour of a system consisting
of many interacting particles, such as a dense gas or a liquid.
Scattering might be elastic or inelastic. In the former case the energy is conserved, in the latter
energy disappears. This means that energy transfer takes place from the scattered particles to degrees
of freedom which are not included explicitly in the system (inclusion of these degrees of freedom
would cause the energy to be conserved). In this chapter we shall consider elastic scattering.

9.1 Classical analysis of scattering

In chapter 3, we have analysed the motion of two bodies attracting each other by a gravitational force
whose value decays with increasing separation r as 1/r2 . This analysis is also correct for opposite
charges which feel an attractive force of the same form (Coulomb’s law). When the force is repulsive,
the solution remains the same – we only have to change the sign of the parameter A which defines
the interaction potential according to V (r) = A/r. One of the key experiments in physics which led
to the notion that atoms consist of small but heavy kernels, surrounded by a cloud of light electrons,
is Rutherford scattering. In this experiment, a thin gold sheet was bombarded with α-particles (i.e.

76 Scattering in classical and in quantum mechanics

helium-4 nuclei) and the scattering of the latter was analysed using detectors behind the gold film. In
this section, we shall first formulate some new quantities for describing scattering processes and then
calculate those quantities for the case of Rutherford scattering.
Rutherford scattering is chosen as an example here – scattering problems can be studied more
generally; see Griffiths, chapter 11, section 11.1.1 for a nice description of classical scattering.
We consider scattering of particles incident on a so-called ‘scattering centre’, which may be an-
other particle. The scattering centre is supposed to be at rest. This might not always justified in a
real experiment, but the analysis in chapter 3, in which the full two-body problem was reduced to
a one-body problem with with a reduced mass, pertains to the present case. The incident particles
interact with the scattering centre located at r = 0 by the usual scalar two-point potential V (r) which
satisfies the requirements of Newton’s third law. Suppose we have a beam of incident particles par-
allel to the z-axis. The beam has a homogeneous density close to that axis, and we can define a flux,
which is the number of particles passing a unit area perpendicular to the beam, per unit time. Usually,
particles close to the z-axis will be scattered more strongly than particles far from the z-axis, as the
interaction potential between the incident particles and scattering centre falls off with their separation
r. An experimentalist cannot analyse the detailed orbits of the individual particles – instead a detector
is placed at a large distance from the scattering centre and this detector counts the number of parti-
cles arriving at each position. You may think of this detector as a photographic plate which changes
colour to an extent related to the number of particles hitting it. The theorist wants to predict what the
experimentalist measures, starting from the interaction potential V (r) which governs the interaction
In figure 9.1, the geometry of the process is shown. In addition a small cone, spanned by the
spherical polar angles dϑ and dϕ, is displayed. It is assumed here that the scattering takes place in a
small neighbourhood of the scattering centre, and for the detector the orbits of the scattered particles
all seem to be directed radially outward from the scattering centre. The surface dA of the intersection
of the cone with a sphere of radius R around the scattering centre is given by dA = R2 sin ϑ dϑ dϕ. The
quantity sin ϑ dϑ dϕ is called spatial angle and is usually denoted by dΩ. This dΩ defines a cone like
the one shown in figure 9.1. Now consider the number of particles which will hit the detector within
this small area per unit time. This number, divided by the total incident flux (see above) is called the
differential scattering cross section, dσ /dΩ:

dσ (Ω) Number of particles leaving the scattering centre through the cone dΩ per unit time
= .
dΩ Flux of incident beam
The differential cross section has the dimension of area (length ).
First we realise ourselves that the problem is symmetric with respect to rotations around the z-axis,
so the differential scattering cross section only depends on ϑ . The only two relevant parameters of
the incoming particle then are its velocity and its distance b from the z-axis. This distance is called
the impact parameter – it is also shown in figure 9.1.
We first calculate the scattering angle ϑ as a function of the impact parameter b. We use the
solution found in chapter 3 [Eq. (3.24)] which is now a hyperbola. We write this solution in the form

r=λ . (9.2)
ε cos(ϑ −C) − 1

The integration constant C reappears in the cosine because we have not chosen ϑ = 0 at the perihelion
– the closest approach occurs when the particle crosses the dashed line in figure 9.1 which bisects the
in- and outgoing particle direction.
9.1. Classical analysis of scattering 77



d cos ϑ dϕ

Figure 9.1: Geometry of the scattering process. b is the impact parameter and ϕ and ϑ are the angles of the
orbit of the outcoming particle.

We know that for ϑ = π, r → ∞, from which we have

cos(π −C) = 1/ε. (9.3)

Because of the fact that cosine is even [cos x = cos(−x)] we can infer that the other value of ϑ for
which r goes to infinity, and which corresponds to the outgoing direction occurs when the argument
of the cosine is C − π, so that we find

ϑ∞ −C = C − π, (9.4)

or ϑ∞ = 2C − π. The subscript ∞ indicates that this value corresponds to t → ∞. From the last two
equations we find the following relation between the scattering angle ϑ∞ and ε:

sin(ϑ∞ /2) = 1/ε. (9.5)

We want to know ϑ∞ as a function of b rather than ε however. To this end we note that the angular
momentum is given as
` = µvinc b, (9.6)
where ‘inc’ stands for ‘incident’, and the total energy as
µ 2
E= v , (9.7)
2 inc
so that the impact parameter can be found as
b= √ . (9.8)
78 Scattering in classical and in quantum mechanics

Using Eq. (3.21), we can finally write (9.5) in the form:

p 2Eb
cot(ϑ∞ /2) = ε2 − 1 = . (9.9)

From the relation between b and ϑ∞ we can find the differential scattering cross section. The
particles scattered with angle between ϑ and ϑ + dϑ , must have approached the scattering centre
with impact parameters between particular boundaries b and b + db. The number of particles flowing
per unit area through the ring segment with radius b and width db is given as j2πbdb, where j is the
incident flux. We consider a segment dϕ of this ring. Hence:

dσ (Ω) = b(ϑ )dbdϕ. (9.10)

Relation (9.9) can be used to express the right hand side in terms of ϑ∞ :
 2  2
A A d cot(ϑ /2) dϑ
dσ (Ω) = cot(ϑ /2) d cot(ϑ /2) dϕ = cot(ϑ /2) d cos ϑ dϕ.
2E 2E dϑ d cos ϑ
This can be worked out straightforwardly to yield:
dσ (Ω) A 1
= 4
. (9.12)
dΩ 4E sin ϑ /2

This is known as the famous Rutherford formula.

9.2 Quantum scattering with a spherical potential

We now consider the scattering problem within quantum mechanics, by looking at a particle incident
on a scattering centre which is usually another particle.1 We assume that we know the scattering
potential which is spherically symmetric so that it depends on the distance between the particle and
the scattering centre only.

We shall again calculate the differential cross section, dΩ (Ω), which describes how these inten-
sities are distributed over the various spatial angles Ω. This quantity, integrated over the spherical
angles ϑ and ϕ, is the total cross section, σtot .
The scattering process is described by the solutions of the single-particle Schrödinger equation
involving the (reduced) mass m, the relative coordinate r and the interaction potential V between the
particle and the interaction centre:

}2 2
− ∇ +V (r) ψ(r) = Eψ(r). (9.13)

This is a partial differential equation in three dimensions, which could be solved using the ‘brute
force’ discretisation methods presented in appendix A, but exploiting the spherical symmetry of the
potential, we can solve the problem in another, more elegant, way which, moreover, works much
faster on a computer. More specifically, in section 9.2.1 we shall establish a relation between the
phase shift and the scattering cross sections. In this section, we shall restrict ourselves to a description
1 Every
two-particle collision can be transformed into a single scattering problem involving the relative position; in the
transformed problem the incoming particle has the reduced mass m = m1 m2 /(m1 + m2 ).
9.2. Quantum scattering with a spherical potential 79

V= 205

V= 10

V= 0


Figure 9.2: The radial wave functions for l = 0 for various square well potential depths.

of the concept of phase shift and describe how it can be obtained from the solutions of the radial
Schrödinger equation.
For the potential V (r) we make the assumption that it vanishes for r larger than a certain value
rmax . In case we are dealing with an asymptotically decaying potential, we neglect contributions from
the potential beyond the range rmax , which must be chosen suitably, or treat the tail in a perturbative
For a spherically symmetric potential, the solution of the Schrödinger equation can always be
written as
∞ l
ul (r) m
ψ(r) = ∑ ∑ Alm Y (ϑ , ϕ) (9.14)
l=0 m=−l r l
where ul satisfies the radial Schrödinger equation:
 2 2 
}2 l(l + 1)

} d
+ E −V (r) − ul (r) = 0. (9.15)
2m dr2 2mr2
Figure 9.2 shows the solution of the radial Schrödinger equation with l = 0 for a square well potential
for various well depths – our discussion applies also to nonzero values of l. Outside the well, the
solution ul can be written as a linear combination of the two independent solutions jl and nl , the
regular and irregular spherical Bessel functions. We write this linear combination in the particular
ul (r > rmax ) ∝ kr [cos δl jl (kr) + sin δl nl (kr)] . (9.16)
δl is determined via a matching procedure at the well boundary. The motivation for writing ul in this
form follows from the asymptotic expansion for the spherical Bessel functions:
80 Scattering in classical and in quantum mechanics

kr jl (kr) ≈ sin(kr − lπ/2) (9.17a)

krnl (kr) ≈ cos(kr − lπ/2) (9.17b)

k = 2mE/}

which can be used to rewrite (9.16) as

ul (r) ∝ sin(kr − lπ/2 + δl ), large r. (9.18)

We see that ul approaches a sine-wave form for large r and the phase of this wave is determined by δl ,
hence the name ‘phase shift’ for δl (for l = 0 ul is a sine wave for all r > rmax ).
The phase shift as a function of energy and l contains all the information about the scattering
properties of the potential. In particular, the phase shift enables us to calculate the scattering cross
sections and this will be done in section 9.2.1; here we simply quote the results. The differential cross
section is given in terms of the phase shift by
1 ∞

= 2 ∑ (2l + 1)eiδl sin(δl )Pl (cos ϑ )


dΩ k l=0

and for the total cross section we find

dσ 4π
σtot = 2π dϑ sin ϑ
(ϑ ) = 2
k ∑ (2l + 1) sin2 δl . (9.20)

Summarising the analysis up to this point, we see that the potential determines the phase shift
through the solution of the Schrödinger equation for r < rmax . The phase shift acts as an intermediate
object between the interaction potential and the experimental scattering cross sections, as the latter
can be determined from it.
Unfortunately, the expressions (9.19) and (9.20) contain sums over an infinite number of terms
– hence they cannot be evaluated on the computer exactly. However, cutting off these sums can be
motivated by a physical argument. Classically, only the waves with an angular momentum smaller
than }lmax = }krmax will ‘feel’ the potential – particles with higher l-values will pass by unaffected.
Therefore we can safely cut off the sums at a somewhat higher value of l – we can always check
whether the results obtained change significantly when taking more terms into account.
How is the phase shift determined in practice? First, the Schrödinger equation must be integrated
from r = 0 outwards with boundary condition ul (r = 0) = 0. At rmax , the numerical solution must be
matched onto the form (9.16) to fix δl . This can be done straightforwardly in the few cases where an
analytical solution is known. For example, if the potential is a hard core with
∞ for r < a
V (r) = (9.21)
0 for r ≥ a,

we know that the solution is given as

u(r) ∼ (r − a) jl (k(r − a)) (9.22)

which vanishes for r = 0. We therefore immediately see that δ = ka, which can be substituted directly
in the expressions for the cross sections.
9.2. Quantum scattering with a spherical potential 81

Veff (r) [m eV] 4



0 0.5 1 1.5 2 2.5
r [σ]

Figure 9.3: The effective potential for the Lennard-Jones interaction for various l-values.

In a computational approach, we use the value of the numerical solution at two different points r1
and r2 beyond rmax and we will use the latter method in order to avoid calculating derivatives. From
(9.16) it follows directly that the phase shift is given by
(1) (2)
K jl − jl
tan δl = (1) (2)
with (9.23a)
Knl − nl
r1 ul
K= (1)
. (9.23b)
r2 ul
In this equation, jl stands for jl (kr1 ) etc.
A computational example is based on the work by Toennies et al., (J. Chem. Phys., 71, p. 614,
1979) on the scattering of hydrogen off noble gas atoms. Figure 9.3 shows the Lennard-Jones inter-
action potential plus the centrifugal barrier l(l + 1)/r2 of the radial Schrödinger equation. For higher
l-values, the potential consists essentially of a hard core, a well and a barrier which is caused by the
1/r2 centrifugal term in the Schrödinger equation. In such a potential, quasi-bound states are possible.
These are states which would be genuine bound states for a potential for which the barrier does not
drop to zero for larger values of r, but remains at its maximum height. You can imagine the following
to happen when a particle is injected into the potential at precisely this energy: it tunnels through the
barrier, remains in the well for a relatively long time, and then tunnels outward through the barrier in
an arbitrary direction because it has ‘forgotten’ its original direction. In wave-like terms, the particle
resonates in the well, and this state decays after a relatively long time. This phenomenon is called
‘scattering resonance’. This means that particles injected at this energy are strongly scattered and this
shows up as a peak in the total cross section.
82 Scattering in classical and in quantum mechanics



Total cross section






0 0.5 1 1.5 2 2.5 3 3.5
Energy [m eV]

Figure 9.4: The total cross section shown as function of the energy for a Lennard-Jones potential modeling the
H–Kr system. Peaks correspond to the resonant scattering states.

Such peaks can be seen figure 9.4, which shows the total cross section as a function of the energy
calculated with a program as described above. The peaks are due to l = 4, l = 5 and l = 6 scattering,
with energies increasing with l. Figure 9.5 finally shows the experimental results for the total cross
section for H–Kr. We see that the agreement is excellent.

9.2.1 Calculation of scattering cross sections

In this section we derive Eqs. (9.19) and (9.20). At a large distance from the scattering centre we can
make an Ansatz for the wave function. This consists of the incoming beam and a scattered wave:

ψ(r) ∼ eik·r + f (ϑ ) . (9.24)
ϑ is the angle between the incoming beam and the line passing through r and the scattering centre. f
does not depend on the azimuthal angle ϕ because the incoming wave has azimuthal symmetry, and
the spherically symmetric potential will not generate m 6= 0 contributions to the scattered wave. f (ϑ )
is called the scattering amplitude. From the Ansatz it follows that the differential cross section is given
directly by the square of this amplitude:

= | f (ϑ )|2 . (9.25)

Beyond rmax , the solution can also be written in the form (9.14) leaving out all m 6= 0 contributions
9.2. Quantum scattering with a spherical potential 83

Figure 9.5: Experimental results as obtained by Toennies et al. for the total cross section (arbitrary units) of the
scattering of hydrogen atoms by noble gas atoms as function of centre of mass energy.

because of the azimuthal symmetry:

ul (r)
ψ(r) = ∑ Al Pl (cos ϑ ) (9.26)
l=0 r

where we have used the fact that Y0l (ϑ , φ ) is proportional to Pl [cos(ϑ )]. Because the potential vanishes
in the region r > rmax , the solution ul (r)/r is given by the linear combination of the regular and
irregular spherical Bessel functions, and as we have seen this reduces for large r to

ul (r) ≈ sin(kr − + δl ). (9.27)
We want to derive the scattering amplitude f (ϑ ) by equating the expressions (9.24) and (9.26) for the
wave function. For large r we obtain, using (9.27):

sin(kr − lπ/2 + δl )
∑ Al kr
Pl (cos ϑ ) = eik·r + f (ϑ )
. (9.28)

We write the right hand side of this equation as an expansion similar to that in the left hand side, using
the following expression for a plane wave (see e.g. Abramovitz and Stegun, Handbook of Mathemat-
ical functions, 1965, Dover)

eik·r = ∑ (2l + 1)il jl (kr)Pl (cos ϑ ). (9.29)
84 Scattering in classical and in quantum mechanics

f (ϑ ) can also be written as an expansion in Legendre polynomials:

f (ϑ ) = ∑ fl Pl (cos ϑ ), (9.30)

so that we obtain:
∞ ∞ 
sin(kr − lπ/2 + δl )
∑ Al kr
Pl (cos ϑ ) = ∑ (2l + 1)i jl (kr) + fl
Pl (cos ϑ ). (9.31)
l=0 l=0

If we substitute the asymptotic form (9.17) of jl in the right hand side, we find:

sin(kr − lπ/2 + δl )
∑ Al kr
Pl (cos ϑ ) =
1 ∞ 2l + 1
2l + 1 ikr
∑ 2ik (−) e + fl + 2ik e Pl (cos ϑ ). (9.32)
r l=0
l+1 −ikr

Both the left and the right hand side of (9.32) contain in- and outgoing spherical waves (the occurrence
of incoming spherical waves does not violate causality: they arise from the incoming plane wave). For
each l, the prefactors of the incoming and outgoing waves should both be equal on both sides in (9.32).
This condition leads to
Al = (2l + 1)eiδl il (9.33)
2l + 1 iδl
fl = e sin(δl ). (9.34)
Using (9.25), (9.30), and (9.34), we can write down an expression for the differential cross section
in terms of the phase shifts δl :
1 ∞

= ∑ (2l + 1)e sin(δl )Pl (cos ϑ ) .


dΩ k2 l=0

For the total cross section we find, using the orthonormality relations of the Legendre polynomials:

dσ 4π
σtot = 2π dϑ sin ϑ
(ϑ ) = 2
k ∑ (2l + 1) sin2 δl . (9.36)

9.2.2 The Born approximation

Consider again the solution of a particle which is being scattered by a potential. We shall now relax the
condition that the potential be spherically symmetric. Let us write down the stationary Schrödinger
equation for the wavefunction:

}2 2
− ∇ +V (r) ψ(r) = Eψ(r).

For V (r) ≡ 0, an incoming plane wave would be a solution to this equation. It turns out possible to
write the solution to the Schrödinger equation with potential formally as an integral expression. This
9.2. Quantum scattering with a spherical potential 85

is done using the Green’s function formalism. The Green function depends on two positions r and r0
– it is defined by
}2 2
E+ ∇ −V (r) G(r, r0 ) = δ (r − r0 ).
To understand the Green function (and easily recall its definition) you may view the delta function
on the right hand side as a unit operator, so that G may be called the inverse of the operator E Iˆ − Ĥ,
where Iˆ is the unit operator. For V (r) ≡ 0 we call the Green’s function G0 :

}2 2
E+ ∇ G0 (r, r0 ) = δ (r − r0 ).
Before calculating G0 let us assume we have it at our disposal. We then may write the solution to
the full Schrödinger equation, i.e. including the potential V , in terms of a solution φ (r) to the ‘bare’
Schrödinger equation, that is, the Schrödinger equation with potential V ≡ 0:
ψ(r) = φ (r) + G0 (r, r0 )V (r0 )ψ(r0 ) d 3 r0 . (9.37)

This can easily be checked by substituting the solution into the full Schrödinger equation and using
the fact that E Iˆ − Ĥ, acting on the Green’s function, gives a delta-function.
Now we consider the scattering problem with an incoming beam of the form φ (r) = exp(iki · r)
(the subscript ‘i’ denotes the incoming wave vector). We see from Eq. (9.37) that this wave persists
but that it is accompanied by a scattering term which is the integral on the right hand side. Now the
wavefunction ψ(r) is still very difficult to find, as it occurs in Eq. (9.37) in an implicit form. We can
make the equation explicit if we assume that the potential V (r) is small, so that the scattered part of
the wave is much smaller than the wavefunction of the incoming beam. In a first approximation we
might then replace ψ(r0 ) on the right hand side of Eq. (9.37) by φ (r) which is a plane wave:
0 0 0 3 0 iki ·r 0
ψ(r) = φ (r) + G0 (r, r )V (r )φ (r ) d r = e + G0 (r, r0 )V (r0 )eiki ·r d 3 r0 .

The key to the scattering amplitude is given by the notion that it must always be possible to write the
solution (9.37) in the form:
ψ(r) = eiki ·r + f (ϑ , ϕ) .
At this moment we hardly recognise this form in the expression obtained for the wavefunction. We first
must find the explicit expression for the Green’s function G0 . Without going through the derivation
(see for example Griffiths, pp. 364–366) we give it here:
2m eik|r−r |
G0 (r, r0 ) =
}2 4πr
with k = 2mE/}2 .
Now we take r far from the origin. As the range of the potential is finite, we know that only
contributions with r0  r have to be taken into account. Taylor expanding the exponent occuring in
the Green’s function:
r − r0 = r2 − 2r · r0 + r02 ≈ r − r · r
leads to
2m eikr −ikr·r0 /r
G(r, r0 ) = 2 e .
} 4πr
86 Scattering in classical and in quantum mechanics

The denominator does not have to be taken into account as it gives a much smaller contribution to the
result for r  1/k. Now we define kf = kr/r, i.e. kf is a wave vector corresponding to an outgoing
wave from the scattering centre to the point r. We have

2m eikr
ψ(r) = φ (r) + 2 V (r0 )e−ikf ·ri eiki ·r d 3 r0 .
} 4πr
This is precisely of the required form provided we set
f (ϑ , ϕ) = V (r0 )ei(ki −kf )·r .
This is the so-called first Born approximation. It is valid for weak scattering – higher order approxi-
mations can be made by iterative substitution for ψ(r0 ) in the integral occurring in Eq. (9.37). In the
first order Born approximation, the scattering amplitude f (ϑ , ϕ) is in fact a Fourier transform of the
scattering potential.
As an example, we consider a potential which is not weak but which is easily tractable within the
Born scheme: the Coulomb potential
q1 q2 1
V (r) = .
4πε0 r
The Fourier transform of this potential reads
q1 q2 1
V (k) = .
4πε k2
Therefore, we immediately find for f (ϑ ):
mq1 q2
f (ϑ ) = .
4πε}2 (ki − kf )2

The angle ϑ is hidden in ki − kf , the norm of which is equal to 2 sin(ϑ /2). The result therefore is,
using E = }2 k2 /(2m):
dσ q1 q2
= .
dΩ 16πε0 E sin2 (ϑ /2)
This is precisely the classical Rutherford formula, which also turns out to be the correct classical
result. This could not possibly be anticipated beforehand, but it is a happy coincidence.

Symmetry and conservation laws

In this chapter, we return to classical mechanics and shall explore the relation between the symmetry of
a physical system and the conservation of physical quantities. In the first chapter, we have already seen
that translational symmetry implies momentum conservation, that time translation symmetry implies
energy conservation and that rotational symmetry implies conservation of angular momentum. There
exists a fundamental theorem, called Noether’s theorem, which shows that, indeed, for every spatial
continuous symmetry of a system which can be described by a Lagrangian, some physical quantity is
conserved, and the theorem also allows us to find that quantity.
The special form of the equations of motion for a system described by a Lagrangian (or Hamil-
tonian) leads already to a large number of conserved quantities, called Poincaré invariants. We shall
consider only one Poincaré invariant here: phase space volume. The associated conservation law is
called Liouville’s theorem.

10.1 Noether’s theorem

Suppose a mechanical system is invariant under symmetry transformations which can be parametrised
using some real, continuous parameter. Examples include those mentioned already above: rotations
(parametrised by the rotation angles) or translations in space or time. The fact that the system is
invariant under these transformations is reflected by the Lagrangian being invariant under these sym-
metries. For simplicity we shall restrict ourselves to a single continuous parameter, s. In the case of
rotations one could imagine s to be the rotation angle about an axis fixed in space, such as the z-axis.
The mechanical path for some system, i.e. the solution of the Euler-Lagrange equations of motion, is
called q(t). Now we perform a symmetry transformation. This gives rise to a different path, which
we call Q(s,t), with Q(0,t) = q(t). The path Q(s,t) should have the same value of the Lagrangian L
as the path q(t), in other words, L should not depend on s:

L(Q(s,t), Q̇(s,t)) = 0. (10.1)

This leads to
∂L ∂Qj ∂ L ∂ Q̇ j
∑ ∂Qj ∂s
∂ Q̇ j ∂ s
= 0. (10.2)

Now we use the Euler-Lagrange equations:

∂L d ∂L
= (10.3)
∂Qj dt ∂ Q̇ j

88 Symmetry and conservation laws

in order to write
" #
d N ∂ L dQ j
dL ∂L ∂Q ∂ L ∂ Q̇ j d ∂Qj ∂L∂ L ∂ Q̇ j
= ∑ +
∂ Q j ∂ s ∂ Q̇ j ∂ s ∑ =
dt ∂s
∂ Q̇ j ∂ s
∂ Q̇ j
= ∑ ∂ Q̇ j ds = 0,
dt j=1
j=1 j=1
and we see that the term within brackets in the last expression must be a constant of the motion:
∂ L dQ j dQ j
∑ ∂ Q̇ j ds = ∑ pj ds
= Constant in time. (10.5)
j=1 j=1

We see that any continuous symmetry of the Lagrangian leads to a constant of the motion, given by
(10.5). This analysis is obviously rather abstract, so let us now consider an example.
Suppose a one-particle system in three dimensional space is invariant under rotations around the
z-axis. The rotation angle is called α. In order to be able to evaluate the derivatives of the coordinates
with respect to α, we use cylindrical coordinates (r, ϕ, z) with x = r cos ϕ and y = r sin ϕ. A rotation
about the z axis over an angle α then corresponds to

ϕ → ϕ + α. (10.6)

so that we have
px = −px r sin(ϕ + α) = −px y; (10.7)

py = py r cos(ϕ + α) = py x; (10.8)

pz =0 (10.9)

so that the conserved quantity, from (10.5) is

xpy − ypx = Lz , (10.10)

the z-component of the angular momentum. Similarly, we would find Lx and Ly for the conserved
quantities associated with rotations around the x- and y- axes respectively. Also, it is easy to verify
that for more than one particle, the total angular momentum is conserved.
The reader is invited to check that space translation symmetry results in momentum conservation.

10.2 Liouville’s theorem

A special conservation law is due to the fact that the equations of motion can be derived from a
Hamiltonian (or from a Lagrangian). Such equations of motion are called canonical. The fact that the
equations of motion are canonical reflects a symmetry which is called symplecticity (or symplectic-
ness), a discussion of which is outside the scope of these notes. The important notion is that this type
of symmetry leads to a number of conserved quantities, called Poincaré invariants, of which we shall
consider only one, the volume of phase space.
The proof of Liouville’s theorem hinges upon the fact that whenever in a volume integral like
V= dnx (10.11)

10.2. Liouville’s theorem 89

we perform a variable transformation x → y, we must put a correction factor det(J) in the integral,
where J is the Jacobian matrix, given by

∂ yi
Ji j = . (10.12)

We thus have Z Z
V= d n x det(J = d n y. (10.13)
Ω Ω0

where Ω0 is the volume Ω transformed to y-space.

The state of a mechanical system consisting of N degrees of freedom is represented by a point in
2N-dimensional phase space (qi , p j ). In the course of time, this point moves in phase space, and forms
a trajectory. We now consider not a single mechanical system in phase space, but a set of systems
which are initially homogeneously distributed over some region Ω0 , with volume V0 . In the course of
time, every point in Ω0 will move in phase space, Ω0 will therefore transform into some new region
Ω(t). The volume of this new space is given as
V (t) = dnq dn p (10.14)

We want to show that V (t) = V0 , hence the volume does not change in time. To this end, we consider
a transformation from time t = 0 to ∆t:
∂ H(p, q)
q0i ≡ qi (∆t) = qi (0) + ∆t + O(∆t 2 ); (10.15)
∂ pi
∂ H(p, q)
p0i ≡ pi (∆t) = pi (0) − ∆t + O(∆t 2 ), (10.16)
∂ qi

where we have used a first order Taylor expansion and replaced time derivatives of qi and pi using
Hamilton’s equations. Now we can evaluate the original volume V0 as follows:
V0 = dnq dn p = d n q d n p det [J(∆t)] . (10.17)
Ω Ω(∆t)

The Jacobi determinant can be written in block-form as follows:

1 + ∆t ∂ 2 H 2

∂ qi ∂ p j −∆t ∂∂qi ∂Hqi
det [J(∆t)] = (10.18)

2 2
∆t ∂ ∂p ∂Hp 1 − ∆t ∂ ∂q j ∂Hpi

i j

Careful consideration of this expression should convince you that det [J(∆t)] = 1 + O(∆t 2 ). We see
therefore that V (∆t) = V0 + O(∆t 2 ), from which it follows that

dV (0)
= 0. (10.19)
This argument can be extended for arbitrary times, so that we have proven that V is a constant of the
motion. We have found Liouville’s theorem in the form:

The volume of a region phase occupied by a set of systems does not change in time.
90 Symmetry and conservation laws







Figure 10.1: A box divided into two halves by a wall with a hole. Initially, particles will be in the right hand
volume, and will move to the left. After large times, they will all come back to the right hand volume.

Of course, the region can change in shape, but its total volume will remain constant in time. We could
have put any density distribution of points in phase space in the integrals, which does not change the
Liouville’s theorem is important in equilibrium statistical mechanics. So-called ergodic systems
are assumed to move to a time-independent distribution in phase space, that is, any large set of systems
setting off at time t = 0 from different points in phase space and moving according to the Hamiltonian
equations of motion will assume the same, invariant distribution after long times. Liouville’s theorem,
moreover tells us that the systems will not all end up in the same point in phase space, but spread over
a region with a volume equal to the initial volume.
There exists a more specific theorem concerning the behaviour of systems in time. This is
Poincaré’s theorem, which says that a system which is to evolve under the mechanical equations
of motion will always return arbitrarily close to its starting point within a finite time. Consider for
example a box partitioned into two sub-volumes (figure 10.1). There is a small hole in the middle, and
there are N particles in the right hand volume. Obviously, a fraction of these particles will move to
the left hand volume, but the Poincaré recurrence theorem tells us that after a finite time, all particles
will reassemble in the right hand volume! This seems to be in contradiction with the second law of
10.2. Liouville’s theorem 91

thermodynamics. This law states that the entropy will not decrease in the course of time. Here we see
an increase of entropy when the particles distribute themselves over the two volume halves rather than
a single one, but come back in a more ordered (less entropic) state after some time. This is only an ap-
parent contradiction, as the Poincaré theorem holds for a finite number of particles (finite-dimensional
phase space). What we see here is an example of the inequivalence of interchanging the order in which
limits are taken: if we first take the system size to infinity (the approach of statistical mechanics and
thermodynamics), the recurrence time will become infinite. If, on the other hand, we consider a finite
system over infinitely large times (the mechanics approach), we see that it returns arbitrarily close to
its initial state infinitely often. Taking then the system size to infinity does not alter this conclusion.

Systems close to equilibrium

11.1 Introduction

When we prepare a conservative system in a state with a certain energy, it will conserve this energy
ad infinitum. In practice, such is never the case, as it is impossible to decouple a system from its
environment or from its internal degrees of freedom. This requires some explanation. We usually
describe macroscopic objects in terms of the coordinates of their centre of mass and their Euler an-
gles. These are the macroscopic degrees of freedom. As these objects consist of particles (atoms,
molecules), they have very many additional, internal, or microsopic degrees of freedom. In fact the
heat which is generated during friction is nothing but a transfer of mechanical energy associated with
the macroscopic degrees of freedom to a mechanical energy of the internal (microscopic) degrees of
freedom. So heat in the end is a form of mechanical energy. As a result of friction, macroscopic
objects will, when subject to a conservative, time-independent force (apart from friction), always end
up at rest in a point where there potential energy is minimal. Any system at a point where its potential
energy is minimal is said to be in a stable state. A system which looses its kinetic energy via friction
is called a dissipative system. All the macroscopic systems we know are dissipative, although some
can approach conservative systems very well.
If the interactions are not all harmonic (‘harmonic’ means that the potential energy is a quadratic
functions of the coordinates) then there may be more than one minimum. Local minima correspond
to metastable states. A system in a metastable state will return to that state under a small perturbation,
but, when it is strongly perturbed, it might move to another metastable with lower potential energy, or
to the stable state. An example is shown in figure 11.1, for a particle in a one-dimensional potential.
Molecular systems, in which we take all degrees of freedom explicitly into account, are believed

Figure 11.1: System with a metastable (M) and a stable (S) state. A strong perturbation may kick the ball out
of its metastable state, and under the influence of dissipation it will then move to the stable state.

11.2. Analysis of a system close to equilibrium 93

to be non-dissipative. We know from statistical physics that every degree of freedom in a system in
thermal equilibrium carries a kinetic energy equal (on average) to kB T , where kB is Boltzmann’s con-
stant. At low temperatures, the energies of the particles are small as can be seen from the Boltzmann
distribution, which gives the probability of finding a system with energy E as exp[−E/(kB T )]. There-
fore, for low temperatures, the kinetic and the potential energy of a system are small. It can therefore
be inferred that at low temperatures, a system is close to a (meta-)stable state.
In this section, we analyse systems close to mechanical equilibrium. The beautiful result of this
analysis is a description in terms of a set of uncoupled harmonic oscillators, which are themselves
trivial to analyse. Moreover we obtain a straightforward recipe for finding the resonance frequencies
(related to the coupling strengths) of those oscillators.

11.2 Analysis of a system close to equilibrium

Consider a conservative system characterised by generalised coordinates q j , j = 1, . . . , N. The system

is in equilibrium, defined by q̃1 , . . . , q̃N , if its potential energy is minimal. In that case we have

V (q̃1 , . . . , q̃N ) = 0; j = 1, . . . , N. (11.1)

Now suppose that we perturb the system slightly, i.e. we change the values of the q j slightly with
respect to their equilibrium values. As the first derivative of the potential with respect to each of the
q j vanishes, a Taylor expansion of the potential only contains second and higher order terms:
1 ∂2
V (q1 , . . . , qN ) = V (q̃1 , . . . , q̃N ) +
2 ∑ (q j − q̃ j )(qk − q̃k ) ∂ q j ∂ qk V (q̃1 , . . . , q̃N ) + . . . . (11.2)

The terms of order higher than two will be neglected, as we are interested in systems close to equilib-
rium (i.e. q j − q̃ j small).
We can represent the resulting expansion using matrix notation. Introduce the matrix
 ∂ 2V ∂ 2V 2
· · · ∂ q∂1 ∂VqN

∂ q1 ∂ q1 ∂ q1 ∂ q2
 ∂ 2V 2 2
 ∂ q2 ∂ q1 ∂ q∂2 ∂Vq2 · · · ∂ q∂2 ∂VqN 

K= 
.. .. .. ..
 (11.3)
 . . . . 
∂ 2V ∂ 2V ∂ 2V
∂ qN ∂ q1 ∂ qN ∂ q2 · · · ∂ qN ∂ qN

The matrix K is obviously symmetric. We can write

V (q1 , . . . , qN ) = V (q̃1 , . . . , q̃N ) + δ qT Kδ q (11.4)
where δ q is a column vector with components q j − q̃ j , j = 1, . . . , N; the superscript T denotes the
transpose of the vector.
Now we write down the kinetic energy in terms of the generalised coordinates. We assume that
the constraints only depend on the generalised coordinates q j and not on their derivatives or on time.
In that case, the kinetic energy can be written in the form (see page 26):
2 ∑ M jk q̇ j q̇k , (11.5)
94 Systems close to equilibrium

where the matrix M is symmetric: M jk = M jk . Note that M jk may depend on the q j . In terms of
q j − q̃ j , and using vector notation, we can rewrite the kinetic energy as
T = δ q̇T Mδ q̇. (11.6)
The equations of motion now read:
∑ (M jk δ q̈k + δ q̈k Mk j ) = − ∂ q j + ∂ q j = − ∑ (Kk j δ qk + δ qk Kk j ).
k=1 k=1

We have omitted the dependence of Mi j on q j – this dependence generates terms on the left- and right
hand side, which are both of order δ q̇2 and can therefore be neglected. Using the symmetry of the
matrices M and K, (11.7) reduces to
∑ M jk δ q̈k = − ∑ K jk δ qk . (11.8)
k=1 k=1

Let us consider the two-dimensional case to clarify the procedure. We consider two generalised
coordinates q1 and q2 with the matrix M jk equal to the identity. Then, the kinetic energy has the form:
1 1
T = q̇21 + q̇22 . (11.9)
2 2
The potential energy depends on the two coordinates q1 and q2 :
V = V (q1 , q2 ). (11.10)
The equations of motion read:
q̈1 = δ q̈1 = − (11.11)
∂ q1
q̈2 = δ q̈2 = − (11.12)
∂ q2
Expanding about the point q̃1 , q̃2 , where V is supposed to be minimal, we have

1 ∂2
V (q1 , q2 ) = V (q̃1 , q̃2 ) + (q1 − q̃1 )2 2 V (q̃1 , q̃2 )+
2 ∂ q1
∂2 1 ∂2
(q1 − q̃1 )(q2 − q̃2 ) V (q̃1 , q̃2 ) + (q2 − q̃2 )2 2 V (q̃1 , q̃2 ). (11.13)
∂ q1 ∂ q2 2 ∂ q2
This can be written in the form:
 
∂ 2V ∂ 2V  
1 ∂ q21 ∂ q1 ∂ q2 q1 − q̃1
V (q1 , q2 ) = V (q̃1 , q̃2 ) + (q1 − q̃1 , q2 − q̃2 )  ∂ 2V ∂ 2V
 . (11.14)
2 q2 − q̃2
∂ q1 ∂ q2 ∂ q21

Defining δ q1 = q1 − q̃1 and similarly for δ q2 , this equation reads:

 
∂ 2V ∂ 2V  
1 ∂ q21 ∂ q1 ∂ q2  δ q1
V (q1 , q2 ) = V (q̃1 , q̃2 ) + (δ q1 , δ q2 ) 
∂ 2V ∂ 2V
. (11.15)
2 2
δ q2
∂q ∂q 1 2 ∂ q1

The 2 × 2 matrix occurring in this expression is our matrix K.

11.2. Analysis of a system close to equilibrium 95

11.2.1 Example: Double pendulum

Consider as an example the double pendulum, consisting of two rigid massless rods of length l and L,
with masses M and m:

θ M


The velocity of the upper mass is Lϑ̇ , that of the lower one is a vector sum of the velocity of the
upper one and that of the lower one with respect to the upper one. For very small angles ϑ and ϕ
both velocities will be approximately in the horizontal direction so that they can simply be added:
vm = Lϑ̇ + l ϕ̇. The kinetic energy therefore reads:

M 2 m 2
T= Lϑ̇ + Lϑ̇ + l ϕ̇ . (11.16)
2 2
Let us perform a transformation to more convenient variables

x = Lϑ (11.17a)
y = Lϑ + lϕ. (11.17b)

Note that x and y do not denote cartesian coordinates. In that case the kinetic energy can simply be
written as
M m
T = ẋ2 + ẏ2 . (11.18)
2 2
The potential energy of the upper mass is MgL(1 − cos ϑ ) ≈ MgLϑ 2 /2, and that of the lower mass
is given as mg [L(1 − cos ϑ ) + l(1 − cos ϕ)], which, in the small angle approximation becomes

VLower (ϑ , ϕ) = mg Lϑ 2 + lϕ 2 .

The total potential energy, written in terms of x and y, therefore reads:

(M + m)g 2 mg
V= x + (y − x)2 . (11.20)
2L 2l
96 Systems close to equilibrium

The equations of motion can therefore be written as

− L g − mg mg
M 0 ẍ l l x
= mg . (11.21)
0 m ÿ l − mgl y

This is the form given in (11.8). We shall return to this example in the next section.

11.3 Normal modes

Let us try to find solutions to Eq. (11.15) of the form

δ q j = A j eiωt (11.22)

where ω does not depend on j – all the degrees of freedom oscillate at the same frequency. Such a
motion is called a normal mode. In the following we shall use q j instead of δ q j : q j is the generalised
coordinate measured with respect to its equilibrium value.
We have q̈ j = −ω 2 q j , so (11.8) reduces to
∑ M jk ω 2 Ak = ∑ K jk Ak . (11.23)
k=1 k=1

If the mass tensor M jk would be the identity, Eq. (11.23) would be an eigenvalue equation. For general
mass tensors, the equation is a generalised eigenvalue equation. We can reduce this equation to an
ordinary eigenvalue equation by multiplying the left and right and side by the inverse M −1 of the mass
matrix. We then have:
ω 2A j = ∑ M −1
jk Kkl Al . (11.24)

In algebraic terms, the solutions to these equations are the eigenvectors A (with components A j ) and
the corresponding eigenvalues ω 2 . In physical terms, the components A j are the amplitudes of the
oscillatory motions of the generalised coordinates, and ω is the frequency of the oscillation. In order
for the normal modes to exist, the eigenvalues should be real. That the eigenvalues are indeed real
follows from the fact that both M and K are real, symmetric matrices. This implies that M −1 K is a real,
symmetric matrix, and it is a well-known result of linear algebra that the eigenvalues of a Hermitian
matrix are real (real and symmetric implies Hermitian).
Another question is whether the eigenvalues are positive or negative. Assuming that we are ex-
panding the potential around a minimum, the matrix K can be shown to be positive definite. A positive
definite matrix has only positive eigenvalues1 . Moreover, the mass matrix can be shown to be posi-
tive. Then its inverse M −1 is also positive. Multiplying two positive matrices yields a product matrix
which is positive. Therefore M −1 K is positive, and the ω 2 are positive. Hence the frequencies of the
oscillations are always real – we do not find expontial growth or decay. In physical terms one could
say that perturbing the system from equilibrium always pushes it back to this equilibrium – therefore
the ‘spring force’ experienced by the coordinates is always opposite to the perturbation, and therefore
an oscillation arises, and not a drift away from equilibrium, or some exponential decay. Such decay
may however be found near a local maximum or near a saddle point of the potential.
Let us find the normal modes for the coupled pendulums. Note that this problem is relatively
simple as a result of the fact that the mass matrix is diagonal and therefore trivial to invert. After
1 In fact, we shall occasionally allow for zero eigenvalues; in that case, the matrix is called positive semidefinite.
11.4. Vibrational analysis 97

multiplying both sides of Eq. (11.21) by M −1 , we have a standard diagonalisation problem for the
matrix:  M+m mg mg 
ML g + Ml − Ml . (11.25)
− gl g
The eigenvalues are the solutions of the so-called secular equation which has the from
− ω 2 − mg
ML g + Ml

Ml = 0. (11.26)
g g 2
−l l −ω

This reduces to the following quadratic equation in ω 2 :

M + m  g g  2 M + m g2
ω4 − + ω + = 0. (11.27)
M L l M Ll
This equation has two solutions for ω 2 .
We will examine some special cases. If M  m, then, provided that l is not too close to L, the two
roots with corresponding eigenvectors (Ax , Ay ) are given by
g Ax m L
ω≈ ; ≈ (11.28)
l Ay M l − L

and r
g Ax L − l
ω≈ ; ≈ . (11.29)
L Ay L
The first solution describes an almost stationary motion of the upper pendulum with the lower one
oscillating at its natural frequency. In the second case, the motion of the upper and lower are of the
same order of magnitude with the natural frequency of the upper pendulum.
If M  m, the solutions are
g Ax L
ω2 ≈ = (11.30)
L+l Ay L + l
m g g Ax m L+l
ω2 ≈ + , ≈− . (11.31)
M L l Ay M L
The first case describes a motion in which the two rods are aligned so that we have essentially a single
pendulum of length l + L and mass m. The second case corresponds to a very high frequency of the
upper mass with an almost stationary lower mass.

11.4 Vibrational analysis

The way in which atoms are bound together in molecules is described by quantum mechanics. There
is a long standing tradition in the quantum mechanical calculation of stationary states of molecules.
In the last fifteen years or so it has become possible to perform dynamical computations of molecules
to very good accuracy using fully quantum mechanical calculations. These calculations are quite
demanding on computer resources and they do not always give a very good insight into the dynamics
of interest. Therefore, a semi-classical approach is often adopted in order to calculate vibration spectra
for example.
First, the total energy of the molecule is calculated as a function of the nuclear positions Ri ,
i = 1, . . . , N for an N-atomic molecule. There is however a problem in doing this. Suppose we want
98 Systems close to equilibrium



Figure 11.2: Interactions in a molecule.

to calculate this energy for 10 values of all the coordinates of a 10-atom molecule. As there are 30
coordinates, we need to perform 1030 stationary quantum calculations, which would require the age
of the universe. Therefore the potential is parametrised in a sensible way, which we now describe.
All the chemically bonded atoms are described by harmonic or goniometric interactions. The degrees
of freedom chosen for this parametrisation are the bond length, bond angle and dihedral angle. The
forces associated with these degrees of freedom are called stretch, bend, and torsion respectively.
These degrees of freedom are shown in figure 11.2 The form of the potentials associated with bond
stretching is given as
VStretch = (l − l0 )2 (11.32)
where l is the bond length and l0 is the equilibrium bond length. The spring constant κ determines
how difficult it is to stretch the bond. The bending potential is given in terms of the bond angle ϕ:
VBend = (ϕ − ϕ0 )2 (11.33)
A similar expression exists for the torsional energy.
The constants κ and α can be determined from stationary quantum mechanical calculations. As-
suming that these parameters are known, we shall now use the given form of the potential to calculate
the vibration spectrum of a triatomic, linear molecule, such as CS2 or CO2 (see figure 11.3). We
neglect bending here, so only bond stretching is taken into account.
If the initial configuration is linear, the motion takes place along a straight line, which we take as
our X-axis. The coordinates of the three atoms are x1 , x2 and x3 . The kinetic energy can therefore be
written down immediately:
µ 2  m
T= ẋ1 + ẋ32 + ẋ22 . (11.34)
2 2
The potential energy is given by
κ κ
V= (x2 − x1 − l)2 + (x3 − x2 − l)2 . (11.35)
2 2
Here, l is the equilibrium bond length. The centre of mass of the system will move uniformly as there
are no external forces acting, and we take this centre as the origin. The equilibrium coordinates are
11.4. Vibrational analysis 99

1 3

µ µ
Figure 11.3: Triatomic molecule.

then x1 = −l, x2 = 0 and x3 = l. The deviations from these values are

δ x1 = x1 + l; δ x2 = x2 and δ x3 = x3 − l. (11.36)

In this representation, we have

µ  m
T= δ ẋ12 + δ ẋ32 + δ ẋ22 . (11.37)
2 2
κ κ
V= (δ x2 − δ x1 )2 + (δ x3 − δ x2 )2 . (11.38)
2 2
We can find the matrices K and M directly from these expressions:
 
µ 0 0
M =  0 m 0 , (11.39)
0 µ

and  
1 −1 0
K = κ  −1 2 −1  . (11.40)
0 −1 1
The normal modes can now be found by solving (11.23) with these matrices. The eigenvectors can be
found by solving the secular equation:

κ − µω 2

−κ 0

−κ 2κ − mω 2 −κ = 0. (11.41)

0 −κ κ − µω 2

This leads to:

(κ − µω 2 )ω 2 (µmω 2 − κm − 2κ µ) = 0, (11.42)

from which we find: s 

κ 1 2
ω1 = 0; ω2 = ; ω3 = κ + . (11.43)
µ µ m
100 Systems close to equilibrium

1 3

µ µ
Mode 1
Mode 2

Mode 3

Figure 11.4: The three modes of the triatomic molecule.

The corresponding eigenvectors can be found after some algebra:

   
1 1 1
A1 =  1  ; A2 =  0  ; A3 =  −2µ/m  . (11.44)
1 −1 1

The first of these, corresponding to ω1 = 0, is a mode in which the atoms all slide in the same direction
with the same speed. This is a manifestation of the translational symmetry of the problem, which has
been recovered by our procedure. The second one represents a mode in which the middle atompstands
still and the two outer atoms vibrate oppositely. Obviously, the frequency of this mode is ω2 = κ/m,
corresponding to the two springs. Finally, the last mode is one in which the two outer atoms move
in one direction, and the central atom in the opposite direction. The motion can be understood by
replacing the two outer masses by a single one with mass 2µ at their midpoint, coupled by a spring
with spring constant 2κ to the central mass. The reduced mass of this system (1/(2µ) + 1/m)−1 then
occurs in the expression for the resonance frequency. The three modes are depicted in figure 11.4.

11.5 The chain of particles

In the previous section we have analysed a triatomic molecule. Now we shall analyse a larger system:
a chain of N particles. We assume that all particles have the same mass, and that they are connected
by a string with tension τ. The particles are assumed to move only in the vertical (y) direction, and
the x-components of adjacent particles differ by a separation d. The first and last spring are connected
to points at y = 0. The chain is depicted in figure 11.5. The chain is a model for a continuous string,
which is obtained by letting N → ∞ and d → 0 while keeping the string length Nd fixed.
Let us consider particle number k. The springs connecting this particle to its neighbours are
stretched, and this may result in a net force acting on particle k. The spring between particle k and
k + 1 has a length
(yk+1 − yk )2
l = d 2 + (yk+1 − yk )2 ≈ d + (11.45)
11.5. The chain of particles 101

d k+1

Figure 11.5: The harmonic chain of particles.

where a first order Taylor expansion is used to obtain the second expression. The potential energy for
this link is equal to the tension τ times the extension of the string, and therefore we find for the total
potential energy:
τ N
V= ∑ (yk+1 − yk )2
2d k=0

y0 = yN+1 = 0. (11.47)
The kinetic energy is given by
m 2
T= ∑ ẏk . (11.48)
k=1 2

We now find the matrices Mkl and Kkl as:

Mkl = mδkl (11.49)

where δkl is the Dirac delta function, in other words, Mkl is m/2 times the unit matrix. For Kkl we
find:  τ 
2 d − dτ 0 0 0 ··· 0
 −τ 2τ −τ 0 0 ··· 0 
 d d d
K =  0 −τ 2τ −τ 0 . (11.50)

 d d d ··· 0  
.. .. .. .. .. .. . .
. . . . . . .
The normal mode equation (11.23) can be solved analytically for arbitrary N by substituting for the
eigenvector Ak = γ exp(iαk), where γ is some constant. This trial solution does not satisfy the bound-
ary equations (11.47), but we do not bother about this for the moment. Then for 2 ≤ k ≤ N − 1 we
τ  iα(k−1) 
mω 2 eikα = −e + 2eikα − eiα(k+1) (11.51)
Dividing left and right hand side by exp(ikα), we find

mω 2 = (1 − cos α). (11.52)
For each α, there is also a solution for −α for the same ω. This can be used to construct a solution

Ak = γ(eikα − e−ikα ) = 2iγ sin α. (11.53)

102 Systems close to equilibrium

This solution always vanishes at k = 0 and it vanishes also at k = N when Nα = nπ, for integer n.
So the conclusion is that for each n = 0, . . . , N, we have a solution which vanishes at the two ends
of the string. For values of n higher than N, or lower than 0, the solutions obtained are identical
to the solutions with 0 ≤ n ≤ N. For each solution, all particles move up and down with the same
frequency, given by (11.52). The wavelength is given by kd such that kα = 2π, so λ = 2πd/α, and
the wavevector q = α/d.
It is possible to formulate the Lagrangian directly in a continuum form, and derive the wave
equation from this. Note that in the continuum limit, α and d small, we obtain from (11.52) for the
τα 2 d
ω2 = = τ q2 . (11.54)
md m
p with the well known dispersion equation ω = cq, we learn that the sound speed c is given
as τd/m. Defining the density ρ = m/d, we have
c= . (11.55)

Density operators — Quantum information theory

1 Introduction

In this section, we extend the formalism of quantum physics to include statistical uncertainty which
can be traced back to our lack of knowledge of what the wavefunction actually is. States whose form
we are not certain about, are described by an object called density matrix. Density matrices can be
used to detect coupling between a particle and an outside world, which in the simplest case is another
particle. We shall see that this coupling may lead to quantum states which do not have a classical
analogue – these states are called entangled. Entanglement is used in novel technological applications
which are based on the quantum nature of matter. The most spectular realisation of this trend which
may be achieved in the next few decades is the quantum computer, which will be briefly discussed
towards the end of this chapter.

2 The density operator

Up to this point, we have always assumed that a quantum system can be described by a wavefunction
which contains all the information which can in principle be obtained about the system. In particular,
knowledge of the wavefunction enables us to predict the possible outcomes of physical measurements
and their probabilities. For example, if we know that the electron in a hydrogen atom finds itself in a
√ (|2, 1, 1, +i + i |2, 1, 0, +i − |2, 1, −1, −i) , (1)
where the ket-vectors are of the form |n, l, mz , sz i, we can calculate the possible outcome of any mea-
surement and its respective probability. In this example, a measurement of Lz would yield the value
}, or 0 or −}, all with probability 1/3. Although knowledge of the quantum mechanical wavefunc-
tion does not predict outcomes of measurements unambiguously, the wavefunction is the most precise
knowledge we can have about a system. In that sense, the wave function is for quantum mechanics
what the positions and velocities are for a classical many-particle system.
In some cases we might indeed know the state of a quantum system, for example at very low tem-
peratures where particles are almost certainly in the ground state, or when they are in some collective
quantum state, as is the case in superfluidity or superconductivity. Another example is a quantum
system of which we just have measured all quantities corresponding to the observation maximum. If
we would measure for example Lz = −}, L2 = 2}2 , E = E2 and Sz = −}/2, we can be sure that just
after that measurement the system is in the state
|2, 1, −1, −i . (2)
Note that we do not know the overall phase factor, but this factor drops out when calculating physical

104 Density operators — Quantum information theory

In most practical situations, however, we do not know the wavefunction at all! Moreover, if we
do not know the state, we cannot infer its shape from whatever sequence of whatever measurements,
as the first measurement reduces the state so that it changes considerably (as mentioned above, we do
know its state immediately after the measurement). Moreover, suppose we have carefully prepared
a system in a well-defined state, the interaction with its surroundings will alter that state. Does this
mean that quantum mechanics might be a nice theory, but that the state of affairs is that we cannot
do anything with it in practice? The answer is no: even if we do not know the state of a system pre-
cisely, we usually have some statistical knowledge about it. This means that we know the probability
Pr that a system is in state |ri. Now this might become very confusing: quantum mechanics allows
us to make statistical predictions, and now I say that the state of a system is specified in a statistical
manner. It might be helpful to keep in mind that a wavefunction by itself is not a statistical object at
all: it is a well-defined object whose time-evolution can be calculated with — in principle — arbitrary
precision. However, measurements performed on a system described by a known wavefunction are
subject to quantum mechanical uncertainty. This is called intrinsic uncertainty. If we have — for
whatever reason — incomplete knowledge of the state of the system, we speak of extrinsic uncer-
tainty. One of the conceptual difficulties students and scientists have with quantum mechanics is the
difference between the wavefunction which evolves smoothly and deterministically according to the
time-dependent Schrödinger equation on the one hand, and the abrupt change taking place at a mea-
surement where the state is instantaneously reduced to the eigenfunction of the operator corresponding
to the physical quantity we measure, on the other hand.
As an example of a state which is not known explicitly, consider again the hydrogen atom. Sup-
pose we have thousands of hydrogen atoms which we can measure. All these atoms have undergone
the same preparation. We measure energy, L2 and Lz of the atom. In all cases we find that energy is
that of the first excited state E = E2 , and l = 1, but in 25% of the cases, we find m = 1, in 50 % of the
cases m = 0 and in the remaining 25 % we find m = −1. It now is tempting to say that the state of
every hydrogen atom (neglecting the electron spin) can be written as
1 1 1
|ψi = |2, 1, 1i + √ |2, 1, 0i + |2, 1, −1i (3)
2 2 2
But can we really infer this information from our measurements? We could flip the sign of the second
term on the right hand sign, and we would find the same probabilities as with the state given above.
Can you now still tell me which state the atoms are in?
To emphasise the point more strongly, we look at the simplest possible nontrivial system, de-
scribed by a two-dimensional Hilbert state, e.g. a spin-1/2 particle. Suppose someone, Charlie, gives
us an electron but he does not know its spin state. He does however know that there is no reason for
the spin to be preferably up or down, so the probability to measure spin ‘up’ or ‘down’ is 1/2 for both.
Does that give us enough information to specify the state? Well, you might guess that the state is
|ψi = √ (|1/2i + |−1/2i) , (4)
but why couldn’t it be
|ψi = √ (|1/2i − |−1/2i)? (5)
In fact the state of the system could be anything of the form
|ψi = √ |1/2i + eiϕ |−1/2i ,

2. The density operator 105

for any real ϕ.

Although we do not know the wavefunction exactly, we can evaluate the expectation value of the
z-component of the spin: as we find }/2 and −}/2 with equal probabilities, the expectation value is
0. More generally, if we have a spin which is in the spin-up state with probablity p and in the down
state with probablity 1 − p, the expectation value of the z-component of the spin is }(p − 1/2). So
expectation values can still be found, although we do not have complete information about the state
of the system. This fact might raise the question whether there is any difference in measured physical
quantities between one of the candidate wavefunctions suggested above, and the information that the
particle is in the ‘up’ state with probability 1/2 and in the ‘down’ state with the same probability. After
all, they both give the same value for expectation value of the z-component of the spin.
We now introduce the following states:

|ψ1 i = |1/2i ; (7a)

|ψ2 i = |−1/2i ; (7b)
|ψ3 i = √ (|1/2i + |−1/2i) ; (7c)
|ψ4 i = √ (|1/2i − |−1/2i) ; (7d)
|ψ5 i = √ (|1/2i + i |−1/2i) ; (7e)
|ψ6 i = √ (|1/2i − i |−1/2i) . (7f)
These states are recognised as the spin-up and -down
 √ states for the z, x and y directions. Let us

consider a particle in the state |1/2i + e |−1/2i / 2, and calculate the probability of finding at a
measurement this particle in the state |ψ3 i:

 2 1 + exp(iϕ) 2 1

1 iϕ
(h1/2| + h−1/2|) |1/2i + e |−1/2i = = (1 + cos ϕ) . (8)
2 2 2

If we evalute the probability to find the particle in the state |ψ3 i in the case it was, before the mea-
surement, in a so-called mixed state which is given with equal probabilities to be |1/2i and |−1/2i,
we find 1/2, as can easily be verified. Calculating the probabilities for a particle to be found in the
states |ψ1 i to |ψ6 i we find the following results.

State |1/2i + eiϕ |−1/2i / 2 Equal mixture of |1/2i and |−1/2i
|ψ1 i 1/2 1/2
|ψ2 i 1/2 1/2
|ψ3 i 1/2(1 + cos ϕ) 1/2
|ψ4 i 1/2(1 − cos ϕ) 1/2
|ψ5 i 1/2(1 − sin ϕ) 1/2
|ψ6 i 1/2(1 + sin ϕ) 1/2

We see that there is no ϕ, i.e. no pure state, which leads to the same probabilities for all measurement
results. It is important to make the distinction between the two cases very clearly: if Charley gives us
106 Density operators — Quantum information theory

millions of times a particle which is in a state

√ with an arbitrary ϕ, we
√ will find probabilities 1/2 to
find the particle in either (|1/2i + |−1/2i)/ 2 or (|1/2i − |−1/2i)/ 2. If the phase would always
be the same, say ϕ = 0 then we would find probabilities 1 and 0 respectively. Therefore, a mixed state
indicates uncertainty of the relative phase of the components of the wave function.
Let us summarize what we have learned:
A system can be either in a pure or a mixed state. In the first case, we know precisely the
wavefunction of the system. In the second case, we are not sure about the state, but we can
ascribe a probability for the system to be in any of the states accessible to it.

Note that the uncertainty about the state the particle is in, is a classical uncertainty. We can for
example flip a coin and, depending on whether the result is head or tails, send a spin-up or -down to
a friend. Our friend then only knows that the probability for the particle he receives to be ‘up’ is 1/2,
and similar for ‘down’.
We now turn to the general case of a system which can be in either one of a set of normalised, but
not necessarily orthogonal, states |ψi i. The probability for the system to be in the state |ψi i is pi , with
obviously ∑i pi = 1. Suppose the expectation value of some operator  in state |ψi i is given by Ai .
Then the expectation value of  for the system at hand is given by

hAi = ∑ pi Ai = ∑ pi hψi | Â |ψi i . (9)

i i

We now introduce the density operator, which is in some sense the ‘optimal’ specification of the
system. The density operator is defined as

ρ̂ = ∑ pi |ψi i hψi | . (10)


Suppose the set |φn i forms a basis of the Hilbert space of the system under consideration. Then the
expectation value of the operator  can be rewritten after inserting the unit operator 1 = ∑n |φn i hφn |

hAi = ∑ pi hψi | Â |ψi i = ∑ pi hψi | ∑ |φn i hφn | Â |ψi i =

i i n
" #
∑ hφn | ∑ pi |ψi i hψi | Â |φn i = ∑ hφn | ρ̂ |φn i = Tr ρ̂ Â . (11)

n i n

Here we have used the trace operator, Tr which adds all diagonal terms of an operator. For a general
operator Q̂:
Tr Q̂ = ∑ hφn | Q̂ |φn i . (12)
The trace is independent of the basis used — it is invariant under a basis transformation. We omit the
hat from operators unless confusion may arise. Another property of the trace is

Tr (|ψi hχ|) = hχ|ψi , (13)

which is easily verified by writing out the trace with respect to a basis φn .
If a system is in a well-defined quantum state |ψi, we say that the system is in a pure state. In that
case the density operator is
ρ = |ψi hψ| . (14)
2. The density operator 107

If the system is not in a pure state, but if only the statistical weights pi of the states |ψi i are known,
we say that the system is in a mixed state. When someone gives you a density operator, how can you
assess whether it corresponds to a pure or a mixed state? Well, it is clear that for a pure state we have
ρ 2 = ρ, which means that ρ is a projection operator1 :

ρ 2 = |ψi hψ|ψi hψ| = |ψi hψ| = ρ, (15)

where we have used the fact that ψ is normalised.

For a mixed state, such as
ρ = α |ψi hψ| + β |φ i hφ | (16)
where hψ|φ i = 0, we have
ρ 2 = α 2 |ψi hψ| + β 2 |φ i hφ | 6= ρ. (17)
Although we have considered a particular example here, it holds in general for a mixed state that ρ is
not a projection operator.
Another way to see this is to look at the eigenvalues of ρ. For a pure state, for which ρ = |ψi hψ|,
clearly |ψi is an eigenstate of ρ with eigenvalue 1, and all other eigenvalues are 0 (their eigenstates
are all states which are perpendicular to |ψi. These values for the eigenvalues are the only ones which
are allowed by a projection operator. As

Trρ = ∑ pi = 1, (18)

we have
∑ λi = 1. (19)

Now let us evaluate

hφ | ρ |φ i = ∑ pi |hψi |φ i|2 ≤ 1, (20)

where the fact that |hψi |φ i| ≤ 1, combined with ∑i pi = 1 leads to the inequality. The condition
∑i λi = 1, means that either one of the eigenvalues is 1 and the rest is 0, or they are all strictly smaller
than 1. Thus, for an eigenstate φ of the density operator, we have

hφ | ρ |φ i = hφ |λ |φ i = λ < 1. (21)

We see that a density operator has eigenvalues between 0 and 1. In summary

The sum of the eigenvalues of the density operator is 1. The situation where only one of
these eigenvalues is 1 and the rest is 0, corresponds to a pure state.
If there are eigenvalues 0 < λ < 1, then we are dealing with a mixed state.

To summarize, we can say that if a system is in a mixed state, it can be characterized by a set of
possible wavefunctions |ψi i and probabilities pi for the system to be in each of those wave functions.
But a more compact way of representing our knowledge of the system is by using the density operator,
which can be constructed when we know the possible states ψi and their probabilities pi [see Eq. (10)].
The density operator can be used to calculate expectation values using the trace, see Eq. (11).
1 Recall that a projection operator P is an Hermitian operator satisfying P2 = P.
108 Density operators — Quantum information theory

Let us consider an example. Take again the case where Charley sends us a spin-up or -down
particle with equal probabilities. For convenience, we denote these two states as |0i (spin up) and |1i
(spin-down). Then the density operator can be evaluated as
1 1
ρ= |0i h0| + |1i h1| . (22)
2 2
This operator works in a two-dimensional Hilbert space – therefore it can be represented as a 2 × 2
1/2 0
ρ= . (23)
0 1/2
The matrix elements are evaluated as follows. The upper-left element is
1 1
h0| ρ |0i = h0|0i h0|0i + h0|1i h1|0i = 1/2 (24)
2 2
as follows from (22) and from the orthogonality of the two basis states. The upper-right element is
given by
1 1
h0| ρ |1i = h0|0i h0|1i + h0|1i h1|1i = 0 (25)
2 2
as a result of orthogonality. The lower left element h1|ρ|0i and the lower right h1|ρ|1i are found
similarly. Another interesting way to find the density matrix (i.e. the matrix representation of the
density operator) is by directly using the vector representation of the states |0i and |1i:
1 1 1 0 1/2 0
ρ= (1, 0) + (0, 1) = . (26)
2 0 2 1 0 1/2

Note the somewhat unusual order in which we encounter column and row vectors: the result is not a
number, but an operator.
Another day, Charley decides to send us particles which are either ”up” or ”down” along the
x-axis. As you might remember, the eigenstates are
√ (|0i + |1i) (27)
for spin-up (along x) and
√ (|0i − |1i) (28)
for spin-down. You recognize these states as the states |φ3 i and |φ4 i given above. Now let us work
out the density operator:
1 1 1 1 1/2 0
ρ= (1, 1) + (1, −1) = . (29)
4 1 4 −1 0 1/2

We see that we obtain the same density matrix! Apparently, the particular axis used by Charley does
not affect what we measure at our end.
Another question we frequently ask ourselves when dealing with quantum systems is:
What is the probability to find the system in a state |φ i in a measurement?
The answer for a system which is in a pure state |ψi is:

Pφ = |hφ |ψi|2 . (30)

2. The density operator 109

If the system can be in either one of a set of states |ψi i with respective probabilities pi , the answer is
Pφ = ∑ pi |hφ |ψi i|2 . (31)

Another way to obtain the expression on the right hand side is by using the density operator:

hφ |ρ|φ i = ∑ pi |hφ |ψi i|2 = Pφ . (32)


This equation follows directly from the definition of the density operator.
Important examples of systems in a mixed state are statistical systems connected to a heat bath.
Loosely speaking, the actual state of the system without the bath varies with time, and we do not know
that state when we perform a measurement. We know however from statistical physics that the prob-
ability for the system to be in a state with energy E is given by the Boltzmann factor exp[−E/(kB T )],
so the density operator can be written as

ρ =N ∑ |ψi i e−E /(k T ) hψi |

i B

where the ψi are eigenstates of the Hamiltonian. The prefactor N is adjusted such that N ∑ e−Ei /(kB T ) =
1 in order to guarantee that Tr ρ = 1. The density operator can also be written as

ρ = N e−Ĥ/(kB T ) , (34)

as can be verified as follows:

e−Ĥ/(kB T ) = ∑ |ψi i hψi | e−Ĥ/(kB T ) ∑ |ψ j ihψ j | = ∑ |ψi i e−Êi /(kB T ) hψi | . (35)
i j i

Any expectation value can now in principle be evaluated. For example, consider a spin-1/2 particle
connected to a heat bath of temperature T in a magnetic field B pointing in the z-direction. The
Hamiltonian is given by
H = −γBSz . (36)
Then the expectation value of the z-component of the spin can be calculated as

hSz i = Tr (ρSz ). (37)

We can evaluate ρ. Using the notation β = 1/(kB T ) it reads:

 β γ}B/2 
1 e 0
ρ = β γ}B/2 . (38)
e + e−β γ}B/2 0 e−β γ}B/2

Now the expectation value hSz i can immediately be found, using Sz = }σz /2, where σz is the Pauli
hSz i = Tr (ρSz ) = }/2 tanh(β γ}B/2). (39)
Considering systems of noninteracting particles, the density operator can be used to derive the
average occupation of energy levels, leading to the well-known Fermi-Dirac distribution for fermions,
and the Bose-Einstein distribution for bosons. This derivation is however beyond the scope of this
lecture course — it is treated in your statistical mechanics course.
110 Density operators — Quantum information theory

3 Entanglement

Entanglement is a phenomenon which can occur when two or more quantum systems are coupled. We
shall focus on the simplest nontrivial system exhibiting entanglement: two particles, A and B, whose
degrees of freedom span a two-dimensional Hilbert space (as usual, you may think of two spin-1/2
particles). The states of the particles are denoted |0i and |1i. Therefore, the possible states of the
two-particle system are linear combinations of the states

|00i |01i |10i and |11i (40)

(the first number denotes the state of particle A and the second one that of particle B). We use these
states (in this order) as a basis of the four-dimensional Hilbert space, that is, we may identify
 
 0 
|00i ⇔ 
 0 
 (41)

and so on.
Suppose the system is in the state
|ψi = (|00i + |01i + |10i + |11i) (42)
or, in vector notation:  
1 1 
ψ=  . (43)
2 1 
Note that this state is normalised.
We perform measurements of the first spin only. More specifically, we measure the probabilities
for a system to be in the states
1 1
|ψ1 i = |0i , |ψ2 i = |1i , |ψ3 i = √ (|0i + |1i) or |ψ4 i = √ (|0i − |1i) . (44)
2 2
The resulting probabilities are (check this!):

P1 = P2 = 1/2; (45a)
P3 = 1; P4 = 0. (45b)

These are precisely

√ the same results as those found above for a single particle in the state |ψ3 i =
(|0i + |1i)/ 2, that is, if we want to predict measurements on the first particle, we can forget about
the second particle. The reason for this is that we can write the state (42) as
(|0iA + |1iA ) ⊗ (|0iB + |1iB ) (46)
where ⊗ is the so-called tensor product. The fact that (42) can be written as a (tensor) product of pure
states of the two subsystems A and system B is responsible for the fact that the second particle does
not ‘interfere’ with the first one.
3. Entanglement 111

Now consider the state

|ψE i = √ (|00i + |11i) . (47)
If we evaluate again the probabilities of finding the first spin in the states ψ1 or ψ2 , we find:

P1 = P2 = , (48)
as can readily be seen from (47). In order to evaluate the probability to find the first spin in the state
ψ3 , we write ψ in the form

|ψi = [(|ψ3 i + |ψ4 i) |0i + (|ψ3 i − |ψ4 i) |1i] . (49)
Now, the probability for the first spin to be in the state ψ3 or ψ4 , while not caring about whether the
second spin is 0 or 1, is seen to be
P3 = P4 = 1/2 (50)
These results are the same as for a single particle with density operator

ρ= (|0i h0| + |1i h1|) . (51)
The conclusion is that for the state (47), the first particle is well described by a mixed state. There
is no way to assign a pure state to particle A: we say that particle A is entangled with particle B and
ψE is called an entangled state. Thus, we see that the density operator is very useful for describing
particles coupled to an ‘outside world’. A state is entangled when it does not allow us to assign a pure
state to a part of the quantum system under consideration.
Let us now consider entanglement from another point of view. We perform measurements on
particle A and on B, checking whether these particles are found in state 1 or 0. For our entangled state
(47) we find

P00 = P11 = 1/2 (52a)

P10 = P01 = 0, (52b)

where P01 is the probability to find particle A in state 0 and particle B in state 1 etcetera. We see that
in terms of classical probabilities, the systems A is strongly correlated with system B. It turns out that
this correlation remains complete even when the measurement is performed with respect to another
basis (see exercises):

Entanglement gives rise to correlation of probabilities, and this correlation cannot be lifted by a
basis transformation.

Now let’s start with a system which is not entangled — it might for example be in the state (42).
We assume that the system evolves according to a Hamiltonian H which, in the basis |00i, |01i, |10i,
|11i, has the following form:
 
1 0 0 0
 0 1 0 0 

 0 0 1 0 
 (53)
0 0 0 −1
112 Density operators — Quantum information theory

The time evolution operator is given by T = exp(−it Ĥ/}) — at t = π}/2 it has the form
 
−i 0 0 0
 0 −i 0 0 
 
 0 0 −i 0  (54)
0 0 0 i

so that we find
|ψ(t = π}/2)i = − (|00i + |01i + |10i − |11i) , (55)
which is an entangled state (you will find no way to write it as a tensor product of two pure states of
A and B). Thus we see that when a system starts off in a non-entangled state, it might evolve into an
entangled state in the course of time.

4 The EPR paradox and Bell’s theorem

In 1935, Einstein, Podolsky and Rosen (EPR) published a thought experiment, which demonstrated
that quantum mechanics is not compatible with some obvious ideas which we tacitly apply when
describing phenomena. In particular the notions of an existing reality existing independently of ex-
perimental measurements and of locality cannot both be reconciled with quantum mechanics. Locality
is used here to denote the idea that events cannot have an effect at a distance before information has
travelled from that event to another place where its effect is noticed. Together, the notions of reality
and locality are commonly denoted as ‘local realism’. From the failure of quantum mechanics to
comply with local realism, EPR concluded that quantum mechanics is not a complete theory.
The EPR paradox is quite simple to explain. At some point in space, a stationary particle with
spin 0 decays into two spin-1/2 particles which fly off in opposite directions (momentum conservation
does not allow the directions not to be opposite). During the decay process, angular momentum is
conserved which implies that the two particles must have opposite spin: when one particle is found to
have spin ‘up’ along some measuring axis, the other particle must have spin ‘down’ along the same
axis. Obviously, we are dealing with an entangled state.
Suppose Alice and Bob both receive an outcoming particle from the same decay event. Alice
measures the spin of the particle along the z direction, and Bob does the same with his particle.
Superficially, we can say that they would both have the same probability to find either }/2 or −}/2.
However, if quantum mechanics is correct, these measurements should be strongly correlated: if Alice
has measured spin up, then Bob’s particle must have spin down along the z-axis, so the measurement
results are fully correlated. According to the ‘orthodox’, or ‘Copenhagen’ interpretation of quantum
mechanics, if Alice is the first one to measure the spin, the particular value measured by her is decided
at the very moment of that measurement. But this means that at the at the same moment the spin state
of Bob’s particle is determined. But Bob could be lightyears away from Alice, and perform his mea-
surement immediately after her. According to the orthodox interpretation, his measurement would be
influenced by Alice’s. But this was inconceivable to Einstein, who maintained that the information
about Alice’s measurement could not reach the Bob’s particle instantaneously, as the speed of light is
a limiting factor for communication. In Einstein’s view, the outcome of the measurements of the par-
ticles is determined at the moment when they leave the source, and he believed that a more complete
theory could be found which would unveil the ‘hidden variables’ which determine the outcomes of
Alice and Bob’s measurements when the particles left the source. These hidden variables would then
represent some “reality” which exists irrespectively of the measurement.
4. The EPR paradox and Bell’s theorem 113

a c

Figure 12.1: The measuring axis for a spin.

The EPR puzzle remained unsettled for a long time, until, in 1965, John Bell formulated a theorem
which would allow to distinguish between Einstein’s scenario and the orthodox quantum mechanical
interpretation. We shall now derive Bell’s theorem. Suppose we count in an audience the numbers of
people having certain properties, such as ‘red hair’ or ‘wearing yellow socks’, ‘taller than 1.70 m’.
We take three such properties, called A, B and C. If we select one person from the audience, he or she
will either comply to each of these properties or not. We denote this by a person being ‘in the state’
A+ , B− ,C+ for example. The number of people in the state A+ , B− ,C+ is denoted N(A+ , B− ,C+ ). We
now write
N(A+ , B− ) = N(A+ , B− ,C+ ) + N(A+ , B− ,C− ) (56)

which is a rather obvious relation.

We use similar relations in order to rewrite this as

N(A+ , B− ) = N(A+ ,C− ) − N(A+ , B+ ,C− ) + N(B− ,C+ ) − N(A− , B− ,C+ ) ≤ N(A+ ,C− ) + N(B− ,C+ ).
This is Bell’s inequality, which can also be formulated in terms of probabilities [P(A+ , B− ) instead of
N(A+ , B− ) etcetera]. We have used everyday-life examples in order to emphasise that there is nothing
mysterious, let alone quantum mechanical, about Bell’s inequality. But let us now turn to quantum
mechanics, and spin determination in particular.
Consider the three axes a, b and c shown in the figure. A+ is now identified with a spin-up mea-
surement along a etcetera. We can now evaluate P(A+ ,C− ). Measuring A+ happens with probability
1/2, but after this measurement, the particle is in the spin-up state along the a-axis. If the spin is
then measured along the c direction, we have a probability sin2 π/8 to find C− (see problem 16 of the
exercises). The combined probability is P(A+ ,C− ) is therefore 12 sin2 (π/8) Similarly, P(B− ,C+ ) is
also equal to 12 sin2 (π/8), and P(A+ , B− ) is 1/4. Inserting these numbers into Bell’s inequality gives:

1 2 1
≤ sin (π/8) = 1− 2 , (58)
4 2 2

which is obviously wrong. Therefore, we see that quantum mechanics does not obey Bell’s inequality.
Now what does this have to do with the EPR paradox? Well, first of all, the EPR paradox allows us
to measure the spin in two different directions at virtually the same moment. But, more importantly, if
114 Density operators — Quantum information theory

the particles would leave the origin with predefined probabilities, Bell’s inequality would unambigu-
ously hold. The only way to violate Bell’s inequality is by accepting that Alice’s measurement reduces
the entangled wavefunction of the two-particle system, which is also noticed by Bob instantaneously.
So, there is some ‘action at a distance’, in contrast to what we usually have in physics, where every
action is mediated by particles such as photons, mesons, . . . .
In 1982, Aspect, Dalibard and Roger performed experiments with photons emerging from decay-
ing atoms in order to check whether Bell’s theorem holds or not. Since then, several other groups
have redone this experiment, sometimes with different setups. The conclusion is now generally ac-
cepted that Bell’s theorem does not hold for quantum mechanical probabilities. The implications of
this conclusion for our view of Nature is enormous: somehow actions can be performed without in-
termediary particles, so that the speed of light is not a limiting factor for this kind of communication.
‘Communication’ is however a dangerous term to use in this context, as it suggests that information
can be transmitted instantaneously. However, the ‘information’ which is transmitted from Alice to
Bob or vice versa is purely probabilistic, since Bob nor Alice can predict the outcome of their mea-
surements. So far, no schemes have been invented or realised which would allow us to send over a
Mozart symphony at speeds faster than the speed light.

5 No cloning theorem

In recent years, much interest has arisen in quantum information processing. In this field, people try
to exploit quantum mechanics in order to process information in a way completely different from clas-
sical methods. We have already encountered one example of these attempts: quantum cryptography,
where a random encryption key can be shared between Bob and Alice without Eve being capable of
eavesdropping. Another very important application, which unfortunately is still far from a realisation,
is the quantum computer. When I speak of a quantum computer, you should not forget that I mean
a machine which exists only on paper, not in reality. A quantum computer is a quantum machine in
which qubits evolve in time. A qubit is a quantum system with a 2-dimensional Hilbert space. It can
always be denoted
|ϕi = a |0i + b |1i , (59)
where a and b are complex constants satisfying a2 + b2 = 1. The states |0i and |1i form a basis in
the Hilbert space. A quantum computer manipulates several qubits in parallel. A system consisting
of n qubits has a 2n -dimensional Hilbert space. A quantum computation consists of a preparation of
the qubits in some well-defined state, followed by an autonomous evolution of the qubit system, and
concluded by reading out the state of the qubits. As the system is autonomous, it is described by a
(Hermitian) Hamiltonian. The time-evolution operator U = exp(−itH/}) is then a unitary operator,
so the the quantum computation between initialisation and reading out the results can be described in
terms of a sequence of unitary transformations applied to the system. In this section we shall derive a
general theorem for such an evolution, the no-cloning theorem:

An unknown quantum state cannot be cloned.

By cloning we mean that we can copy the state of some quantum system into some other system
without losing the state of our original system.
Before proceeding with the proof of this theorem, let us assume that cloning would be possible.
In that case, communication at speeds faster than light would in principle be possible. To see this,
imagine Alice has a qubit of which Bob has many clones, which are entangled with Alice’s qubit. If
Alice performs a measurement on her qubit along the axis |0i or |0i + |1i, Bob’s clones will become
6. Dense coding 115

aligned along the same axis. As Bob has many clones, he can find out which measurement Alice per-
formed without ambiguity (how?). So the no-cloning theorem is essential in making communication
at speeds faster than the speed of light impossible.
The proof of the no-cloning theorem for qubit systems proceeds as follows. Cloning for a qubit
pair means that we have a unitary evolution U with the following effect on a qubit pair:
U |α0i = |ααi . (60)
The evolution U should work for any state α, therefore it cannot depend on α. Therefore, for some
other state |β i we must have
U |β 0i = |β β i . (61)

Now let us operate with U on the state |γ0i with |γi = (|αi + |β i) / 2:

U |γ0i = (|ααi + |β β i) / 2 6= |γγi , (62)
which completes the proof.

6 Dense coding

In this section, I describe a way of sending over more information than bits. This sounds completely
impossible, but, again, quantum mechanics is in principle able to realise the impossible. It is however
difficult to implement, as it is based on Bob and Alice having an entangled pair of qubits, in the state
|00i + |11i . (63)
From now on, we shall adopt the convention in this field to omit normalisation factors in front of the
wavefunctions. We can imagine this state to be realised by having an entangled pair generator midway
between Alice and Bob, sending entangled particles in opposite directions as in the EPR setup.
Note that the following qubit operations are all unitary:
I |φ i = |φ i (64a)
X |0i = |1i , (64b)
X |1i = |0i (64c)
Z |0i = |0i , (64d)
Z |1i = − |1i . (64e)
Y |0i = |1i (64f)
Y |1i = − |0i Y = XZ . (64g)
The operator I is the identity; X is called the NOT operator, We assume that Alice has a device with
which she can perform any of the four transformations (I, X,Y, Z) on her member (i.e. the first) of the
entangled qubit pair. The resulting perpendicular states for these four transformations are:
I (|00i + |11i) = (|00i + |11i) (65a)
X (|00i + |11i) = (|10i + |01i) (65b)
Y (|00i + |11i) = (|10i − |01i) (65c)
Z (|00i + |11i) = (|00i − |11i) (65d)
Alice does not perform any measurement — she performs one of these four transformations and then
she sends her bit to Bob. Bob then measures in which of the four possible states the entangled pair is,
in other words, he now knows which transformation Alice applied. This information is ‘worth’ two
bits, but Alice had to send only one bit to Bob!
116 Density operators — Quantum information theory

7 Quantum computing and Shor’s factorisation algorithm

A quantum computer is a device containing one or more sets of qubits (called registers), which can be
initialised without ambiguity, and which can evolve in a controlled way under the influence of unitary
transformations and which can be measured after completion of this evolution.
The most general single-qubit transformation is a four-parameter family. For more than one qubit,
it can be shown that every nontrivial unitary transformation can be generated by a single-qubit trans-
formation of the form
−ie−iφ sin(θ /2)
cos(θ /2)
U(θ , φ ) = . (66)
−ieiφ sin(θ /2) cos(θ /2)

and another unitary transformations involving more than a single qubit, the so-called 2-qubit XOR.
This transformation acts on a qubit pair and has the following effect:

XOR (|00i) = |00i (67a)

XOR (|01i) = |01i (67b)
XOR (|10i) = |11i (67c)
XOR (|11i) = |10i (67d)

We see that the first qubit is left unchanged and the second one is the eXclusive OR of the two input
bits. Unitary transformations are realised by hardware elements called gates.
Several proposals for building quantum computers exist. In the ion trap, an array of ions which
can be in either the ground state (|0i) or the excited state (|1i), controlled by laser pulses. Coupling of
neighbouring ions in order to realise an XOR-gate is realised through a controlled momentum transfer
to displacement excitations (phonons) of the chain.
Here in Delft, activities focus on arrays of Josephson junctions. Josephson junctions are very
thin layers of ordinary conductors separating two superconductors. Current can flow through these
junctions in either the clockwise or anti-clockwise direction (interpreted as 0 and 1 respectively).
Other initiatives include NMR devices and optical cavities. With this technique it has become possible
recently to factorise the number 15. Realisation of a working quantum computer will take at least a
few decades — if it will come at all.
A major problem in realising a working quantum computer is to ensure a unitary evolution. In
practice, the system will always be coupled to the outside world. Quantum computing hinges upon the
possiblity to have controlled, coherent superpositions. Coherent superpositions are linear combina-
tions of quantum states into another, pure state. As we have seen in the previous section, coupling to
the environment may lead to entanglement which would cause the quantum computer to be described
by a density operator rather than by a pure state. In particular, any phase relation between constitutive
parts of a phase-coherent superposition is destroyed by coupling to the environment. We shall now
treat this phenomenon in more detail.
Consider a qubit which interacts with its environment. We denote the state of the environment by
the ket |mi. The interaction is described by the following prescription:

|0i |mi → |0i |m0 i ; (68a)

|1i |mi → |1i |m1 i . (68b)

In this interaction, the qubit itself does not change — if this would be the case, our computer would
be useless to start with.
7. Quantum computing and Shor’s factorisation algorithm 117

Suppose we start with a state

|0i + eiφ |1i (69)
which is coupled to the environment. This coupling will induce the transition

|0i + eiφ |1i |mi → |0i |m0 i + eiφ |1i |m1 i .


Suppose this qubit is then fed into a so-called Hademard gate, which has the effect
H |0i = √ (|0i + |1i) ; (71a)
H |1i = √ (|0i − |1i) . (71b)
Then the outcome is
h    i
eiφ /2 |0i e−iφ /2 |m0 i + eiφ /2 |m1 i + |1i e−iφ /2 |m0 i − eiφ /2 |m1 i . (72)

If we suppose that hm0 |m1 i is real, we find for the probabilities to measure the qubit in the state |0i or
|1i (after normalisation):
P0 = (1 − hm0 |m1 i cos φ ) (73a)
P1 = (1 + hm0 |m1 i cos φ ) (73b)
If there is no coupling, m0 = m1 = m, and we recognise the phase relation between the two states in
the probabilities. On the other hand, if hm0 |m1 i = 0, then we find for both probabilities 1/2, and the
phase relation has disappeared completely.
It is interesting to construct a density operator for the qubit in the final state (72). Consider a qubit

α |0i | + β |1i (74)

which has interacted with its environment, so that we have the combined state

α |0i |m0 i + β |1i |m1 i . (75)

We can arrive at a density operator for the qubit only by performing the trace over the m-system only.
Using (13) we find
|α|2 αβ ∗ hm1 |m0 i
ρqubit = . (76)
α ∗ β hm0 |m1 i |β |2
The eigenvalues of this matrix are
1 1
λ= ± (|α|2 − |β |2 )2 + 4|α|2 |β |2 | hm0 |m1 i |2 (77)
2 2
and these lie between 0 and 1, where the value 1 is reached only for hm0 |m1 i = 1. The terms co-
herence/decoherence derive from the name coherence which is often used for the matrix element
hm0 |m1 i.
Now let us return to the very process of quantum computing itself. The most impressive algorithm,
which was developed in 1994 by Peter Shor, is that of factorising large integers, an important problem
118 Density operators — Quantum information theory

in the field of encryption and code-breaking. We shall not describe this algorithm in detail, but present
a brief sketch of an important sub-step, finding the period of an integer function f . It is assumed here
that all unitary transformations used can be realised with a limited number of gates.
The algorithm works with two registers, both containing n qubits. These registers are described
by a 2n -dimensional Hilbert space. As basis states we use the bit-sequences of the integers between
0 and 2n−1 . The basis state corresponding to such an integer x is denoted |xin . Now we perform the
Hademard gate (71) to all bits of the state |0in . This yields
2n −1
H |0in ≡ |win = 2 −n
∑ |xin . (78)

It is possible (but we shall not describe the method here) to construct, for any function f which maps
the set of numbers 0 to 2n−1 onto itself, a unitary transformation U f which has the effect

U f |xin |0in = |xin | f (x)in (79)

using a limited number of gates.

Now we are ready for the big trick in quantum computing. If we let U f act on the state |win then
we obtain n 2 −1
U f |win |0in = 2−n ∑ |xin | f (x)in . (80)

We see that the new state contains f (x) for all possible values of x. In other words, applying the gates
U f to our state |win |0in , we have evaluated the function f for 2n different arguments. This feature is
called quantum parallelism and it is this feature which is responsible for the (theoretical) performance
of quantum computing.
Of course, if we were to read out the results of the computation for each x-value, we would have
not gained much, as this would take 2n operations. In general, however, the final result that we are
after consists of only few data, so a useful problem does not consist of simply calculating f for all
of its possible arguments. As an example we consider the problem of finding the periodicity of the
function f , which is an important step in Shor’s algorithm. This is done by reading out only one
particular value of the result in the second register, f (x) = u, say. The first register is then the sum
of all x-states for which it holds that f (x) = u. If f has a period r, we will find that these x-values
lie a distance r apart from each other. Now we act with a (unitary) Fourier transform operator on this
register, and the result will be a linear combination of the registers corresponding to the period(s) of
the function f . If there is only one period, we can read this out straightforwardly.
It has been said already that finding the period of some function is an important step in the fac-
torising algorithm. Shor’s algorithm is able to factorise an n-bit integer in about 300n3 steps. A very
rough estimate of size for the number to be factorized where a quantum computer starts outperforming
a classical machine, is about 10130 .
Appendix A

Review of Linear Algebra

1 Hilbert spaces

A Hilbert space is defined as a linear, closed inner product space. The notions of linearity, inner
product and closure may need some explanation.
• A linear vector space is a vector space in which any linear combination of vectors is an element
of that space. In other words, if u and v are elements of the space H , then

αu + β v lies in H . (1)

• An inner product is a scalar expression depending on two vectors u and v. It is denoted by hu|vi
and it satisfies the following requirements:
hu|vi = hv|ui∗ , (2a)
where the asterisk denotes complex conjugation.
2. Linearity:
hw|αu + β vi = α hw|ui + β hw|vi . (2b)
3. Positive-definiteness:
hu|ui ≥ 0, (2c)
and the equals-sign only holds when u = 0.
An inner product space is a linear vector space in which an inner product is defined.
• Closure means that if we take a converging sequence of vectors in the Hilbert space then the limit
of the sequence also lies inside the space.
We shall now discuss two examples of Hilbert spaces.
1. Linear vector space in finite dimension N. The elements are represented as column vectors:
 
 u2 
u = |ui =  .  . (3)
 
 .. 

The elements ui are complex. The vector hu| is conveniently denoted as

hu| = (u∗1 , u∗2 , . . . u∗N ), (4)

120 Review of Linear Algebra

It is called the Hermitian conjugate of the column vector |ui; hu| is often denoted as |ui† . The
inner product hu|vi is the product between the row vector hu| and the column vector |vi – hence it
can be written as
hu|vi = ∑ u∗i vi . (5)

This definition satisfies all the requirements of the inner product, mentioned above.

2. A second example is the space of square integrable functions, i.e. complex-valued functions f
depending on n real variables x1 , . . . , xn ≡ x satisfying
d n x | f (x)|2 < ∞. (6)

Note that the x may be restricted to some domain.

The inner product for complex-valued functions is defined as
h f |gi d n x f ∗ (x)g(x) (7)

2 Operators

An operator transforms a vector into some other vector. We shall be mainly concerned with linear
operators T̂ , which, for any two complex numbers α and β , satisfy

T̂ (α |ui + β |vi) = α T̂ |ui + β T̂ |vi . (8)

Examples are operators represented by matrices in a finite-dimensional Hilbert space:

    
1 2 3 1 8
 −1 −2 1   2  =  −4  . (9)
1 −1 0 1 −1

An example of a linear operator in function space is the derivative operator D̂ = d/dx:

D̂ f (x) = f (x). (10)

The Hermitian conjugate T̂ † of an operator T̂ is defined as

T̂ |ui = hu| T̂ † . (11)

As an example, consider a two-dimensional Hilbert space:

T11 T12 u1 T11 u1 + T12 u2
T̂ |ui = = . (12)
T21 T22 u2 T21 u1 + T22 u2

Taking the Hermitian conjugate of this we have, using (4):

† ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗
T̂ |ui = (T11 u1 + T12 u2 , T21 u1 + T22 u2 ) . (13)
2. Operators 121

According to (11), this must be equal to

 † † 
T11 T12
(u∗1 , u∗2 ) † † (14)
T21 T22

and we immediately see that

∗ ∗
T11 T21
T̂ † = ∗ ∗ . (15)
T12 T22
We conclude that the Hermitian conjugate of a matrix is the transpose and complex conjugate of the
original. This result holds for matrices of arbitrary size.
Now let us find the Hermitian conjugate of the operator D̂ = d/dx:
h f | D̂ |gi = hg| D̂† | f i . (16)

Writing out the integral expressions for the inner product we have:
d d ∗
Z Z Z ∗

h f | D̂ |gi = dx f (x) g(x) = − dx f (x) g(x) = − dx g(x)D̂ f ∗ (x) = hg| − D̂ | f i
dx dx
where we have used the partial integration to arrive at the first equality and we have assumed that the
integrated terms vanish. This condition holds for virtually all sensible quantum systems. Comparing
(16) with (17), we see that
D̂† = −D̂. (18)
A Hermitian operator Ĥ is an operator satisfying

Ĥ † = Ĥ. (19)

We have seen that the differentiation operator D̂ is not Hermitian – however, D̂2 is.
A unitary operator Û is an operator which satisfies

Û Û † = Û †Û = I,
ˆ (20)

where Iˆ is the unit operator which leaves any vector unchanged, Iˆ |ui = |ui.
An eigenvector of a linear operator T̂ is a vector which satisfies

T̂ |ui = λ |ui , (21)

where λ is a complex number, which is called the eigenvalue. In geometrical terms, this means that
a vector which is operated on by T̂ will change its length, but not its direction. Eigenvectors are
extremely important in quantum mechanics, as we shall see in this course. Eigenvalues are said to be
degenerate if they are shared by at least two linearly independent eigenvectors.
For an Hermitian operator we have the following:

• The eigenvectors span the whole Hilbert space, which means that any vector of the space can be
written as a linear combination of the eigenvectors. This property of the eigenvectors is called

• All eigenvalues are real.

• Any two eigenvectors belonging to distinct eigenvalues are mutually orthogonal.

122 Review of Linear Algebra

In the special case of a finite dimensional Hilbert space, the matrix representation of an Hermitian
operator Ĥ satisfies
ˆ = ŜĤ Ŝ†
Diag (22)
where the matrix Diagˆ is diagonal, i.e. only its diagonal elements are nonzero, and the columns matrix
Ŝ are the eigenvectors of Ĥ.
Two operators  and B̂ are said to commute if their product does not depend on the order in which
it is evaluated:
 and B̂ commute if ÂB̂ = B̂Â. (23)
For two commuting operators  and B̂ it holds that any nondegenerate eigenvector of B̂ is also an
eigenvector of Â. If however  has a degenerate eigenvalue, then there can always be found a special
orthogonal basis in the degenerate eigenspace of that eigenvalue such that all basis vectors are also
eigenvectors of B̂, with eigenvalues which may or may not be degenerate.
Appendix B

The time-dependent Schrödinger equation

The time-dependent Schrödinger equation reads:

i} ψ(R,t) = Ĥψ(R,t). (1)
The coordinate R denotes any dependence other than time. Usually R contains space coordinates of
the particle(s) in the system and their spin.
In Dirac vector notation, The Schrödinger equation can be written as

i} |ψ(t)i = Ĥ |ψ(t)i . (2)
This equation has a formal solution for the case where the Hamiltonian Ĥ does not depend on the
|ψ(t)i = e−it Ĥ/} |ψ(t = 0)i . (3)
This expression is difficult to evaluate as it involves the exponent of an operator.
In case we know the eigenvectors |ϕn i and eigenvalues En of the Hamiltonian:

Ĥ |ϕn i = En |ϕn i , (4)

the solution is not difficult to find. We have for any eigenvector |ϕn i:

|ϕn (t)i = e−itEn /} |ϕn i . (5)

Because of completeness, we can write |ψ(t = 0)i as

|ψ(t = 0)i = ∑ cn |ϕn i , (6)


and we see that the following solution

∑ cn e−itE /} |ϕn i

satisfies the time-dependent Schrödinger equation with starting value |ψ(t = 0)i, as can easily be
verified by substitution.
The stationary Schrödinger equation can be derived from the time-dependent Schrödinger equa-
tion by a separation of variables. Let us try to write the solution to the time-dependent Schrödinger
equation in the form
ψ(R,t) = Φ(R)Q(t). (8)

124 The time-dependent Schrödinger equation

Substitution into the time-dependent Schrödinger equation and division on the left- and right hand
side by ψ(R,t) leads to
i} ∂ Q(t)
∂t ĤΦ(R)
= . (9)
Q(t) Φ(R)
On the left hand side, we have an expression depending on t, whereas on the right hand side we have
an expression depending on R. These two expressions can therefore be equal only when they are
constant. We call this constant the energy, En . This leads to the two equations

i} = En Q(t); (10a)
ĤΦ(R) = En Φ(R). (10b)

The second equation is the stationary Schrödinger equation which is essentially an eigenvalue equation
for the operator Ĥ. The first equation has as its solution a time-dependent phase factor exp(−iEnt/})
which must be multiplied by the eigenfunction of Ĥ at energy E in order to obtain a solution to the
time-dependent Schrödinger equation. From (7) we see that the full solution to the time-dependent
Schrödinger equation can always be written as linear combination of the solutions found via the sta-
tionary approach.
Appendix C

Review of the Schrödinger equation in one


In your first quantum mechanics course, you have encountered the stationary Schrödinger equation in
one dimension. In this appendix we review briefly some aspects of this equation and its solutions.
The stationary Schrödinger equation in one dimension reads
 2 2 
−} d
+V (x) ψ(x) = Eψ(x). (1)
2m dx2

This is an eigenvalue equation: on the left hand side, we have an operator acting on the wave function
ψ(x), and the result must be proportional to ψ, with proportionality constant E, the energy. A re-
striction on the possible solutions is that they must be square integrable, that is, they must have finite
The solution of this equation is known in a few cases only: the constant potential, the harmonic
oscillator and the Morse potential, which is related to the hydrogen atom. Here we shall restrict
ourselves to the constant potential, V (x) = V .
For E > V , the solutions can be written as

ψ(x) = e±ikx , (2)

k2 = 2m(E −V )/}2 . (3)
In Eq. (2), the + sign in the exponent corresponds to a wave running to the right, and the − sign
to a left-running wave. This can be seen when the solutions are multiplied by the appropriate time-
dependent phase factor exp(−iEt/}). The solution exp(±ikx) is not normalisable. Nevertheless, it
is accepted as a solution, because it is the limit of a sequence of normalisable solutions ψn which
are of the form exp(±ikx) for −n < x < n, and which are smoothly cut off to zero beyond these two
For E < V , the solution is
ψ(x) = e±qx , (4)
q2 = 2m(V − E)/}2 . (5)
When the solution extends to ∞, only the − sign is allowed as a result of the normalisability of the
wave function. For x to −∞, only the + sign is admissable for the same reason. The difference with
exp(±ikx) (which is also not normalisable but nevertheless accepted) is that it is not possible to find a
series of normalisable solutions whose limit behaves as a diverging exp(±qx).

126 Review of the Schrödinger equation in one dimension

We often deal with a potential which is piecewise constant. At the boundary between two regions
with constant potential, the boundary condition must be met that the value and the derivative of the
wave function are equal on both sides. Often we do not care about the normalisation of the wave
function at first (we do however care about the normalisability!). In that case, the two matching con-
ditions for value and derivative can be replaced by a continuity condition for the so-called logarithmic
derivative ψ 0 (x)/ψ(x) (the prime 0 stands for the derivative).
We now consider a general (i.e. a non-constant) potential V (x). When E > V for x → ∞ or −∞,
then a solution can be found for all values of E larger than V . In other words, the energy spectrum is
then continuous. When E < V for x → ∞ and x → −∞, then a normalisable solution is found only for
a discrete set of E-values. The spectrum is then discrete. Note that it is the normalisablity condition
which restricts in this case the energy to a discrete set.